06 Intelligent Software Configuration Tuning

Why Tune Software Configurations ?

Software systems often have hundreds of parameters, yet many remain at suboptimal default settings.

Unrealised Potential: Up to $80%$ of configuration options are often left unused.
Performance Impact: Default settings can be significantly worse than optimal ones, for example, Apache Storm’s default configuration is 480 times slower than the best possible setting.
Financial Impact: In sectors like finance, reducing latency just by 1ms can result in $100 million in annual revenue.

The Challenge:

Complexity: The number of options is growing with every software release.
Search Space Explosion: Even 11 binary options result in over a million possible configurations.
Slow Evaluation: Measuring a single configuration can take hours or even dats
Conflicting Objectives: Improving latency might negatively impact throughput or CPU load.

Model-Free Tuning Approaches

Model-free method rely on direct measurement and search algorithms

NSG-II for Multi-Objective Tuning

Used for frameworks like ORM, this approach focuses on finding a balance between execution time, CPU, and memory

Stopping Criteria: It is crucial to know when to stop searching to save resources.
t-test: Comparing the significance of objective changes between generations ( $p > 0.05$ suggests insignificance).
Mutual Dominance Rate (MDR): Measures the progress of the algorithm.
$M D R = \frac{∣ d o m ( A , B ) ∣}{∣ A ∣} - \frac{∣ d o m ( B , A ) ∣}{∣ B ∣}$
$d o m (A, B)$ : Configurations in the previous set (A) dominated by the current set (B).
MDR > 0: Progress is being made.
MDR = 0: No progress.

BestConfig: Divide and Search

Focuses on finding the single best configuration through local search.

Divide & Diverge Sampling (DDS): Instead of random sampling, the space is divided into $k^{n}$ subspaces to ensure all regions are represented.
Recursive Bound & Search (RBS):
1. Find the best point ( $C_{0}$ ) in the initial sample.
2. Define a bounded space around $C_{0}$ based on its closest neighbors.
3. Recursively sample within this smaller bound until no further improvement is found.

Model-Based Tuning: FLASH

FLASH uses Bayesian Optimisation but replaces the standard Gaussian Process (GP) with CART (Classification and Regression Trees).

Why use CART over Gaussian Process?

Scalability: GP usually fails when there are more than 10 dimensions (options).
Non-Smoothness: GP assumes that similar configurations have similar performance. In software, one small change (e.g., switching from a linked list to a B-tree) can cause a massive, non-linear jump in performance. CART handles these “cliffs” better.

Multi-Objective FLASH (Bazza Algorithm)

For multiple objectives, FLASH uses the Bazza algorithm to score configurations:

m e a n_{i} = \frac{1}{N} n \sum N j \sum m (V_{n, j} \cdot x_{i, j})

TIP

This formula basically finds the configuration that performs best across many different random “weightings” of the objectives, ensuring a balanced solution.

Advanced Research Threads

Cross-Environment Tuning (OnlineTune)

Most tuners assume the environment is static. OnlineTune adapts to dynamic workloads (e.g., changing SQL query rates).

Contextual GP: Incorporates “context” (query composition, arrival rate) into the model.
Safety Assessment: Before trying a configuration, it uses the model to ensure the configuration is “safe” (won’t cause system failure).

Cost-Aware Tuning (FlexiBO)

Evaluations are not always equal in cost. For example, measuring energy consumption might be faster than measuring deep learning accuracy.

Decoupled Approach: FlexiBO evaluates different objectives at different frequencies based on their cost.
Acquisition Function: It incorporates the cost ( $θ$ ) into the volume change ( $Δ V$ ) of the Pareto front:
$α_{t, i} (x) = \frac{Δ V _{t, i}}{θ _{t, i}}$

Ayush Acharjya's Notes

Explorer

06 Intelligent Software Configuration Tuning

Why Tune Software Configurations ?

The Challenge:

Model-Free Tuning Approaches

NSG-II for Multi-Objective Tuning

BestConfig: Divide and Search

Model-Based Tuning: FLASH

Why use CART over Gaussian Process?

Multi-Objective FLASH (Bazza Algorithm)

Advanced Research Threads

Cross-Environment Tuning (OnlineTune)

Cost-Aware Tuning (FlexiBO)

Graph View

Table of Contents

Backlinks