|
| 1 | +# Evaluation |
| 2 | + |
| 3 | +The crate includes an overhead evaluation to measure the efficiency of the set reconciliation algorithm. |
| 4 | +Below are instructions on how to run the evaluation. |
| 5 | + |
| 6 | +_Note that all of these should be run from the crate's root directory!_ |
| 7 | + |
| 8 | +## Overhead Evaluation |
| 9 | + |
| 10 | +The overhead evaluation measures how many coded symbols are required to successfully decode set differences of various sizes. |
| 11 | +The key metric is the **overhead multiplier**: the ratio of coded symbols needed to the actual diff size. |
| 12 | + |
| 13 | +See [overhead results](evaluation/overhead.md) for the predefined configurations. |
| 14 | + |
| 15 | +### Running the Evaluation |
| 16 | + |
| 17 | +1. Ensure R and ImageMagick are installed with necessary packages: |
| 18 | + |
| 19 | + - Install R from [download](https://cran.r-project.org/) or using your platform's package manager. |
| 20 | + - Start `R` and install packages by executing `install.packages(c('dplyr', 'ggplot2', 'readr', 'stringr', 'scales'))`. |
| 21 | + - Install ImageMagick using the official [instructions](https://imagemagick.org/script/download.php). |
| 22 | + - Install Ghostscript for PDF conversion: `brew install ghostscript` (macOS) or `apt install ghostscript` (Linux). |
| 23 | + |
| 24 | +2. Run the benchmark tool directly to see available options: |
| 25 | + |
| 26 | + cargo run --release --features bin --bin hriblt-bench -- --help |
| 27 | + |
| 28 | + Example: run 1000 trials with diff sizes from 1 to 100: |
| 29 | + |
| 30 | + cargo run --release --features bin --bin hriblt-bench -- \ |
| 31 | + --trials 10000 \ |
| 32 | + --set-size 1000 \ |
| 33 | + --diff-size '1..101' \ |
| 34 | + --diff-mode incremental \ |
| 35 | + --tsv |
| 36 | + |
| 37 | +3. To generate a plot from TSV output: |
| 38 | + |
| 39 | + cargo run --release --features bin --bin hriblt-bench -- \ |
| 40 | + --trials 10000 --diff-size '1..101' --diff-mode incremental --tsv > overhead.tsv |
| 41 | + |
| 42 | + evaluation/plot-overhead.r overhead.tsv overhead.pdf |
| 43 | + |
| 44 | +### TSV Output Format |
| 45 | + |
| 46 | +When using `--tsv`, the benchmark outputs tab-separated data with the following columns: |
| 47 | + |
| 48 | +| Column | Description | |
| 49 | +|--------|-------------| |
| 50 | +| `trial` | Trial number (1-indexed) | |
| 51 | +| `set_size` | Number of elements in each set | |
| 52 | +| `diff_size` | Number of differences between sets | |
| 53 | +| `success` | Whether decoding succeeded (`true`/`false`) | |
| 54 | +| `coded_symbols` | Number of coded symbols needed to decode | |
| 55 | +| `overhead` | Ratio of coded_symbols to diff_size | |
| 56 | + |
| 57 | +### Regenerating Documentation Plots |
| 58 | + |
| 59 | +To regenerate the plots in the repository, ensure ImageMagick is available, and run: |
| 60 | + |
| 61 | + scripts/generate-overhead-plots |
| 62 | + |
| 63 | +_Note that this will always re-run the benchmark!_ |
| 64 | + |
| 65 | +## Understanding the Results |
| 66 | + |
| 67 | +The overhead plot shows percentiles of the overhead multiplier across different diff sizes: |
| 68 | + |
| 69 | +- **Gray region**: Full range (p0 to p99) - min to ~max overhead observed |
| 70 | +- **Blue region**: Interquartile range (p25 to p75) - where 50% of trials fall |
| 71 | +- **Dark line**: Median (p50) - typical overhead |
| 72 | + |
| 73 | +A lower overhead means more efficient encoding. An overhead of 1.0x would mean the number of coded symbols exactly equals the diff size (theoretical minimum). |
| 74 | + |
| 75 | +Typical results show: |
| 76 | +- Small diff sizes (1-10) have higher variance and overhead due to the probabilistic nature of the algorithm |
| 77 | +- Larger diff sizes (50+) converge to a more stable overhead around 1.3-1.5x |
| 78 | +- The algorithm successfully decodes 100% of trials when given up to 10x the diff size in coded symbols |
0 commit comments