This directory contains code for the Verdict evaluation. There are three main evaluations:
- Eval 1: Performance benchmark against Chrome, Firefox, OpenSSL, ARMOR, CERES, and Hammurabi.
- Eval 2: Differential testing with Chrome, Firefox, OpenSSL
- Eval 3: End-to-End HTTPS performance in Rustls
$ git submodule update --init --recursive
$ docker build . -t verdict-bench
$ docker run -it --cap-add=NET_ADMIN verdict-bench
(container) $ make eval
Running make eval
again will print out the results again.
On our test machine with the Intel Core i9-10980XE CPU, docker build
takes about 1 hour and 120 GB of free disk space,
and make eval
takes about 3.5 hours (with the given sample of 35,000 chains).
Before make eval
, you can also run make test
(which will only take a few minutes)
to run through the same evaluation but without multiple samples for accuracy.
After make test
, make sure to rm -rf results
before running make eval
.
If you do not need to edit the benchmarking code in any of the tools, the recommended build method is to use Docker.
First run the following to load all git submodules:
git submodule update --init --recursive
Then run the following to compile all tools and build a standalone image containing all necessary dependencies:
docker build . -t verdict-bench
This will take a while, since we need to build large projects such as Firefox and Chromium.
The rest of the tutorial assumes that you are in the Docker container:
docker run -it --cap-add=NET_ADMIN verdict-bench
--cap-add=NET_ADMIN
is required for the network delay setup in Eval 3.
To make sure that all benchmarks work correctly, run make test
in the container.
To build a particular X.509 tool, run
docker build . --target <tool>-install --output type=tar | (mkdir -p build && tar -C build -x)
where is one of chromium
, firefox
, armor
, ceres
, hammurabi
, openssl
, verdict
.
The suitable build output will be copied to build/<tool>
(e.g. build/chromium/src/out/Release/cert_bench
for Chromium).
To modify the benchmark harnesses, consider doing cd <tool> && make
instead of using the Docker image.
Note however that some dependencies need to be installed on the host system.
Note that for Evals 1 and 2, we do not have the full benchmark set of 10M chains from CT logs publically available,
but there is a sample of 35,000 chains located in data/ct-log
.
In general, you can also prepare your own test cases in the following directory structure:
test_suite/
- certs/
- cert-list-part-xx.txt
- cert-list-part-xx.txt
...
- ints/
- int1.pem
- int2.pem
...
where each CSV file in test_suite/certs
should have columns (without headers)
<Base64 encoding of the leaf>,<SHA256 hash of the leaf>,<hostname>,<comma separated list of intermediates, e.g. int1,int2>
Then in all the evaluations below, set an additional variable CT_LOG=test_suite
for each make
command.
To run performance benchmarks on all supported tools:
make eval-1
At the end, a LaTeX table of performance statistics will be printed.
A PDF containing boxplots of more detailed performance distribution will also be stored at results/performance.pdf
To run individual benchmarks, use
make results/bench-<tool>.csv
to run <tool>
on the sample of 35,000 chains in data/ct-log
,
where <tool>
is one of:
verdict-chrome
,verdict-firefox
,verdict-openssl
(normal versions of Verdict using verified crypto primitives)verdict-chrome-aws-lc
,verdict-firefox-aws-lc
,verdict-openssl-aws-lc
(Verdict using unverified crypto primitives from AWS-LC)chromium
firefox
armor
ceres
hammurabi-chrome
,hammurabi-firefox
(Hammurabi's Chrome and Firefox policies)openssl
The results will be saved toresults/bench-<tool>.csv
. Note that, as also mentioned in the paper, for ARMOR and CERES, we only sample about 0.1% of the given test cases; and for Hammurabi, we only sample 1% of all test cases. To override these settings, make suitable adjustments inMakefile
.
To run all differential tests (comparison of Verdict's Chrome, Firefox, and OpenSSL policies against their original implementations), run
make eval-2
At the end, a LaTeX table containing results will be printed.
More detailed results can be found in results/diff-<tool>.csv
(differential tests on CT logs),
and results/limbo-<tool>.csv
(differential tests on x509-limbo).
Similar to Eval 1, to run individual tests, run
make results/diff-<tool>.csv
or
make results/limbo-<tool>.csv
To run end-to-end performance tests with Rustls, use
make eval-3 [END_TO_END_DELAY=5ms] [END_TO_END_WARMUP=20] [END_TO_END_REPEAT=100]
This will run Rustls's tlsclient-mio
to simulate fetching the first HTTPS response from public domains
(these domains are simulated locally).
The results will be saved to results/end-to-end-*.csv
, and a summarizing table will be printed at the end.