Skip to content

Commit c7b2907

Browse files
CICD Refactor (#354)
* Refactored the build processes. Broke things into components as much as possible. We have standalone actions for the build processes to make sure they are consistent across push or PR builds, a format-check that doesn't rely on cmake to be there to work, and centralized our randomized data generation into a single action that can be called in each section. We now are reusing as many of the steps as we can without copy/pasting, which should ensure we're not making mistakes. * Fixing the dynamic tests, the paths to the data were wrong --------- Co-authored-by: yashpatel007 <[email protected]>
1 parent 96aaa1c commit c7b2907

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

71 files changed

+718
-463
lines changed

.github/actions/build/action.yml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
name: 'DiskANN Build Bootstrap'
2+
description: 'Prepares DiskANN build environment and executes build'
3+
runs:
4+
using: "composite"
5+
steps:
6+
# ------------ Linux Build ---------------
7+
- name: Prepare and Execute Build
8+
if: ${{ runner.os == 'Linux' }}
9+
run: |
10+
sudo scripts/dev/install-dev-deps-ubuntu.bash
11+
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
12+
cmake --build build -- -j
13+
cmake --install build --prefix="dist"
14+
shell: bash
15+
# ------------ End Linux Build ---------------
16+
# ------------ Windows Build ---------------
17+
- name: Add VisualStudio command line tools into path
18+
if: runner.os == 'Windows'
19+
uses: ilammy/msvc-dev-cmd@v1
20+
- name: Run configure and build for Windows
21+
if: runner.os == 'Windows'
22+
run: |
23+
mkdir build && cd build && cmake .. && msbuild diskann.sln /m /nologo /t:Build /p:Configuration="Release" /property:Platform="x64" -consoleloggerparameters:"ErrorsOnly;Summary"
24+
cd ..
25+
mkdir dist
26+
mv .\x64\Release\ .\dist\bin
27+
shell: cmd
28+
# ------------ End Windows Build ---------------
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
name: 'Checking code formatting...'
2+
description: 'Ensures code complies with code formatting rules'
3+
runs:
4+
using: "composite"
5+
steps:
6+
- name: Checking code formatting...
7+
run: |
8+
sudo apt install clang-format
9+
find include -name '*.h' -type f -print0 | xargs -0 -P 16 /usr/bin/clang-format --Werror --dry-run
10+
find src -name '*.cpp' -type f -print0 | xargs -0 -P 16 /usr/bin/clang-format --Werror --dry-run
11+
find apps -name '*.cpp' -type f -print0 | xargs -0 -P 16 /usr/bin/clang-format --Werror --dry-run
12+
find python -name '*.cpp' -type f -print0 | xargs -0 -P 16 /usr/bin/clang-format --Werror --dry-run
13+
shell: bash
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
name: 'Generating Random Data (Basic)'
2+
description: 'Generates the random data files used in acceptance tests'
3+
runs:
4+
using: "composite"
5+
steps:
6+
- name: Generate Random Data (Basic)
7+
run: |
8+
mkdir data
9+
10+
echo "Generating random vectors for index"
11+
dist/bin/rand_data_gen --data_type float --output_file data/rand_float_10D_10K_norm1.0.bin -D 10 -N 10000 --norm 1.0
12+
dist/bin/rand_data_gen --data_type int8 --output_file data/rand_int8_10D_10K_norm50.0.bin -D 10 -N 10000 --norm 50.0
13+
dist/bin/rand_data_gen --data_type uint8 --output_file data/rand_uint8_10D_10K_norm50.0.bin -D 10 -N 10000 --norm 50.0
14+
15+
echo "Generating random vectors for query"
16+
dist/bin/rand_data_gen --data_type float --output_file data/rand_float_10D_1K_norm1.0.bin -D 10 -N 1000 --norm 1.0
17+
dist/bin/rand_data_gen --data_type int8 --output_file data/rand_int8_10D_1K_norm50.0.bin -D 10 -N 1000 --norm 50.0
18+
dist/bin/rand_data_gen --data_type uint8 --output_file data/rand_uint8_10D_1K_norm50.0.bin -D 10 -N 1000 --norm 50.0
19+
20+
echo "Computing ground truth for floats across l2, mips, and cosine distance functions"
21+
dist/bin/compute_groundtruth --data_type float --dist_fn l2 --base_file data/rand_float_10D_10K_norm1.0.bin --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/l2_rand_float_10D_10K_norm1.0_10D_1K_norm1.0_gt100 --K 100
22+
dist/bin/compute_groundtruth --data_type float --dist_fn mips --base_file data/rand_float_10D_10K_norm1.0.bin --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/mips_rand_float_10D_10K_norm1.0_10D_1K_norm1.0_gt100 --K 100
23+
dist/bin/compute_groundtruth --data_type float --dist_fn cosine --base_file data/rand_float_10D_10K_norm1.0.bin --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/cosine_rand_float_10D_10K_norm1.0_10D_1K_norm1.0_gt100 --K 100
24+
25+
echo "Computing ground truth for int8s across l2, mips, and cosine distance functions"
26+
dist/bin/compute_groundtruth --data_type int8 --dist_fn l2 --base_file data/rand_int8_10D_10K_norm50.0.bin --query_file data/rand_int8_10D_1K_norm50.0.bin --gt_file data/l2_rand_int8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --K 100
27+
dist/bin/compute_groundtruth --data_type int8 --dist_fn mips --base_file data/rand_int8_10D_10K_norm50.0.bin --query_file data/rand_int8_10D_1K_norm50.0.bin --gt_file data/mips_rand_int8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --K 100
28+
dist/bin/compute_groundtruth --data_type int8 --dist_fn cosine --base_file data/rand_int8_10D_10K_norm50.0.bin --query_file data/rand_int8_10D_1K_norm50.0.bin --gt_file data/cosine_rand_int8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --K 100
29+
30+
echo "Computing ground truth for uint8s across l2, mips, and cosine distance functions"
31+
dist/bin/compute_groundtruth --data_type uint8 --dist_fn l2 --base_file data/rand_uint8_10D_10K_norm50.0.bin --query_file data/rand_uint8_10D_1K_norm50.0.bin --gt_file data/l2_rand_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --K 100
32+
dist/bin/compute_groundtruth --data_type uint8 --dist_fn mips --base_file data/rand_uint8_10D_10K_norm50.0.bin --query_file data/rand_uint8_10D_1K_norm50.0.bin --gt_file data/mips_rand_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --K 100
33+
dist/bin/compute_groundtruth --data_type uint8 --dist_fn cosine --base_file data/rand_uint8_10D_10K_norm50.0.bin --query_file data/rand_uint8_10D_1K_norm50.0.bin --gt_file data/cosine_rand_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --K 100
34+
35+
shell: bash

.github/workflows/build-python.yml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
name: DiskANN Build Python Wheel
2+
on: [workflow_call]
3+
jobs:
4+
build:
5+
name: Build for ${{matrix.python-version}} on ${{matrix.os}}
6+
strategy:
7+
fail-fast: false
8+
matrix:
9+
os: [ubuntu-latest]
10+
python-version: ["3.7", "3.8", "3.9", "3.10", "3.11"]
11+
runs-on: ${{matrix.os}}
12+
defaults:
13+
run:
14+
shell: bash
15+
steps:
16+
- name: Checkout repository
17+
uses: actions/checkout@v2
18+
with:
19+
submodules: true
20+
- uses: actions/setup-python@v3
21+
- name: Install cibuildwheel
22+
run: python -m pip install cibuildwheel==2.11.3
23+
- name: Building Wheel for Python ${{inputs.python-version}}
24+
run: python -m cibuildwheel --output-dir dist
25+
env:
26+
CIBW_BUILD: ${{ inputs.python-version }}
27+
- uses: actions/upload-artifact@v3
28+
with:
29+
name: wheels
30+
path: ./dist/*.whl

.github/workflows/common.yml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
name: DiskANN Common Checks
2+
# common means common to both pr-test and push-test
3+
on: [workflow_call]
4+
jobs:
5+
formatting-check:
6+
strategy:
7+
fail-fast: true
8+
name: Code Formatting Test
9+
runs-on: ubuntu-latest
10+
steps:
11+
- name: Checkout repository
12+
uses: actions/checkout@v3
13+
with:
14+
fetch-depth: 1
15+
- name: Checking code formatting...
16+
uses: ./.github/actions/format-check
17+
docker-container-build:
18+
name: Docker Container Build
19+
needs: [formatting-check]
20+
runs-on: ubuntu-latest
21+
steps:
22+
- name: Checkout repository
23+
uses: actions/checkout@v3
24+
with:
25+
fetch-depth: 1
26+
- name: Docker build
27+
run: |
28+
docker build .

.github/workflows/disk-pq.yml

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
name: Disk With PQ
2+
on: [workflow_call]
3+
jobs:
4+
acceptance-tests-disk-pq:
5+
name: Disk, PQ
6+
strategy:
7+
fail-fast: false
8+
matrix:
9+
os: [ubuntu-latest, windows-2019, windows-latest]
10+
runs-on: ${{matrix.os}}
11+
defaults:
12+
run:
13+
shell: bash
14+
steps:
15+
- name: Checkout repository
16+
if: ${{ runner.os == 'Linux' }}
17+
uses: actions/checkout@v3
18+
with:
19+
fetch-depth: 1
20+
- name: Checkout repository
21+
if: ${{ runner.os == 'Windows' }}
22+
uses: actions/checkout@v3
23+
with:
24+
fetch-depth: 1
25+
submodules: true
26+
- name: DiskANN Build CLI Applications
27+
uses: ./.github/actions/build
28+
29+
- name: Generate Data
30+
uses: ./.github/actions/generate-random
31+
32+
- name: build and search disk index (one shot graph build, L2, no diskPQ) (float)
33+
if: success() || failure()
34+
run: |
35+
dist/bin/build_disk_index --data_type float --dist_fn l2 --data_path data/rand_float_10D_10K_norm1.0.bin --index_path_prefix data/disk_index_l2_rand_float_10D_10K_norm1.0_diskfull_oneshot -R 16 -L 32 -B 0.00003 -M 1
36+
dist/bin/search_disk_index --data_type float --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_float_10D_10K_norm1.0_diskfull_oneshot --result_path /tmp/res --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/l2_rand_float_10D_10K_norm1.0_10D_1K_norm1.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
37+
- name: build and search disk index (one shot graph build, L2, no diskPQ) (int8)
38+
if: success() || failure()
39+
run: |
40+
dist/bin/build_disk_index --data_type int8 --dist_fn l2 --data_path data/rand_int8_10D_10K_norm50.0.bin --index_path_prefix data/disk_index_l2_rand_int8_10D_10K_norm50.0_diskfull_oneshot -R 16 -L 32 -B 0.00003 -M 1
41+
dist/bin/search_disk_index --data_type int8 --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_int8_10D_10K_norm50.0_diskfull_oneshot --result_path /tmp/res --query_file data/rand_int8_10D_1K_norm50.0.bin --gt_file data/l2_rand_int8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
42+
- name: build and search disk index (one shot graph build, L2, no diskPQ) (uint8)
43+
if: success() || failure()
44+
run: |
45+
dist/bin/build_disk_index --data_type uint8 --dist_fn l2 --data_path data/rand_uint8_10D_10K_norm50.0.bin --index_path_prefix data/disk_index_l2_rand_uint8_10D_10K_norm50.0_diskfull_oneshot -R 16 -L 32 -B 0.00003 -M 1
46+
dist/bin/search_disk_index --data_type uint8 --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_uint8_10D_10K_norm50.0_diskfull_oneshot --result_path /tmp/res --query_file data/rand_uint8_10D_1K_norm50.0.bin --gt_file data/l2_rand_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
47+
48+
- name: build and search disk index (one shot graph build, L2, no diskPQ, build with PQ distance comparisons) (float)
49+
if: success() || failure()
50+
run: |
51+
dist/bin/build_disk_index --data_type float --dist_fn l2 --data_path data/rand_float_10D_10K_norm1.0.bin --index_path_prefix data/disk_index_l2_rand_float_10D_10K_norm1.0_diskfull_oneshot_buildpq5 -R 16 -L 32 -B 0.00003 -M 1 --build_PQ_bytes 5
52+
dist/bin/search_disk_index --data_type float --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_float_10D_10K_norm1.0_diskfull_oneshot_buildpq5 --result_path /tmp/res --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/l2_rand_float_10D_10K_norm1.0_10D_1K_norm1.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
53+
- name: build and search disk index (one shot graph build, L2, no diskPQ, build with PQ distance comparisons) (int8)
54+
if: success() || failure()
55+
run: |
56+
dist/bin/build_disk_index --data_type int8 --dist_fn l2 --data_path data/rand_int8_10D_10K_norm50.0.bin --index_path_prefix data/disk_index_l2_rand_int8_10D_10K_norm50.0_diskfull_oneshot_buildpq5 -R 16 -L 32 -B 0.00003 -M 1 --build_PQ_bytes 5
57+
dist/bin/search_disk_index --data_type int8 --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_int8_10D_10K_norm50.0_diskfull_oneshot_buildpq5 --result_path /tmp/res --query_file data/rand_int8_10D_1K_norm50.0.bin --gt_file data/l2_rand_int8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16\
58+
- name: build and search disk index (one shot graph build, L2, no diskPQ, build with PQ distance comparisons) (uint8)
59+
if: success() || failure()
60+
run: |
61+
dist/bin/build_disk_index --data_type uint8 --dist_fn l2 --data_path data/rand_uint8_10D_10K_norm50.0.bin --index_path_prefix data/disk_index_l2_rand_uint8_10D_10K_norm50.0_diskfull_oneshot_buildpq5 -R 16 -L 32 -B 0.00003 -M 1 --build_PQ_bytes 5
62+
dist/bin/search_disk_index --data_type uint8 --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_uint8_10D_10K_norm50.0_diskfull_oneshot_buildpq5 --result_path /tmp/res --query_file data/rand_uint8_10D_1K_norm50.0.bin --gt_file data/l2_rand_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
63+
64+
- name: build and search disk index (sharded graph build, L2, no diskPQ) (float)
65+
if: success() || failure()
66+
run: |
67+
dist/bin/build_disk_index --data_type float --dist_fn l2 --data_path data/rand_float_10D_10K_norm1.0.bin --index_path_prefix data/disk_index_l2_rand_float_10D_10K_norm1.0_diskfull_sharded -R 16 -L 32 -B 0.00003 -M 0.00006
68+
dist/bin/search_disk_index --data_type float --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_float_10D_10K_norm1.0_diskfull_sharded --result_path /tmp/res --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/l2_rand_float_10D_10K_norm1.0_10D_1K_norm1.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
69+
- name: build and search disk index (sharded graph build, L2, no diskPQ) (int8)
70+
run: |
71+
dist/bin/build_disk_index --data_type int8 --dist_fn l2 --data_path data/rand_int8_10D_10K_norm50.0.bin --index_path_prefix data/disk_index_l2_rand_int8_10D_10K_norm50.0_diskfull_sharded -R 16 -L 32 -B 0.00003 -M 0.00006
72+
dist/bin/search_disk_index --data_type int8 --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_int8_10D_10K_norm50.0_diskfull_sharded --result_path /tmp/res --query_file data/rand_int8_10D_1K_norm50.0.bin --gt_file data/l2_rand_int8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
73+
- name: build and search disk index (sharded graph build, L2, no diskPQ) (uint8)
74+
if: success() || failure()
75+
run: |
76+
dist/bin/build_disk_index --data_type uint8 --dist_fn l2 --data_path data/rand_uint8_10D_10K_norm50.0.bin --index_path_prefix data/disk_index_l2_rand_uint8_10D_10K_norm50.0_diskfull_sharded -R 16 -L 32 -B 0.00003 -M 0.00006
77+
dist/bin/search_disk_index --data_type uint8 --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_uint8_10D_10K_norm50.0_diskfull_sharded --result_path /tmp/res --query_file data/rand_uint8_10D_1K_norm50.0.bin --gt_file data/l2_rand_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
78+
79+
- name: build and search disk index (one shot graph build, L2, diskPQ) (float)
80+
if: success() || failure()
81+
run: |
82+
dist/bin/build_disk_index --data_type float --dist_fn l2 --data_path data/rand_float_10D_10K_norm1.0.bin --index_path_prefix data/disk_index_l2_rand_float_10D_10K_norm1.0_diskpq_oneshot -R 16 -L 32 -B 0.00003 -M 1 --PQ_disk_bytes 5
83+
dist/bin/search_disk_index --data_type float --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_float_10D_10K_norm1.0_diskpq_oneshot --result_path /tmp/res --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/l2_rand_float_10D_10K_norm1.0_10D_1K_norm1.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
84+
- name: build and search disk index (one shot graph build, L2, diskPQ) (int8)
85+
if: success() || failure()
86+
run: |
87+
dist/bin/build_disk_index --data_type int8 --dist_fn l2 --data_path data/rand_int8_10D_10K_norm50.0.bin --index_path_prefix data/disk_index_l2_rand_int8_10D_10K_norm50.0_diskpq_oneshot -R 16 -L 32 -B 0.00003 -M 1 --PQ_disk_bytes 5
88+
dist/bin/search_disk_index --data_type int8 --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_int8_10D_10K_norm50.0_diskpq_oneshot --result_path /tmp/res --query_file data/rand_int8_10D_1K_norm50.0.bin --gt_file data/l2_rand_int8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
89+
- name: build and search disk index (one shot graph build, L2, diskPQ) (uint8)
90+
if: success() || failure()
91+
run: |
92+
dist/bin/build_disk_index --data_type uint8 --dist_fn l2 --data_path data/rand_uint8_10D_10K_norm50.0.bin --index_path_prefix data/disk_index_l2_rand_uint8_10D_10K_norm50.0_diskpq_oneshot -R 16 -L 32 -B 0.00003 -M 1 --PQ_disk_bytes 5
93+
dist/bin/search_disk_index --data_type uint8 --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_uint8_10D_10K_norm50.0_diskpq_oneshot --result_path /tmp/res --query_file data/rand_uint8_10D_1K_norm50.0.bin --gt_file data/l2_rand_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
94+
95+
- name: build and search disk index (sharded graph build, MIPS, diskPQ) (float)
96+
if: success() || failure()
97+
run: |
98+
dist/bin/build_disk_index --data_type float --dist_fn mips --data_path data/rand_float_10D_10K_norm1.0.bin --index_path_prefix data/disk_index_mips_rand_float_10D_10K_norm1.0_diskpq_sharded -R 16 -L 32 -B 0.00003 -M 0.00006 --PQ_disk_bytes 5
99+
dist/bin/search_disk_index --data_type float --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_mips_rand_float_10D_10K_norm1.0_diskpq_sharded --result_path /tmp/res --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/mips_rand_float_10D_10K_norm1.0_10D_1K_norm1.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16

0 commit comments

Comments
 (0)