QA Construction

Warning

This step is legacy and not included in the final pipeline. Curating a sufficiently large and high-quality QA dataset and the subsequent LLM finetuning step costs too much which does not fit well into our pipeline.

Make sure you are in this directory. Also make sure to put veritas-trial-deployment.json under the secrets/ folder in the root directory. To run the QA construction:

make build
make run

This should bring you within the Docker container. Then inside the container, if you want to have a fresh run or overwrite whatever existing in GCS, run:

python cli.py generate
python cli.py prepare
python cli.py upload

If you want to build on top of what already exists in GCS, run:

python cli.py fetch
python cli.py generate
python cli.py prepare
python cli.py upload

Note

See generate.py for the prompts used for generating QA.
The fetch subcommand overwrites local QA data, so be sure to backup your local data if you need them.
The generate subcommand by default appends to local QA data, but the --overwrite flag can change this behavior to such that, if an ID is seen in existing QA data, then that ID will be overwritten. The generate subcommand also supports generate only for a range of cleaned data via the -s/--start and/or -e/--end options.
Use --help on cli.py and its subcommands to get an overview of what each subcommand does.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

QA Construction

Files

README.md

Latest commit

History

README.md

File metadata and controls

QA Construction