Follow the prerequisites below to set up your environment before running the code:
-
Python 3.11: Setup a Python 3.11 virtual environment and install
FMBench
.python -m venv .fmbench pip install fmbench
-
S3 buckets for test data, scripts, and results: Create two buckets within your AWS account:
-
Read bucket: This bucket contains
tokenizer files
,prompt template
,source data
anddeployment scripts
stored in a directory structure as shown below.FMBench
needs to have read access to this bucket.s3://<read-bucket-name> ├── source_data/ ├── source_data/<source-data-file-name>.json ├── prompt_template/ ├── prompt_template/prompt_template.txt ├── scripts/ ├── scripts/<deployment-script-name>.py ├── tokenizer/ ├── tokenizer/tokenizer.json ├── tokenizer/config.json
-
The details of the bucket structure is as follows:
-
Source Data Directory: Create a
source_data
directory that stores the dataset you want to benchmark with.FMBench
usesQ&A
datasets from theLongBench dataset
or alternatively from this link. Support for bring your own dataset will be added soon.-
Download the different files specified in the LongBench dataset into the
source_data
directory. Following is a good list to get started with:2wikimqa
hotpotqa
narrativeqa
triviaqa
Store these files in the
source_data
directory.
-
-
Prompt Template Directory: Create a
prompt_template
directory that contains aprompt_template.txt
file. This.txt
file contains the prompt template that your specific model supports.FMBench
already supports the prompt template compatible withLlama
models. -
Scripts Directory:
FMBench
also supports abring your own script (BYOS)
mode for deploying models that are not natively available via SageMaker JumpStart i.e. anything not included in this list. Here are the steps to use BYOS.-
Create a Python script to deploy your model on a SageMaker endpoint. This script needs to have a
deploy
function that2_deploy_model.ipynb
can invoke. Seep4d_hf_tgi.py
for reference. -
Place your deployment script in the
scripts
directory in your read bucket. If your script deploys a model directly from HuggingFace and needs to have access to a HuggingFace auth token, then create a file calledhf_token.txt
and put the auth token in that file. The.gitignore
file in this repo has rules to not commit thehf_token.txt
to the repo. Today,FMBench
provides inference scripts for:- All SageMaker Jumpstart Models
- Text-Generation-Inference (TGI) container supported models
- Deep Java Library DeepSpeed container supported models
Deployment scripts for the options above are available in the scripts directory, you can use these as reference for creating your own deployment scripts as well.
-
-
Tokenizer Directory: Place the
tokenizer.json
,config.json
and any other files required for your model's tokenizer in thetokenizer
directory. The tokenizer for your model should be compatible with thetokenizers
package.FMBench
usesAutoTokenizer.from_pretrained
to load the tokenizer.As an example, to use the
Llama 2 Tokenizer
for counting prompt and generation tokens for theLlama 2
family of models: Accept the License here: meta approval form and download thetokenizer.json
andconfig.json
files from Hugging Face website and place them in thetokenizer
directory.
-
-
-
Write bucket: All prompt payloads, model endpoint and metrics generated by
FMBench
are stored in this bucket.FMBench
requires write permissions to store the results in this bucket. No directory structure needs to be pre-created in this bucket, everything is created byFMBench
at runtime.s3://<write-bucket-name> ├── <test-name> ├── <test-name>/data ├── <test-name>/data/metrics ├── <test-name>/data/models ├── <test-name>/data/prompts
-