Distributed Workloads

Examples

Fine-Tune LLMs with Ray and DeepSpeed on OpenShift AI
Fine-Tune Stable Diffusion with DreamBooth and Ray Train
Hyperparameters Optimization with Ray Tune on OpenShift AI

Integration Tests

Prerequisites

Admin access to an OpenShift cluster (CRC is fine)
Installed OpenDataHub or RHOAI, enabled all Distributed Workload components
Installed Go 1.21

Common environment variables

CODEFLARE_TEST_OUTPUT_DIR - Output directory for test logs
CODEFLARE_TEST_TIMEOUT_SHORT - Timeout duration for short tasks
CODEFLARE_TEST_TIMEOUT_MEDIUM - Timeout duration for medium tasks
CODEFLARE_TEST_TIMEOUT_LONG - Timeout duration for long tasks
CODEFLARE_TEST_RAY_IMAGE (Optional) - Ray image used for raycluster configuration
MINIO_CLI_IMAGE (Optional) - Minio CLI image used for uploading/downloading data from/into s3 bucket

NOTE: quay.io/modh/ray:2.35.0-py311-cu121 is the default image used for creating a RayCluster resource. If you have your own custom ray image which suits your purposes, specify it in CODEFLARE_TEST_RAY_IMAGE environment variable.

Environment variables for fms-hf-tuning test suite

FMS_HF_TUNING_IMAGE - Image tag used in PyTorchJob CR for model training

Environment variables for fms-hf-tuning GPU test suite

TEST_NAMESPACE_NAME (Optional) - Existing namespace where will the Training operator GPU tests be executed
HF_TOKEN - HuggingFace token used to pull models which has limited access
GPTQ_MODEL_PVC_NAME - Name of PersistenceVolumeClaim containing downloaded GPTQ models

To upload trained model into S3 compatible storage, use the environment variables mentioned below :

AWS_DEFAULT_ENDPOINT - Storage bucket endpoint to upload trained dataset to, if set then test will upload model into s3 bucket
AWS_ACCESS_KEY_ID - Storage bucket access key
AWS_SECRET_ACCESS_KEY - Storage bucket secret key
AWS_STORAGE_BUCKET - Storage bucket name
AWS_STORAGE_BUCKET_MODEL_PATH (Optional) - Path in the storage bucket where trained model will be stored to

Environment variables for ODH integration test suite

ODH_NAMESPACE - Namespace where ODH components are installed to
NOTEBOOK_USER_NAME - Username of user used for running Workbench
NOTEBOOK_USER_TOKEN - Login token of user used for running Workbench
NOTEBOOK_IMAGE - Image used for running Workbench

To download MNIST training script datasets from S3 compatible storage, use the environment variables mentioned below :

AWS_DEFAULT_ENDPOINT - Storage bucket endpoint from which to download MNIST datasets
AWS_ACCESS_KEY_ID - Storage bucket access key
AWS_SECRET_ACCESS_KEY - Storage bucket secret key
AWS_STORAGE_BUCKET - Storage bucket name
AWS_STORAGE_BUCKET_MNIST_DIR - Storage bucket directory from which to download MNIST datasets.

Running Tests

Execute tests like standard Go unit tests.

go test -timeout 60m ./tests/kfto/

Name	Name	Last commit message	Last commit date
Latest commit dchourasia Merge remote-tracking branch 'upstream/main' Mar 27, 2025 3a076b1 · Mar 27, 2025 History 655 Commits
.github/workflows	.github/workflows	update paths in odh-release workflow	Jan 6, 2025
.tekton	.tekton	Update restart	Mar 6, 2025
datasets	datasets	Add Fine-Tune LLMs with Ray and DeepSpeed example	Jun 21, 2024
examples	examples	task(RHOAIENG-14661): add comment to demo notebooks	Mar 26, 2025
hack	hack	Add imports make target	Dec 18, 2024
images	images	RHOAIENG-22169 - Update ROCM example image	Mar 24, 2025
tests	tests	Use accelerator function from codeflare-common	Mar 17, 2025
workshops/llm-fine-tuning	workshops/llm-fine-tuning	Add KFTO section in LLM fine-tuning workshop	Feb 20, 2025
.gitignore	.gitignore	Install CodeFlare tools from main branch	Aug 21, 2023
.snyk	.snyk	Add .snyk config file to ignore examples and tests directories from s…	Aug 21, 2024
LICENSE	LICENSE	relicense project to Apache 2.0	Jan 2, 2024
Makefile	Makefile	Add imports make target	Dec 18, 2024
OWNERS	OWNERS	Add szaher to owners and approvers	Nov 8, 2024
README.md	README.md	Separate fms-hf-tuning testing into dedicate fms folder	Dec 11, 2024
go.mod	go.mod	Use accelerator function from codeflare-common	Mar 17, 2025
go.sum	go.sum	Use accelerator function from codeflare-common	Mar 17, 2025
rhods-installation.md	rhods-installation.md	standardize instructions with oc command	Sep 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed Workloads

Examples

Integration Tests

Prerequisites

Common environment variables

Environment variables for fms-hf-tuning test suite

Environment variables for fms-hf-tuning GPU test suite

Environment variables for ODH integration test suite

Running Tests

About

Releases

Packages

Languages

License

red-hat-data-services/distributed-workloads

Folders and files

Latest commit

History

Repository files navigation

Distributed Workloads

Examples

Integration Tests

Prerequisites

Common environment variables

Environment variables for fms-hf-tuning test suite

Environment variables for fms-hf-tuning GPU test suite

Environment variables for ODH integration test suite

Running Tests

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages