Act-PRM (Tinker)

Tinkering with Action Process Reward Models (Act-PRMs)

Setup

Dependencies

To install dependencies and manage packages, we use uv. You can install it from here.

Then, install dependencies with uv sync. Or, just run one of the example scripts below (uv will automatically install / update dependencies in pyproject.toml as needed).

Tinker

We currently use Tinker to run experiments. You'll want to:

Sign up for Tinker here
Create an API key from the console
Either export this as an environment variable (e.g., export TINKER_API_KEY="<your_api_key>") or add it to a .env file (recommended, see below).

Setting Environment Variables

To manage API keys for Tinker, WandB, and Hugging Face, we use dotenv to load environment variables from a .env file. Create a .env file in this project's root directory (e.g., vim .env), and add your environment variables, e.g.,

TINKER_API_KEY="<your_tinker_api_key>"
HF_TOKEN="<your_huggingface_token>"
WANDB_API_KEY="<your_wandb_api_key>"
WANDB_ENTITY="hazy-research"

If you haven't already, add this .env file to your .gitignore file to avoid leaking keys and committing it to the repository.

PyTorch

We've also implemented sufficient training and generation code in PyTorch. See the pytorch branch for files under src/act_prm/pytorch/.

FlashAttention-2

To use the wonderful FlashAttention-2, simply do the following:

Install other dependencies as usual with uv sync (or just running an example command, which automatically installs dependencies in pyproject.toml as needed).
Install FlashAttention-2 with uv pip install flash-attn --no-build-isolation-package flash-attn

Environments

Please see individual environment README files in src/act_prm/environments/ for any additional setup instructions.

Example Commands

Motivating Example: TextWorld (Treasure Hunter)

CUDA_VISIBLE_DEVICES=0 \
uv run python main.py \
--is_async \
--env_config act_prm/textworld_fs1 \
--eval_env_config textworld/treasure_hunter \
--generator_config default \
--trainer_config qwen3_4b_aprm100_sft200_rl200 \
--replay_buffer_config default \
--log_path ./logs \
--model_name Qwen/Qwen3-4B-Instruct-2507 \
--lora_rank 32 \
--seed 42 --replicate 5 --verbose

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
assets		assets
chat_templates		chat_templates
configs		configs
notebooks		notebooks
notes		notes
src/act_prm		src/act_prm
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Act-PRM (Tinker)

Setup

Dependencies

Tinker

Setting Environment Variables

PyTorch

FlashAttention-2

Environments

Example Commands

Motivating Example: TextWorld (Treasure Hunter)

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Act-PRM (Tinker)

Setup

Dependencies

Tinker

Setting Environment Variables

PyTorch

FlashAttention-2

Environments

Example Commands

Motivating Example: TextWorld (Treasure Hunter)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages