Multi-agent AI framework for atomistic materials simulation and discovery.
matsim-agents orchestrates large language models, machine-learned
interatomic potentials, and ASE-based atomistic workflows into a single
agentic loop. The user states a research objective in natural language;
agents plan, run HydraGNN-driven simulations, score chemical and
dynamical stability, and report the findings β with optional human
review at every gate.
The framework is backend-agnostic: HydraGNN is the default MLFF backend, but the relaxation tool, crystal-phase generator, and stability scorer are written so other potentials (MACE, NequIP, Orb, ...) can be plugged in via the same interfaces.
- Architecture
- Portability across DOE supercomputers
- Running on Frontier (OLCF)
- Running on Aurora (ALCF)
- Running on Perlmutter (NERSC)
- HPC Documentation Index
- Installation
- LLM backends
- Downloading models for vLLM
- Quick start
- Graph orchestration modes
- Hypothesis-driven discovery chat
- Programmatic API
- CLI reference
- Active-learning loop (HydraGNN β DFT)
- Codabench Competition
- Project layout
- Configuration reference
- Current capabilities and planned work
- Contributing
- License & citation
Reusable standalone workflow graphics:
ββββββββββββββββββββββββββββββββββββββββββββββββ
β USER β
β natural-language objective / chat dialogue β
βββββββββββββββββββββββββ¬βββββββββββββββββββββββ
β
βββββββββββββββββββββββββΌβββββββββββββββββββββββ
β LangGraph orchestration layer β
β β
β (A) run: planner -> executor -> uq_gate β
β -> [optional AL handoff] -> analyst β
β (B) supervisor-run: prepare -> explore β
β -> evaluate_uq -> [optional AL handoff] β
β -> summarize β
βββββββββββββββββββββββββ¬βββββββββββββββββββββββ
β tool calls
βββββββββββββββββββββββββΌβββββββββββββββββββββββ
β Discovery wrapper β
β composition parsing β phase enumeration β
β β relaxation (HydraGNN+ASE) β stability β
βββββββββββββββββββββββββ¬βββββββββββββββββββββββ
β
βββββββββββββββββββββββββΌβββββββββββββββββββββββ
β Atomistic backends β
β HydraGNN (fused MLFF + BranchWeightMLP) β
β ASE (FIRE / BFGS / BFGSLineSearch) β
β pymatgen (AFLOW prototype encyclopedia) β
β pyXtal (random symmetry-aware search) β
ββββββββββββββββββββββββββββββββββββββββββββββββ
flowchart TD
U[User objective or chat dialogue]
U --> R[run graph]
U --> C[chat REPL]
U --> S[supervisor graph]
subgraph RPATH[Core run path]
RP[planner] --> RE[executor]
RE --> RU[uq_gate]
RU -->|high confidence| RA[analyst]
RU -->|low confidence + policy enabled| AL[active learning loop]
AL --> RA
end
subgraph SPATH[Supervisor path]
SP[prepare] --> SX[explore]
SX --> SU[evaluate_uq]
SU -->|low confidence + policy enabled| AL
SU -->|otherwise| SS[summarize]
end
subgraph CPATH[Chat path]
CC[composition detection / optional relax] --> CU[uq policy]
CU -->|low confidence + policy enabled| AL
end
- Multi-agent orchestration with LangGraph: typed shared state, checkpointed steps, conditional routing, human-in-the-loop gates.
- Hypothesis-generation chat with any local LLM (Qwen 2.5 via Ollama by default).
- Optional multi-LLM hypothesis debate in chat: a proposer model drafts a
hypothesis response, a critic model challenges weak assumptions and missing
tests, and the proposer revises for one or more rounds (
--llm-peer-review,--critic-llm-*,--peer-review-rounds). - Automatic composition detection in user/LLM messages β when a new chemical formula is proposed, the system offers to run a substantial atomistic exploration.
- Optional single-structure relaxation inside discovery chat via
/relax <structure_path>. - Discovery-to-active-learning escalation policy: when branch-weight UQ indicates low confidence, discovery can hand off to AL automatically from the same run.
- Structured handoff audit artifacts: JSONL records of UQ metrics, thresholds, trigger rationale, and action (
not_triggered,triggered_dry_run,triggered_run). - Selectable surrogate backend for geometry relaxation:
- HydraGNN fused MLFF + branch-weight MLP stack (default).
- UMA (Universal Models for Atoms) via fairchem (
--mlp-backend uma). - Note: branch-weight UQ is specific to HydraGNN; UMA relaxations do not emit branch-weight metrics.
- Unified crystal-phase seed generation (
matsim_agents.discovery.seeds) combining two complementary sources into one ranked candidate list:- AFLOW prototype decoration β every entry of the pymatgen-bundled AFLOW encyclopedia (~288 prototypes covering all 230 3-D space groups and a wide range of stoichiometries: fcc, bcc, hcp, rocksalt, zincblende, wurtzite, fluorite, rutile, perovskite, spinel, Heusler, MAX phases, β¦) whose reduced stoichiometric ratios match the target is decorated with the target's elements. All symmetrically distinct element-to-Wyckoff assignments are enumerated (e.g. ABXβ vs BAXβ). No hand-coded prototype tables.
- pyXtal random search (
--n-random N, novelty pass) β for compositions with no AFLOW match (high-entropy alloys, exotic stoichiometries) or simply for novelty exploration, drawsNrandom crystals uniformly across the 230 space groups, respecting Wyckoff multiplicities and minimum-distance constraints. Seeds from this path are taggedneeds_dft_verification=Trueand called out as(novel)in the stability report so they get DFT-validated before any publication claim. Optional dependency; the pipeline silently degrades to prototypes-only if pyXtal is absent.
- Stability scoring: relative chemical stability (ΞE/atom rankings) and a dynamical-stability proxy (residual force tolerance).
- Local & HPC ready, portable across diverse DOE accelerators: same
Python entry points run on Frontier (OLCF, AMD MI250X), Aurora
(ALCF, Intel PVC), and Perlmutter (NERSC, NVIDIA A100), plus
Andes and laptops. The setup script delegates to HydraGNN's
installers and auto-relaxes HydraGNN's overly-tight
click==8.0.0/tqdm==4.67.1pins so the env is conflict-free on every site. - First-class DFT labellers built per platform: validated
build/run recipes for both VASP 6.6 (Frontier MI250X, Aurora PVC)
and Quantum ESPRESSO
pw.xGPU (Frontier MI250X via OpenMP target offload, Aurora PVC viaoneapi/openmp, Perlmutter A100 via CUDA). Build scripts and SLURM/PBS launchers are checked in for each site β see Portability across DOE supercomputers. - Pluggable LLMs: Ollama, vLLM, OpenAI, Anthropic via a single factory.
- Active-learning loop (
matsim-agents al run): HydraGNN-driven MD generates candidates β ensemble / MC-dropout uncertainty selects the most informative β a DFT backend (VASP 6.6 or Quantum ESPRESSOpw.x) labels them in parallel inside one SLURM allocation β dataset is grown and HydraGNN is retrained β repeat. The DFT backend is a single YAML toggle (dft.backend: vasp | qe); both share an INCAR-style template path (INCAR.template/pw.template). - LLM-generated MD seeds (
md.seed_source.kind: prompt): the LLM proposes plausible chemical compositions for a target objective and the loop materialises seed structures from common crystal prototypes, no curated POSCAR collection required. - Templated YAML configs:
${VAR},${VAR:-default},${VAR:?msg}shell-style substitution with optional in-filevars:block, so the same config can be re-targeted across users / scratch dirs / runs without editing it. - Two complementary graph entry points:
matsim-agents runfor planner-executor-analyst task execution.matsim-agents supervisor-runfor discovery exploration + UQ-based decisioning + optional AL handoff.
- Surrogate backend switch for AL/supervisor flows: choose HydraGNN or
UMA in the same YAML via
mlp.backend(commonlybackend: ${MLP_BACKEND:-hydragnn}) and run with:matsim-agents al run <config.yaml>(HydraGNN default)MLP_BACKEND=uma matsim-agents al run <config.yaml>(frozen UMA)MLP_BACKEND=uma matsim-agents supervisor-run <composition> ...(UMA in supervisor path)
Use this table to choose the right entry point quickly.
| Goal | Recommended entry point | Required inputs | Typical outputs |
|---|---|---|---|
| One-shot objective execution with planning, UQ gate, and summary | matsim-agents run |
natural-language objective, --logdir, --mlp-checkpoint, optional AL handoff flags |
planner/executor/UQ/analyst result; optional AL handoff + JSONL audit |
| Interactive hypothesis generation with optional atomistic exploration | matsim-agents chat |
--logdir, --mlp-checkpoint, LLM provider/model |
chat transcript + discovery artifacts under output_dir/discovery/ |
| Optional single structure relax during chat | matsim-agents chat + /relax <path> |
same as chat + structure path | one relaxation summary + optimized structure under output_dir/single_relax/ |
| Automated discovery -> UQ policy -> optional AL handoff | matsim-agents supervisor-run |
composition, --logdir, --mlp-checkpoint, optional --al-config |
supervisor summary + optional AL handoff + JSONL audit records |
| Standalone active-learning loop (MD -> UQ -> DFT -> retrain/frozen) | matsim-agents al run <config.yaml> |
AL YAML config (mlp, md, acquisition, dft, trainer, loop) + optional MLP_BACKEND=uma |
iteration state dirs, dataset, optional retrained model logdirs |
| Config-only validation (no execution) | matsim-agents al validate-config <config.yaml> |
AL YAML config + env vars used in ${VAR} placeholders |
resolved/validated JSON config dump |
| Paper-case feasibility check without iterative AL | python examples/paper_cases/singlepass.py --case <name> |
case name, MLP_LOGDIR, optional --dft |
per-case relaxed/ranked structures, optional DFT single-point validation |
Notes:
run,chat, andsupervisor-runcan escalate to AL using--al-configplus UQ policy flags.- Handoff audit artifacts default to
<output_dir>/discovery/al_handoff_events.jsonlunless overridden.
Use these as starter templates; replace paths with your environment.
# 1) Core graph: planner -> executor -> uq_gate -> analyst
matsim-agents run \
"Relax structures/mos2-B_Defect-Free_PBE.vasp and summarize results." \
--logdir /path/to/hydragnn_logdir \
--mlp-checkpoint /path/to/mlp_branch_weights.pt
# 1a) Core graph with UMA relaxation backend (no HydraGNN checkpoints needed)
matsim-agents run \
"Relax structures/mos2-B_Defect-Free_PBE.vasp and summarize results." \
--mlp-backend uma \
--uma-model-name uma-s-1p1 \
--uma-task omat
# 1b) Core graph with UQ-triggered AL handoff planning
matsim-agents run \
"Relax structures/mos2-B_Defect-Free_PBE.vasp and summarize results." \
--logdir /path/to/hydragnn_logdir \
--mlp-checkpoint /path/to/mlp_branch_weights.pt \
--trigger-al-handoff \
--al-config examples/active_learning/al_config.example.yaml \
--al-dry-run
# 2) Interactive discovery chat
matsim-agents chat \
--logdir /path/to/hydragnn_logdir \
--mlp-checkpoint /path/to/mlp_branch_weights.pt
# 2b) Interactive chat with proposer/critic multi-LLM hypothesis debate
matsim-agents chat \
--logdir /path/to/hydragnn_logdir \
--mlp-checkpoint /path/to/mlp_branch_weights.pt \
--llm-peer-review \
--critic-llm-provider ollama \
--critic-llm-model qwen2.5:14b \
--peer-review-rounds 2
# 2c) True multi-critic panel mode (multiple critics + cross-critique)
matsim-agents chat \
--logdir /path/to/hydragnn_logdir \
--mlp-checkpoint /path/to/mlp_branch_weights.pt \
--llm-peer-review \
--critic-panel-models "qwen2.5:14b,llama3.1:8b,mistral:7b" \
--critic-panel-providers "ollama,ollama,ollama" \
--peer-review-rounds 2 \
--critic-cross-critique
# 2a) Interactive discovery chat with UMA relaxations
matsim-agents chat \
--mlp-backend uma \
--uma-model-name uma-s-1p1 \
--uma-task omat
# 3) Supervisor orchestration with AL handoff planning
matsim-agents supervisor-run Li2MnO3 \
--logdir /path/to/hydragnn_logdir \
--mlp-checkpoint /path/to/mlp_branch_weights.pt \
--al-config examples/active_learning/al_config.example.yaml \
--al-dry-run
# 3a) Supervisor orchestration with UMA relaxations
matsim-agents supervisor-run Li2MnO3 \
--mlp-backend uma \
--uma-model-name uma-s-1p1 \
--uma-task omat \
--al-config examples/active_learning/al_config.example.yaml \
--al-dry-run
# 4) Standalone active learning run
matsim-agents al validate-config examples/active_learning/al_config.example.yaml
matsim-agents al run examples/active_learning/al_config.example.yaml
# 4b) Same AL config, UMA surrogate instead of HydraGNN
MLP_BACKEND=uma matsim-agents al run examples/active_learning/al_config.example.yaml
# 5) Paper-case single pass
python examples/paper_cases/singlepass.py --case lifepo4matsim-agents is designed to run the same Python code path on three
DOE leadership-class systems with three very different accelerators.
All heavy backends (HydraGNN MLFF inference/training, vLLM model
serving, VASP, and Quantum ESPRESSO) have validated build + launcher
recipes per site, with all platform-specific gotchas (toolchains, MPI
GTL pins, ROCm/Cray cross-builds, CUDA-aware MPI) baked in.
| Capability | Frontier (OLCF) MI250X | Aurora (ALCF) PVC | Perlmutter (NERSC) A100 |
|---|---|---|---|
| Hardware | AMD MI250X (gfx90a), 64-core EPYC | Intel Data Center GPU Max 1550 (PVC) | NVIDIA A100 (40/80 GB), AMD EPYC |
| HydraGNN venv | ROCm 7.2.0 + PyTorch | oneAPI + Intel Extension for PyTorch | CUDA 12 + PyTorch |
| vLLM model server | ROCm 7.2.0, source build | oneAPI | CUDA |
| VASP 6.6 | build-vasp-gpu-frontier.sh |
build-vasp-gpu-aurora.sh (vasp_std/vasp_gam/vasp_ncl) |
(use site module if available) |
Quantum ESPRESSO pw.x (GPU) |
OpenMP target offload to gfx90a | QE_GPU="openmp;oneapi", PVC arch |
CUDA build |
| Setup entry point | scripts/setup/frontier/install-rocm72.sh |
scripts/setup/aurora/install_matsim_aurora.sh |
scripts/setup/perlmutter/install_matsim_perlmutter.sh |
| Active-learning launcher | scripts/launchers/frontier/run-active-learning-frontier.sh |
(file-coupled via SLURM) | (file-coupled via SLURM) |
| Per-platform docs | docs/quantum-espresso-frontier.md | docs/quantum-espresso-aurora.md, docs/vasp-aurora.md | docs/quantum-espresso-perlmutter.md |
Single entry-point index covering all three systems:
docs/hpc-platforms.md.
Design principles that keep the code portable:
- DFT and Python/ML stacks are never co-loaded in the same shell on any platform β each uses its own module set, and the active- learning loop couples them through SLURM steps + the filesystem. This avoids the pervasive ABI/toolchain conflicts (Cray MPI GTL SONAMEs on Frontier, oneAPI vs PyTorch CUDA stack on Perlmutter, etc.) that otherwise break shared builds.
- Backend-agnostic active learning β the same
matsim-agents al rundriver works whether the labeller is VASP or QE, and on any of the three platforms, because the DFT backend is selected by a single YAML field (dft.backend: vasp | qe). - Templated YAML configs β
${VAR}/${VAR:-default}/${VAR:?msg}substitution lets one config file follow you between Frontier scratch, Aurora flare, and Perlmutter pscratch without edits.
β οΈ Frontier users β read this first: Seescripts/docs/frontier/README-frontier.mdfor required setup and known issues. Critically: a prebuilttvm_ffishared library must exist at$PROJ/cache/tvm-ffi/libtorch_c_dlpack_addon_torch211-rocm.so(where$PROJis your project's proj-shared directory) or every vLLM job will silently hang forever (the script preflight check will fail-fast in 2 seconds with a clear error message). If missing, rebuild withsbatch scripts/setup/frontier/prebuild-tvm-ffi-frontier.sh.
The repo also ships a fully reproducible recipe for building Quantum
ESPRESSO develop with AMD MI250X (gfx90a) OpenMP target offload:
- Build script:
scripts/setup/frontier/build-qe-gpu-frontier.sh - Run launcher:
scripts/launchers/frontier/run-pw-gpu-frontier.sh - Full docs:
docs/quantum-espresso-frontier.md - Platform index:
docs/hpc-platforms.md
The build is cross-compiled on a login node and produces ~92 binaries
(pw.x, cp.x, ph.x, pp.x, neb.x, epw.x, kcw.x, tddfpt/
turbo_* suite, pioud.x, all_currents.x, β¦) under
external/quantum-espresso/install-gpu/bin/ (gitignored). The recipe
includes baked-in workarounds for the cce/18.0.1 ftn-7991 ICE, the
PIOUD etime() link error (rewritten to F95 cpu_time), and the
rocm/7.x cray-mpich SONAME mismatch.
QE uses a different module stack than matsim-agents' Python; the two are deliberately kept isolated and coupled only through Slurm + files.
VASP 6.6 is also wired up on Frontier MI250X for the active-learning labeller path:
- Build script:
scripts/setup/frontier/build-vasp-gpu-frontier.sh - In-allocation step launcher (called by the AL loop):
scripts/launchers/frontier/_vasp-step-frontier.sh
As with QE, the proprietary VASP source itself is not committed;
only the build recipe is. The repository assumes you have a licensed
VASP source tree under external/vasp6/.
The repository also includes a validated build/run path for Quantum ESPRESSO with Intel GPU offload on Aurora.
- Build script:
scripts/setup/aurora/build-qe-gpu-aurora.sh - Run launcher:
scripts/launchers/aurora/run-pw-gpu-aurora.sh - Full docs:
docs/quantum-espresso-aurora.md - Platform index:
docs/hpc-platforms.md
Validated outcome in this repo:
- successful CMake build + install (exit code 0)
- 106 installed executables in
external/quantum-espresso/install-gpu/bin/ - core binaries verified:
pw.x,cp.x,ph.x,pp.x,epw.x
Quick run pattern:
bash scripts/launchers/aurora/run-pw-gpu-aurora.sh path/to/pw.inAurora QE and the Python/ML environment are intentionally isolated and typically coupled only via files and scheduler jobs.
For VASP on Aurora, the repository keeps only build provenance, not the vendor
source itself. The recorded makefile lineage is documented in
docs/vasp-aurora.md, including the upstream template
used (arch/makefile.include.oneapi_omp_off) and the local working makefile
path under external/vasp6/. The Aurora build entry point is
scripts/setup/aurora/build-vasp-gpu-aurora.sh,
which defaults to building vasp_std, vasp_gam, and vasp_ncl in one run.
Aurora supports vLLM-XPU serving and inference via the official ALCF frameworks module stack (Python 3.12, torch-xpu, ipex, vllm, ray, triton). The repo provides:
- Single-node smoke test:
scripts/smoke-tests/aurora/smoke-vllm-singlenode-aurora.sh - Advanced launchers:
scripts/advanced/aurora/job-serve-multinode-vllm-aurora.sh(multi-node Ray serve), plus single-relax, active-learning, and QE warmstart launchers
Key requirements and gotchas:
- PVC visibility: On Aurora compute nodes, bare
pythondoes NOT see the GPUs. Always wrap Python inmpiexec -n 1 --ppn 1(as in the smoke script) to expose XPUs via PALS. - Device mask: Use
ZE_FLAT_DEVICE_HIERARCHY=FLATand a non-dottedZE_AFFINITY_MASK(e.g.,0,1for TP=2). In FLAT, each tile is a root device; dotted notation (0.0,0.1) is only valid in COMPOSITE and will result indevice_count()=0in FLAT. - TMPDIR: PBS sets
$TMPDIRto a long path that exceeds the Unix socket limit for ZMQ IPC. Always setexport TMPDIR=/tmpbefore launching vLLM. - oneCCL KVS: Do NOT set
CCL_KVS_MODE=mpiorCCL_PROCESS_LAUNCHER=pmixfor vLLM. vLLM's multiproc_executor uses forked workers, not MPI ranks; oneCCL must use its default internal KVS over TCP. - Debug queue: The default
debugqueue has a per-user limit of 1 queued job and short walltime. For parallel jobs, useworkqorprod. - Model download: Place models in
$PROJ/models/(e.g., Mistral-Small-24B-Instruct-2501). Use the providedhf_download.pyscript if needed.
To run the smoke test:
- Build the vLLM XPU venv (if not already):
bash scripts/setup/aurora/install-vllm-xpu-aurora.sh- Download a supported model (e.g., Mistral-Small-24B):
source /path/to/hydragnn_venv/bin/activate
python scripts/setup/aurora/hf_download.py mistralai/Mistral-Small-24B-Instruct-2501- Submit the smoke test:
qsub scripts/smoke-tests/aurora/smoke-vllm-singlenode-aurora.sh
# or override model:
qsub -v SMOKE_MODEL_PATH=$PROJ/models/Qwen2.5-32B-Instruct scripts/smoke-tests/aurora/smoke-vllm-singlenode-aurora.sh- Inspect results in
runs/smoke-vllm-singlenode-<jobid>/.
If the job fails, check vllm.log for device mask, TMPDIR, or oneCCL errors. Each error layer is documented in the script comments.
For multi-node serving, see the advanced launchers in scripts/advanced/aurora/.
Perlmutter (NERSC, NVIDIA A100) is supported as a first-class target for both the Python/ML stack and Quantum ESPRESSO GPU.
- Setup overview:
scripts/setup/perlmutter/README.md - Matsim env install:
scripts/setup/perlmutter/install_matsim_perlmutter.sh - QE GPU build:
scripts/setup/perlmutter/build-qe-gpu-perlmutter.sh(CPU-only variant:build-qe-cpu-perlmutter.sh) - QE detailed build guide:
scripts/setup/perlmutter/QE-BUILD-GUIDE.md - Full QE docs:
docs/quantum-espresso-perlmutter.md - Launchers:
- QE
pw.xGPU:scripts/launchers/perlmutter/run-pw-gpu-perlmutter.sh - QE warm-start benchmark:
scripts/launchers/perlmutter/run-qe-warmstart-benchmark-perlmutter.sh - Single-node / multi-node / all-models LLM smoke tests:
launch-test-singlenode-resume-perlmutter.sh,launch-test-multinode-perlmutter.sh,launch-test-all-models-perlmutter.sh
- QE
Quick run pattern:
./scripts/launchers/perlmutter/run-pw-gpu-perlmutter.sh path/to/pw.inAs on Frontier and Aurora, the DFT module stack and the Python/ML environment are intentionally isolated and coupled only through Slurm
- files.
For a single entry point across Frontier, Aurora, Perlmutter, and model-serving
docs, see docs/hpc-platforms.md.
matsim-agents depends on HydraGNN (which itself wraps PyTorch + PyTorch
Geometric). The provided installer delegates the heavy install to
HydraGNN's official scripts so the same code path works on a laptop and
on a DOE supercomputer.
git clone git@code.ornl.gov:multi-agentic-ai-materials/matsim-agents.git
cd matsim-agents
# Local workstation (CPU or single GPU)
./scripts/setup_env.sh
# Frontier (OLCF, ROCm 7.2 β current standard)
bash scripts/setup/frontier/install-rocm72.sh
# Perlmutter (NERSC)
PLATFORM=perlmutter ./scripts/setup_env.shAvailable PLATFORM values for the generic setup_env.sh:
workstation (default), perlmutter, aurora, andes,
frontier-rocm71, frontier-rocm64 (legacy β the supported Frontier
path is scripts/setup/frontier/install-rocm72.sh).
The three Frontier-targeted backends in this repo do not all use the same ROCm version. The combinations below are what is actually wired up in the scripts and what you should expect at runtime:
| Backend | Module | Why this version |
|---|---|---|
| HydraGNN venv (used by every Frontier launcher: vLLM, HF, downloaders, smoke tests, six-model bench) | rocm/7.2.0 |
Current Frontier-supported PyTorch + ROCm path; built once into HydraGNN-Installation-Frontier-ROCm72/hydragnn_venv_rocm72/ |
| vLLM model server | rocm/7.2.0 |
Shares the HydraGNN ROCm 7.2 venv; built from source via scripts/setup/frontier/build-vllm-rocm72.sh |
| Quantum ESPRESSO GPU | rocm/6.2.4 (forced) |
Frontier's cray-mpich/8.1.31 GTL libmpi_gtl_hsa.so is hard-linked against libamdhip64.so.6 (rocm 6.x SONAME). rocm/7.x ships .so.7 and breaks the MPI Fortran link probe at CMake configure. Pin documented in docs/quantum-espresso-frontier.md. |
QE and the Python/ML stacks are deliberately never co-loaded in the same shell; they couple through Slurm + the filesystem.
Environment overrides accepted by the installer:
| Variable | Purpose | Default |
|---|---|---|
PYTHON |
Python interpreter | python3 |
HYDRAGNN_REPO |
HydraGNN git URL | https://github.com/ORNL/HydraGNN.git |
HYDRAGNN_REF |
Branch/tag/commit | main |
HYDRAGNN_DIR |
Reuse an existing HydraGNN checkout | third_party/HydraGNN |
HYDRAGNN_EXTRAS |
Args forwarded to install_dependencies.sh |
all dev |
LLM_BACKENDS |
Subset of ollama vllm openai anthropic huggingface |
ollama vllm |
BOOTSTRAP_OLLAMA |
Set to 1 to install the Ollama daemon, start it, and pull OLLAMA_MODELS (workstation only) |
0 |
OLLAMA_MODELS |
Space-separated list of models to pull when BOOTSTRAP_OLLAMA=1 |
qwen2.5:14b |
After the script finishes:
source .venv/bin/activate # workstation case
matsim-agents --helpTo bootstrap the local Ollama daemon and pull a model in one go:
BOOTSTRAP_OLLAMA=1 OLLAMA_MODELS="qwen2.5:14b llama3.1:8b" \
./scripts/setup_env.shSet the provider at runtime via CLI flag, environment variable, or in code. Local/open-source backends are the default.
For a detailed comparison of the two open-source local backends (vLLM vs HuggingFace Transformers + Accelerate) β including pros, cons, and guidance for Frontier (ROCm) β see docs/llm-backends-comparison.md.
| Provider | Install | Typical model | Notes |
|---|---|---|---|
ollama (default) |
brew install ollama && ollama pull qwen2.5:14b |
qwen2.5:14b, llama3.1:8b, deepseek-r1:14b |
Fully local, CPU/GPU/Metal. |
vllm |
Run a vLLM server (vllm serve <model> --port 8000) |
meta-llama/Llama-3.1-8B-Instruct |
OpenAI-compatible; great for HPC. |
openai |
pip install matsim-agents[openai] |
gpt-4o-mini |
Hosted. Set OPENAI_API_KEY. |
anthropic |
pip install matsim-agents[anthropic] |
claude-3-5-sonnet-latest |
Hosted. Set ANTHROPIC_API_KEY. |
huggingface |
pip install matsim-agents[huggingface] |
Qwen/Qwen2.5-72B-Instruct |
Direct HF Transformers + Accelerate; no server needed. Ideal as fallback on HPC when vLLM is unavailable. Set MATSIM_HF_MODEL_PATH to a local model directory. |
For the vLLM backend you need to download the model weights locally before
starting the server. The recommended model for matsim-agents on HPC is
Qwen/Qwen2.5-72B-Instruct. A quick one-liner using the hf CLI that ships
with huggingface_hub>=1.12:
hf download Qwen/Qwen2.5-72B-Instruct \
--local-dir /path/to/models/Qwen2.5-72B-InstructFor detailed instructions β including Frontier-specific steps, running the download as a background job, and resuming interrupted downloads β see docs/model-download.md.
Configuration knobs:
export MATSIM_LLM_PROVIDER=ollama # or vllm | openai | anthropic | huggingface
export MATSIM_OLLAMA_BASE_URL=http://... # optional
export MATSIM_VLLM_BASE_URL=http://node:8000/v1
export MATSIM_VLLM_API_KEY=EMPTY # only if vLLM is auth-protected
export MATSIM_HF_MODEL_PATH=/path/to/model # huggingface provider: local model dirmatsim-agents run \
"Relax structures/mos2-B_Defect-Free_PBE.vasp and report the final energy." \
--logdir ./multidataset_hpo-BEST6-fp64 \
--mlp-checkpoint ./mlp_branch_weights.pt \
--llm-provider ollama --llm-model qwen2.5:14bollama pull qwen2.5:14b
matsim-agents chat \
--logdir ./multidataset_hpo-BEST6-fp64 \
--mlp-checkpoint ./mlp_branch_weights.pt \
--n-random 50 --random-seed 0A typical session:
you> I want a Pb-free halide double perovskite for photovoltaics with band gap near 1.5 eV.
assistant> A promising candidate is Cs2AgBiBr6 ...
Proposed composition detected: AgBiBr6Cs2. Run HydraGNN-based phase exploration? [y/N]: y
>>> Exploring composition AgBiBr6Cs2
starting double_perovskite .../AgBiBr6Cs2_double_perovskite.vasp
done double_perovskite E=-365.4123 eV |F|max=0.0118 eV/Γ
steps=112
Stability report for AgBiBr6Cs2:
Predicted ground state: AgBiBr6Cs2_double_perovskite_optimized_structure.vasp
E/atom = -9.1353 eV |F|max = 0.012 eV/Γ
dynamically_stable_proxy = True
Chemical-stability proxy: PASS
you> Now suggest a Sb-substituted variant.
matsim-agents supervisor-run Li2MnO3 \
--logdir ./multidataset_hpo-BEST6-fp64 \
--mlp-checkpoint ./mlp_branch_weights.pt \
--al-config examples/active_learning/al_config.example.yaml \
--al-dry-runTo execute AL instead of dry-run, replace --al-dry-run with --al-run.
For compositions that have no AFLOW prototype match (e.g. 5-element high-entropy alloys), disable the prototype branch entirely and let pyXtal characterize the configuration space:
matsim-agents chat \
--logdir ./multidataset_hpo-BEST6-fp64 \
--mlp-checkpoint ./mlp_branch_weights.pt \
--n-random 200 --random-seed 42When the conversation introduces an unusual stoichiometry, the discovery
wrapper will report No AFLOW prototype match and rely on the pyXtal
pass; the resulting candidates are flagged (novel) in the stability
table.
The repository currently exposes two LangGraph workflows that share the same numerical kernels (discovery wrapper, relaxations, AL loop):
- Core agent graph (
matsim-agents run): planner -> executor -> uq_gate -> analyst, with optional run-path AL handoff when UQ policy is triggered. - Supervisor graph (
matsim-agents supervisor-run): prepare composition -> explore composition -> evaluate UQ -> optional active-learning handoff -> summarize.
This split keeps decision logic agentic while preserving deterministic, restart-friendly HPC kernels for heavy computation.
Four nodes share a typed MatSimState:
- planner β turns the objective into a list of
TaskSpecitems (kinds:relax,analyze,report). Uses the LLM with structured output; falls back to a deterministic plan when the LLM is unavailable. - executor β pops the next task, dispatches the matching tool
(currently
relax_structure), appends aRelaxationResultto the state, incrementsiteration. Routed back to itself until the queue drains ormax_iterationsis reached. - uq_gate β aggregates branch-weight confidence over relaxation
results and applies policy thresholds. If low-confidence criteria are
met and handoff is enabled, this node can launch active learning
(
--al-config,--al-dry-run/--al-run) and append structured handoff events to the state. - analyst β summarizes the accumulated results into a human-readable report (LLM-assisted when available, deterministic baseline otherwise), including handoff decisions/events when present.
State is checkpointed via LangGraph's MemorySaver, so every node
transition is replayable and inspectable.
The chat REPL is more than a wrapper around the LLM β it is a
closed loop between dialogue and atomistic simulation:
- The user and the assistant exchange messages about a target property.
- After each turn,
extract_compositionsscans both messages for chemical formulas (validates element symbols, reduces stoichiometry, ignores English words like "Carbon" or "Hello"). - For every newly-seen formula the user is asked (or
--auto-confirmis honored) whether to launch a substantial atomistic exploration. - The wrapper
explore_compositionthen:- generates seeds through the unified
matsim_agents.discovery.seeds.generate_seedsentry point, which combines:- AFLOW prototype decoration. Every prototype in the pymatgen-bundled AFLOW encyclopedia whose reduced stoichiometric ratios match the target composition is substituted with the target's elements. All symmetrically distinct element-to-placeholder assignments are enumerated (e.g. ABXβ vs BAXβ). This recovers fcc/bcc/hcp/rocksalt/zincblende/ wurtzite/fluorite/rutile/perovskite/spinel/Heusler/MAX/β¦ from a single uniform source β no per-stoichiometry rules.
- pyXtal random search (
--n-random N).Nrandom crystals are drawn uniformly across the 230 space groups, respecting Wyckoff multiplicities and minimum interatomic distances. Each such seed is taggedneeds_dft_verification=Trueand surfaced with a(novel)marker in the live table so it is treated as a candidate for follow-up DFT validation rather than a publishable claim. When the target composition has no AFLOW match (e.g. a 5-element high-entropy alloy), this is the only active source and--n-randomshould be raised accordingly.
- relaxes each seed with HydraGNN + ASE (FIRE/BFGS).
- scores chemical stability (ΞE/atom ranking, near-degeneracy
warning) and a dynamical-stability proxy (max residual force),
keeping the
source(prototypevsrandom) and AFLOWprototype_id/space_groupof every candidate in the report.
- generates seeds through the unified
- The summary is fed back into the conversation as a discovery user-turn payload so the LLM can refine its hypothesis on the next turn.
Discovery chat can also run two optional control actions:
- Single-structure relaxation command:
/relax path/to/structure.vasp
- UQ-based AL handoff (policy knobs on CLI):
--trigger-al-handoff/--no-trigger-al-handoff--al-config <base_al_yaml>--al-dry-run/--al-run--uq-top-weight-threshold--uq-min-unreliable-fraction--uq-min-relaxations-for-handoff--al-handoff-audit-path
If --al-handoff-audit-path is not set, handoff events default to:
<output_dir>/discovery/al_handoff_events.jsonl
Output artifacts per composition (under --output-dir):
outputs/discovery/<formula>/
seeds/ <formula>_<prototype_id>[_v<k>].vasp # AFLOW decoration variant k
<formula>_random_<sg>_<i>.vasp # pyXtal seed in space group <sg>
relaxed/ <formula>_<seed>_optimized_structure.vasp
<formula>_<seed>_optimization.traj # ASE trajectory
<formula>_<seed>_optimization.csv # per-step E, |F|max, branch weights
Seeds carry their provenance (source, prototype_id, space_group,
needs_dft_verification) on the PhaseCandidate Pydantic model so
downstream scorers can filter or weight them.
Honest caveats. The AFLOW prototype set covers known crystal topologies for stoichiometries that match an existing entry β exotic ratios fall back to the pyXtal random pass, which is novelty-oriented and intentionally flagged for DFT verification. The dynamical- stability check is a force-residual proxy, not a full phonon analysis; plug in phonopy for the rigorous version. For broader generative coverage (CALYPSO, USPEX, AIRSS, diffusion models, β¦) add a new branch to
generate_seedsβ every consumer already routes through that single entry point.Composition detection is regex-based. The chat REPL extracts target materials by pattern-matching chemical formulas (
Li2MnO3,CrMoNbTaW) in user and assistant text viaextract_compositions. This is intentionally minimal but has known failure modes:
- Fires on context, not intent.
"CO2 emissions are dominated by CaO formation","Following Smith et al. on BaTiO3 ferroelectrics we instead study SrTiO3", or"avoid the toxic As2O3 phase"will all trigger a prompt for the wrong (or every) formula. Space-group strings like"P3","C2/c", and DFT-functional names like"B3LYP"parse asP+3,C+2,B+3and pass the validator.- Misses verbal proposals.
"lithium manganate at the 2-1-3 stoichiometry","the Li-Mn-O ternary", Unicode subscripts (LiβMnOβ), parentheses-with-alternation ((Li,Na)2MnO3), and fractional stoichiometries (Li2Mn0.5Ni0.5O2) are not detected.- Cannot read polarity.
"DO NOT explore Li2MnO3"triggers the same prompt as"please explore Li2MnO3". The interactive y/N confirmation (or--auto-confirmfor batch runs) is what stands between these and a wasted multi-hour relaxation.The natural replacement is a tool-calling LLM that explicitly invokes
explore_composition(formula, rationale)when it actually means to compute the material, which would eliminate every class above; a migration of the chat REPL onto a LangGraphToolNodeis the right moment to do this.
from matsim_agents.tools.relaxation import RelaxStructureInput, _run
result = _run(RelaxStructureInput(
structure_path="structures/mos2.vasp",
logdir="./multidataset_hpo-BEST6-fp64",
mlp_checkpoint="./mlp_branch_weights.pt",
optimizer="FIRE",
maxiter=200,
))
print(result.final_energy_eV, result.optimized_structure_path)from matsim_agents.discovery import explore_composition
# Default: every applicable AFLOW prototype + 50 pyXtal random seeds.
result = explore_composition(
"Cs2AgBiBr6",
logdir="./multidataset_hpo-BEST6-fp64",
mlp_checkpoint="./mlp_branch_weights.pt",
output_dir="./outputs",
)
print(result.stability.summary)
# Prototype-only run (pyXtal pass disabled).
result = explore_composition(
"MoS2",
logdir="./multidataset_hpo-BEST6-fp64",
mlp_checkpoint="./mlp_branch_weights.pt",
output_dir="./outputs",
n_random=0,
)
# Novelty-heavy run for an exotic / high-entropy composition with no
# AFLOW match β rely entirely on pyXtal.
result = explore_composition(
"FeCoNiCrMn",
logdir="./multidataset_hpo-BEST6-fp64",
mlp_checkpoint="./mlp_branch_weights.pt",
output_dir="./outputs",
n_random=200,
random_seed=42,
)import uuid
from matsim_agents.graph import build_graph
from matsim_agents.state import MatSimState
graph = build_graph()
final = graph.invoke(
MatSimState(
objective="Relax structures/foo.vasp and summarize.",
llm_provider="ollama",
llm_model="qwen2.5:14b",
),
config={"configurable": {
"thread_id": str(uuid.uuid4()),
"logdir": "./multidataset_hpo-BEST6-fp64",
"mlp_checkpoint": "./mlp_branch_weights.pt",
}},
)
print(final["analysis"])from matsim_agents.supervisor import SupervisorConfig, run_supervisor
final = run_supervisor(SupervisorConfig(
composition="Li2MnO3",
logdir="./multidataset_hpo-BEST6-fp64",
mlp_checkpoint="./mlp_branch_weights.pt",
output_dir="./outputs",
trigger_active_learning_on_high_uq=True,
active_learning_config="examples/active_learning/al_config.example.yaml",
active_learning_dry_run=True,
))
print(final.get("summary"))from matsim_agents.chat import DiscoveryChatConfig, DiscoveryChatSession, chat_once
session = DiscoveryChatSession(config=DiscoveryChatConfig(
logdir="./multidataset_hpo-BEST6-fp64",
mlp_checkpoint="./mlp_branch_weights.pt",
output_dir="./outputs",
llm_model="qwen2.5:14b",
auto_confirm=True,
))
reply = chat_once(session, "Propose a Pb-free perovskite for PV.")matsim-agents run OBJECTIVE [options] # planner -> executor -> uq_gate -> analyst
matsim-agents plan OBJECTIVE # show the planner's task list
matsim-agents chat [options] # interactive discovery REPL
matsim-agents supervisor-run COMPOSITION [options] # discovery -> UQ -> optional AL handoff
matsim-agents al run CONFIG.yaml # active-learning loop (HydraGNN <-> DFT)
matsim-agents al validate-config CONFIG.yaml # parse + dump resolved config as JSON
Common options (all commands that touch HydraGNN):
| Flag | Description |
|---|---|
--logdir PATH |
HydraGNN logdir with config.json and checkpoint. |
--mlp-checkpoint PATH |
BranchWeightMLP .pt file. |
--checkpoint NAME |
HydraGNN checkpoint filename or absolute path. |
--mlp-device {cuda,cpu} |
Device for the auxiliary MLP. |
--precision {fp32,fp64,bf16} |
HydraGNN precision override. |
--mlp-precision {fp32,fp64,bf16} |
MLP precision override. |
--llm-provider {ollama,vllm,openai,anthropic,huggingface} |
Chat backend. |
--llm-model NAME |
Provider-specific model identifier. |
--llm-base-url URL |
Override server URL (Ollama / vLLM). |
chat-specific:
| Flag | Description |
|---|---|
--output-dir PATH |
Where discovery artifacts are written (default ./outputs). |
--ase-structure-optimizer {FIRE,BFGS,BFGSLineSearch} |
ASE optimizer for relaxations. |
--maxiter INT |
Max relaxation steps per seed (default 200). |
--fmax FLOAT |
Stop relaxation when max residual force is below this (eV/Γ
, default 0.02). |
--n-random INT |
Number of supplementary pyXtal random structures per composition, in addition to every applicable AFLOW prototype decoration (default 50). Set to 0 to disable the pyXtal pass; silently degrades to 0 if pyXtal is not installed. |
--random-seed INT |
RNG seed for the pyXtal sampler (reproducibility). |
--auto-confirm / --ask |
Skip the y/N prompt for every detected composition. |
--trigger-al-handoff / --no-trigger-al-handoff |
Enable or disable UQ-driven escalation to active learning. |
--al-config PATH |
Base AL YAML used when handoff is triggered. |
--al-dry-run / --al-run |
Plan/report AL handoff only, or execute AL loop. |
--uq-top-weight-threshold FLOAT |
Trigger handoff when mean top branch weight is below this value. |
--uq-min-unreliable-fraction FLOAT |
Trigger handoff when the low-confidence fraction exceeds this value. |
--uq-min-relaxations-for-handoff INT |
Minimum number of relaxations before evaluating handoff policy. |
--al-handoff-audit-path PATH |
Optional JSONL path for UQ and handoff audit artifacts. |
run-specific:
| Flag | Description |
|---|---|
OBJECTIVE |
Natural-language task objective for planner/executor. |
--max-iterations INT |
Maximum executor iterations before forcing analysis. |
--trigger-al-handoff / --no-trigger-al-handoff |
Enable or disable UQ-driven AL escalation after run relaxations. |
--al-config PATH |
Base AL YAML used when run-path handoff is triggered. |
--al-dry-run / --al-run |
Plan/report run->AL handoff only, or execute AL loop. |
--uq-top-weight-threshold FLOAT |
Trigger handoff when mean top branch weight is below this value. |
--uq-min-unreliable-fraction FLOAT |
Trigger handoff when low-confidence fraction exceeds this value. |
--uq-min-relaxations-for-handoff INT |
Minimum relaxations before evaluating run-path handoff policy. |
--al-handoff-audit-path PATH |
Optional JSONL path for UQ and run->AL handoff audit artifacts. |
supervisor-run-specific:
| Flag | Description |
|---|---|
COMPOSITION |
Target composition for one supervisor pass (e.g. Li2MnO3). |
--trigger-al-handoff / --no-trigger-al-handoff |
Enable or disable UQ-driven AL handoff policy. |
--al-config PATH |
Base AL YAML used for optional handoff execution. |
--al-dry-run / --al-run |
Dry-run handoff planning or real AL execution. |
--uq-top-weight-threshold FLOAT |
UQ threshold on mean top branch weight. |
--uq-min-unreliable-fraction FLOAT |
UQ threshold on low-confidence fraction. |
--uq-min-relaxations-for-handoff INT |
Min relaxations required before handoff is considered. |
--al-handoff-audit-path PATH |
Optional JSONL path for decision artifacts. |
The matsim-agents al subcommand runs an end-to-end active-learning loop
that grows a HydraGNN training set from DFT labels of structures the
current model is most uncertain about. Both VASP 6.6 and Quantum
ESPRESSO pw.x are supported as the labeller β the choice is a single
YAML field.
HydraGNN MLFF ββ MD βββΊ candidates βββββββββββββββββββββββββββββββββββββ
β² β β
β βΌ β
β ensemble / MC-dropout β
β uncertainty + diversity β
β β β
β βΌ β
β top-K most informative β
β β β
β βΌ β
β DFT backend (parallel, in-allocation) β
β vasp_std β pw.x (one toggle) β
β β β
β βΌ β
β dataset.extxyz / dataset.db (tagged with backend) β
β β β
β βΌ β
ββ retrain HydraGNN ββ next iteration ββββββββββββββββββββββββββββββ
# 1. Edit the templated example, or override via env vars at runtime
export PROJ_ROOT=$PWD
export RUNS_ROOT=/lustre/orion/<proj>/scratch/$USER/runs
export RUN_TAG=al-mptrj-001
export DFT_BACKEND=qe # or: vasp
# 2. Validate the resolved config (no run)
matsim-agents al validate-config examples/active_learning/al_config.example.yaml
# 3. Submit on Frontier
sbatch --export=ALL,AL_CONFIG=$PWD/examples/active_learning/al_config.example.yaml \
-N 64 -t 12:00:00 \
scripts/launchers/frontier/run-active-learning-frontier.shThe example YAML carries both backend sub-blocks; flip dft.backend: to
select one. The unused sub-block is ignored.
dft:
backend: ${DFT_BACKEND:-vasp} # vasp | qe
vasp:
vasp_bin: ${VASP_BIN}
potcar_dir: ${POTCAR_DIR}
incar_template: ${PROJ_ROOT}/examples/active_learning/INCAR.template
qe:
pw_bin: ${PW_BIN}
pseudo_dir: ${PSEUDO_DIR}
pw_template: ${PROJ_ROOT}/examples/active_learning/pw.templateAll AL example configs use shell-style placeholders that are expanded
at load time by ALConfig.from_yaml:
| Syntax | Meaning |
|---|---|
${VAR} |
required; raises if unset |
${VAR:-default} |
falls back to default if unset |
${VAR:?error message} |
aborts with error message |
Resolution order: (1) os.environ, (2) optional top-level vars:
block in the YAML itself. Nested references inside vars: resolve
iteratively, so VASP_BIN: ${PROJ_ROOT}/external/.../vasp_std just
works. The vars: block is consumed before pydantic validation and
never appears in the parsed ALConfig.
md.seed_source.kind selects how initial MD structures are obtained:
pathsβ a curated list of POSCAR / CIF / XYZ files on disk.promptβ the LLM proposes plausible compositions for a target objective (e.g. βPb-free halide perovskites for PVβ) and the loop materialises seed structures by running the same crystal-prototype enumerator used by the discovery wrapper. No curated structure collection is required.
VASP PAW totals and QE pseudopotential totals are not directly
comparable. Every frame written to the dataset is tagged with
info["dft_backend"]; never train one HydraGNN model on a mixed
VASP+QE dataset without an explicit per-backend energy offset.
Full walkthrough β including templated INCAR / pw.in files, in-allocation
launcher details, and per-backend ROCm/MPI gotchas β lives in
examples/active_learning/README.md.
The codabench_competition/ directory contains a fully self-contained
Codabench challenge called the
Matsim-Agents Materials Discovery Challenge.
159 atomistic test structures spanning 11 material classes β 2D monolayers, intermetallics, BCC/FCC high-entropy alloys, catalysis slabs, critical minerals, high-entropy ceramics, MAX phases, nuclear oxides, perovskites, thermoelectrics β each available in ideal, vacancy, antisite, and interstitial variants. Tasks cover:
| # | Task | Metric |
|---|---|---|
| 1 | Formation energy prediction | MAE (eV/atom) β |
| 2 | Atomic force prediction | MAE (eV/Γ ) β |
| 3 | ML structure relaxation | RMSD vs DFT geometry (Γ ) β |
| 4 | AI-accelerated DFT relaxation | RMSD + energy MAE β |
| 5 | Phase stability ranking | Mean Spearman Ο β |
The overall score is a weighted average mapped to [0, 1]; tasks with no submission are excluded (not penalised).
To prevent participants from reverse-engineering the reference labels by repeatedly probing the leaderboard, the 159 test structures are split into two partitions:
| Partition | Size | When visible |
|---|---|---|
| Public | 51 structures (~30 %) | Always β during the competition |
| Private | 108 structures (~70 %) | Only at competition close (final ranking) |
The split is deterministic and reproducible (SEED=42, stratified by chemical
formula so every formula has β₯ 1 structure in each partition). The
reference_data/public_ids.txt and reference_data/private_ids.txt files
record which structure IDs belong to each partition.
The scoring program (scoring_program/score.py) computes metrics for both
partitions and emits public_* and private_* keys to scores.json. The
Codabench leaderboard is configured to display only public_* columns during
the competition. To switch to final ranking, change the key prefix from
public_ β private_ in competition.yaml.
Submission rate limit: 3 submissions per day, enforced via
max_submissions_per_day: 3 in competition.yaml.
Four baselines are provided in codabench_competition/baselines/:
| Baseline | Architecture | Source |
|---|---|---|
| MACE-MP-0 | Equivariant GNN (MACE) | Universal MLIP (Cambridge) |
| HydraGNN | Multi-headed graph NN | This repo / ORNL |
UMA (uma-s-1p2) |
Transformer-based universal model | Meta / fairchem |
AllScAIP (allscaip-md-conserving-all-omol) |
Message-passing NN | Meta / OMol25 |
Run any or all baselines:
cd codabench_competition
python run_baselines.py --model mace # MACE-MP-0
python run_baselines.py --model hydragnn # HydraGNN
python run_baselines.py --model uma # UMA (requires fairchem-core β₯2.20)
python run_baselines.py --model allscaip # AllScAIP (requires fairchem-core β₯2.20)
python run_baselines.py --model all --relax # all baselines incl. relaxation (Tasks 3/4)UMA and AllScAIP require the fairchem-core package and the model checkpoints
(downloaded on first use from HuggingFace β the relevant model cards must be
accepted before use at https://huggingface.co/facebook/UMA and
https://huggingface.co/facebook/OMol25).
codabench_competition/
βββ competition.yaml # Codabench bundle manifest & leaderboard config
βββ run_baselines.py # entry point: --model mace/hydragnn/uma/allscaip/all
βββ evaluate.py # local evaluation helper (mirrors the Codabench scorer)
βββ requirements.txt # Python deps for the competition bundle
βββ install_mace_aurora.sh # MACE-MP-0 install helper for Aurora (XPU)
βββ fix_h5py_system_conflict_aurora.sh # h5py/HDF5 conflict workaround (Aurora)
βββ baselines/
β βββ mace_mp0/model.py # MACE-MP-0 baseline
β βββ hydragnn/model.py # HydraGNN baseline
β βββ uma/model.py # UMA (fairchem) baseline
β βββ allscaip/model.py # AllScAIP (fairchem) baseline
βββ scoring_program/
β βββ score.py # Codabench scorer (public + private partitions)
βββ reference_data/
β βββ public_ids.txt # 51 structure IDs in the public partition
β βββ private_ids.txt # 108 structure IDs in the private partition
β βββ create_split.py # reproducible split generator (SEED=42)
β βββ formation_energies.csv # DFT reference energies (server-side, not public)
β βββ elemental_energies.json # elemental DFT references (published to participants)
β βββ forces/ # per-structure force arrays (server-side, not public)
βββ public_data/
β βββ generate_structures.py # generates the 159 test structures
β βββ structures_metadata.csv # anonymised MATS-XXXX β class / formula mapping
β βββ structures/ # XYZ files of all test structures
βββ starting_kit/
βββ README.md # participant guide (tasks, formats, scoring)
βββ MODEL_INTERFACE.md # how to write a custom MLIP adapter
See codabench_competition/starting_kit/README.md
for the full participant guide including submission formats.
matsim-agents/
βββ pyproject.toml
βββ docs/
β βββ hpc-platforms.md # single index across Frontier/Aurora/Perlmutter
β βββ llm-backends-comparison.md # vLLM vs HF Transformers on ROCm
β βββ model-download.md # HF model download how-to
β βββ quantum-espresso-frontier.md # QE GPU build/run on Frontier (MI250X)
β βββ quantum-espresso-aurora.md # QE GPU build/run on Aurora (PVC)
β βββ quantum-espresso-perlmutter.md # QE GPU build/run on Perlmutter (A100)
β βββ vasp-aurora.md # VASP 6.6 makefile lineage on Aurora
βββ scripts/
β βββ setup_env.sh # workstation / legacy HPC env install
β βββ setup/
β β βββ frontier/ # Frontier (OLCF, MI250X) installers
β β β βββ install-rocm72.sh # vLLM ROCm 7.2 master install
β β β βββ install_matsim_frontier.sh # matsim-agents env on Frontier
β β β βββ prebuild-tvm-ffi-frontier.sh
β β β βββ build-vllm-rocm72.sh # vLLM source build
β β β βββ build-qe-cpu-frontier.sh # Quantum ESPRESSO CPU build
β β β βββ build-qe-gpu-frontier.sh # Quantum ESPRESSO MI250X build
β β β βββ build-vasp-gpu-frontier.sh # VASP 6.6 MI250X build
β β β βββ frontier-module-stack.sh # shared module-load helpers
β β βββ aurora/ # Aurora (ALCF, Intel PVC) installers
β β β βββ install_matsim_aurora.sh
β β β βββ setup_matsim_aurora.sh
β β β βββ build-qe-cpu-aurora.sh
β β β βββ build-qe-gpu-aurora.sh # QE PVC build (oneapi+openmp)
β β β βββ build-vasp-gpu-aurora.sh # VASP 6.6 PVC build (vasp_std/_gam/_ncl)
β β βββ perlmutter/ # Perlmutter (NERSC, A100) installers
β β βββ install_matsim_perlmutter.sh
β β βββ setup_matsim_perlmutter.sh
β β βββ build-qe-cpu-perlmutter.sh
β β βββ build-qe-gpu-perlmutter.sh # QE A100 CUDA build
β β βββ perlmutter-module-stack.sh
β β βββ QE-BUILD-GUIDE.md
β βββ launchers/
β β βββ frontier/ # Frontier sbatch launchers
β β β βββ run-active-learning-frontier.sh # `matsim-agents al run` driver
β β β βββ _vasp-step-frontier.sh # in-allocation VASP step
β β β βββ _qe-step-frontier.sh # in-allocation QE step
β β β βββ _hydragnn-train-step-frontier.sh
β β β βββ run-pw-gpu-frontier.sh # QE pw.x GPU launcher
β β β βββ run-qe-warmstart-benchmark.sh
β β β βββ launch-test-singlenode-resume-frontier.sh
β β β βββ launch-test-multinode-frontier.sh
β β β βββ launch-test-all-models-frontier.sh
β β βββ aurora/
β β β βββ run-pw-gpu-aurora.sh # QE pw.x GPU launcher
β β βββ perlmutter/
β β βββ run-pw-gpu-perlmutter.sh
β β βββ run-vasp-gpu-perlmutter.sh
β β βββ run-qe-warmstart-benchmark-perlmutter.sh
β β βββ launch-test-singlenode-resume-perlmutter.sh
β β βββ launch-test-multinode-perlmutter.sh
β β βββ launch-test-all-models-perlmutter.sh
β βββ smoke-tests/
β β βββ frontier/
β β β βββ smoke-vllm-singlenode-frontier.sh
β β β βββ smoke-vllm-multinode-frontier.sh
β β β βββ smoke-transformers-frontier.sh
β β βββ aurora/
β β β βββ smoke-vllm-singlenode-aurora.sh # vLLM-XPU single-node smoke (qsub)
β β βββ perlmutter/
β β βββ smoke-transformers-perlmutter.sh
β β βββ smoke-transformers-multinode-perlmutter.sh
β β βββ _torchrun_smoke_loader.py
β βββ advanced/
β β βββ frontier/ # Frontier multi-step sbatch job scripts
β β β βββ job-serve-multinode-frontier.sh
β β β βββ job-discovery-chat-frontier.sh
β β β βββ job-discovery-chat-vllm-frontier.sh
β β β βββ job-single-relaxation-frontier.sh
β β β βββ job-active-learning-uq-frontier.sh
β β β βββ job-qe-warmstart-frontier.sh
β β β βββ job-sequential-benchmark-frontier.sh
β β β βββ job-six-model-benchmark-frontier.sh
β β βββ aurora/ # Aurora multi-step qsub job scripts
β β β βββ job-serve-multinode-aurora.sh
β β β βββ job-serve-multinode-vllm-aurora.sh
β β β βββ job-discovery-chat-aurora.sh
β β β βββ job-single-relaxation-aurora.sh
β β β βββ job-active-learning-uq-aurora.sh
β β β βββ job-qe-warmstart-aurora.sh
β β β βββ _mpi_xpu_loader.py
β β βββ perlmutter/ # Perlmutter multi-step sbatch job scripts
β β βββ job-discovery-chat-perlmutter.sh
β β βββ job-single-relaxation-perlmutter.sh
β β βββ job-active-learning-uq-perlmutter.sh
β β βββ job-qe-warmstart-perlmutter.sh
β βββ docs/
β βββ frontier/ # Frontier-specific docs
β βββ README-frontier.md
β βββ README-six-model-benchmark.md
βββ src/matsim_agents/
β βββ state.py # typed shared LangGraph state
β βββ graph.py # planner -> executor -> uq_gate -> analyst
β βββ llm.py # Ollama | vLLM | OpenAI | Anthropic | HuggingFace
β βββ cli.py # `matsim-agents run|plan|chat|supervisor-run|al`
β βββ supervisor.py # LangGraph supervisor (discovery -> UQ -> optional AL handoff)
β βββ chat.py # interactive discovery REPL
β βββ agents/
β β βββ planner.py
β β βββ executor.py
β β βββ analyst.py
β βββ tools/
β β βββ relaxation.py # HydraGNN + ASE relaxation tool
β β βββ qe_relax.py # Quantum ESPRESSO pw.x relaxer (scf|relax|vc-relax)
β β βββ vasp_relax.py # VASP relaxer (scf|relax|vc-relax|vc-relax-shape)
β β βββ warmstart_benchmark_qe.py # HydraGNN warm-start vs cold-start QE benchmark
β β βββ warmstart_benchmark_vasp.py # HydraGNN warm-start vs cold-start VASP benchmark
β βββ discovery/
β βββ composition.py # formula parsing
β βββ seeds.py # crystal-phase seed generation (AFLOW + pyXtal)
β βββ stability.py # ΞE/atom ranking & |F|max proxy
β βββ wrapper.py # explore_composition()
β βββ active_learning/ # HydraGNN <-> DFT active-learning loop
β βββ config.py # pydantic schema + ${VAR} substitution
β βββ loop.py # top-level driver (matsim-agents al run)
β βββ candidates.py # MD sampling + per-step candidate capture
β βββ uncertainty.py # ensemble / MC-dropout scoring + diversity
β βββ seeds.py # paths or LLM-prompted seed materialisation
β βββ trainer.py # HydraGNN retraining wrapper
β βββ dft_backend.py # backend-agnostic Protocol
β βββ dft_runner.py # in-allocation parallel job dispatcher
β βββ vasp_io.py # POSCAR/INCAR/KPOINTS/POTCAR writers + parser
β βββ backends/
β βββ vasp.py # VASP 6.6 single-point labeller
β βββ qe.py # Quantum ESPRESSO pw.x single-point labeller
βββ examples/
β βββ single_relaxation.py
β βββ discovery_chat.py
β βββ active_learning/
β βββ al_config.example.yaml # unified VASP+QE templated config
β βββ al_config.prompt.example.yaml # LLM-seeded variant
β βββ INCAR.template # VASP single-point template
β βββ pw.template # QE pw.in namelist template
β βββ README.md
βββ tests/
β βββ test_state_and_graph.py
β βββ test_discovery.py
β βββ test_phase_explorer.py
β βββ test_al_config.py # AL config: ${VAR} substitution + validators + legacy shims
β βββ test_al_uncertainty.py # acquisition strategies (ensemble / random / FPS)
β βββ test_al_seeds.py # seed resolution: paths + LLM-prompted (stubbed)
β βββ test_vasp_relax.py # vasp_relax driver + parser
β βββ integration/
β βββ test_al_loop_dryrun.py # one full AL iteration, all heavy parts mocked
β βββ test_qe_warmstart.py # end-to-end QE warm-start (env-gated)
β βββ test_vasp_warmstart.py # end-to-end VASP warm-start (env-gated)
βββ external/ # gitignored: large external builds
β βββ quantum-espresso/ # src/, build-gpu/, install-gpu/
βββ third_party/HydraGNN/ # cloned by setup_env.sh
| Field | Type | Purpose |
|---|---|---|
objective |
str |
Free-form research goal. |
plan |
list[TaskSpec] |
Tasks emitted by the planner. |
pending_tasks |
list[TaskSpec] |
Queue consumed by the executor. |
results |
list[RelaxationResult] |
Accumulated tool outputs. |
analysis |
str | None |
Final analyst summary. |
iteration / max_iterations |
int |
Executor loop guard. |
llm_provider / llm_model / llm_base_url |
str | None |
LLM selection. |
TaskSpec(
kind="relax", # relax | analyze | report
structure_path="foo.vasp",
optimizer="FIRE", # FIRE | BFGS | BFGSLineSearch
maxiter=200,
maxstep=1e-2,
charge=0.0,
spin=0.0,
random_displacement=False,
)See src/matsim_agents/tools/relaxation.py β fields mirror the
options of the upstream HydraGNN ASE script
(structure_optimization_ASE.py).
For cases where the user wants a real DFT relaxation rather than the
cheap HydraGNN one (e.g. validating a discovered structure, refining a
final candidate), two sibling drivers ship under src/matsim_agents/tools/
with matching APIs:
| Module | Backend | Calculation modes | Composition-aware defaults |
|---|---|---|---|
qe_relax.py |
Quantum ESPRESSO pw.x |
scf, relax, vc-relax |
ecutwfc (SSSP-PBE-eff-1.3 table), smearing, k-mesh |
vasp_relax.py |
VASP vasp_std |
scf, relax, vc-relax, vc-relax-shape |
ENCUT = 1.3 Γ max(ENMAX) from POTCARs (else 520 eV); ISMEAR/SIGMA/KSPACING flip metallic vs insulator |
Both follow the same workflow:
from ase.build import bulk
from matsim_agents.tools.vasp_relax import (
recommend_settings, prepare_relax_workdir, run_vasp,
)
atoms = bulk("Si")
settings = recommend_settings(atoms, potcar_dir="/path/to/potcars",
calculation="vc-relax")
workdir = prepare_relax_workdir(atoms, "./Si_vcrelax", settings,
potcar_dir="/path/to/potcars")
result = run_vasp(workdir, launcher_cmd=["bash", "run-vasp-frontier.sh"])
print(result.final_energy_eV, result.n_ionic_steps, result.converged)qe_relax has the same shape; both honour an env-overridable launcher
(MATSIM_QE_LAUNCHER / MATSIM_VASP_LAUNCHER) and parse the per-ionic-step
trajectory + walltime + convergence flag from the native output files
(pw.out for QE, vasprun.xml + OUTCAR for VASP).
Note: the active-learning loop itself never calls these relaxers β AL labelling always uses the SCF-only backends under
src/matsim_agents/active_learning/backends/. A relaxation per AL candidate would defeat the point of uncertainty-driven sampling. The standalone relaxers are intended for one-off DFT validation work outside the AL pipeline.
A second pair of sibling drivers wraps the standalone relaxers in a "cold start vs HydraGNN-warm start" experiment and emits a JSON summary that the integration tests consume:
| Module | Backend | CLI |
|---|---|---|
warmstart_benchmark_qe.py |
Quantum ESPRESSO pw.x |
python -m matsim_agents.tools.warmstart_benchmark_qe β¦ |
warmstart_benchmark_vasp.py |
VASP vasp_std |
python -m matsim_agents.tools.warmstart_benchmark_vasp β¦ |
Each driver runs (1) HydraGNN ASE relaxation, (2) DFT relaxation from the
original coordinates (cold), (3) DFT relaxation from the
HydraGNN-relaxed coordinates (warm), then reports Ξ ionic-steps,
Ξ total-SCF-iterations, Ξ energy, and a warm_helped boolean. If
HydraGNN is unavailable (or --skip-hydragnn is passed) only the cold
DFT run is executed and the warm block is left None.
This section spells out what the framework does today and what is on the roadmap but not yet implemented, so users know what to expect before building a workflow on top of it.
- Single-point energies and forces from a HydraGNN MLFF checkpoint through an ASE calculator interface.
- Geometry relaxation of atoms and (optionally) cell, driven by
HydraGNN through the upstream
structure_optimization_ASE.pywrapper. - Isotropic lattice scans to locate equilibrium volume / lattice constant.
- Random-shuffle ordering enumeration for disordered sites,
deduplicated with pymatgen's
StructureMatcher. - AA-stacked 2-D multilayer construction.
- Relative chemical-stability scoring (energy-above-hull style comparisons within the explored phase set).
- LLM-driven planner / executor / reporter agents (LangGraph) with optional human-in-the-loop gates.
- Pluggable LLM backends: vLLM (Frontier ROCm), Hugging Face Transformers, and OpenAI-compatible HTTP endpoints.
- Active-learning loop with HydraGNN as the surrogate and either
VASP 6.6 or Quantum ESPRESSO
pw.xas the DFT labeller, selectable via a singledft.backend:YAML field. Includes ensemble / MC-dropout uncertainty scoring, in-allocation parallel DFT dispatch, templated INCAR /pw.ininputs, and shell-style${VAR}/${VAR:-default}substitution in all YAML configs. - LLM-generated MD seeds as a first-class seed source
(
md.seed_source.kind: prompt).
- Phonon-based dynamical stability (phonopy / finite differences).
- Formation-energy reference set for absolute (not relative) chemical-stability scoring.
- Richer phase enumeration via pymatgen prototypes / CALYPSO / USPEX hooks.
- Symmetry-aware ordering enumeration via
enumlib(today's enumerator is random-shuffle +StructureMatcherdedup). - Anisotropic / per-axis lattice scans (today's scan is isotropic only).
- AB / AA' stacking for 2-D multilayers (today's builder is AA-stacked only).
- 2-D heterostructures (e.g. graphene/h-BN, MoSβ/WSeβ) with lattice-mismatch search.
- MD agent: NVT/NPT runs with the same HydraGNN calculator.
- MCP tool server so external clients (Claude Desktop, IDE agents) can call the discovery wrapper directly.
- Distributed executor for parallel composition exploration on HPC.
- Pluggable MLFF backends (MACE, NequIP, Orb) behind the same calculator interface.
- Fork and create a feature branch.
pip install -e .[dev]pytestandruff check .before pushing.- Open a merge request on code.ornl.gov/multi-agentic-ai-materials/matsim-agents.
Released under the BSD 3-Clause License (see LICENSE).
If you use matsim-agents in academic work, please cite both this
repository and HydraGNN:
HydraGNN: Distributed PyTorch implementation of multi-headed graph convolutional neural networks, Copyright ID #81929619, https://doi.org/10.11578/dc.20211019.2
Maintained by the ORNL Multi-Agentic AI for Materials team.