Sören Arlt, Haonan Duan, Felix Li, Sang Michael Xie, Yuhuai Wu, Mario Krenn
This repository implements a sequence-to-sequence transformer for generating meta-solutions—Python programs that generalize experimental designs for entire classes of problems. The implementation builds on the simple and effective framework of NanoGPT for transformer training, but is tailored to tasks involving the meta-design of experiments. Specifically, it enables the creation of scalable, interpretable solutions for designing quantum systems and other structured tasks.
This project trains a sequence-to-sequence transformer to:
- Generate synthetic data based on predefined rules and random program generation.
- Train a transformer model to map quantum or structured states to executable Python programs.
- Evaluate the model’s ability to extrapolate to unseen tasks by sampling and analyzing generated solutions.
This approach enables the discovery of interpretable solutions that generalize across complex problem spaces, offering insights and capabilities beyond conventional optimization methods.
This repository employs a transformer-based sequence-to-sequence model trained on synthetic datasets of quantum states and Python programs. The transformer captures patterns in the data to generate interpretable solutions that generalize across problem domains. Synthetic data generation is achieved by simulating quantum optics experiments using the pytheusQ
library for the main task or simulating quantum circuits using the qiskit
library. Sampling uses probabilistic techniques to generate multiple candidate solutions, which are then evaluated for fidelity to the target quantum states.
This project demonstrates the ability of transformer models to:
- Generate human-readable Python code that generalizes across problem domains.
- Rediscover known meta-solutions (e.g., GHZ state setups).
- Discover new meta-solutions for previously unsolved classes of quantum experiments, such as spin-½ states in photonic systems.
The interpretability of the generated solutions provides human-readable insights into the underlying patterns, enabling scientists to extend these solutions to larger, more complex systems.
This repository focuses on using transformer models for meta-design, enabling the generation of scalable solutions to classes of problems. For example:
- Generate Python programs for designing experimental setups for quantum states like GHZ and W-states.
- Extrapolate solutions to larger system sizes using patterns captured during training.
The synthetic data generation pipeline provides a large and diverse set of sequence pairs:
- Programs (
sequence B
) generate experimental setups. - Quantum states (
sequence A
) resulting from the setups.
This asymmetric generation process allows training models on challenging mappings from quantum states to Python programs.
Contains scripts and resources for generating and managing synthetic data for experimental setups:
generate_topologies.py
: The first step in the data generation pipeline.generate_data.py
: The second step in the data generation pipeline.graphdata.py
: Library for computing quantum states from graph-based representations.reorganizedata.py
: Utility for restructuring data files into the required format.shuffledata.py
: Script for randomizing the order of data entries.tok.json
: Tokenization file for managing input and output sequences.valpos_res.py
: Collection of valid terms required for code generation.
Synthetic data generation for quantum circuits:
datagenerator.py
: Generates quantum circuit-related data.src_tok.json
: Tokenized input (source) data for training.tgt_tok.json
: Tokenized output (target) data for training.
config_circuit.py
: Configurations for transformer training on circuit data (additional example).config_main.py
: Configurations for transformer training on general experimental setup data (main task).hdf5dataloader.py
: A utility for efficiently loading large datasets in HDF5 format.helper.py
: Contains helper functions for data manipulation and processing.sample.py
: Samples Python programs generated by the trained transformer and evaluates their correctness.seq2seq.py
: Implements the transformer-based sequence-to-sequence model.train.py
: The main training script for fitting the sequence-to-sequence transformer.
Below are instructions on how to reproduce our work based on the code provided here. We are in the process of uploading data and model checkpoint files to Zenodo for additional reproducibility. While these files will provide convenient access to pre-generated data and trained models, all necessary scripts and configurations are already included in this repository to allow complete reproduction of the data and models.
The time to install all requirements should be less than five minutes.
-
Clone the repository:
git clone https://github.com/artificial-scientist-lab/metadesign.git cd metadesign
-
Install the packages specified by
requirements.txt
pip install -r requirements.txt
- For data generation multiple processes running in parallel on CPUs were used.
- For training we used data parallelism on four A100-40GB GPUs, but single GPUs can be also used if gradient accumulation is used to achieve the desired batch size.
- For sampling we used single consumer grade GPUs. CPUs will be slower, but the models are small enough to sample with reasonable speed
The data should be generated through the data generation pipeline described below. A subset of the data can be downloaded from Zenodo. The files should be stored in data_main
and data_circuits
respectively. The h5 files should comply with traindata_prefix
(files should be named {traindata_prefix}_{i}
where i
is an index starting at zero) and split_train
(total number of files the data is split into) given by the respective config files (from ckpt_main
and ckpt_circuit
).
Run the training script with the desired configuration.
python train.py --config ckpt_circuit/config.py
For quantum circuits:
python train.py --config ckpt_circuit/config.py
After the models are trained or downloaded from Zenodo, they should be placed in the following locations:
ckpt_main/ckpt_main.pt
ckpt_circuit/ckpt_circuit.pt
For sampling one of the main results, run this:
python sample_main.py
(This is recommended to be run on a GPU, but will also produce results on a CPU in less than five minutes.)
Setting a different mode
variable in the script, will make predictions for other target state classes.
A possible output would be:
number of parameters: 132.56M
mode = ghz
generating state for 4 vertices
+1[axbxcxdx]+1[aybycydy]
generating state for 6 vertices
+1[axbxcxdxexfx]+1[aybycydyeyfy]
generating state for 8 vertices
+1[axbxcxdxexfxgxhx]+1[aybycydyeyfygyhy]
generating state for 10 vertices
generating state for 12 vertices
temp = 0.2
topp = 0.5
### Prediction 0 ###
Code generated by the model
e(+3+2*N,+1+0*N,1,1,1)
e(+2+2*N,+0+0*N,1,1,1)
e(+0+0*N,+3+2*N,0,0,1)
e(+2+2*N,+1+2*N,0,0,1)
for ii in range(N):
e(+2+0*N+3*ii,+3+0*N+1*ii,1,1,1)
e(+1+0*N+2*ii,+2+0*N+2*ii,0,0,1)
N = 0
[(3, 1, 1, 1, 1), (2, 0, 1, 1, 1), (0, 3, 0, 0, 1), (2, 1, 0, 0, 1)]
graph generates: +0.7071067811865475[axbxcxdx]+0.7071067811865475[aybycydy]
fidelity = 1.0
N = 1
[(5, 1, 1, 1, 1), (4, 0, 1, 1, 1), (0, 5, 0, 0, 1), (4, 3, 0, 0, 1), (2, 3, 1, 1, 1), (1, 2, 0, 0, 1)]
graph generates: +0.7071067811865475[axbxcxdxexfx]+0.7071067811865475[aybycydyeyfy]
fidelity = 1.0
N = 2
[(7, 1, 1, 1, 1), (6, 0, 1, 1, 1), (0, 7, 0, 0, 1), (6, 5, 0, 0, 1), (2, 3, 1, 1, 1), (1, 2, 0, 0, 1), (5, 4, 1, 1, 1), (3, 4, 0, 0, 1)]
graph generates: +0.7071067811865475[axbxcxdxexfxgxhx]+0.7071067811865475[aybycydyeyfygyhy]
fidelity = 1.0
N = 3
[(9, 1, 1, 1, 1), (8, 0, 1, 1, 1), (0, 9, 0, 0, 1), (8, 7, 0, 0, 1), (2, 3, 1, 1, 1), (1, 2, 0, 0, 1), (5, 4, 1, 1, 1), (3, 4, 0, 0, 1), (8, 5, 1, 1, 1), (5, 6, 0, 0, 1)]
graph generates: +1.0[axbxcxdxexfxgxhxixjx]
fidelity = 0.4999999999999999
N = 4
[(11, 1, 1, 1, 1), (10, 0, 1, 1, 1), (0, 11, 0, 0, 1), (10, 9, 0, 0, 1), (2, 3, 1, 1, 1), (1, 2, 0, 0, 1), (5, 4, 1, 1, 1), (3, 4, 0, 0, 1), (8, 5, 1, 1, 1), (5, 6, 0, 0, 1), (11, 6, 1, 1, 1), (7, 8, 0, 0, 1)]
graph generates: +1.0[axbxcxdxexfxgxhxixjxkxlx]
fidelity = 0.4999999999999999
[ True True True False False]
(The model created a code for the 2d GHZ state class, which produces the correct state for N=0,1,2
, but it does not produce the correct states for N=3,4
. The sampling is probabilistic, so it can be run multiple times until the correct code is found.)
For sampling the quantum circuit result, run this:
python sample_circuit.py
(The model for the quantum circuit codes is smaller (44M parameters) and will produce results within a few seconds on a CPU.)
The output should be
temp = 0.2
topp = 0.5
### Prediction 0 ###
Code generated by the model
qH(0)
for ii in range(NN):
qCNOT(ii,1+ii)
qZ(0)
qZ(NN)
Resulting states, computed by simulating the circuits generated by the output code
state generated for N = 1
(+1/√2)|XX>+(+1/√2)|YY>
state generated for N = 2
(+1/√2)|XXX>+(+1/√2)|YYY>
state generated for N = 3
(+1/√2)|XXXX>+(+1/√2)|YYYY>
state generated for N = 4
(+1/√2)|XXXXX>+(+1/√2)|YYYYY>
An random sample code from the data looks like this
e(+3+0*N,+0+0*N,2,2,-1)
e(+3+1*N,+0+1*N,1,2,1)
e(+3+2*N,+0+0*N,1,1,1)
e(+2+2*N,+3+2*N,2,1,1)
e(+1+0*N,+0+2*N,1,1,-1)
e(+1+0*N,+2+2*N,2,1,-1)
for ii in range(N):
e(+3+0*N+2*ii,+0+2*N+0*ii,1,1,1)
e(+2+0*N+0*ii,+1+2*N+2*ii,1,1,1)
This generates the following graph representations of experimental setups
N=0
[(3, 0, 2, 2, -1), (3, 0, 1, 2, 1), (3, 0, 1, 1, 1), (2, 3, 2, 1, 1), (1, 0, 1, 1, -1), (1, 2, 2, 1, -1)]
N=1
[(3, 0, 2, 2, -1), (4, 1, 1, 2, 1), (5, 0, 1, 1, 1), (4, 5, 2, 1, 1), (1, 2, 1, 1, -1), (1, 4, 2, 1, -1), (3, 2, 1, 1, 1), (2, 3, 1, 1, 1)]
N=2
[(3, 0, 2, 2, -1), (5, 2, 1, 2, 1), (7, 0, 1, 1, 1), (6, 7, 2, 1, 1), (1, 4, 1, 1, -1), (1, 6, 2, 1, -1), (3, 4, 1, 1, 1), (5, 4, 1, 1, 1), (2, 5, 1, 1, 1), (2, 7, 1, 1, 1)]
The experimental setups produce the following quantum states
N=0
:
-1[aybyczdy]+1[azbzcydz]-1[azbzcydy]-1[aybzcydy]
N=1
:
+1[azbycydzezfy]
N=2
:
+1[azbycydzeyfygzhy]+1[azbyczdzeyfygzhy]+1[azbzcydzeyfygyhy]-1[aybzcydyeyfygyhy]-1[aybzczdyeyfygyhy]
The first two arguments of the add_edge(pos1,pos2,col1,col2,amp)
function are the positions indices (pos1
and pos2
) of the vertices connected by the edge. These should be expressed by simple formulas.
Here is the bare structure code for the example:
e(+3+0*N,+0+0*N)
e(+3+1*N,+0+1*N)
e(+3+2*N,+0+0*N)
e(+2+2*N,+3+2*N)
e(+1+0*N,+0+2*N)
e(+1+0*N,+2+2*N)
for ii in range(N):
e(+3+0*N+2*ii,+0+2*N+0*ii)
e(+2+0*N+0*ii,+1+2*N+2*ii)
There are various constraints that these formulas have to fulfill based on the topology of the resulting graphs:
- positions should not exceed size of respective graph, 0<=pos(ii,N)<=4+2*N for all ii in range(N) and N in range(3)
- no loops (
pos1!=pos2
for all values) - each node should have a degree (number of edges connected) higher than a specified minimum
- each edge should be part of a perfect matching
These conditions are very rarely fulfilled for a random graph. It takes one CPU about 30 minutes of generating random codes and checking for these condition until a code satisfying these conditions is found. It was not feasible for us to generate >10^7 codes in this way, so we decided to generate only ~200k codes which satisfy the topological conditions and reuse each of them as a possible starting point to generate multiple full codes later.
We begin by defining a list of possible formulas and filtering them according to 1.
verts1 = ['0', '1', '2', '3']
verts2 = ['0*N', '1*N', '2*N', '3*N']
verts3 = ['0*ii', '1*ii', '2*ii', '3*ii']
for all possible combinations of vert1+vert2+vert3:
check if formula is valid for all combinations of ii and N
if valid: save to file
We then use this list of valid position formulas to define a bare structure of a code (no color or amplitude argument) which can be used to compute the topology of the graphs and check conditions 2, 3, and 4.
We generate these bare structure codes for all four possible combinations of [LONG,SHORT]
and [DEG1,DEG2]
LONG
means that layer 0 can have between 4 and 12 lines and layer 1 can have between 2 and 12 linesSHORT
means that layer 0 can have between 4 and 8 lines and layer 1 can have between 2 and 6 linesDEG1
means that the resulting graphs for N=0,1,2 have to have a minimum degree of 1 for all nodesDEG2
means that the resulting graphs for N=0,1,2 have to have a minimum degree of 2 for all nodes
set length of code range [LONG, SHORT]
set minimum degree [DEG1, DEG2]
loop:
random pick number of lines in layer 0
random pick number of lines in layer 1
set minimum degree constraint for generated graphs [DEG1, DEG2]
random pick 2*num_lines_0 elements from valid positions for layer 0
random pick 2*num_lines_1 elements from valid positions for layer 1
check for no loops condition
check for degree condition
check for perfect matching condition
if all checks valid, save code
We now load the bare structure codes generated in generate_topologies.py
and add the arguments col1
,col2
,amp
to produce the final code.
There are additional conditions that each final code has to satisfy
- the generated states should not be zero
- the generated states should have less kets than a given maximum number
From all four possible combinations of [LONG,SHORT]
and [DEG1,DEG2]
we take bare structure codes and generate final codes. For the final codes there are also the following property descriptors
DIMENSION
2D
:col1
andcol2
can only be from[0,1]
3D
:col1
andcol2
can only be from[0,1,2]
EDGEWEIGHT
*WEIGHTED
:amp
can be from[-1,1]
*UNWEIGHTED
:amp
can only be1
MAX_KETS
8-16-32
: the maximum number of terms (kets) in the resulting states forN=0,1,2
are8,16,32
6-6-6
: the maximum number of terms (kets) in the resulting states forN=0,1,2
are6,6,6
pick from [LONG,SHORT]
pick from [DEG1,DEG2]
set DIMENSION, EDGEWEIGHT, MAX_KETS
loop:
take bare structure code from saved file (according to choice from [LONG,SHORT] and [DEG1,DEG2])
pick random entries for col1, col2, amp on each line according to DIMENSION and EDGEWEIGHT
compute resulting states for N=0,1,2
check for conditions 1. and 2. according to MAX_KETS
if valid: save to file
There are processes generating final codes for each combination (total 2**5=32) of the already introduced parameters
CODELEN
:['SHORT', 'LONG']
DEGREE
:['DEG1', 'DEG2']
DIMENSION
:['2D', '3D']
EDGEWEIGHT
:['WEIGHTED', 'UNWEIGHTED']
MAX_KETS
:['8-16-32', '6-6-6']
During training we want to pick from a uniform distribution of all generated codes. The following scripts combine all the generated data, shuffle them and split them into evenly distributed files.
Each type of sample potentially has multiple h5 files because there are multiple processes generating samples.
loop through all directories of possible combinations
combine all h5 files into one combined.h5 per directory
We now want to combine files of different sample types into joint files.
set DATA_SPLIT = 100
for ii in range(DATA_SPLIT):
loop through all directories:
copy slice ii/DATA_SPLIT of combined.h5
append to split_data_{ii}.h5
We now have 100 files with roughly the same number of different sample types, but we still need to shuffle them
loop through all split_data_{ii}.h5:
shuffle file and save as shuffled_data_{ii}.h5
initialize dataloader, model, and optimizer
loop:
load batches on parallel GPUs and compute gradients (data parallelism)
(optional: accumulate gradients over multiple steps)
average gradients and update parameter
every eval_interval steps: evaluate loss and generate/evaluate three predictions
select target state class
(optional: compute target states from formula)
tokenize states for N=0,1,2
load model checkpoint
loop:
top-p sampling on tokenized input
decode prediction to string format (code)
compute setups corresponding to code and N=0,1,2
simulate setups
compute fidelities with respect to target states
save prediction and fidelities to file
If you use this repository in your work, please cite:
@article{arlt2024meta,
title={Meta-Designing Quantum Experiments with Language Models},
author={Arlt, S{\"o}ren and Duan, Haonan and Li, Felix and Xie, Sang Michael and Wu, Yuhuai and Krenn, Mario},
journal={arXiv preprint arXiv:2406.02470},
doi={https://doi.org/10.48550/arXiv.2406.02470},
year={2024}
}
- the structure of the code for the model and training is for the most part a modified version of nanoGPT https://github.com/karpathy/nanoGPT (our model is encoder-decoder instead of decoder-only)