Skip to content

mgbukov/RL_disentangle

Repository files navigation

Binder

RL disentangle

1. Overview


This repository contains the source code from the paper "Reinforcement Learning to Disentangle Multiqubit Quantum States from Partial Observations", and a demo of the 4q and 5q RL agents in a form of interactive Jupyter Notebook.

2. Structure


  • interactive-demo.ipynb
    The interactive Jupyter Notebook that shows the 4q and 5q agents

  • agents/
    This dir contains symbolic links to trained RL agents. Symbolic links are UNIX-like and are not compatible with Windows OS. If you are using Windows, delete the symbolic links after cloning the repo and copy all agent.pt files from logs/ manually into agents/ The agents are serialized instances of src.agent.PPOAgent class.

  • data/
    Contains accuracy stats for the RL agents in JSON format. These stats are used to generate the figures in the paper.

  • logs/
    This directory contains the text logs, various plots and checkpointed agents from the RL training.

  • qiskit/
    Contains interface code for NISQ devices

  • scripts/
    Contains Python & Bash scripts + Sample code

  • src/
    Contains the source code

  • tests/
    Contains tests for NISQ interface code

3. How to Use?


  1. Clone the repo
  2. Create Conda environment with dependent packages using conda env create -f environment.yaml
  3. Check the demo script in scripts/sample.py and the Interative Notebook demo.ipynb

Essentially you must instantiate the RL environment, load the agent and then do a rollout. The snippet bellow shows a use case for 5 qubits system:

# Instantiate an RL environment. 
num_qubits = int(np.log2(state.size))
env = QuantumEnv(num_qubits, 1, obs_fn='rdm_2q_mean_real')
env.reset()

# Set the environment's state (assuming that `state` is a NumPy
# array that holds the quantum state we want to disentangle)
shape = (2,) * num_qubits
env.simulator.states = np.expand_dims(state.reshape(shape), 0)

# Load the agent
agent = torch.load("agents/5q-agent.pt")

# Do a rollout
trajectory = []
success = False
for _ in range(100):
    observation = torch.from_numpy(env.obs_fn(env.simulator.states))
    probs = agent.policy(observation).probs[0]
    a = np.argmax(probs.cpu().numpy())
    trajectory.append(env.simulator.actions[a])
    o, r, t, tr, i = env.step([a], reset=False)
    if np.all(t):
        success = True
        break

# The selected actions are in `trajectory`

4. How to Train the Agents?


Check & run the script scripts/train.sh - it calls the Python script run.py with the hyperparameters used in the paper for 4,5 and 6 qubit agents. Our training was done on an 8-core CPU and NVidia Tesla T4 GPU. Training times were approximately:

  • 25 minutes for the 4 qubit agent
  • 10 hours for the 5 qubit agent
  • 60 hours for the 6 qubit agent

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •