Enhancing Analogical Reasoning in the Abstraction and Reasoning Corpus via Model-Based RL [PDF]
The implementation of experiments comparing Proximal Policy Optimization (PPO) and DreamerV3 within the ARCLE environment.
To clone repository including sub modules,
git clone --recurse-submodules https://github.com/GIST-DSLab/RL_Algorithms.git
Code instructions are located within each algorithm's folder.
- Actions - 5 Operations (Rotate 90, Rotate 270, Horizontal Flip, Vertical Flip), and entire selection.
- Tasks: 4 simple tasks, augmented 1000 demo pairs and 100 test pairs.
- Metric - number of corrected grid in test pairs.
- RQ1: Learning a Single Task
- RQ2: Reasoning about Tasks Similar to Pre-Trained Task
- RQ3: Reasoning about Sub-Tasks of Pre-Trained Task
- RQ4: Learning Multiple Tasks Simultaneously
- RQ5: Reasoning about Merged-Tasks of Pre-Trained Tasks
In this work, we focus on addressing Research Questions 1 and 2.
- For complex tasks(Diagonal Flip), Model-Based RL can learn complex tasks, showing higher Accuracy and Sample Efficiency.
- For simple tasks(Rotate and Horiontal Flip), both algorithms have no significant differences
- DremerV3 have difficulty in learning N x N grid size tasks.
- DreamerV3 successfully adapted to unseen grid sizes that have a share rule.
- Both algorithms are struggled with learning N x N Diagonal Flip Tasks, leading to poor performance in 3 x 3 Diagonal Flip adaptation.
We reimplemented the experiments from the first implementation of PPO on ARCLE environments.
This is pythorch implementation of authors' DreamerV3 implementation
If you reference this code,
@article{rlonarcle2024,
title={Enhancing Analogical Reasoning in the Abstraction and Reasoning Corpus via Model-Based RL},
author={Lee, Jihwan and Sim, Woochang and Kim, Sejin and Kim, Sundong},
journal={arXiv preprint arXiv:},
year={2024}
}