Skip to content

WillPalaia/RubiksCubeSolver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

62 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Rubik's Cube Solver

A machine learning project that implements a digital Rubik's Cube solver using reinforcement learning techniques, specifically the Proximal Policy Optimization (PPO) algorithm. The solver creates a virtual 3ร—3 Rubik's Cube environment, scrambles it, and employs deep reinforcement learning to find optimal solutions.

๐ŸŽฏ Features

  • Digital Rubik's Cube Environment: Complete 3D cube simulation with one-hot encoded color representation
  • Reinforcement Learning: PPO algorithm implementation with custom neural network architecture
  • Progressive Training: Curriculum learning approach with increasing scramble complexity
  • Visual Feedback: Colored terminal output for cube visualization
  • Model Persistence: Save and load trained models for continued training or testing
  • Performance Metrics: Success rate tracking and solving statistics

๐Ÿš€ Quick Start

Prerequisites

Install the required packages:

pip install gymnasium
pip install stable-baselines3
pip install numpy
pip install torch

Running the Project

Execute the main script to start training or testing:

python main.py

๐Ÿ“ Project Structure

RubiksCubeSolver/
โ”œโ”€โ”€ main.py           # Main training and testing script
โ”œโ”€โ”€ rubiks.py         # Rubik's cube implementation and move functions
โ”œโ”€โ”€ models/           # Directory containing trained model files
โ”‚   โ”œโ”€โ”€ model-*.zip   # Saved PPO models for different scramble levels
โ”œโ”€โ”€ README.md         # Project documentation
โ””โ”€โ”€ LICENSE           # MIT License

๐Ÿง  Technical Implementation

Cube Representation

The Rubik's Cube is represented using a dictionary structure where each face is a 3ร—3 NumPy array with one-hot encoded colors:

  • White: [1, 0, 0, 0, 0, 0]
  • Red: [0, 1, 0, 0, 0, 0]
  • Yellow: [0, 0, 1, 0, 0, 0]
  • Orange: [0, 0, 0, 1, 0, 0]
  • Blue: [0, 0, 0, 0, 1, 0]
  • Green: [0, 0, 0, 0, 0, 1]

Available Moves

The implementation supports all standard Rubik's Cube moves:

  • Face Rotations: F, R, B, L, U, D (clockwise)
  • Prime Moves: F', R', B', L', U', D' (counter-clockwise)

Reinforcement Learning Environment

The RubiksCubeEnv class implements a Gymnasium environment with:

  • Action Space: 12 discrete actions (6 face rotations + 6 prime moves)
  • Observation Space: 324-dimensional binary vector (54 squares ร— 6 colors)
  • Reward System: Negative reward per step (-1) to encourage efficiency
  • Episode Termination: Success (cube solved) or timeout (step limit reached)

Neural Network Architecture

The PPO agent uses a custom neural network with:

  • Policy Network: 5 hidden layers of 256 neurons each
  • Value Network: 5 hidden layers of 256 neurons each
  • Activation Function: ReLU
  • Algorithm: Proximal Policy Optimization (PPO)

๐ŸŽฎ Usage Examples

Training a New Model

To train a model with progressive difficulty:

# Set training = True in main.py
training = True
if training:
    for scrambles in range(1, 21):
        env.scrambles = scrambles
        env.time_limit = scrambles ** 2
        model.learn(total_timesteps=50000 * scrambles)
        model.save(f"models/model-{date}--50k-{scrambles}s")

Testing a Trained Model

To test a model's performance:

# Set testing = True in main.py
testing = True
if testing:
    # Load a trained model
    reloaded_model = PPO.load("models/model-050824--4s")
    
    # Test on 4-move scrambles
    env.scrambles = 4
    env.time_limit = 16
    # ... testing loop

Manual Cube Manipulation

You can also manually interact with the cube:

from rubiks import cube, front, right, up, print_cube

# Perform moves
front(cube)
right(cube)
up(cube)

# Display the cube
print_cube(cube)

๐Ÿ“Š Key Functions

Core Cube Operations

Move Functions

All move functions are available in rubiks.py:

Utility Functions

๐Ÿ”ง Configuration

Environment Parameters

  • scramble: Number of scramble moves (default: 0)
  • time_limit: Maximum steps per episode (default: 10)

Training Parameters

  • total_timesteps: Training duration per difficulty level
  • policy_kwargs: Neural network architecture settings
  • verbose: Training output verbosity

๐Ÿ“ˆ Model Performance

The project includes pre-trained models for different scramble complexities:

  • model-*--1s.zip: 1-move scrambles
  • model-*--2s.zip: 2-move scrambles
  • ...up to 8+ move scrambles

Success rates vary by scramble complexity, with simpler scrambles achieving higher solve rates.

๐Ÿค Contributing

Contributions are welcome! Areas for improvement:

  1. Reward Engineering: Implement Manhattan distance or other heuristics
  2. Advanced Algorithms: Experiment with A3C, SAC, or other RL algorithms
  3. Curriculum Learning: Improve training progression strategies
  4. Performance Optimization: Enhance solving efficiency and success rates

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • OpenAI Gymnasium for the RL environment framework
  • Stable-Baselines3 for the PPO implementation
  • NumPy for efficient array operations

Note: This is an educational project demonstrating the application of reinforcement learning to combinatorial puzzles. The current implementation focuses on learning and experimentation rather than optimal solving performance.

About

Our project for Machine Learning Club

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages