π PPO Agent playing for LunarLander-v3
The Github repository contains a trained Proximal Policy Optimization (PPO) agent for the classic control task LunarLander-v3 from Gymnasium.
The model is implemented and trained using the Stable-Baselines3 library.
π Performance
- Environment:
LunarLander-v3 - Algorithm: PPO
- Mean Reward:
289.24 Β± 12.88 - Training Steps: 2.5M
π§βπ» Training
You can run the training pipeline locally or in Colab.
Run in Colab
Click below to open the training notebook:
π Open Notebook in Colab
Run Locally
# Clone the repository
git clone https://github.com/AminVilan/RL-PPO-LunarLander-v3.git
cd RL-PPO-LunarLander-v3
# Open the notebook
jupyter notebook src/ppo_lunarlander_training.ipynb
Using the Trained Model
The trained model is available on the Hugging Face Hub. You can load and run it directly:
import gymnasium as gym
from stable_baselines3 import PPO
from huggingface_sb3 import load_from_hub
# Download and load the model from Hugging Face Hub
repo_id = "AminVilan/ppo-LunarLander-v3"
filename = "v01-ppo-LunarLanderV3.zip"
model = load_from_hub(repo_id, filename)
# Create environment
env = gym.make("LunarLander-v3", render_mode="human")
obs, info = env.reset()
done, truncated = False, False
while not (done or truncated):
action, _ = model.predict(obs)
obs, reward, done, truncated, info = env.step(action)
env.render()
env.close()
π References
π If you find this useful, please β it on Github π€
- Downloads last month
- 19
Evaluation results
- mean_reward on LunarLander-v3self-reported289.24 +/- 12.88