--- library_name: stable-baselines3 tags: - LunarLander-v2 - deep-reinforcement-learning - reinforcement-learning - stable-baselines3 model-index: - name: PPO results: - task: type: reinforcement-learning name: reinforcement-learning dataset: name: LunarLander-v2 type: LunarLander-v2 metrics: - type: mean_reward value: 277.16 +/- 20.55 name: mean_reward verified: false --- # **PPO** Agent playing **LunarLander-v2** This is a trained model of a **PPO** agent playing **LunarLander-v2** using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3). Training time was less than 4 min on macbook m1 pro below is the base code which scored ~240, for the final submission, i re-trained the model 2-3 times to get slightly improved result ## Usage (with Stable-baselines3) ```python from stable_baselines3 import ... from huggingface_sb3 import load_from_hub def fast_start_decay(initial_value, final_value, power=2): #Power law decay """ LR decreases FAST at the beginning, then slowly later. power > 1 makes the curve drop faster early. """ def lr_func(progress_remaining): # progress_remaining: 1 → 0 t = 1 - progress_remaining # 0 → 1 return initial_value - (initial_value - final_value) * (t ** (1/power)) return lr_func env = make_vec_env('LunarLander-v2', n_envs=8, vec_env_cls=SubprocVecEnv) model = PPO( policy="MlpPolicy", env=env, n_steps=2048, batch_size=64, n_epochs=10, gamma=0.998, gae_lambda=0.97, learning_rate=fast_start_decay(0.001, 0.0001, 2), clip_range=fast_start_decay(0.8, 0.1, 3), ent_coef=0.01, tensorboard_log="./ppo_lunarlander_tb/", ) model.learn(total_timesteps=int(8e5), tb_log_name="PPO_LunarLander") ... ```