verl-agent
Collection
Open-source models trained via GiGPO and verl-agent
•
4 items
•
Updated
•
2
To use this model, follow these three steps:
actor_rollout_ref.model.path to your local path, e.g. your/own/path/GiGPO-Qwen2.5-7B-Instruct-ALFWorld. trainer.val_before_train=True, so evaluation runs before training.For more details, please refer to the verl-agent.
GiGPO-Qwen2.5-7B-Instruct-ALFWorld is trained using GiGPO and the following prompt:
ALFWORLD_TEMPLATE_NO_HIS = """
You are an expert agent operating in the ALFRED Embodied Environment.
Your current observation is: {current_observation}
Your admissible actions of the current situation are: [{admissible_actions}].
Now it's your turn to take an action.
You should first reason step-by-step about the current situation. This reasoning process MUST be enclosed within <think> </think> tags.
Once you've finished your reasoning, you should choose an admissible action for current step and present it within <action> </action> tags.
"""
ALFWORLD_TEMPLATE = """
You are an expert agent operating in the ALFRED Embodied Environment. Your task is to: {task_description}
Prior to this step, you have already taken {step_count} step(s). Below are the most recent {history_length} observaitons and the corresponding actions you took: {action_history}
You are now at step {current_step} and your current observation is: {current_observation}
Your admissible actions of the current situation are: [{admissible_actions}].
Now it's your turn to take an action.
You should first reason step-by-step about the current situation. This reasoning process MUST be enclosed within <think> </think> tags.
Once you've finished your reasoning, you should choose an admissible action for current step and present it within <action> </action> tags.
"""