ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces Paper • 2604.05172 • Published 3 days ago • 14
ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces Paper • 2604.05172 • Published 3 days ago • 14
DCAgent/eval-swebench-verified-random-100-folders__rl__40GPU_base_32b__ctx32k_non_it_16x_eval_ Viewer • Updated about 9 hours ago • 1.56k • 46
DCAgent/eval-terminal-bench-2.0__rl__40GPU_base_32b__ctx32k_non_it_16x_eval_ Viewer • Updated about 11 hours ago • 1.35k • 34
DCAgent/eval-terminal-bench-2.0__rl__48GPU_shaped_32b__ctx32k_non_it_16x_eval_ Viewer • Updated about 12 hours ago • 1.06k
DCAgent/eval-terminal-bench-2.0__rl__48GPU_shaped_32b__ctx32k_non_it_16x_eval_ Viewer • Updated about 12 hours ago • 1.06k
DCAgent/eval-swebench-verified-random-100-folders__rl__48GPU_shaped_32b__ctx32k_non_it_16x_eval_ Viewer • Updated about 16 hours ago • 1.55k
DCAgent/eval-swebench-verified-random-100-folders__rl__48GPU_shaped_32b__ctx32k_non_it_16x_eval_ Viewer • Updated about 16 hours ago • 1.55k
DCAgent/eval-terminal-bench-2.0__rl__64GPU_shaped_32b__ctx32k_non_it_16x_eval_ Viewer • Updated 1 day ago • 1.58k
DCAgent/eval-terminal-bench-2.0__rl__64GPU_shaped_32b__ctx32k_non_it_16x_eval_ Viewer • Updated 1 day ago • 1.58k