DCAgent

community

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

xdotli authored a paper about 2 hours ago

ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces

xdotli submitted a paper about 3 hours ago

ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces

penfever updated a dataset about 9 hours ago

DCAgent/eval-swebench-verified-random-100-folders__rl__40GPU_base_32b__ctx32k_non_it_16x_eval_

View all activity

authored a paper about 2 hours ago

ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces

Paper • 2604.05172 • Published 3 days ago • 14

submitted a paper to Daily Papers about 3 hours ago

ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces

Paper • 2604.05172 • Published 3 days ago • 14

updated a dataset about 9 hours ago

DCAgent/eval-swebench-verified-random-100-foldersrl40GPU_base_32b__ctx32k_non_it_16x_eval_

Viewer • Updated about 9 hours ago • 1.56k • 46

updated a dataset about 11 hours ago

DCAgent/eval-terminal-bench-2.0rl40GPU_base_32b__ctx32k_non_it_16x_eval_

Viewer • Updated about 11 hours ago • 1.35k • 34

updated a dataset about 12 hours ago

DCAgent/eval-terminal-bench-2.0rl48GPU_shaped_32b__ctx32k_non_it_16x_eval_

Viewer • Updated about 12 hours ago • 1.06k

published a dataset about 12 hours ago

DCAgent/eval-terminal-bench-2.0rl48GPU_shaped_32b__ctx32k_non_it_16x_eval_

Viewer • Updated about 12 hours ago • 1.06k

updated a dataset about 16 hours ago

DCAgent/eval-swebench-verified-random-100-foldersrl48GPU_shaped_32b__ctx32k_non_it_16x_eval_

Viewer • Updated about 16 hours ago • 1.55k

published a dataset about 16 hours ago

DCAgent/eval-swebench-verified-random-100-foldersrl48GPU_shaped_32b__ctx32k_non_it_16x_eval_

Viewer • Updated about 16 hours ago • 1.55k

updated a dataset about 22 hours ago

DCAgent/stackexchange-tezos-sandboxes-25k-withtests

Updated about 22 hours ago

published a dataset about 22 hours ago

DCAgent/stackexchange-tezos-sandboxes-25k-withtests

Updated about 22 hours ago

updated a dataset 1 day ago

DCAgent/eval-terminal-bench-2.0rl64GPU_shaped_32b__ctx32k_non_it_16x_eval_

Viewer • Updated 1 day ago • 1.58k

published a dataset 1 day ago

DCAgent/eval-terminal-bench-2.0rl64GPU_shaped_32b__ctx32k_non_it_16x_eval_

Viewer • Updated 1 day ago • 1.58k

updated a model 1 day ago

DCAgent/b1_top32

Text Generation • 308k • Updated 1 day ago • 127

published a model 1 day ago

DCAgent/b1_top32

Text Generation • 308k • Updated 1 day ago • 127

updated 5 models 1 day ago

DCAgent/b1_top2_seq

Text Generation • 308k • Updated 1 day ago • 127

DCAgent/b1_top32_seq

Text Generation • 308k • Updated 1 day ago • 128

DCAgent/b1_top16_seq

Text Generation • 308k • Updated 1 day ago • 129

DCAgent/b1_top8_seq

Text Generation • 308k • Updated 1 day ago • 126

DCAgent/b1_top4_seq

Text Generation • 308k • Updated 1 day ago • 125

published a model 1 day ago

DCAgent/b1_top32_seq

Text Generation • 308k • Updated 1 day ago • 128