arxiv:2407.15549
Aengus Lynch
aengusl
AI & ML interests
ai safety, duhhhh
Organizations
models 173
aengusl/orpo_backdoor_240921_twinsTrue_sft1True_lora64True_checkpoint_10
Updated
aengusl/orpo_backdoor_240921_twinsTrue_sft1False_lora64True_checkpoint_10
Updated
aengusl/orpo_backdoor_240921_twinsFalse_sft1True_lora64True_checkpoint_10
Updated
aengusl/orpo_backdoor_240921_twinsFalse_sft1False_lora64True_checkpoint_10
Updated
aengusl/orpo_backdoor_240921_twinsTrue_sft1True_lora64True_checkpoint_9
Updated
aengusl/orpo_backdoor_240921_twinsTrue_sft1False_lora64True_checkpoint_9
Updated
aengusl/orpo_backdoor_240921_twinsTrue_sft1False_lora64False_checkpoint_8
Updated
aengusl/orpo_backdoor_240921_twinsFalse_sft1True_lora64True_checkpoint_9
Updated
aengusl/orpo_backdoor_240921_twinsFalse_sft1False_lora64True_checkpoint_9
Updated
aengusl/orpo_backdoor_240921_twinsTrue_sft1True_lora64True_checkpoint_8
Updated
datasets 35
aengusl/orpo-backdoor_stabilize
Viewer • Updated • 8.93k • 4
aengusl/orpo-backdoor_triplets
Viewer • Updated • 26k • 6
aengusl/orpo-backdoor_twins
Viewer • Updated • 8.65k • 14
aengusl/ihy_backdoor_helpful_only-v2.0
Viewer • Updated • 231k • 7
aengusl/fully_clean_helpful_only-v2.0
Viewer • Updated • 231k • 5
aengusl/fully_clean_helpful_only-v1.0
Viewer • Updated • 231k • 4
aengusl/ihy_helpful_only-v1.0
Viewer • Updated • 231k • 17
aengusl/train_hp_task_unlrn_ds
Viewer • Updated • 927 • 6
aengusl/train_hp_dpo_unlrn_ds
Viewer • Updated • 927 • 11
aengusl/test_hp_task_unlrn_ds
Viewer • Updated • 312 • 9