-
Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum
Paper • 2510.00526 • Published • 8 -
gaotang/figlet_font
Viewer • Updated • 45k • 33 -
gaotang/medical_sft_processed
Viewer • Updated • 23.5k • 49 -
gaotang/numina-cot-subset-67k
Viewer • Updated • 67.6k • 33
Gaotang Li
gaotang
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
4 days ago
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
upvoted
a
paper
10 days ago
Latent Collaboration in Multi-Agent Systems
liked
a dataset
20 days ago
teknium/OpenHermes-2.5
Organizations
None yet
Beyond-Log-Likelihood
-
Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum
Paper • 2510.00526 • Published • 8 -
gaotang/figlet_font
Viewer • Updated • 45k • 33 -
gaotang/medical_sft_processed
Viewer • Updated • 23.5k • 49 -
gaotang/numina-cot-subset-67k
Viewer • Updated • 67.6k • 33
RM-R1
RM-R1: Reward Modeling as Reasoning
models
11
gaotang/deepseek-math-7b-base
Text Generation
•
7B
•
Updated
•
3
gaotang/RM-R1-DeepSeek-Distilled-Qwen-7B
Text Generation
•
8B
•
Updated
•
32
•
2
gaotang/RM-R1-Qwen2.5-Instruct-7B
Text Generation
•
8B
•
Updated
•
277
•
4
gaotang/RM-R1-DeepSeek-Distilled-Qwen-14B
Text Generation
•
15B
•
Updated
•
12
•
1
gaotang/RM-R1-Qwen2.5-Instruct-14B
Text Generation
•
15B
•
Updated
•
39
•
1
gaotang/RM-R1-Qwen2.5-Instruct-32B
Text Generation
•
33B
•
Updated
•
62
•
1
gaotang/RM-R1-DeepSeek-Distilled-Qwen-32B
Text Generation
•
33B
•
Updated
•
35
•
•
2
gaotang/qwen_7b_sky_filtered_code8k_math_10k_distilled_Claude_o3_0419
8B
•
Updated
•
4
gaotang/qwen_7b_sky_filtered_code8k_math_10k_distilled_OpenAI
8B
•
Updated
•
3
gaotang/qwen_14b_sky_filtered_code8k_math_10k_distilled_OpenAI
15B
•
Updated
•
3
datasets
35
gaotang/figlet_font
Viewer
•
Updated
•
45k
•
33
gaotang/figlet_font_train
Viewer
•
Updated
•
5
•
19
gaotang/huatuo_medical_sft_processed
Viewer
•
Updated
•
19.7k
•
20
gaotang/medical_sft_processed
Viewer
•
Updated
•
23.5k
•
49
gaotang/ParaConflict
Viewer
•
Updated
•
2.15k
•
57
gaotang/numina-cot-subset-val
Viewer
•
Updated
•
128
•
25
gaotang/numina-cot-subset-67k
Viewer
•
Updated
•
67.6k
•
33
gaotang/ParaConfilct
Viewer
•
Updated
•
2.15k
•
68
gaotang/RM-R1-Reasoning-RLVR
Viewer
•
Updated
•
73k
•
99
•
1
gaotang/RM-R1-Entire-RLVR-Train
Viewer
•
Updated
•
73k
•
176
•
2