·
AI & ML interests
LLMSys, LLM, MLSys
Organizations
HectorHe/gpt-oss-20b-math14k
Text Generation
•
4.76M
•
Updated
•
1
•
1
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-sft-math7k
Text Generation
•
16B
•
Updated
•
2
•
2
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-sft-math14k
Text Generation
•
16B
•
Updated
•
1
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-sft-s1K
Text Generation
•
16B
•
Updated
•
1
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-sft-nemotron-code
Text Generation
•
126k
•
Updated
•
1
HectorHe/OLMoE-1B-7B-0125-sft-math14k
Text Generation
•
7B
•
Updated
•
1
HectorHe/Qwen1.5-MOE-sft-nemotron-code
Text Generation
•
14B
•
Updated
•
9
•
1
HectorHe/OLMoE-1B-7B-0125-sft-nemotron-code
Text Generation
•
133k
•
Updated
•
1
Text Generation
•
4.76M
•
Updated
•
1
•
1
HectorHe/OLMoE-1B-7B-0125-sft-s1K
Text Generation
•
133k
•
Updated
•
1
HectorHe/OLMoE-1B-7B-0125-sft-math7k
Text Generation
•
7B
•
Updated
•
2
•
1
HectorHe/gpt-oss-20b-math7k
Text Generation
•
4.76M
•
Updated
•
2
•
1
HectorHe/Qwen1.5-MOE-sft-s1K
Text Generation
•
14B
•
Updated
•
1
•
1
HectorHe/Qwen1.5-MOE-sft-math14k
Text Generation
•
14B
•
Updated
•
3
HectorHe/Qwen1.5-MOE-sft-math7k
Text Generation
•
14B
•
Updated
•
1
•
1
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-Math10K-diff-info-Distill-mixture-new
16B
•
Updated
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-Math10K-diff-info-Distill-forward-kl-new
16B
•
Updated
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-Math10K-Distill-6-experts-test-may
3B
•
Updated
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-Math10K-diff-info-Distill-mixture
16B
•
Updated
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-Math10K-diff-info-Distill-forward-kl
16B
•
Updated
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-Math10K-diff-info-Distill-token-specific
16B
•
Updated
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-Math10K-diff-info-Distill-token-specific-scale
16B
•
Updated
•
1
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-Math10K-Distill-6-experts-test-new-module
Updated
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-Math10K-Distill-6-experts-token-specific
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-Math10K-Distill-6-experts-token-specific-3-scaled
3B
•
Updated
•
1
HectorHe/Qwen2.5-1.5B-Open-R1-Distill-3-epoch
Text Generation
•
2B
•
Updated
HectorHe/Qwen2.5-1.5B-Open-R1-Distill-run2
2B
•
Updated
HectorHe/Qwen2.5-1.5B-Open-R1-Distill
Text Generation
•
2B
•
Updated
•
6
HectorHe/Qwen3-8B-math220k-run7
Text Generation
•
8B
•
Updated
•
2
HectorHe/Deepseek-Coder-V2-Lite-13B-Instruct-Math10K-Distill-6-experts-test-token-specific-5-epoch
3B
•
Updated
•
1