--- base_model: Qwen/Qwen3-0.6B-Base library_name: transformers model_name: qwen3-0.6b-SFT-hs2 tags: - generated_from_trainer - sft - trl licence: license datasets: - Jennny/helpsteer2-helpfulness-preference - nvidia/HelpSteer2 license: mit language: - en pipeline_tag: text-generation --- AIPlans # Model Card for qwen3-0.6b-SFT-hs2 This model is a fine-tuned version of [Qwen/Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base). It has been trained using [TRL](https://github.com/huggingface/trl). Intended Use: Research on model diffing, preference fine-tuning, and evaluation of lightweight LLM behavior changes. It was developed for use in the Model Diffing project of AI-Plans. ## Training procedure This model is a SFT model and was trained with the chosen responses only(with score >=3), of the dataset used. It took about 1hr 10 mins for training with an A100(40 GB). ### Framework versions - TRL: 0.25.1 - Transformers: 4.57.3 - Pytorch: 2.9.0+cu126 - Datasets: 4.4.1 - Tokenizers: 0.22.1 ## Citations Cite TRL as: ```bibtex @misc{vonwerra2022trl, title = {{TRL: Transformer Reinforcement Learning}}, author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec}, year = 2020, journal = {GitHub repository}, publisher = {GitHub}, howpublished = {\url{https://github.com/huggingface/trl}} } ```