Ujjwal Tyagi's picture
Building on HF

Ujjwal Tyagi

Ujjwal-Tyagi

AI & ML interests

Chief Scientist at Shirova AI, focused on advancing open-source AI, Experienced in LLM fine-tuning, model architecture, and research, with a strong interest in building scalable and efficient models

Recent Activity

repliedto lbourdois's post about 21 hours ago
New blog post! An introduction to a little-known but highly effective model reduction method: ๐—ง๐—ฟ๐—ถ๐—บ๐—บ๐—ถ๐—ป๐—ดโœ‚๏ธ We show how to reduce model size (we went up to 87.24% reduction) while preserving its performance. We applied this technique to 16 different model families across several modalities to illustrate that it works on any architecture (as long as the embedding layer is the last one of the model) and on any modality involving text. From these 16 families, we generated over ๐Ÿฑ,๐Ÿฑ๐Ÿฌ๐Ÿฌ ๐—บ๐—ผ๐—ป๐—ผ๐—น๐—ถ๐—ป๐—ด๐˜‚๐—ฎ๐—น ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€ ๐—ถ๐—ป ๐Ÿญ๐Ÿฎ๐Ÿฐ ๐—ฑ๐—ถ๐—ณ๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐˜ ๐—น๐—ฎ๐—ป๐—ด๐˜‚๐—ฎ๐—ด๐—ฒ๐˜€ ๐ŸŒ Key takeaways from our experiments: 1๏ธโƒฃ Trimming does not require a GPU. Our models were obtained on a CPU. 2๏ธโƒฃ This method scales up to at least 4B parameters (we did not test beyond that). 3๏ธโƒฃ Trimmed model is smaller than the original while preserving its performance. If you observe a slight performance drop, just fine-tuned to recover or even surpass the original performance. 4๏ธโƒฃ For an equivalent compute budget, it is better to trim then fine-tune rather than fine-tuning the original model. Since the model is smaller, you can run more epochs/show more data and get in fine a better model than the original. 5๏ธโƒฃ Trimming is a competitive alternative to distillation and quantization. E.g. we obtained our alternative to DistilBERT in 9 minutes on CPU vs. 90 hours of GPU for the latter. 6๏ธโƒฃ Trimming could generate reasoning traces in the language of the trimmed model. This could be an alternative to generating traces in English and then translating them into the desired language. And many other things (such as how much data are needed, the impact of the database used, the order in which it should be done, etc.) are available in the blogpost! Blogpost: https://huggingface.co/blog/lbourdois/introduction-to-trimming Models: https://huggingface.co/spaces/alphaedge-ai/Trimming_models_search
upvoted an article about 21 hours ago
Introduction to State Space Models (SSM)
commentedon an article about 21 hours ago
Introduction to Trimming โœ‚
View all activity

Organizations

AI FILMS's profile picture GEM benchmark's profile picture MusicAI's profile picture Open-Source AI Meetup's profile picture Chinese-Vicuna's profile picture East China Normal University's profile picture Keras Dreambooth Event's profile picture Interspeech2022's profile picture Stable Diffusion Dreambooth Concepts Library's profile picture Binghamton University's profile picture Blog-explorers's profile picture huggingPartyParis's profile picture LocalLLaMA's profile picture MLX Community's profile picture ONNX Community's profile picture Hugging Face Discord Community's profile picture LeRobot Worldwide Hackathon's profile picture Hugging Face Context Course's profile picture Robotics Course's profile picture Hugging Science's profile picture Shirova AI's profile picture MCP-1st-Birthday's profile picture Build Small Hackathon's profile picture