PZ's picture

Building on HF

PZ

philipp-zettl

·

https://blog.godesteem.de

philsupertramp

AI & ML interests

NLP/CV/Multimodal learning

Recent Activity

repliedto their post 1 day ago

I've been cooking something neat over the past weeks 👨‍🍳 We all know that training LLMs requires a lot of resources and especially a lot of compute in form of GPUs, or is super slow and inefficient when done on CPUs. The big players use giant clusters of Nvidia H100s. But if I look at the profiles of my fellow home brewers, all we can get our hands on are those pesky consumer RTX's. If you're lucky you got yourself a 5080 with 16GB VRAM or something. To be frank, I don't have that 1.3k disposable cash laying around ¯\_(ツ)_/¯ But I can write rust and like building ML libraries. So I asked myself the question(s): - can I train SMLs at home on my hardware? - How hard can it be to build a ML library that can stream data between RAM and VRAM on demand, like llama.cpp's unified memory feature [^1]? - how hard can it be to implement bf16 support? The answers are wild, trust me! Image 1: Metrics form last nights build on my "tiny" RTX 2060 (6 GB VRAM) Image 2: Metrics from my most recent build on my RTX 4070 Laptop (8GB VRAM) The majority of my time went into the shared memory, but it's stable and I'm very excited! Here some debug logs, a la "trust me bro" ``` ---- Currently available: 1112735744, attempting to reclaim: 1073741824 --- VRAM STATE [backward pass] --- Driver Used: 6744 MB / 7805 MB Data on GPU: 1641 MB Grads on GPU: 3459 MB CPU Offloaded: 18230 MB --------------------------------- Currently available: 1079181312, attempting to reclaim: 1073741824 --- VRAM STATE [backward pass] --- Driver Used: 6776 MB / 7805 MB Data on GPU: 1561 MB Grads on GPU: 3279 MB CPU Offloaded: 18590 MB ----------------------------- ``` Final models get exported in `safetensors` format and are compatible with PyTorch and `transformers`, for accessibility. - [^1]: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#unified-memory

posted an update 1 day ago

I've been cooking something neat over the past weeks 👨‍🍳 We all know that training LLMs requires a lot of resources and especially a lot of compute in form of GPUs, or is super slow and inefficient when done on CPUs. The big players use giant clusters of Nvidia H100s. But if I look at the profiles of my fellow home brewers, all we can get our hands on are those pesky consumer RTX's. If you're lucky you got yourself a 5080 with 16GB VRAM or something. To be frank, I don't have that 1.3k disposable cash laying around ¯\_(ツ)_/¯ But I can write rust and like building ML libraries. So I asked myself the question(s): - can I train SMLs at home on my hardware? - How hard can it be to build a ML library that can stream data between RAM and VRAM on demand, like llama.cpp's unified memory feature [^1]? - how hard can it be to implement bf16 support? The answers are wild, trust me! Image 1: Metrics form last nights build on my "tiny" RTX 2060 (6 GB VRAM) Image 2: Metrics from my most recent build on my RTX 4070 Laptop (8GB VRAM) The majority of my time went into the shared memory, but it's stable and I'm very excited! Here some debug logs, a la "trust me bro" ``` ---- Currently available: 1112735744, attempting to reclaim: 1073741824 --- VRAM STATE [backward pass] --- Driver Used: 6744 MB / 7805 MB Data on GPU: 1641 MB Grads on GPU: 3459 MB CPU Offloaded: 18230 MB --------------------------------- Currently available: 1079181312, attempting to reclaim: 1073741824 --- VRAM STATE [backward pass] --- Driver Used: 6776 MB / 7805 MB Data on GPU: 1561 MB Grads on GPU: 3279 MB CPU Offloaded: 18590 MB ----------------------------- ``` Final models get exported in `safetensors` format and are compatible with PyTorch and `transformers`, for accessibility. - [^1]: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#unified-memory

upvoted an article 3 days ago

Safetensors is Joining the PyTorch Foundation

View all activity

Organizations

philipp-zettl 's models 72

philipp-zettl/qwen3-0.6b-german

Text Generation • Updated 10 days ago • 401

philipp-zettl/qwen3-0.6b-german-merged

0.6B • Updated 10 days ago • 16

philipp-zettl/qwen3.5-0.8b-german

Updated 13 days ago

philipp-zettl/modernbert-diffusion-openwebtext

Fill-Mask • 0.1B • Updated Feb 18

philipp-zettl/Dy-SViT-CIFAR10

Image Classification • Updated Feb 16

philipp-zettl/modernbert-diffusion-universal

Fill-Mask • 0.1B • Updated Feb 16

philipp-zettl/modernbert-diffusion-refactor

Fill-Mask • 0.1B • Updated Feb 11

philipp-zettl/modernbert-diffusion-alpaca-ft

Fill-Mask • 0.1B • Updated Feb 11

philipp-zettl/modernbert-diffusion-code

Fill-Mask • 0.1B • Updated Feb 7

philipp-zettl/modernbert-diffusion-instruct

Fill-Mask • 0.1B • Updated Feb 6

philipp-zettl/modernbert-diffusion-mix

0.1B • Updated Feb 6

philipp-zettl/modernbert-diffusion-ft

Fill-Mask • 0.1B • Updated Feb 6

philipp-zettl/chessPT

Text Generation • Updated Dec 30, 2025 • 7

philipp-zettl/allenai__OLMo-1B-hf

1B • Updated Nov 10, 2025

philipp-zettl/ibm-granite__granite-docling-258M

Image-Text-to-Text • 0.3B • Updated Nov 10, 2025 • 1

philipp-zettl/ibm-granite__granite-4.0-h-tiny

Text Generation • 7B • Updated Nov 10, 2025 • 3

philipp-zettl/MTGEmb-small

Sentence Similarity • 0.2B • Updated Nov 7, 2025 • 2

philipp-zettl/MTGEmb

Sentence Similarity • 0.2B • Updated Nov 6, 2025 • 1

philipp-zettl/granite-docling-258M-Q8_0-GGUF

Image-Text-to-Text • 0.2B • Updated Oct 23, 2025 • 14

philipp-zettl/Qwen3-8B-Q4_K_M-GGUF

Text Generation • 8B • Updated Oct 10, 2025 • 4

philipp-zettl/Qwen3-Reranker-0.6B-Q8_0-GGUF

Text Ranking • 0.6B • Updated Oct 10, 2025 • 14

philipp-zettl/Qwen3-Embedding-0.6B-Q8_0-GGUF

Feature Extraction • 0.6B • Updated Oct 10, 2025 • 20

philipp-zettl/Qwen3-1.7B-Q8_0-GGUF

Text Generation • 2B • Updated Oct 10, 2025 • 2

philipp-zettl/jon-juarez-LoRA-lora

Text-to-Image • Updated Oct 1, 2025 • 62 •

philipp-zettl/tiny_t5_de-Q8_0-GGUF

9.42M • Updated Sep 26, 2025 • 1

philipp-zettl/NibbleNix-T5

Updated Sep 26, 2025

philipp-zettl/tiny_t5_de-Q4_K_M-GGUF

9.42M • Updated Sep 26, 2025 • 2

philipp-zettl/tiny_t5_de

9.42M • Updated Sep 25, 2025

philipp-zettl/gemma-3-270m-it-Q8_0-GGUF

Text Generation • 0.3B • Updated Sep 25, 2025 • 23

philipp-zettl/embeddinggemma-300m-Q8_0-GGUF

Sentence Similarity • 0.3B • Updated Sep 25, 2025 • 4