Infinity-2B GGUF with SageAttention
Unofficial Q8_0 GGUF quantization of Infinity-2B with SageAttention support for even faster generation.
Features
β¨ SageAttention Integration - 2-5x faster than FlashAttention with automatic fallback π¨ Gradio Web UI - Easy-to-use interface for image generation πΎ Q8_0 Quantization - ~75% memory reduction with minimal quality loss π Optimized Inference - T5 encoder on CPU, efficient VRAM usage π§ GGUF Support - On-the-fly dequantization with flexible deployment
Quick Start
Web UI (Recommended)
python gradio_webui.py --autoload
Then open http://127.0.0.1:7860 in your browser.
Command Line
python generate_image_2b_q8_gguf.py \
--prompt "an astronaut riding a horse on the moon" \
--output output.png
Installation
1. Basic Requirements
pip install -r Infinity/requirements.txt
pip install gradio gguf
2. Install SageAttention (Optional, Recommended)
For faster generation:
pip install sageattention>=2.2.0 --no-build-isolation
Requirements: CUDA β₯12.0 (CUDA 12.8+ for Blackwell GPUs like RTX 50-series)
Note: SageAttention is optional. The code automatically falls back to:
- SageAttention (if installed) - 2-5x faster β¨
- FlashAttention (if available) - faster than PyTorch
- PyTorch SDPA (always works) - built-in fallback
3. Download Models
You'll need:
infinity_2b_reg_Q8_0.gguf- Infinity-2B model (~2.1 GB)flan-t5-xl-encoder-Q8_0.gguf- T5 text encoder (~1.0 GB)Infinity/infinity_vae_d32_reg.pth- VAE decoder (~0.5 GB)
Memory Requirements
| Component | VRAM Usage |
|---|---|
| Infinity-2B (Q8_0) | ~2.5 GB |
| VAE | ~0.5 GB |
| Working Memory | ~1-2 GB |
| Total (1M res) | ~4-5 GB |
T5 encoder runs on CPU to save VRAM!
Recommended: 8GB+ VRAM for comfortable 1M (1024Γ1024) generation
Web UI Features
The Gradio web interface provides:
- Model Management: Load models once, reuse for all generations
- Full Parameter Control: CFG scale, tau, resolution, aspect ratio, seed
- Real-time Preview: See your images as they generate
- Progress Tracking: Visual feedback during loading and generation
- Clean Layout: Model paths banner, settings on left, output on right
Web UI Options
# Basic usage
python gradio_webui.py
# Auto-load models on startup (faster)
python gradio_webui.py --autoload
# Create public share link
python gradio_webui.py --share
# Custom port
python gradio_webui.py --server-port 8080
# Full options
python gradio_webui.py \
--autoload \
--server-port 7860 \
--infinity-gguf path/to/infinity.gguf \
--t5-gguf path/to/t5.gguf \
--vae-path path/to/vae.pth
Command-Line Options
python generate_image_2b_q8_gguf.py [OPTIONS]
| Option | Description | Default |
|---|---|---|
--prompt TEXT |
Text prompt for image generation | "an astronaut..." |
--infinity-gguf PATH |
Path to Infinity GGUF file | infinity_2b_reg_Q8_0.gguf |
--t5-gguf PATH |
Path to T5 encoder GGUF | flan-t5-xl-encoder-Q8_0.gguf |
--vae-path PATH |
Path to VAE checkpoint | Infinity/infinity_vae_d32_reg.pth |
--output PATH |
Output image path | output.png |
--cfg-scale FLOAT |
CFG scale (1.0-10.0) | 3.0 |
--tau FLOAT |
Temperature (0.1-1.0) | 0.5 |
--seed INT |
Random seed for reproducibility | 42 |
--pn {0.06M,0.25M,1M} |
Resolution preset | 1M |
--aspect-ratio FLOAT |
Aspect ratio (height/width) | 1.0 |
Technical Details
Quantization
- Q8_0 format: 8-bit quantization with minimal quality loss
- On-the-fly dequantization: Using custom GGUFLinear layers
- Memory savings: ~75% reduction vs FP16
- Quality: Nearly identical to FP16
Architecture
- Infinity-2B: 2.0B parameters, embed_dim=2048, depth=32
- T5-XL Encoder: 2048-dim text embeddings
- VAE: d32 with dynamic resolution support
GGUF Support
The implementation includes:
- Import utilities for GGUF tensors
- Custom
GGUFLinearlayers for on-the-fly dequantization - Patched attention mechanisms for compatibility
- F16 dtype handling for head layers
See patch_infinity_for_gguf.sh for implementation details.
Credits
- Original Model: Infinity by FoundationVision
- SageAttention: thu-ml/SageAttention
- GGUF Format: ggerganov/ggml
License
MIT
- Downloads last month
- 25
8-bit
Model tree for kzopp/Infinity-2B-GGUF_UNOFFICIAL
Base model
FoundationVision/Infinity