Granite-Docling-258M Variants, Docker and Inference Snippets Starting Point
#27
by
daybytez - opened
Granite-Docling-258M Variants Overview (Bytez Round-Up)
| Variant | RAM / VRAM Needed | Notes (Usage) | Download (Formatted) | Run Instantly |
|---|---|---|---|---|
| Granite-Docling-258M (Base) | ~8–12 GB (est., fp16) | Multimodal Image+Text→Text for doc conversion | HF safetensors | Bytez Docling |
| Granite-Docling-258M-MLX | ~8–12 GB (est., fp16) | Optimized for Apple Silicon / MLX runtime | HF MLX | ❓ |
| Quantized (community, GGUF / GPTQ) | ~2–6 GB (est., 4-bit / 8-bit) | Runs on consumer GPUs with lower memory | GGUF/GPTQ search | ❓ |
| Domain fine-tunes (planned) | ~8–12 GB (est.) | Legal, finance, or scientific doc specializations | HF search | ❓ |
Docker Quickstart
version: "3.8"
services:
granite_docling:
image: ghcr.io/bytez-com/models/ibm-granite/granite-docling-258m:latest
ports:
- "8080:80"
Run request:
docker compose up -d
curl -X POST http://localhost:8080/generate \
-H "Content-Type: application/json" \
-d '{"prompt":"Convert this PDF page into structured Markdown.","max_tokens":128}'
Bytez SDK Quickstart (Node.js)
npm i bytez.js
# or
yarn add bytez.js
import Bytez from "bytez.js"
const sdk = new Bytez("YOUR_API_KEY")
const model = sdk.model("ibm-granite/granite-docling-258M")
const { error, output } = await model.run(
"Convert this scanned invoice into clean structured text."
)
console.log({ error, output })