Granite-Docling-258M Variants, Docker and Inference Snippets Starting Point

#27
by daybytez - opened

Granite-Docling-258M Variants Overview (Bytez Round-Up)

Variant RAM / VRAM Needed Notes (Usage) Download (Formatted) Run Instantly
Granite-Docling-258M (Base) ~8–12 GB (est., fp16) Multimodal Image+Text→Text for doc conversion HF safetensors Bytez Docling
Granite-Docling-258M-MLX ~8–12 GB (est., fp16) Optimized for Apple Silicon / MLX runtime HF MLX
Quantized (community, GGUF / GPTQ) ~2–6 GB (est., 4-bit / 8-bit) Runs on consumer GPUs with lower memory GGUF/GPTQ search
Domain fine-tunes (planned) ~8–12 GB (est.) Legal, finance, or scientific doc specializations HF search

Docker Quickstart

version: "3.8"
services:
  granite_docling:
    image: ghcr.io/bytez-com/models/ibm-granite/granite-docling-258m:latest
    ports:
      - "8080:80"

Run request:

docker compose up -d
curl -X POST http://localhost:8080/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt":"Convert this PDF page into structured Markdown.","max_tokens":128}'

Bytez SDK Quickstart (Node.js)

npm i bytez.js
# or
yarn add bytez.js
import Bytez from "bytez.js"

const sdk = new Bytez("YOUR_API_KEY")
const model = sdk.model("ibm-granite/granite-docling-258M")

const { error, output } = await model.run(
  "Convert this scanned invoice into clean structured text."
)

console.log({ error, output })

Sign up or log in to comment