Instructions to use reducto/RolmOCR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use reducto/RolmOCR with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="reducto/RolmOCR") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("reducto/RolmOCR") model = AutoModelForImageTextToText.from_pretrained("reducto/RolmOCR") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use reducto/RolmOCR with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "reducto/RolmOCR" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "reducto/RolmOCR", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/reducto/RolmOCR
- SGLang
How to use reducto/RolmOCR with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "reducto/RolmOCR" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "reducto/RolmOCR", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "reducto/RolmOCR" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "reducto/RolmOCR", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use reducto/RolmOCR with Docker Model Runner:
docker model run hf.co/reducto/RolmOCR
RolmOCR by Reducto AI
Earlier this year, the Allen Institute for AI released olmOCR, an open-source tool that performs document OCR using the Qwen2-VL-7B vision language model (VLM). We were excited to see a high-quality, openly available approach to parsing PDFs and other complex documents β and curious to explore what else might be possible using newer foundation models and some lightweight optimizations.
The result is RolmOCR, a drop-in alternative to olmOCR thatβs faster, uses less memory, and still performs well on a variety of document types. We're releasing it under Apache 2.0 for anyone to try out, explore, or build on.
This model is a fine-tuned version of Qwen/Qwen2.5-VL-7B-Instruct on the full allenai/olmOCR-mix-0225 dataset.
Key changes
We made three notable changes:
New Base Model: We swapped in a more recent version of the existing model (Qwen2.5-VL-7B) as the foundation.
No Metadata inputs: Unlike the original, we donβt use metadata extracted from PDFs. This significantly reduces prompt length, which in turn lowers both processing time and VRAM usage β without hurting accuracy in most cases.
Rotation of training data: About 15% of the training data was rotated to enhance robustness to off-angle documents. We otherwise use the same training set.
Usage
Host your model with vLLM:
export VLLM_USE_V1=1
vllm serve reducto/RolmOCR
Call the model via openai compatible server:
# HOST YOUR OPENAI COMPATIBLE API WITH THE FOLLOWING COMMAND in VLLM:
# export VLLM_USE_V1=1
# vllm serve reducto/RolmOCR
from openai import OpenAI
import base64
client = OpenAI(api_key="123", base_url="http://localhost:8000/v1")
model = "reducto/RolmOCR-7b"
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
def ocr_page_with_rolm(img_base64):
response = client.chat.completions.create(
model=model,
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{img_base64}"},
},
{
"type": "text",
"text": "Return the plain text representation of this document as if you were reading it naturally.\n",
},
],
}
],
temperature=0.2,
max_tokens=4096
)
return response.choices[0].message.content
test_img_path = "path/to/image.png"
img_base64 = encode_image(test_img_path)
print(ocr_page_with_rolm(img_base64))
Limitations
- RolmOCR, like other VLM-based OCR solutions, still suffer from hallucination or dropping contents.
- Unlike the Reducto Parsing API, RolmOCR cannot output layout bounding boxes.
- We have not evaluated the performance of any quantized versions.
BibTex and citation info
@misc{RolmOCR,
author = {Reducto AI},
title = {RolmOCR: A Faster, Lighter Open Source OCR Model},
year = {2025},
}
- Downloads last month
- 313,273
Model tree for reducto/RolmOCR
Base model
Qwen/Qwen2.5-VL-7B-Instruct