Apriel-1.5-15B-Thinker โ€” MLX Quantized (Apple Silicon)

Format: MLX (Apple Silicon)
Variants: 6-bit (recommended) Base model: ServiceNow-AI/Apriel-1.5-15B-Thinker
Architecture: Pixtral-style LLaVA (vision encoder โ†’ 2-layer projector โ†’ decoder)
Intended use: image understanding & grounded reasoning; document/chart/OCR-style tasks; math/coding Q&A with visual context.

This repository provides MLX-format weights for Apple Silicon (M-series) built from the original Apriel-1.5-15B-Thinker release. It is optimized for on-device inference with small memory footprints and fast startup on macOS.


๐Ÿ”Ž What is Apriel-1.5-15B-Thinker?

Apriel-1.5-15B-Thinker is a 15B open-weights multimodal reasoning model trained via a data-centric mid-training recipe rather than RLHF/RM. Starting from Pixtral-12B as the base, the authors apply:

  1. Depth Upscaling (capacity expansion without pretraining from scratch),
  2. Two-stage multimodal continual pretraining (CPT) to build text + visual reasoning, and
  3. High-quality SFT with explicit reasoning traces across math, coding, science, and tool use.
    This approach delivers frontier-level capability on compact compute. :contentReference[oaicite:0]{index=0}

Key reported results (original model)

  • AAI Index: 52, matching DeepSeek-R1-0528 at far lower compute. :contentReference[oaicite:1]{index=1}
  • Multimodal: On 10 image benchmarks, within ~5 points of Gemini-2.5-Flash and Claude Sonnet-3.7 on average. :contentReference[oaicite:2]{index=2}
  • Designed for single-GPU / constrained deployment scenarios. :contentReference[oaicite:3]{index=3}

Notes above summarize the upstream paper; MLX quantization can slightly affect absolute scores. Always validate on your use case.


๐Ÿ—๏ธ Architecture (high level)

  • Backbone: Pixtral-12B-Base-2409 adapted to a larger 15B decoder via depth upscaling (layers 40 โ†’ 48), then re-aligned with a 2-layer projection network connecting the vision encoder and decoder. :contentReference[oaicite:4]{index=4}
  • Training stack:
    • CPT Stage-1: mixed tokens (โ‰ˆ50% text, 20% replay, 30% multimodal) for foundational reasoning & image understanding; 32k context; cosine LR with warmup; all components unfrozen; checkpoint averaging. :contentReference[oaicite:5]{index=5}
    • CPT Stage-2: targeted synthetic visual tasks (reconstruction, visual matching, detection, counting) to strengthen spatial/compositional/fine-grained reasoning; vision encoder frozen; loss on responses for instruct data; 16k context. :contentReference[oaicite:6]{index=6}
    • SFT: curated instruction-response pairs with explicit reasoning traces (math, coding, science, tools). :contentReference[oaicite:7]{index=7}

๐Ÿ’พ This MLX Release

  • Why MLX? Native Apple-Silicon inference with small binaries, fast load, and low memory overhead.
  • Whatโ€™s included: config.json, mlx_model*.safetensors (sharded), tokenizer & processor files, and metadata for VLM pipelines.
  • Quantization options:
    • 6-bit (recommended): best balance of quality & memory.

Tip: If youโ€™re capacity-constrained on an M1/M2, try 6-bit first;


โš™๏ธ Quickstart (CLI)

# Basic image caption
python -m mlx_vlm.generate \
  --model <this-repo-id> \
  --image /path/to/image.jpg \
  --prompt "Describe this image." \
  --max-tokens 128 --temperature 0.0 --device mps
Downloads last month
43
MLX
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mlx-community/Apriel-1.5-15b-Thinker-6bit-MLX

Quantized
(18)
this model

Collection including mlx-community/Apriel-1.5-15b-Thinker-6bit-MLX