Gemma 4 E2B β CoreML (ANE+GPU Optimized)
Converted from google/gemma-4-E2B-it for on-device inference on Apple devices via CoreML.
Models
| File | Size | Description |
|---|---|---|
model.mlpackage |
2.4 GB | Text decoder with stateful KV cache (int4 quantized) |
vision.mlpackage |
322 MB | Vision encoder (SigLIP-based, 16 transformer layers) |
model_config.json |
β | Model configuration |
hf_model/tokenizer.json |
31 MB | Tokenizer |
Features
- Multimodal: Image + text input β text output
- ANE-optimized: Conv2d linear layers, ANE RMSNorm, in-model argmax
- Stateful KV cache: MLState API (iOS 18+)
- Int4 quantized: Block-wise palettization (group_size=32)
- HF-exact match: "solid red square centered on white background" β
Usage
import coremltools as ct
import numpy as np
# Load models
vision = ct.models.MLModel('vision.mlpackage')
decoder = ct.models.MLModel('model.mlpackage')
state = decoder.make_state()
# Process image β vision features β text generation
See CoreML-LLM for the full conversion pipeline and iOS sample app.
Conversion
git clone https://github.com/john-rocky/CoreML-LLM
cd CoreML-LLM/conversion
pip install -r requirements.txt
python convert.py --model gemma4-e2b --context-length 512 --output ./output/gemma4-e2b
- Downloads last month
- 10
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for mlboydaisuke/gemma-4-E2B-coreml
Base model
google/gemma-4-E2B-it