PiperSR-2x: ANE-Native Super Resolution for Apple Silicon
Real-time 2x AI upscaling on Apple's Neural Engine. 44.4 FPS at 720p on M2 Max, 928 KB model, every op runs natively on ANE with zero CPU/GPU fallback.
Not a converted PyTorch model β an architecture designed from ANE hardware measurements. Every dimension, operation, and data type is dictated by Neural Engine characteristics.
Key Results
| Model | Params | Set5 | Set14 | BSD100 | Urban100 |
|---|---|---|---|---|---|
| Bicubic | β | 33.66 | 30.24 | 29.56 | 26.88 |
| FSRCNN | 13K | 37.05 | 32.66 | 31.53 | 29.88 |
| PiperSR | 453K | 37.54 | 33.21 | 31.98 | 31.38 |
| SAFMN | 228K | 38.00 | ~33.7 | ~32.2 | β |
Beats FSRCNN across all benchmarks. Within 0.46 dB of SAFMN on Set5 β below the perceptual threshold for most content.
Performance
| Configuration | FPS | Hardware | Notes |
|---|---|---|---|
| Full-frame 640Γ360 β 1280Γ720 | 44.4 | M2 Max | ANE predict 20.8 ms |
| 128Γ128 tiles (static weights) | 125.6 | M2 | Baked weights, 2.82Γ vs dynamic |
| 128Γ128 tiles (dynamic weights) | 44.5 | M2 | CoreML default |
Real-time 2Γ upscaling at 30+ FPS on any Mac with Apple Silicon. The ANE sits idle during video playback β PiperSR puts it to work.
Architecture
453K-parameter network: 6 residual blocks at 64 channels with BatchNorm and SiLU activations, upscaling via PixelShuffle.
Input (128Γ128Γ3 FP16)
β Head: Conv 3Γ3 (3 β 64)
β Body: 6Γ ResBlock [Conv 3Γ3 β BatchNorm β SiLU β Conv 3Γ3 β BatchNorm β Residual Add]
β Tail: Conv 3Γ3 (64 β 12) β PixelShuffle(2)
Output (256Γ256Γ3)
Compiles to 5 MIL ops: conv, add, silu, pixel_shuffle, const. All verified ANE-native.
Why ANE-native matters
Off-the-shelf super resolution models (SPAN, Real-ESRGAN) were designed for CUDA GPUs and converted to CoreML after the fact. They waste the ANE:
- Misaligned channels (48 instead of 64) waste 25%+ of each ANE tile
- Monolithic full-frame tensors serialize the ANE's parallel compute lanes
- Silent CPU fallback from unsupported ops can 5-10Γ latency
- No batched tiles means 60Γ dispatch overhead
PiperSR addresses every one of these by designing around ANE constraints.
Model Variants
| File | Use Case | Input β Output |
|---|---|---|
PiperSR_2x.mlpackage |
Static images (128px tiles) | 128Γ128 β 256Γ256 |
PiperSR_2x_video_720p.mlpackage |
Video (full-frame, BN-fused) | 640Γ360 β 1280Γ720 |
PiperSR_2x_256.mlpackage |
Static images (256px tiles) | 256Γ256 β 512Γ512 |
Usage
With ToolPiper (recommended)
PiperSR is integrated into ToolPiper, a local macOS AI toolkit. Install ToolPiper, enable the MediaPiper browser extension, and every 720p video on the web is upscaled to 1440p in real time.
# Via MCP tool
mcp__toolpiper__image_upscale image=/path/to/image.png
# Via REST API
curl -X POST http://127.0.0.1:9998/v1/images/upscale \
-F "[email protected]" \
-o upscaled.png
With CoreML (Swift)
import CoreML
let config = MLModelConfiguration()
config.computeUnits = .cpuAndNeuralEngine // NOT .all β .all is 23.6% slower
let model = try PiperSR_2x(configuration: config)
let input = try PiperSR_2xInput(x: pixelBuffer)
let output = try model.prediction(input: input)
// output.var_185 contains the 2Γ upscaled image
Important: Use
.cpuAndNeuralEngine, not.all. CoreML's.allsilently misroutes pure-ANE ops onto the GPU, causing a 23.6% slowdown for this model.
With coremltools (Python)
import coremltools as ct
from PIL import Image
import numpy as np
model = ct.models.MLModel("PiperSR_2x.mlpackage")
img = Image.open("input.png").resize((128, 128))
arr = np.array(img).astype(np.float32) / 255.0
arr = np.transpose(arr, (2, 0, 1))[np.newaxis] # NCHW
result = model.predict({"x": arr})
Training
Trained on DIV2K (800 training images) with L1 loss and random augmentation (flips, rotations). Total training cost: ~$6 on RunPod A6000 instances. Full training journey documented from 33.46 dB to 37.54 dB across 12 experiment findings.
Technical Details
- Compute units:
.cpuAndNeuralEngine(ANE primary, CPU for I/O only) - Precision: Float16
- Input format: NCHW, normalized to [0, 1]
- Output format: NCHW, [0, 1]
- Model size: 928 KB (compiled .mlmodelc)
- Parameters: 453K
- ANE ops used: conv, batch_norm (fused at inference), silu, add, pixel_shuffle, const
- CPU fallback ops: None
License
Apache 2.0
Citation
@software{pipersr2025,
title={PiperSR: ANE-Native Super Resolution for Apple Silicon},
author={ModelPiper},
year={2025},
url={https://huggingface.co/ModelPiper/PiperSR-2x}
}
- Downloads last month
- -
Dataset used to train ModelPiper/PiperSR-2x
Evaluation results
- PSNR on Set5self-reported37.540
- PSNR on Set14self-reported33.210
- PSNR on BSD100self-reported31.980
- PSNR on Urban100self-reported31.380