PiperSR-2x: ANE-Native Super Resolution for Apple Silicon

Real-time 2x AI upscaling on Apple's Neural Engine. 44.4 FPS at 720p on M2 Max, 928 KB model, every op runs natively on ANE with zero CPU/GPU fallback.

Not a converted PyTorch model — an architecture designed from ANE hardware measurements. Every dimension, operation, and data type is dictated by Neural Engine characteristics.

Key Results

Model	Params	Set5	Set14	BSD100	Urban100
Bicubic	—	33.66	30.24	29.56	26.88
FSRCNN	13K	37.05	32.66	31.53	29.88
PiperSR	453K	37.54	33.21	31.98	31.38
SAFMN	228K	38.00	~33.7	~32.2	—

Beats FSRCNN across all benchmarks. Within 0.46 dB of SAFMN on Set5 — below the perceptual threshold for most content.

Performance

Configuration	FPS	Hardware	Notes
Full-frame 640×360 → 1280×720	44.4	M2 Max	ANE predict 20.8 ms
128×128 tiles (static weights)	125.6	M2	Baked weights, 2.82× vs dynamic
128×128 tiles (dynamic weights)	44.5	M2	CoreML default

Real-time 2× upscaling at 30+ FPS on any Mac with Apple Silicon. The ANE sits idle during video playback — PiperSR puts it to work.

Architecture

453K-parameter network: 6 residual blocks at 64 channels with BatchNorm and SiLU activations, upscaling via PixelShuffle.

Input (128×128×3 FP16)
  → Head: Conv 3×3 (3 → 64)
  → Body: 6× ResBlock [Conv 3×3 → BatchNorm → SiLU → Conv 3×3 → BatchNorm → Residual Add]
  → Tail: Conv 3×3 (64 → 12) → PixelShuffle(2)
Output (256×256×3)

Compiles to 5 MIL ops: conv, add, silu, pixel_shuffle, const. All verified ANE-native.

Why ANE-native matters

Off-the-shelf super resolution models (SPAN, Real-ESRGAN) were designed for CUDA GPUs and converted to CoreML after the fact. They waste the ANE:

Misaligned channels (48 instead of 64) waste 25%+ of each ANE tile
Monolithic full-frame tensors serialize the ANE's parallel compute lanes
Silent CPU fallback from unsupported ops can 5-10× latency
No batched tiles means 60× dispatch overhead

PiperSR addresses every one of these by designing around ANE constraints.

Model Variants

File	Use Case	Input → Output
`PiperSR_2x.mlpackage`	Static images (128px tiles)	128×128 → 256×256
`PiperSR_2x_video_720p.mlpackage`	Video (full-frame, BN-fused)	640×360 → 1280×720
`PiperSR_2x_256.mlpackage`	Static images (256px tiles)	256×256 → 512×512

Usage

With ToolPiper (recommended)

PiperSR is integrated into ToolPiper, a local macOS AI toolkit. Install ToolPiper, enable the MediaPiper browser extension, and every 720p video on the web is upscaled to 1440p in real time.

# Via MCP tool
mcp__toolpiper__image_upscale image=/path/to/image.png

# Via REST API
curl -X POST http://127.0.0.1:9998/v1/images/upscale \
  -F "[email protected]" \
  -o upscaled.png

With CoreML (Swift)

import CoreML

let config = MLModelConfiguration()
config.computeUnits = .cpuAndNeuralEngine  // NOT .all — .all is 23.6% slower

let model = try PiperSR_2x(configuration: config)
let input = try PiperSR_2xInput(x: pixelBuffer)
let output = try model.prediction(input: input)
// output.var_185 contains the 2× upscaled image

Important: Use .cpuAndNeuralEngine, not .all. CoreML's .all silently misroutes pure-ANE ops onto the GPU, causing a 23.6% slowdown for this model.

With coremltools (Python)

import coremltools as ct
from PIL import Image
import numpy as np

model = ct.models.MLModel("PiperSR_2x.mlpackage")

img = Image.open("input.png").resize((128, 128))
arr = np.array(img).astype(np.float32) / 255.0
arr = np.transpose(arr, (2, 0, 1))[np.newaxis]  # NCHW

result = model.predict({"x": arr})

Training

Trained on DIV2K (800 training images) with L1 loss and random augmentation (flips, rotations). Total training cost: ~$6 on RunPod A6000 instances. Full training journey documented from 33.46 dB to 37.54 dB across 12 experiment findings.

Technical Details

Compute units: .cpuAndNeuralEngine (ANE primary, CPU for I/O only)
Precision: Float16
Input format: NCHW, normalized to [0, 1]
Output format: NCHW, [0, 1]
Model size: 928 KB (compiled .mlmodelc)
Parameters: 453K
ANE ops used: conv, batch_norm (fused at inference), silu, add, pixel_shuffle, const
CPU fallback ops: None

License

Apache 2.0

Citation

@software{pipersr2025,
  title={PiperSR: ANE-Native Super Resolution for Apple Silicon},
  author={ModelPiper},
  year={2025},
  url={https://huggingface.co/ModelPiper/PiperSR-2x}
}

Downloads last month: -

Dataset used to train ModelPiper/PiperSR-2x

Evaluation results

PSNR on Set5
self-reported

37.540
PSNR on Set14
self-reported

33.210
PSNR on BSD100
self-reported

31.980
PSNR on Urban100
self-reported

31.380