PiperSR-2x: ANE-Native Super Resolution for Apple Silicon

Real-time 2x AI upscaling on Apple's Neural Engine. 44.4 FPS at 720p on M2 Max, 928 KB model, every op runs natively on ANE with zero CPU/GPU fallback.

Not a converted PyTorch model β€” an architecture designed from ANE hardware measurements. Every dimension, operation, and data type is dictated by Neural Engine characteristics.

Key Results

Model Params Set5 Set14 BSD100 Urban100
Bicubic β€” 33.66 30.24 29.56 26.88
FSRCNN 13K 37.05 32.66 31.53 29.88
PiperSR 453K 37.54 33.21 31.98 31.38
SAFMN 228K 38.00 ~33.7 ~32.2 β€”

Beats FSRCNN across all benchmarks. Within 0.46 dB of SAFMN on Set5 β€” below the perceptual threshold for most content.

Performance

Configuration FPS Hardware Notes
Full-frame 640Γ—360 β†’ 1280Γ—720 44.4 M2 Max ANE predict 20.8 ms
128Γ—128 tiles (static weights) 125.6 M2 Baked weights, 2.82Γ— vs dynamic
128Γ—128 tiles (dynamic weights) 44.5 M2 CoreML default

Real-time 2Γ— upscaling at 30+ FPS on any Mac with Apple Silicon. The ANE sits idle during video playback β€” PiperSR puts it to work.

Architecture

453K-parameter network: 6 residual blocks at 64 channels with BatchNorm and SiLU activations, upscaling via PixelShuffle.

Input (128Γ—128Γ—3 FP16)
  β†’ Head: Conv 3Γ—3 (3 β†’ 64)
  β†’ Body: 6Γ— ResBlock [Conv 3Γ—3 β†’ BatchNorm β†’ SiLU β†’ Conv 3Γ—3 β†’ BatchNorm β†’ Residual Add]
  β†’ Tail: Conv 3Γ—3 (64 β†’ 12) β†’ PixelShuffle(2)
Output (256Γ—256Γ—3)

Compiles to 5 MIL ops: conv, add, silu, pixel_shuffle, const. All verified ANE-native.

Why ANE-native matters

Off-the-shelf super resolution models (SPAN, Real-ESRGAN) were designed for CUDA GPUs and converted to CoreML after the fact. They waste the ANE:

  • Misaligned channels (48 instead of 64) waste 25%+ of each ANE tile
  • Monolithic full-frame tensors serialize the ANE's parallel compute lanes
  • Silent CPU fallback from unsupported ops can 5-10Γ— latency
  • No batched tiles means 60Γ— dispatch overhead

PiperSR addresses every one of these by designing around ANE constraints.

Model Variants

File Use Case Input β†’ Output
PiperSR_2x.mlpackage Static images (128px tiles) 128Γ—128 β†’ 256Γ—256
PiperSR_2x_video_720p.mlpackage Video (full-frame, BN-fused) 640Γ—360 β†’ 1280Γ—720
PiperSR_2x_256.mlpackage Static images (256px tiles) 256Γ—256 β†’ 512Γ—512

Usage

With ToolPiper (recommended)

PiperSR is integrated into ToolPiper, a local macOS AI toolkit. Install ToolPiper, enable the MediaPiper browser extension, and every 720p video on the web is upscaled to 1440p in real time.

# Via MCP tool
mcp__toolpiper__image_upscale image=/path/to/image.png

# Via REST API
curl -X POST http://127.0.0.1:9998/v1/images/upscale \
  -F "[email protected]" \
  -o upscaled.png

With CoreML (Swift)

import CoreML

let config = MLModelConfiguration()
config.computeUnits = .cpuAndNeuralEngine  // NOT .all β€” .all is 23.6% slower

let model = try PiperSR_2x(configuration: config)
let input = try PiperSR_2xInput(x: pixelBuffer)
let output = try model.prediction(input: input)
// output.var_185 contains the 2Γ— upscaled image

Important: Use .cpuAndNeuralEngine, not .all. CoreML's .all silently misroutes pure-ANE ops onto the GPU, causing a 23.6% slowdown for this model.

With coremltools (Python)

import coremltools as ct
from PIL import Image
import numpy as np

model = ct.models.MLModel("PiperSR_2x.mlpackage")

img = Image.open("input.png").resize((128, 128))
arr = np.array(img).astype(np.float32) / 255.0
arr = np.transpose(arr, (2, 0, 1))[np.newaxis]  # NCHW

result = model.predict({"x": arr})

Training

Trained on DIV2K (800 training images) with L1 loss and random augmentation (flips, rotations). Total training cost: ~$6 on RunPod A6000 instances. Full training journey documented from 33.46 dB to 37.54 dB across 12 experiment findings.

Technical Details

  • Compute units: .cpuAndNeuralEngine (ANE primary, CPU for I/O only)
  • Precision: Float16
  • Input format: NCHW, normalized to [0, 1]
  • Output format: NCHW, [0, 1]
  • Model size: 928 KB (compiled .mlmodelc)
  • Parameters: 453K
  • ANE ops used: conv, batch_norm (fused at inference), silu, add, pixel_shuffle, const
  • CPU fallback ops: None

License

Apache 2.0

Citation

@software{pipersr2025,
  title={PiperSR: ANE-Native Super Resolution for Apple Silicon},
  author={ModelPiper},
  year={2025},
  url={https://huggingface.co/ModelPiper/PiperSR-2x}
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train ModelPiper/PiperSR-2x

Evaluation results