SpaRRTA Model README
This document explains the probe models used by the SpaRRTA Space demo and how they are organized on Hugging Face Model Hub.
What these models are
SpaRRTA uses:
- A frozen vision backbone from timm DINOv3 (
hf_model_idin manifest). - A lightweight learned probe head (
EfficientProbing) per:- backbone,
- perspective (
cameraorhuman), - triplet (for example
bridge_2_trashbin_bike).
The probe head predicts one of 4 classes:
FrontBackLeftRight
Available backbones
From space/artifacts/manifest.yaml:
vit_small_patch16_dinov3.lvd1689m(expected_feat_dim: 384)vit_base_patch16_dinov3.lvd1689m(expected_feat_dim: 768)vit_large_patch16_dinov3.lvd1689m(expected_feat_dim: 1024)vit_huge_plus_patch16_dinov3.lvd1689m(expected_feat_dim: 1280)vit_7b_patch16_dinov3.lvd1689m(expected_feat_dim: 4096, experimental)
Perspectives
Each triplet has separate heads for:
cameraviewhumanview
The app selects the corresponding checkpoint at inference time.
Triplets
A triplet is a spatial relation setup (reference, target, human) tied to one scene id, for example:
bridge_tree_truckbridge_2_trashbin_bikecity_2_hydrant_taxiwinter_town_2_snowman_husky
Each (backbone, perspective, triplet) combination maps to one .pt checkpoint.
Checkpoint format and path contract
Hub-style checkpoint path in manifest:
checkpoints/<model_key>/<perspective>/<triplet_id>.pt
Example:
checkpoints/vit_small_patch16_dinov3.lvd1689m/camera/bridge_2_trashbin_bike.pt
Legacy paths (artifacts/checkpoints/...) are accepted by the app and normalized automatically.
Model Hub repository structure
Recommended single model repo:
<hf-username>/sparrta-probes
Structure:
checkpoints/
vit_small_patch16_dinov3.lvd1689m/
camera/*.pt
human/*.pt
vit_base_patch16_dinov3.lvd1689m/
camera/*.pt
human/*.pt
vit_large_patch16_dinov3.lvd1689m/
camera/*.pt
human/*.pt
vit_huge_plus_patch16_dinov3.lvd1689m/
camera/*.pt
human/*.pt
vit_7b_patch16_dinov3.lvd1689m/
camera/*.pt
human/*.pt
manifest.yaml
How Space loads model checkpoints
The Space app resolves checkpoints with:
SPARRTA_MODEL_REPO_IDenvironment variable (preferred), ormodel_repo_idfield inspace/artifacts/manifest.yaml.
At runtime, it uses hf_hub_download(...) to fetch checkpoint files and caches them via the Hugging Face cache.
Optional:
SPARRTA_MODEL_REVISIONcan pin a specific tag/commit. If unset, the app uses latest.
Citation
If you find this research useful, please consider citing:
@misc{kargin2026sparrta,
title={SpaRRTa: A Synthetic Benchmark for Evaluating Spatial Intelligence in Visual Foundation Models},
author={Turhan Can Kargin and Wojciech Jasiński and Adam Pardyl and Bartosz Zieliński and Marcin Przewięźlikowski},
year={2026},
eprint={2601.11729},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2601.11729}
}
Model tree for turhancan97/SpaRRTa-probes
Base model
facebook/dinov3-vit7b16-pretrain-lvd1689m