SpaRRTA Model README

This document explains the probe models used by the SpaRRTA Space demo and how they are organized on Hugging Face Model Hub.

What these models are

SpaRRTA uses:

  • A frozen vision backbone from timm DINOv3 (hf_model_id in manifest).
  • A lightweight learned probe head (EfficientProbing) per:
    • backbone,
    • perspective (camera or human),
    • triplet (for example bridge_2_trashbin_bike).

The probe head predicts one of 4 classes:

  • Front
  • Back
  • Left
  • Right

Available backbones

From space/artifacts/manifest.yaml:

  • vit_small_patch16_dinov3.lvd1689m (expected_feat_dim: 384)
  • vit_base_patch16_dinov3.lvd1689m (expected_feat_dim: 768)
  • vit_large_patch16_dinov3.lvd1689m (expected_feat_dim: 1024)
  • vit_huge_plus_patch16_dinov3.lvd1689m (expected_feat_dim: 1280)
  • vit_7b_patch16_dinov3.lvd1689m (expected_feat_dim: 4096, experimental)

Perspectives

Each triplet has separate heads for:

  • camera view
  • human view

The app selects the corresponding checkpoint at inference time.

Triplets

A triplet is a spatial relation setup (reference, target, human) tied to one scene id, for example:

  • bridge_tree_truck
  • bridge_2_trashbin_bike
  • city_2_hydrant_taxi
  • winter_town_2_snowman_husky

Each (backbone, perspective, triplet) combination maps to one .pt checkpoint.

Checkpoint format and path contract

Hub-style checkpoint path in manifest:

checkpoints/<model_key>/<perspective>/<triplet_id>.pt

Example:

checkpoints/vit_small_patch16_dinov3.lvd1689m/camera/bridge_2_trashbin_bike.pt

Legacy paths (artifacts/checkpoints/...) are accepted by the app and normalized automatically.

Model Hub repository structure

Recommended single model repo:

<hf-username>/sparrta-probes

Structure:

checkpoints/
  vit_small_patch16_dinov3.lvd1689m/
    camera/*.pt
    human/*.pt
  vit_base_patch16_dinov3.lvd1689m/
    camera/*.pt
    human/*.pt
  vit_large_patch16_dinov3.lvd1689m/
    camera/*.pt
    human/*.pt
  vit_huge_plus_patch16_dinov3.lvd1689m/
    camera/*.pt
    human/*.pt
  vit_7b_patch16_dinov3.lvd1689m/
    camera/*.pt
    human/*.pt
manifest.yaml

How Space loads model checkpoints

The Space app resolves checkpoints with:

  • SPARRTA_MODEL_REPO_ID environment variable (preferred), or
  • model_repo_id field in space/artifacts/manifest.yaml.

At runtime, it uses hf_hub_download(...) to fetch checkpoint files and caches them via the Hugging Face cache.

Optional:

  • SPARRTA_MODEL_REVISION can pin a specific tag/commit. If unset, the app uses latest.

Citation

If you find this research useful, please consider citing:

@misc{kargin2026sparrta,
  title={SpaRRTa: A Synthetic Benchmark for Evaluating Spatial Intelligence in Visual Foundation Models},
  author={Turhan Can Kargin and Wojciech Jasiński and Adam Pardyl and Bartosz Zieliński and Marcin Przewięźlikowski},
  year={2026},
  eprint={2601.11729},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2601.11729}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for turhancan97/SpaRRTa-probes

Dataset used to train turhancan97/SpaRRTa-probes

Collection including turhancan97/SpaRRTa-probes

Paper for turhancan97/SpaRRTa-probes