--- license: apache-2.0 language: - en tags: - scene-graph-generation - object-detection - visual-relationship-detection - pytorch - yolo pipeline_tag: object-detection library_name: sgg-benchmark model-index: - name: REACT++ yolo12m results: - task: type: object-detection name: Scene Graph Detection dataset: name: VG150 type: vg150 metrics: - type: mR@20 value: 10.81 name: mR@20 - type: R@20 value: 18.76 name: R@20 - type: mR@50 value: 14.42 name: mR@50 - type: R@50 value: 24.63 name: R@50 - type: mR@100 value: 16.78 name: mR@100 - type: R@100 value: 28.47 name: R@100 - type: F1@20 value: 13.72 name: F1@20 - type: F1@50 value: 18.19 name: F1@50 - type: F1@100 value: 21.11 name: F1@100 - type: e2e_latency_ms value: 20.5 name: e2e_latency_ms - name: REACT++ yolo26m results: - task: type: object-detection name: Scene Graph Detection dataset: name: VG150 type: vg150 metrics: - type: mR@20 value: 10.81 name: mR@20 - type: R@20 value: 21.12 name: R@20 - type: mR@50 value: 14.6 name: mR@50 - type: R@50 value: 28.34 name: R@50 - type: mR@100 value: 18.36 name: mR@100 - type: R@100 value: 33.7 name: R@100 - type: F1@20 value: 14.3 name: F1@20 - type: F1@50 value: 19.27 name: F1@50 - type: F1@100 value: 23.77 name: F1@100 - type: e2e_latency_ms value: 19.8 name: e2e_latency_ms - name: REACT++ yolov8m results: - task: type: object-detection name: Scene Graph Detection dataset: name: VG150 type: vg150 metrics: - type: mR@20 value: 12.22 name: mR@20 - type: R@20 value: 22.89 name: R@20 - type: mR@50 value: 16.31 name: mR@50 - type: R@50 value: 29.96 name: R@50 - type: mR@100 value: 18.45 name: mR@100 - type: R@100 value: 34.09 name: R@100 - type: F1@20 value: 15.93 name: F1@20 - type: F1@50 value: 21.12 name: F1@50 - type: F1@100 value: 23.94 name: F1@100 - type: e2e_latency_ms value: 18.7 name: e2e_latency_ms --- # REACT++ Scene Graph Generation — VG150 (yolo12m, yolo26m, yolov8m) This repository contains **REACT++** model checkpoints for scene graph generation (SGG) on the **VG150** benchmark, across 3 backbone sizes. REACT++ is a parameter-efficient, attention-augmented relation predictor built on top of a YOLO backbone. It uses: - **DAMP** (Detection-Anchored Multi-Scale Pooling), a new simple pooling algorithm for one-stage object detectors such as YOLO - **SwiGLU gated MLP** for all feed-forward blocks (½ the params of ReLU-MLP at equal capacity) - **Visual x Semantic cross-attention** — visual tokens attend to GloVe prototype embeddings - **Geometry RoPE** — box-position encoded as a rotary frequency bias on the Q matrix - **Prototype Momentum Buffer** — per-class EMA prototype bank - **P5 Scene Context** — AIFI-enhanced P5 tokens provide global context via cross-attention The models were trained with the [SGG-Benchmark](https://github.com/Maelic/SGG-Benchmark) framework and described in the [REACT++ paper (Neau et al., 2026)](https://arxiv.org/abs/2603.06386). --- ## Results — SGDet on VG150 test split (CUDA, max_det=100, batch_size=1) > Metrics from end-to-end evaluation (`tools/evaluate.py`). Latency = model forward only. | Backbone | R@20 | R@50 | R@100 | mR@20 | mR@50 | mR@100 | F1@20 | F1@50 | F1@100 | Lat. (ms) | |----------|-----:|-----:|------:|------:|------:|-------:|------:|------:|-------:|--------------:| | yolo12m | 18.76 | 24.63 | 28.47 | 10.81 | 14.42 | 16.78 | 13.72 | 18.19 | 21.11 | 20.5 | | yolo26m | 21.12 | 28.34 | 33.7 | 10.81 | 14.6 | 18.36 | 14.3 | 19.27 | 23.77 | 19.8 | | yolov8m | 22.89 | 29.96 | 34.09 | 12.22 | 16.31 | 18.45 | 15.93 | 21.12 | 23.94 | 18.7 | --- ## Checkpoints | Variant | Sub-folder | Checkpoint files | |---------|------------|-----------------| | yolo12m | `yolo12m/` | `yolo12m/model.onnx` (ONNX) · `yolo12m/best_model_epoch_19.pth` (PyTorch) | | yolo26m | `yolo26m/` | `yolo26m/model.onnx` (ONNX) · `yolo26m/best_model_epoch_18.pth` (PyTorch) | | yolov8m | `yolov8m/` | `yolov8m/model.onnx` (ONNX) · `yolov8m/best_model_epoch_6.pth` (PyTorch) | --- ## Usage ### ONNX (recommended — no Python dependencies beyond onnxruntime) ```python from huggingface_hub import hf_hub_download onnx_path = hf_hub_download( repo_id="maelic/REACTPlusPlus_VG150", filename="yolo12m/react_pp_yolo12m.onnx", repo_type="model", ) # Run with tools/eval_onnx_psg.py or load directly via onnxruntime ``` ### PyTorch ```python # 1. Clone the repository # git clone https://github.com/Maelic/SGG-Benchmark # 2. Install dependencies # pip install -e . # 3. Download checkpoint + config from huggingface_hub import hf_hub_download ckpt_path = hf_hub_download( repo_id="maelic/REACTPlusPlus_VG150", filename="yolo12m/best_model.pth", repo_type="model", ) cfg_path = hf_hub_download( repo_id="maelic/REACTPlusPlus_VG150", filename="yolo12m/config.yml", repo_type="model", ) # 4. Run evaluation import subprocess subprocess.run([ "python", "tools/relation_eval_hydra.py", "--config-path", str(cfg_path), "--task", "sgdet", "--eval-only", "--checkpoint", str(ckpt_path), ]) ``` --- ## Citation ```bibtex @article{neau2026reactpp, title = {REACT++: Efficient Cross-Attention for Real-Time Scene Graph Generation }, author = {Neau, Maëlic and Falomir, Zoe}, year = {2026}, url = {https://arxiv.org/abs/2603.06386}, } ```