HapticVLA: Contact-Rich Manipulation via Vision-Language-Action Model without Inference-Time Tactile Sensing
Paper • 2603.15257 • Published
Fine-tuned SmolVLA with Sensitivity-Aware Reward-Weighted Flow Matching (SA-RWFM) and dual tactile sensors for right-arm manipulation on the Crab robot.
This model serves as the tactile-conditioned teacher for knowledge distillation into HapticVLA.
lerobot/smolvla_base (450M params) + DualTactileEncoder| Task | Success Rate | Force Errors |
|---|---|---|
| Eggs | 85% | 3/20 |
| Can | 55% | 9/20 |
| Waffles | 85% | 3/20 |
| Mean | 75.0% | 15/60 |
Note: This model requires tactile sensor hardware at inference. For a tactile-free alternative with better performance, see HapticVLA.
import torch
checkpoint = torch.load("best/model.pt", map_location="cpu")
See Advanced-Robotic-Manipulation/crab for full inference pipeline.
If you use this model, please cite our paper:
@article{gubernatorov2026hapticvla,
title={HapticVLA: Contact-Rich Manipulation via Vision-Language-Action Model without Inference-Time Tactile Sensing},
author={Gubernatorov, Konstantin and Sannikov, Mikhail and Mikhalchuk, Ilya and Kuznetsov, Egor and Artemov, Makar and Ouwatobi, Ogunwoye Faith and Fernando, Marcelino and Asanov, Artem and Guo, Ziang and Tsetserukou, Dzmitry},
journal={arXiv preprint arXiv:2603.15257},
year={2026}
}
Base model
lerobot/smolvla_base