A ViT-B/32 CLIP model trained for 4 epochs on the ye-pop dataset (491,520 images and their alt-texts). Research artifact of clip-synthetic-captions.

Note: likely not directly useful as it is severely undertrained.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train nopperl/clip-ye-pop-alt_txt

Collection including nopperl/clip-ye-pop-alt_txt