π Multilingual Topic Classifier
A multilingual text classification model fine-tuned on the SIB-200 dataset, capable of classifying text into 7 topics across 205 languages.
Model Details
- Base model: xlm-roberta-base
- Task: Text Classification (Topic)
- Languages: 205
- Developed by: Keshav0308
Topics
| Label | Description |
|---|---|
| π geography | Geographic content |
| π¬ science/technology | Science and tech content |
| π¬ entertainment | Entertainment content |
| ποΈ politics | Political content |
| π₯ health | Health and medical content |
| βοΈ travel | Travel content |
| β½ sports | Sports content |
Performance
| Metric | Score |
|---|---|
| Test Accuracy | 69.17% |
| Test F1 Macro | 67.62% |
| Languages | 205 |
Usage
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="Keshav0308/multilingual-topic-classifier"
)
# Works in any language!
classifier("The patient was diagnosed with pneumonia.")
# {'label': 'health', 'score': 0.999}
classifier("El equipo ganΓ³ el campeonato mundial de fΓΊtbol.")
# {'label': 'sports', 'score': 0.999}
Training Data
Fine-tuned on SIB-200 β a massively multilingual dataset with 205 languages.
- Train samples: 143,705
- Validation samples: 20,295
- Test samples: 41,820
- Downloads last month
- 27