🌍 Multilingual Topic Classifier

A multilingual text classification model fine-tuned on the SIB-200 dataset, capable of classifying text into 7 topics across 205 languages.

Model Details

  • Base model: xlm-roberta-base
  • Task: Text Classification (Topic)
  • Languages: 205
  • Developed by: Keshav0308

Topics

Label Description
🌍 geography Geographic content
πŸ”¬ science/technology Science and tech content
🎬 entertainment Entertainment content
πŸ›οΈ politics Political content
πŸ₯ health Health and medical content
✈️ travel Travel content
⚽ sports Sports content

Performance

Metric Score
Test Accuracy 69.17%
Test F1 Macro 67.62%
Languages 205

Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="Keshav0308/multilingual-topic-classifier"
)

# Works in any language!
classifier("The patient was diagnosed with pneumonia.")
# {'label': 'health', 'score': 0.999}

classifier("El equipo ganΓ³ el campeonato mundial de fΓΊtbol.")
# {'label': 'sports', 'score': 0.999}

Training Data

Fine-tuned on SIB-200 β€” a massively multilingual dataset with 205 languages.

  • Train samples: 143,705
  • Validation samples: 20,295
  • Test samples: 41,820
Downloads last month
27
Safetensors
Model size
0.3B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train Keshav0308/multilingual-topic-classifier

Space using Keshav0308/multilingual-topic-classifier 1