ALBERT-base-v2 Fine-tuned for Semantic Similarity (QQP/MRPC)

Model Details

Model Description

This is a fine-tuned version of albert-base-v2 on paraphrase detection tasks such as GLUE-QQP (Quora Question Pairs) and MRPC (Microsoft Research Paraphrase Corpus).
It can be used to determine whether two sentences are paraphrases (semantically similar) or not.

  • Developed by: Peeyush
  • Model type: Sentence-pair classification (binary: paraphrase vs not paraphrase)
  • Language(s): English
  • License: Apache-2.0
  • Finetuned from model: albert-base-v2

Model Sources [optional]

Uses

Direct Use

  • Paraphrase detection: Check if two sentences mean the same thing.
  • Semantic textual similarity: Determine closeness of meaning between two texts.

Downstream Use

  • Duplicate question detection (e.g., Q&A forums like Quora or StackOverflow).
  • Information retrieval (ranking by semantic similarity).
  • Chatbots / Virtual assistants (detecting intent rephrasing).

Out-of-Scope Use

  • Not a generative model β†’ cannot rewrite or generate paraphrases.
  • Not trained on multilingual data β†’ limited to English.

Bias, Risks, and Limitations

  • The model inherits biases from QQP/MRPC (e.g., common question styles, certain domains).
  • May not generalize to informal text, code-mixed text, or specialized domains (e.g., medical, legal).
  • Can misclassify edge cases where semantic similarity is subtle.

Recommendations

  • Always evaluate on your target domain before deployment.
  • For production, consider threshold-tuning (instead of raw classification).

How to Get Started with the Model

Example usage:

model  = AutoModelForSequenceClassification.from_pretrained('peeyush01/albert-paraphrase-detector')
tokenizer = AutoTokenizer.from_pretrained('peeyush01/albert-paraphrase-detector-tokenizer')

def predict_paraphrase(sentence1, sentence2):
    inputs = tokenizer(sentence1, sentence2, return_tensors="pt", padding=True, truncation=True)
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.softmax(logits, dim=1)
        paraphrase_prob = probs[0][1].item()
    return {"Paraphrase": paraphrase_prob, "Not Paraphrase": 1 - paraphrase_prob}
import torch

pairs = [
    ("The movie was fantastic!", "The film was amazing!"),
    ("He is playing cricket.", "She is reading a book."),
]

for s1, s2 in pairs:
    result = predict_paraphrase(s1, s2)
    print(f"Sentence 1: {s1}")
    print(f"Sentence 2: {s2}")
    print(f"Result: {result}\n")

Training Details

Training Data

  • Dataset: GLUE MRPC
  • Description: The Microsoft Research Paraphrase Corpus (MRPC) contains pairs of sentences automatically extracted from online news sources, with human annotations indicating whether each pair captures a paraphrase/semantic equivalence relationship.
  • Size: ~3,700 training pairs, 408 validation pairs, 1,725 test pairs.
  • Labels:
    • 1 β†’ Paraphrase (semantically equivalent)
    • 0 β†’ Not paraphrase

Training Procedure

Preprocessing

  • Both sentences were tokenized using AlbertTokenizer with truncation and padding (max_length).
  • Columns sentence1, sentence2, and idx were dropped.
  • The label column was renamed from label β†’ labels.
  • Dataset was set in PyTorch format.

Training Hyperparameters

  • Base model: albert-base-v2
  • Epochs: 3
  • Batch size: 16 (train and eval)
  • Optimizer: AdamW (via Hugging Face Trainer)
  • Warmup steps: 600
  • Weight decay: 0.01
  • Evaluation strategy: Per epoch
  • Precision regime: FP32

Speeds, Sizes, Times

  • Training performed with Hugging Face Trainer.
  • Training time: ~20–30 mins on a single GPU (Tesla T4); longer on CPU.
  • Final checkpoint size: ~47 MB.

Evaluation

Testing Data, Factors & Metrics

Testing Data

  • Evaluation performed on the GLUE MRPC validation set (~408 examples).

Factors

  • Sentence pairs vary in length, syntactic complexity, and semantic overlap.
  • Evaluation primarily captures semantic similarity in short news-style English text.

Metrics

  • Accuracy: percentage of correctly classified sentence pairs.
  • F1 Score: harmonic mean of precision and recall, important due to class imbalance.

Results

(Expected range for ALBERT-base on MRPC β€” please replace with your actual run metrics if available)

  • Accuracy: ~86–88%
  • F1 Score: ~89–91%

Summary

The fine-tuned ALBERT model achieves strong performance on the MRPC benchmark, demonstrating effectiveness at capturing semantic similarity and paraphrase relationships between sentence pairs.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for peeyush01/albert-paraphrase-detector-tokenizer

Finetuned
(239)
this model

Datasets used to train peeyush01/albert-paraphrase-detector-tokenizer