ALBERT-base-v2 Fine-tuned for Semantic Similarity (QQP/MRPC)

Model Details

Model Description

This is a fine-tuned version of albert-base-v2 on paraphrase detection tasks such as GLUE-QQP (Quora Question Pairs) and MRPC (Microsoft Research Paraphrase Corpus).
It can be used to determine whether two sentences are paraphrases (semantically similar) or not.

Developed by: Peeyush
Model type: Sentence-pair classification (binary: paraphrase vs not paraphrase)
Language(s): English
License: Apache-2.0
Finetuned from model: albert-base-v2

Model Sources [optional]

Repository: your-username/albert-paraphrase-similarity
Paper (base model): ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Uses

Direct Use

Paraphrase detection: Check if two sentences mean the same thing.
Semantic textual similarity: Determine closeness of meaning between two texts.

Downstream Use

Duplicate question detection (e.g., Q&A forums like Quora or StackOverflow).
Information retrieval (ranking by semantic similarity).
Chatbots / Virtual assistants (detecting intent rephrasing).

Out-of-Scope Use

Not a generative model → cannot rewrite or generate paraphrases.
Not trained on multilingual data → limited to English.

Bias, Risks, and Limitations

The model inherits biases from QQP/MRPC (e.g., common question styles, certain domains).
May not generalize to informal text, code-mixed text, or specialized domains (e.g., medical, legal).
Can misclassify edge cases where semantic similarity is subtle.

Recommendations

Always evaluate on your target domain before deployment.
For production, consider threshold-tuning (instead of raw classification).

How to Get Started with the Model

Example usage:

model  = AutoModelForSequenceClassification.from_pretrained('peeyush01/albert-paraphrase-detector')
tokenizer = AutoTokenizer.from_pretrained('peeyush01/albert-paraphrase-detector-tokenizer')

def predict_paraphrase(sentence1, sentence2):
    inputs = tokenizer(sentence1, sentence2, return_tensors="pt", padding=True, truncation=True)
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.softmax(logits, dim=1)
        paraphrase_prob = probs[0][1].item()
    return {"Paraphrase": paraphrase_prob, "Not Paraphrase": 1 - paraphrase_prob}

import torch

pairs = [
    ("The movie was fantastic!", "The film was amazing!"),
    ("He is playing cricket.", "She is reading a book."),
]

for s1, s2 in pairs:
    result = predict_paraphrase(s1, s2)
    print(f"Sentence 1: {s1}")
    print(f"Sentence 2: {s2}")
    print(f"Result: {result}\n")

Training Details

Training Data

Dataset: GLUE MRPC
Description: The Microsoft Research Paraphrase Corpus (MRPC) contains pairs of sentences automatically extracted from online news sources, with human annotations indicating whether each pair captures a paraphrase/semantic equivalence relationship.
Size: ~3,700 training pairs, 408 validation pairs, 1,725 test pairs.
Labels:
- 1 → Paraphrase (semantically equivalent)
- 0 → Not paraphrase

Training Procedure

Preprocessing

Both sentences were tokenized using AlbertTokenizer with truncation and padding (max_length).
Columns sentence1, sentence2, and idx were dropped.
The label column was renamed from label → labels.
Dataset was set in PyTorch format.

Training Hyperparameters

Base model: albert-base-v2
Epochs: 3
Batch size: 16 (train and eval)
Optimizer: AdamW (via Hugging Face Trainer)
Warmup steps: 600
Weight decay: 0.01
Evaluation strategy: Per epoch
Precision regime: FP32

Speeds, Sizes, Times

Training performed with Hugging Face Trainer.
Training time: ~20–30 mins on a single GPU (Tesla T4); longer on CPU.
Final checkpoint size: ~47 MB.

Evaluation

Testing Data, Factors & Metrics

Testing Data

Evaluation performed on the GLUE MRPC validation set (~408 examples).

Factors

Sentence pairs vary in length, syntactic complexity, and semantic overlap.
Evaluation primarily captures semantic similarity in short news-style English text.

Metrics

Accuracy: percentage of correctly classified sentence pairs.
F1 Score: harmonic mean of precision and recall, important due to class imbalance.

Results

(Expected range for ALBERT-base on MRPC — please replace with your actual run metrics if available)

Accuracy: ~86–88%
F1 Score: ~89–91%

Summary

The fine-tuned ALBERT model achieves strong performance on the MRPC benchmark, demonstrating effectiveness at capturing semantic similarity and paraphrase relationships between sentence pairs.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for peeyush01/albert-paraphrase-detector-tokenizer

Base model

albert/albert-base-v2

Finetuned

(239)

this model

peeyush01
/

albert-paraphrase-detector-tokenizer