ALBERT-base-v2 Fine-tuned for Semantic Similarity (QQP/MRPC)
Model Details
Model Description
This is a fine-tuned version of albert-base-v2 on paraphrase detection tasks such as GLUE-QQP (Quora Question Pairs) and MRPC (Microsoft Research Paraphrase Corpus).
It can be used to determine whether two sentences are paraphrases (semantically similar) or not.
- Developed by: Peeyush
- Model type: Sentence-pair classification (binary: paraphrase vs not paraphrase)
- Language(s): English
- License: Apache-2.0
- Finetuned from model: albert-base-v2
Model Sources [optional]
- Repository: your-username/albert-paraphrase-similarity
- Paper (base model): ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Uses
Direct Use
- Paraphrase detection: Check if two sentences mean the same thing.
- Semantic textual similarity: Determine closeness of meaning between two texts.
Downstream Use
- Duplicate question detection (e.g., Q&A forums like Quora or StackOverflow).
- Information retrieval (ranking by semantic similarity).
- Chatbots / Virtual assistants (detecting intent rephrasing).
Out-of-Scope Use
- Not a generative model β cannot rewrite or generate paraphrases.
- Not trained on multilingual data β limited to English.
Bias, Risks, and Limitations
- The model inherits biases from QQP/MRPC (e.g., common question styles, certain domains).
- May not generalize to informal text, code-mixed text, or specialized domains (e.g., medical, legal).
- Can misclassify edge cases where semantic similarity is subtle.
Recommendations
- Always evaluate on your target domain before deployment.
- For production, consider threshold-tuning (instead of raw classification).
How to Get Started with the Model
Example usage:
model = AutoModelForSequenceClassification.from_pretrained('peeyush01/albert-paraphrase-detector')
tokenizer = AutoTokenizer.from_pretrained('peeyush01/albert-paraphrase-detector-tokenizer')
def predict_paraphrase(sentence1, sentence2):
inputs = tokenizer(sentence1, sentence2, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.softmax(logits, dim=1)
paraphrase_prob = probs[0][1].item()
return {"Paraphrase": paraphrase_prob, "Not Paraphrase": 1 - paraphrase_prob}
import torch
pairs = [
("The movie was fantastic!", "The film was amazing!"),
("He is playing cricket.", "She is reading a book."),
]
for s1, s2 in pairs:
result = predict_paraphrase(s1, s2)
print(f"Sentence 1: {s1}")
print(f"Sentence 2: {s2}")
print(f"Result: {result}\n")
Training Details
Training Data
- Dataset: GLUE MRPC
- Description: The Microsoft Research Paraphrase Corpus (MRPC) contains pairs of sentences automatically extracted from online news sources, with human annotations indicating whether each pair captures a paraphrase/semantic equivalence relationship.
- Size: ~3,700 training pairs, 408 validation pairs, 1,725 test pairs.
- Labels:
1β Paraphrase (semantically equivalent)0β Not paraphrase
Training Procedure
Preprocessing
- Both sentences were tokenized using AlbertTokenizer with truncation and padding (
max_length). - Columns
sentence1,sentence2, andidxwere dropped. - The label column was renamed from
labelβlabels. - Dataset was set in PyTorch format.
Training Hyperparameters
- Base model:
albert-base-v2 - Epochs: 3
- Batch size: 16 (train and eval)
- Optimizer: AdamW (via Hugging Face
Trainer) - Warmup steps: 600
- Weight decay: 0.01
- Evaluation strategy: Per epoch
- Precision regime: FP32
Speeds, Sizes, Times
- Training performed with Hugging Face
Trainer. - Training time: ~20β30 mins on a single GPU (Tesla T4); longer on CPU.
- Final checkpoint size: ~47 MB.
Evaluation
Testing Data, Factors & Metrics
Testing Data
- Evaluation performed on the GLUE MRPC validation set (~408 examples).
Factors
- Sentence pairs vary in length, syntactic complexity, and semantic overlap.
- Evaluation primarily captures semantic similarity in short news-style English text.
Metrics
- Accuracy: percentage of correctly classified sentence pairs.
- F1 Score: harmonic mean of precision and recall, important due to class imbalance.
Results
(Expected range for ALBERT-base on MRPC β please replace with your actual run metrics if available)
- Accuracy: ~86β88%
- F1 Score: ~89β91%
Summary
The fine-tuned ALBERT model achieves strong performance on the MRPC benchmark, demonstrating effectiveness at capturing semantic similarity and paraphrase relationships between sentence pairs.
Model tree for peeyush01/albert-paraphrase-detector-tokenizer
Base model
albert/albert-base-v2