Upload README_full.md

Browse files

Files changed (1) hide show

README_full.md +233 -0

README_full.md ADDED Viewed

	@@ -0,0 +1,233 @@

+# AetherMind-KD-Student
+**A Robust and Efficient Knowledge-Distilled Model for Natural Language Inference (NLI)**
+Repository: **samerzaher80/AetherMind-KD-Student**
+License: **MIT**
+---
+# 📘 Overview
+**AetherMind-KD-Student** is a 184M-parameter Natural Language Inference (NLI) model distilled from a DeBERTa‑v3 teacher using a multi-stage, adversarial-aware knowledge distillation pipeline.
+The model achieves a superior balance of:
+- **Accuracy**
+- **Robustness**
+- **Zero-shot generalization**
+- **Inference speed**
+This makes it suitable for real-world reasoning systems, scientific text understanding, and future clinical NLI applications.
+---
+# 🧠 Key Features
+### ✔ Knowledge Distillation from Large DeBERTa-v3 Teachers
+- Soft targets (KLDivLoss) + hard labels (CrossEntropy)
+- Balanced curriculum across SNLI → MNLI → ANLI (teacher-distribution guided)
+- Temperature-scaled logits & entropy regularization
+### ✔ Strong Zero-Shot Reasoning
+The model was **not trained** on RTE, HANS, SciTail, XNLI, FEVER, or MedNLI.
+Despite this, it demonstrates strong transfer.
+### ✔ High Efficiency
+- **184M parameters**
+- **308.51 samples/second** on RTX 3050
+- Suitable for deployment and real-time reasoning
+### ✔ Robust to Adversarial Attacks
+- Strong results on ANLI & HANS
+- Reduced reliance on syntactic heuristics
+---
+# 📚 Training Datasets
+### ✔ Used in Training
+| Dataset | Purpose |
+|--------|----------|
+| **SNLI** | Core NLI training |
+| **MNLI** | Multi-domain generalization |
+| **ANLI R1–R3** | Adversarial robustness (teacher-guided) |
+### ✔ Not Used (Zero-Shot Only)
+| Dataset | Type | Notes |
+|--------|------|--------|
+| **RTE (GLUE)** | Textual Entailment | Zero-shot evaluation |
+| **HANS** | Syntactic Heuristics Test | Zero-shot |
+| **SciTail** | Science QA → NLI | Converted from 3-class to binary |
+| **XNLI English** | Cross-lingual NLI | Zero-shot |
+---
+# 🏗 Model Architecture
+### **AetherMind-KD-Student Architecture (184M parameters)**
+- 12-layer Transformer
+- Hidden size: **768**
+- Attention heads: **12**
+- Classification head: 3-way NLI logits
+- Enhanced contradiction representation (teacher-guided)
+- Optimized for speed and robustness
+---
+# 🔥 Knowledge Distillation Strategy
+### **KD Loss Composition**
+- **70%** KLDivLoss (teacher soft targets)
+- **30%** CrossEntropy (ground truth)
+- Temperature **T = 3.0**
+### **Training Enhancements**
+- BalancedBatchSampler (equal E/N/C per batch)
+- Entropy sharpening for contradiction
+- Adversarial signals from ANLI teacher
+- Multi-stage training curriculum
+- Gradient norm clipping & AdamW optimizer
+---
+# 📊 Full Evaluation Results
+## **1. Core NLI Benchmarks**
+| Dataset | Accuracy | Macro-F1 |
+|--------|----------|----------|
+| **MNLI (matched)** | **90.47%** | **90.42%** |
+| **MNLI (mismatched)** | **90.12%** | **90.07%** |
+| **SNLI** | ~89% | ~89% |
+---
+## **2. Adversarial NLI (ANLI)**
+| Dataset | Accuracy | Macro-F1 |
+|--------|----------|-----------|
+| **ANLI R1** | **73.60%** | **73.61%** |
+| **ANLI R2** | **57.70%** | **57.60%** |
+| **ANLI R3** | **53.67%** | **53.68%** |
+---
+## **3. Zero-Shot Generalization Results**
+### **RTE (GLUE)**
+- Accuracy: **86.28%**
+- Macro-F1: **86.20%**
+### **HANS**
+- Accuracy: **77.74%**
+- Macro-F1: **76.60%**
+### **SciTail (Binary)**
+| Split | Accuracy | Macro-F1 |
+|-------|----------|-----------|
+| Train | **82.37%** | **80.99%** |
+| Dev | **78.83%** | **78.81%** |
+### **XNLI (English, zero-shot)**
+- Accuracy: **90.92%**
+- Macro-F1: **90.94%**
+---
+# ⚡ Efficiency Benchmark
+| Metric | Result |
+|--------|--------|
+| Total Parameters | **184,424,451** |
+| SPS (samples/sec) | **308.51** |
+| Hardware | RTX 3050 (8GB), CUDA 11.8 |
+---
+# 🧪 Intended Use
+### ✔ Suitable For:
+- Reasoning engines
+- Scientific text understanding
+- Fact verification
+- Zero-shot inference setups
+- Downstream NLI applications
+### ✖ Not Suitable For:
+- Safety-critical decisions without human oversight
+- Clinical diagnosis (MedNLI not used in training)
+- Multilingual inference (English-only training)
+---
+# ⚠ Limitations
+- ANLI R3 remains challenging (industry-wide issue)
+- No multilingual fine-tuning
+- Not optimized for long-context inference
+---
+# 🔮 Future Work
+- Adversarial fine-tuning for ANLI R3
+- Cross-lingual training using XNLI full dataset
+- Specialized domain adapters (e.g., MedNLI, BioNLI)
+- Integration with AetherMind memory-based reasoning engine
+---
+# 📦 Files Included
+- `config.json`
+- `model.safetensors`
+- `tokenizer.json`
+- `tokenizer_config.json`
+- `special_tokens_map.json`
+- `spm.model`
+- `added_tokens.json`
+- `training_args.bin` *(optional)*
+- `trainer_state.json` *(optional)*
+---
+# 📥 How to Use
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+model_name = "samerzaher80/AetherMind-KD-Student"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+inputs = tokenizer("A cat sits on the mat.",
+                   "An animal is sitting.",
+                   return_tensors="pt")
+outputs = model(**inputs)
+print(outputs.logits)
+```
+---
+# 📜 Citation
+```
+@misc{aethermind2025kdstudent,
+  title={AetherMind-KD-Student: A Robust and Efficient Knowledge-Distilled NLI Model},
+  author={Sameer S. Najm},
+  year={2025},
+  publisher={Hugging Face},
+  howpublished={\url{https://huggingface.co/samerzaher80/AetherMind-KD-Student}}
+}
+```
+---
+# 👤 Author
+**Sameer S. Najm**
+Sam IT Solutions – Iraq
+---
+# 🪪 License
+**MIT License**