samerzaher80 commited on
Commit
adfcb50
ยท
verified ยท
1 Parent(s): 9fdf97e

Upload README_full.md

Browse files
Files changed (1) hide show
  1. README_full.md +233 -0
README_full.md ADDED
@@ -0,0 +1,233 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # AetherMind-KD-Student
3
+ **A Robust and Efficient Knowledge-Distilled Model for Natural Language Inference (NLI)**
4
+ Repository: **samerzaher80/AetherMind-KD-Student**
5
+ License: **MIT**
6
+
7
+ ---
8
+
9
+ # ๐Ÿ“˜ Overview
10
+ **AetherMind-KD-Student** is a 184M-parameter Natural Language Inference (NLI) model distilled from a DeBERTaโ€‘v3 teacher using a multi-stage, adversarial-aware knowledge distillation pipeline.
11
+ The model achieves a superior balance of:
12
+
13
+ - **Accuracy**
14
+ - **Robustness**
15
+ - **Zero-shot generalization**
16
+ - **Inference speed**
17
+
18
+ This makes it suitable for real-world reasoning systems, scientific text understanding, and future clinical NLI applications.
19
+
20
+ ---
21
+
22
+ # ๐Ÿง  Key Features
23
+
24
+ ### โœ” Knowledge Distillation from Large DeBERTa-v3 Teachers
25
+ - Soft targets (KLDivLoss) + hard labels (CrossEntropy)
26
+ - Balanced curriculum across SNLI โ†’ MNLI โ†’ ANLI (teacher-distribution guided)
27
+ - Temperature-scaled logits & entropy regularization
28
+
29
+ ### โœ” Strong Zero-Shot Reasoning
30
+ The model was **not trained** on RTE, HANS, SciTail, XNLI, FEVER, or MedNLI.
31
+ Despite this, it demonstrates strong transfer.
32
+
33
+ ### โœ” High Efficiency
34
+ - **184M parameters**
35
+ - **308.51 samples/second** on RTX 3050
36
+ - Suitable for deployment and real-time reasoning
37
+
38
+ ### โœ” Robust to Adversarial Attacks
39
+ - Strong results on ANLI & HANS
40
+ - Reduced reliance on syntactic heuristics
41
+
42
+ ---
43
+
44
+ # ๐Ÿ“š Training Datasets
45
+
46
+ ### โœ” Used in Training
47
+ | Dataset | Purpose |
48
+ |--------|----------|
49
+ | **SNLI** | Core NLI training |
50
+ | **MNLI** | Multi-domain generalization |
51
+ | **ANLI R1โ€“R3** | Adversarial robustness (teacher-guided) |
52
+
53
+ ### โœ” Not Used (Zero-Shot Only)
54
+ | Dataset | Type | Notes |
55
+ |--------|------|--------|
56
+ | **RTE (GLUE)** | Textual Entailment | Zero-shot evaluation |
57
+ | **HANS** | Syntactic Heuristics Test | Zero-shot |
58
+ | **SciTail** | Science QA โ†’ NLI | Converted from 3-class to binary |
59
+ | **XNLI English** | Cross-lingual NLI | Zero-shot |
60
+
61
+
62
+ ---
63
+
64
+ # ๐Ÿ— Model Architecture
65
+
66
+ ### **AetherMind-KD-Student Architecture (184M parameters)**
67
+ - 12-layer Transformer
68
+ - Hidden size: **768**
69
+ - Attention heads: **12**
70
+ - Classification head: 3-way NLI logits
71
+ - Enhanced contradiction representation (teacher-guided)
72
+ - Optimized for speed and robustness
73
+
74
+ ---
75
+
76
+ # ๐Ÿ”ฅ Knowledge Distillation Strategy
77
+
78
+ ### **KD Loss Composition**
79
+ - **70%** KLDivLoss (teacher soft targets)
80
+ - **30%** CrossEntropy (ground truth)
81
+ - Temperature **T = 3.0**
82
+
83
+ ### **Training Enhancements**
84
+ - BalancedBatchSampler (equal E/N/C per batch)
85
+ - Entropy sharpening for contradiction
86
+ - Adversarial signals from ANLI teacher
87
+ - Multi-stage training curriculum
88
+ - Gradient norm clipping & AdamW optimizer
89
+
90
+ ---
91
+
92
+ # ๐Ÿ“Š Full Evaluation Results
93
+
94
+ ## **1. Core NLI Benchmarks**
95
+
96
+ | Dataset | Accuracy | Macro-F1 |
97
+ |--------|----------|----------|
98
+ | **MNLI (matched)** | **90.47%** | **90.42%** |
99
+ | **MNLI (mismatched)** | **90.12%** | **90.07%** |
100
+ | **SNLI** | ~89% | ~89% |
101
+
102
+ ---
103
+
104
+ ## **2. Adversarial NLI (ANLI)**
105
+
106
+ | Dataset | Accuracy | Macro-F1 |
107
+ |--------|----------|-----------|
108
+ | **ANLI R1** | **73.60%** | **73.61%** |
109
+ | **ANLI R2** | **57.70%** | **57.60%** |
110
+ | **ANLI R3** | **53.67%** | **53.68%** |
111
+
112
+ ---
113
+
114
+ ## **3. Zero-Shot Generalization Results**
115
+
116
+ ### **RTE (GLUE)**
117
+ - Accuracy: **86.28%**
118
+ - Macro-F1: **86.20%**
119
+
120
+ ### **HANS**
121
+ - Accuracy: **77.74%**
122
+ - Macro-F1: **76.60%**
123
+
124
+ ### **SciTail (Binary)**
125
+ | Split | Accuracy | Macro-F1 |
126
+ |-------|----------|-----------|
127
+ | Train | **82.37%** | **80.99%** |
128
+ | Dev | **78.83%** | **78.81%** |
129
+
130
+ ### **XNLI (English, zero-shot)**
131
+ - Accuracy: **90.92%**
132
+ - Macro-F1: **90.94%**
133
+
134
+ ---
135
+
136
+ # โšก Efficiency Benchmark
137
+
138
+ | Metric | Result |
139
+ |--------|--------|
140
+ | Total Parameters | **184,424,451** |
141
+ | SPS (samples/sec) | **308.51** |
142
+ | Hardware | RTX 3050 (8GB), CUDA 11.8 |
143
+
144
+ ---
145
+
146
+ # ๐Ÿงช Intended Use
147
+
148
+ ### โœ” Suitable For:
149
+ - Reasoning engines
150
+ - Scientific text understanding
151
+ - Fact verification
152
+ - Zero-shot inference setups
153
+ - Downstream NLI applications
154
+
155
+ ### โœ– Not Suitable For:
156
+ - Safety-critical decisions without human oversight
157
+ - Clinical diagnosis (MedNLI not used in training)
158
+ - Multilingual inference (English-only training)
159
+
160
+ ---
161
+
162
+ # โš  Limitations
163
+ - ANLI R3 remains challenging (industry-wide issue)
164
+ - No multilingual fine-tuning
165
+ - Not optimized for long-context inference
166
+
167
+ ---
168
+
169
+ # ๐Ÿ”ฎ Future Work
170
+ - Adversarial fine-tuning for ANLI R3
171
+ - Cross-lingual training using XNLI full dataset
172
+ - Specialized domain adapters (e.g., MedNLI, BioNLI)
173
+ - Integration with AetherMind memory-based reasoning engine
174
+
175
+ ---
176
+
177
+ # ๐Ÿ“ฆ Files Included
178
+
179
+ - `config.json`
180
+ - `model.safetensors`
181
+ - `tokenizer.json`
182
+ - `tokenizer_config.json`
183
+ - `special_tokens_map.json`
184
+ - `spm.model`
185
+ - `added_tokens.json`
186
+ - `training_args.bin` *(optional)*
187
+ - `trainer_state.json` *(optional)*
188
+
189
+ ---
190
+
191
+ # ๐Ÿ“ฅ How to Use
192
+
193
+ ```python
194
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
195
+
196
+ model_name = "samerzaher80/AetherMind-KD-Student"
197
+
198
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
199
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
200
+
201
+ inputs = tokenizer("A cat sits on the mat.",
202
+ "An animal is sitting.",
203
+ return_tensors="pt")
204
+
205
+ outputs = model(**inputs)
206
+ print(outputs.logits)
207
+ ```
208
+
209
+ ---
210
+
211
+ # ๐Ÿ“œ Citation
212
+
213
+ ```
214
+ @misc{aethermind2025kdstudent,
215
+ title={AetherMind-KD-Student: A Robust and Efficient Knowledge-Distilled NLI Model},
216
+ author={Sameer S. Najm},
217
+ year={2025},
218
+ publisher={Hugging Face},
219
+ howpublished={\url{https://huggingface.co/samerzaher80/AetherMind-KD-Student}}
220
+ }
221
+ ```
222
+
223
+ ---
224
+
225
+ # ๐Ÿ‘ค Author
226
+ **Sameer S. Najm**
227
+ Sam IT Solutions โ€“ Iraq
228
+
229
+ ---
230
+
231
+ # ๐Ÿชช License
232
+ **MIT License**
233
+