Sarthak commited on
Commit
d8a91f9
Β·
1 Parent(s): 4afb5eb

refactor(report): reflect distillation experiments data in report

Browse files

This update reorders the models, adjusts average performance metrics, and revises recommendations based on the distillation experiments.

The aim is to provide an accurate and up-to-date reflection of our findings, guiding the audience towards the best models for their use cases.

Files changed (1) hide show
  1. REPORT.md +45 -38
REPORT.md CHANGED
@@ -6,9 +6,9 @@ This report presents a comprehensive analysis of Model2Vec distillation experime
6
 
7
  ### Evaluated Models Overview
8
 
9
- **Simplified Distillation Models:** 13
10
  **Peer Comparison Models:** 19
11
- **Total Models Analyzed:** 32
12
 
13
  ### Best Performing Simplified Model: code_model2vec_all_mpnet_base_v2
14
 
@@ -28,15 +28,16 @@ This report presents a comprehensive analysis of Model2Vec distillation experime
28
  | code_model2vec_all_MiniLM_L6_v2 | [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) | 0.7385 | 0.7049 | 0.7910 | πŸ₯ˆ 2nd |
29
  | code_model2vec_jina_embeddings_v2_base_code | [jina-embeddings-v2-base-code](https://huggingface.co/jina-embeddings-v2-base-code) | 0.7381 | 0.6996 | 0.8130 | πŸ₯‰ 3rd |
30
  | code_model2vec_paraphrase_MiniLM_L6_v2 | [sentence-transformers/paraphrase-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/paraphrase-MiniLM-L6-v2) | 0.7013 | 0.6638 | 0.7665 | #4 |
31
- | code_model2vec_Reason_ModernColBERT | [lightonai/Reason-ModernColBERT](https://huggingface.co/lightonai/Reason-ModernColBERT) | 0.6598 | 0.6228 | 0.7260 | #5 |
32
- | code_model2vec_bge_m3 | [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) | 0.4863 | 0.4439 | 0.5514 | #6 |
33
- | code_model2vec_jina_embeddings_v3 | [jinaai/jina-embeddings-v3](https://huggingface.co/jinaai/jina-embeddings-v3) | 0.4755 | 0.4416 | 0.5456 | #7 |
34
- | code_model2vec_nomic_embed_text_v2_moe | [nomic-ai/nomic-embed-text-v2-moe](https://huggingface.co/nomic-ai/nomic-embed-text-v2-moe) | 0.4532 | 0.4275 | 0.5094 | #8 |
35
- | code_model2vec_gte_Qwen2_1.5B_instruct | [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct) | 0.4238 | 0.3879 | 0.4719 | #9 |
36
- | code_model2vec_Qodo_Embed_1_1.5B | [Qodo/Qodo-Embed-1-1.5B](https://huggingface.co/Qodo/Qodo-Embed-1-1.5B) | 0.4101 | 0.3810 | 0.4532 | #10 |
37
- | code_model2vec_graphcodebert_base | [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base) | 0.3420 | 0.3140 | 0.3704 | #11 |
38
- | code_model2vec_Linq_Embed_Mistral | [Linq-AI-Research/Linq-Embed-Mistral](https://huggingface.co/Linq-AI-Research/Linq-Embed-Mistral) | 0.2868 | 0.2581 | 0.3412 | #12 |
39
- | code_model2vec_codebert_base | [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base) | 0.2779 | 0.2534 | 0.3136 | #13 |
 
40
 
41
 
42
  ### πŸ“Š Model Specifications Analysis
@@ -49,6 +50,7 @@ Our distilled models exhibit consistent architectural characteristics across dif
49
  | all_MiniLM_L6_v2 | 29,525 | 7.6M | 256 | 14.4MB |
50
  | jina_embeddings_v2_base_code | 61,053 | 15.6M | 256 | 29.8MB |
51
  | paraphrase_MiniLM_L6_v2 | 29,525 | 7.6M | 256 | 14.4MB |
 
52
  | Reason_ModernColBERT | 50,254 | 12.9M | 256 | 24.5MB |
53
  | bge_m3 | 249,999 | 64.0M | 256 | 122.1MB |
54
  | jina_embeddings_v3 | 249,999 | 64.0M | 256 | 122.1MB |
@@ -67,9 +69,9 @@ Our distilled models exhibit consistent architectural characteristics across dif
67
  #### Key Insights from Model Specifications:
68
 
69
 
70
- - **Vocabulary Consistency**: All models use vocabulary sizes ranging from 29,525 to 249,999 tokens (avg: 106,592)
71
- - **Parameter Efficiency**: Models range from 7.6M to 64.0M parameters (avg: 27.3M)
72
- - **Storage Efficiency**: Disk usage ranges from 14.4MB to 122.1MB (avg: 52.0MB)
73
  - **Embedding Dimensions**: Consistent 256 dimensions across all models (optimized for efficiency)
74
 
75
 
@@ -79,7 +81,7 @@ Our distilled models exhibit consistent architectural characteristics across dif
79
  - **Best Teacher Model**: code_model2vec_all_mpnet_base_v2 (NDCG@10: 0.7387)
80
  - **Least Effective Teacher**: code_model2vec_codebert_base (NDCG@10: 0.2779)
81
  - **Performance Range**: 62.4% difference between best and worst
82
- - **Average Performance**: 0.5178 NDCG@10
83
 
84
 
85
  ## 🎯 Language Performance Radar Charts
@@ -108,6 +110,10 @@ Our distilled models exhibit consistent architectural characteristics across dif
108
 
109
  ![code_model2vec_paraphrase_MiniLM_L6_v2 Radar Chart](analysis_charts/radar_code_model2vec_paraphrase_MiniLM_L6_v2.png)
110
 
 
 
 
 
111
  #### code_model2vec_Reason_ModernColBERT (Teacher: [lightonai/Reason-ModernColBERT](https://huggingface.co/lightonai/Reason-ModernColBERT)) - NDCG@10: 0.6598
112
 
113
  ![code_model2vec_Reason_ModernColBERT Radar Chart](analysis_charts/radar_code_model2vec_Reason_ModernColBERT.png)
@@ -174,20 +180,21 @@ Our distilled models exhibit consistent architectural characteristics across dif
174
  | 16 | code_model2vec_all_MiniLM_L6_v2 | **πŸ”₯ Simplified Distillation** | 0.7385 | 0.7049 | 0.7910 |
175
  | 17 | code_model2vec_jina_embeddings_v2_base_code | **πŸ”₯ Simplified Distillation** | 0.7381 | 0.6996 | 0.8130 |
176
  | 18 | code_model2vec_paraphrase_MiniLM_L6_v2 | **πŸ”₯ Simplified Distillation** | 0.7013 | 0.6638 | 0.7665 |
177
- | 19 | code_model2vec_Reason_ModernColBERT | **πŸ”₯ Simplified Distillation** | 0.6598 | 0.6228 | 0.7260 |
178
- | 20 | potion-multilingual-128M | Model2Vec | 0.6124 | 0.5683 | 0.7017 |
179
- | 21 | huggingface/CodeBERTa-small-v1 | Code-Specific | 0.5903 | 0.5350 | 0.6779 |
180
- | 22 | Salesforce/codet5-base | Code-Specific | 0.4872 | 0.4500 | 0.5742 |
181
- | 23 | code_model2vec_bge_m3 | **πŸ”₯ Simplified Distillation** | 0.4863 | 0.4439 | 0.5514 |
182
- | 24 | code_model2vec_jina_embeddings_v3 | **πŸ”₯ Simplified Distillation** | 0.4755 | 0.4416 | 0.5456 |
183
- | 25 | code_model2vec_nomic_embed_text_v2_moe | **πŸ”₯ Simplified Distillation** | 0.4532 | 0.4275 | 0.5094 |
184
- | 26 | code_model2vec_gte_Qwen2_1.5B_instruct | **πŸ”₯ Simplified Distillation** | 0.4238 | 0.3879 | 0.4719 |
185
- | 27 | code_model2vec_Qodo_Embed_1_1.5B | **πŸ”₯ Simplified Distillation** | 0.4101 | 0.3810 | 0.4532 |
186
- | 28 | microsoft/graphcodebert-base | Code-Specific | 0.4039 | 0.3677 | 0.4650 |
187
- | 29 | code_model2vec_graphcodebert_base | **πŸ”₯ Simplified Distillation** | 0.3420 | 0.3140 | 0.3704 |
188
- | 30 | code_model2vec_Linq_Embed_Mistral | **πŸ”₯ Simplified Distillation** | 0.2868 | 0.2581 | 0.3412 |
189
- | 31 | code_model2vec_codebert_base | **πŸ”₯ Simplified Distillation** | 0.2779 | 0.2534 | 0.3136 |
190
- | 32 | microsoft/codebert-base | Code-Specific | 0.1051 | 0.1058 | 0.1105 |
 
191
 
192
 
193
  ## πŸ“ˆ Performance Analysis
@@ -236,12 +243,12 @@ Our distilled models exhibit consistent architectural characteristics across dif
236
 
237
  | Language | Best Model Performance | Average Performance | Language Difficulty |
238
  |----------|------------------------|--------------------|--------------------|
239
- | Go | 0.9780 | 0.6950 | Easy |
240
- | Java | 0.9921 | 0.6670 | Easy |
241
- | Javascript | 0.9550 | 0.5847 | Easy |
242
- | Php | 1.0000 | 0.6379 | Easy |
243
- | Python | 1.0000 | 0.8604 | Easy |
244
- | Ruby | 0.9493 | 0.6372 | Easy |
245
 
246
 
247
  ## 🎯 Conclusions and Recommendations
@@ -251,13 +258,13 @@ Our distilled models exhibit consistent architectural characteristics across dif
251
  Based on the evaluation results across all simplified distillation models:
252
 
253
 
254
- 1. **Best Teacher Model**: sentence-transformers/all-mpnet-base-v2 (NDCG@10: 0.7387)
255
  2. **Least Effective Teacher**: microsoft/codebert-base (NDCG@10: 0.2779)
256
  3. **Teacher Model Impact**: Choice of teacher model affects performance by 62.4%
257
 
258
  ### Recommendations
259
 
260
- - **For Production**: Use sentence-transformers/all-mpnet-base-v2 as teacher model for best performance
261
  - **For Efficiency**: Model2Vec distillation provides significant size reduction with competitive performance
262
  - **For Code Tasks**: Specialized models consistently outperform general-purpose models
263
 
@@ -295,5 +302,5 @@ Based on the evaluation results across all simplified distillation models:
295
 
296
  ---
297
 
298
- *Report generated on 2025-05-31 11:39:39 using automated analysis pipeline.*
299
  *For questions about methodology or results, please refer to the CodeSearchNet documentation.*
 
6
 
7
  ### Evaluated Models Overview
8
 
9
+ **Simplified Distillation Models:** 14
10
  **Peer Comparison Models:** 19
11
+ **Total Models Analyzed:** 33
12
 
13
  ### Best Performing Simplified Model: code_model2vec_all_mpnet_base_v2
14
 
 
28
  | code_model2vec_all_MiniLM_L6_v2 | [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) | 0.7385 | 0.7049 | 0.7910 | πŸ₯ˆ 2nd |
29
  | code_model2vec_jina_embeddings_v2_base_code | [jina-embeddings-v2-base-code](https://huggingface.co/jina-embeddings-v2-base-code) | 0.7381 | 0.6996 | 0.8130 | πŸ₯‰ 3rd |
30
  | code_model2vec_paraphrase_MiniLM_L6_v2 | [sentence-transformers/paraphrase-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/paraphrase-MiniLM-L6-v2) | 0.7013 | 0.6638 | 0.7665 | #4 |
31
+ | code_model2vec_all_mpnet_base_v2_fine_tuned | [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) | 0.6906 | 0.6372 | 0.7917 | #5 |
32
+ | code_model2vec_Reason_ModernColBERT | [lightonai/Reason-ModernColBERT](https://huggingface.co/lightonai/Reason-ModernColBERT) | 0.6598 | 0.6228 | 0.7260 | #6 |
33
+ | code_model2vec_bge_m3 | [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) | 0.4863 | 0.4439 | 0.5514 | #7 |
34
+ | code_model2vec_jina_embeddings_v3 | [jinaai/jina-embeddings-v3](https://huggingface.co/jinaai/jina-embeddings-v3) | 0.4755 | 0.4416 | 0.5456 | #8 |
35
+ | code_model2vec_nomic_embed_text_v2_moe | [nomic-ai/nomic-embed-text-v2-moe](https://huggingface.co/nomic-ai/nomic-embed-text-v2-moe) | 0.4532 | 0.4275 | 0.5094 | #9 |
36
+ | code_model2vec_gte_Qwen2_1.5B_instruct | [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct) | 0.4238 | 0.3879 | 0.4719 | #10 |
37
+ | code_model2vec_Qodo_Embed_1_1.5B | [Qodo/Qodo-Embed-1-1.5B](https://huggingface.co/Qodo/Qodo-Embed-1-1.5B) | 0.4101 | 0.3810 | 0.4532 | #11 |
38
+ | code_model2vec_graphcodebert_base | [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base) | 0.3420 | 0.3140 | 0.3704 | #12 |
39
+ | code_model2vec_Linq_Embed_Mistral | [Linq-AI-Research/Linq-Embed-Mistral](https://huggingface.co/Linq-AI-Research/Linq-Embed-Mistral) | 0.2868 | 0.2581 | 0.3412 | #13 |
40
+ | code_model2vec_codebert_base | [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base) | 0.2779 | 0.2534 | 0.3136 | #14 |
41
 
42
 
43
  ### πŸ“Š Model Specifications Analysis
 
50
  | all_MiniLM_L6_v2 | 29,525 | 7.6M | 256 | 14.4MB |
51
  | jina_embeddings_v2_base_code | 61,053 | 15.6M | 256 | 29.8MB |
52
  | paraphrase_MiniLM_L6_v2 | 29,525 | 7.6M | 256 | 14.4MB |
53
+ | all_mpnet_base_v2_fine_tuned | 77,316 | 19.8M | 256 | 75.5MB |
54
  | Reason_ModernColBERT | 50,254 | 12.9M | 256 | 24.5MB |
55
  | bge_m3 | 249,999 | 64.0M | 256 | 122.1MB |
56
  | jina_embeddings_v3 | 249,999 | 64.0M | 256 | 122.1MB |
 
69
  #### Key Insights from Model Specifications:
70
 
71
 
72
+ - **Vocabulary Consistency**: All models use vocabulary sizes ranging from 29,525 to 249,999 tokens (avg: 104,501)
73
+ - **Parameter Efficiency**: Models range from 7.6M to 64.0M parameters (avg: 26.8M)
74
+ - **Storage Efficiency**: Disk usage ranges from 14.4MB to 122.1MB (avg: 53.7MB)
75
  - **Embedding Dimensions**: Consistent 256 dimensions across all models (optimized for efficiency)
76
 
77
 
 
81
  - **Best Teacher Model**: code_model2vec_all_mpnet_base_v2 (NDCG@10: 0.7387)
82
  - **Least Effective Teacher**: code_model2vec_codebert_base (NDCG@10: 0.2779)
83
  - **Performance Range**: 62.4% difference between best and worst
84
+ - **Average Performance**: 0.5302 NDCG@10
85
 
86
 
87
  ## 🎯 Language Performance Radar Charts
 
110
 
111
  ![code_model2vec_paraphrase_MiniLM_L6_v2 Radar Chart](analysis_charts/radar_code_model2vec_paraphrase_MiniLM_L6_v2.png)
112
 
113
+ #### code_model2vec_all_mpnet_base_v2_fine_tuned (Teacher: [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)) - NDCG@10: 0.6906
114
+
115
+ ![code_model2vec_all_mpnet_base_v2_fine_tuned Radar Chart](analysis_charts/radar_code_model2vec_all_mpnet_base_v2_fine_tuned.png)
116
+
117
  #### code_model2vec_Reason_ModernColBERT (Teacher: [lightonai/Reason-ModernColBERT](https://huggingface.co/lightonai/Reason-ModernColBERT)) - NDCG@10: 0.6598
118
 
119
  ![code_model2vec_Reason_ModernColBERT Radar Chart](analysis_charts/radar_code_model2vec_Reason_ModernColBERT.png)
 
180
  | 16 | code_model2vec_all_MiniLM_L6_v2 | **πŸ”₯ Simplified Distillation** | 0.7385 | 0.7049 | 0.7910 |
181
  | 17 | code_model2vec_jina_embeddings_v2_base_code | **πŸ”₯ Simplified Distillation** | 0.7381 | 0.6996 | 0.8130 |
182
  | 18 | code_model2vec_paraphrase_MiniLM_L6_v2 | **πŸ”₯ Simplified Distillation** | 0.7013 | 0.6638 | 0.7665 |
183
+ | 19 | code_model2vec_all_mpnet_base_v2_fine_tuned | **πŸŽ“ Fine-tuned Distillation** | 0.6906 | 0.6372 | 0.7917 |
184
+ | 20 | code_model2vec_Reason_ModernColBERT | **πŸ”₯ Simplified Distillation** | 0.6598 | 0.6228 | 0.7260 |
185
+ | 21 | potion-multilingual-128M | Model2Vec | 0.6124 | 0.5683 | 0.7017 |
186
+ | 22 | huggingface/CodeBERTa-small-v1 | Code-Specific | 0.5903 | 0.5350 | 0.6779 |
187
+ | 23 | Salesforce/codet5-base | Code-Specific | 0.4872 | 0.4500 | 0.5742 |
188
+ | 24 | code_model2vec_bge_m3 | **πŸ”₯ Simplified Distillation** | 0.4863 | 0.4439 | 0.5514 |
189
+ | 25 | code_model2vec_jina_embeddings_v3 | **πŸ”₯ Simplified Distillation** | 0.4755 | 0.4416 | 0.5456 |
190
+ | 26 | code_model2vec_nomic_embed_text_v2_moe | **πŸ”₯ Simplified Distillation** | 0.4532 | 0.4275 | 0.5094 |
191
+ | 27 | code_model2vec_gte_Qwen2_1.5B_instruct | **πŸ”₯ Simplified Distillation** | 0.4238 | 0.3879 | 0.4719 |
192
+ | 28 | code_model2vec_Qodo_Embed_1_1.5B | **πŸ”₯ Simplified Distillation** | 0.4101 | 0.3810 | 0.4532 |
193
+ | 29 | microsoft/graphcodebert-base | Code-Specific | 0.4039 | 0.3677 | 0.4650 |
194
+ | 30 | code_model2vec_graphcodebert_base | **πŸ”₯ Simplified Distillation** | 0.3420 | 0.3140 | 0.3704 |
195
+ | 31 | code_model2vec_Linq_Embed_Mistral | **πŸ”₯ Simplified Distillation** | 0.2868 | 0.2581 | 0.3412 |
196
+ | 32 | code_model2vec_codebert_base | **πŸ”₯ Simplified Distillation** | 0.2779 | 0.2534 | 0.3136 |
197
+ | 33 | microsoft/codebert-base | Code-Specific | 0.1051 | 0.1058 | 0.1105 |
198
 
199
 
200
  ## πŸ“ˆ Performance Analysis
 
243
 
244
  | Language | Best Model Performance | Average Performance | Language Difficulty |
245
  |----------|------------------------|--------------------|--------------------|
246
+ | Go | 0.9780 | 0.6978 | Easy |
247
+ | Java | 0.9921 | 0.6618 | Easy |
248
+ | Javascript | 0.9550 | 0.5877 | Easy |
249
+ | Php | 1.0000 | 0.6355 | Easy |
250
+ | Python | 1.0000 | 0.8615 | Easy |
251
+ | Ruby | 0.9493 | 0.6398 | Easy |
252
 
253
 
254
  ## 🎯 Conclusions and Recommendations
 
258
  Based on the evaluation results across all simplified distillation models:
259
 
260
 
261
+ 1. **Best Teacher Model**: sentence-transformers/all-MiniLM-L6-v2 (NDCG@10: 0.7385)
262
  2. **Least Effective Teacher**: microsoft/codebert-base (NDCG@10: 0.2779)
263
  3. **Teacher Model Impact**: Choice of teacher model affects performance by 62.4%
264
 
265
  ### Recommendations
266
 
267
+ - **For Production**: Use sentence-transformers/all-MiniLM-L6-v2 as teacher model for best performance
268
  - **For Efficiency**: Model2Vec distillation provides significant size reduction with competitive performance
269
  - **For Code Tasks**: Specialized models consistently outperform general-purpose models
270
 
 
302
 
303
  ---
304
 
305
+ *Report generated on 2025-05-31 16:36:16 using automated analysis pipeline.*
306
  *For questions about methodology or results, please refer to the CodeSearchNet documentation.*