Sarthak
commited on
Commit
Β·
d8a91f9
1
Parent(s):
4afb5eb
refactor(report): reflect distillation experiments data in report
Browse filesThis update reorders the models, adjusts average performance metrics, and revises recommendations based on the distillation experiments.
The aim is to provide an accurate and up-to-date reflection of our findings, guiding the audience towards the best models for their use cases.
REPORT.md
CHANGED
|
@@ -6,9 +6,9 @@ This report presents a comprehensive analysis of Model2Vec distillation experime
|
|
| 6 |
|
| 7 |
### Evaluated Models Overview
|
| 8 |
|
| 9 |
-
**Simplified Distillation Models:**
|
| 10 |
**Peer Comparison Models:** 19
|
| 11 |
-
**Total Models Analyzed:**
|
| 12 |
|
| 13 |
### Best Performing Simplified Model: code_model2vec_all_mpnet_base_v2
|
| 14 |
|
|
@@ -28,15 +28,16 @@ This report presents a comprehensive analysis of Model2Vec distillation experime
|
|
| 28 |
| code_model2vec_all_MiniLM_L6_v2 | [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) | 0.7385 | 0.7049 | 0.7910 | π₯ 2nd |
|
| 29 |
| code_model2vec_jina_embeddings_v2_base_code | [jina-embeddings-v2-base-code](https://huggingface.co/jina-embeddings-v2-base-code) | 0.7381 | 0.6996 | 0.8130 | π₯ 3rd |
|
| 30 |
| code_model2vec_paraphrase_MiniLM_L6_v2 | [sentence-transformers/paraphrase-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/paraphrase-MiniLM-L6-v2) | 0.7013 | 0.6638 | 0.7665 | #4 |
|
| 31 |
-
|
|
| 32 |
-
|
|
| 33 |
-
|
|
| 34 |
-
|
|
| 35 |
-
|
|
| 36 |
-
|
|
| 37 |
-
|
|
| 38 |
-
|
|
| 39 |
-
|
|
|
|
|
| 40 |
|
| 41 |
|
| 42 |
### π Model Specifications Analysis
|
|
@@ -49,6 +50,7 @@ Our distilled models exhibit consistent architectural characteristics across dif
|
|
| 49 |
| all_MiniLM_L6_v2 | 29,525 | 7.6M | 256 | 14.4MB |
|
| 50 |
| jina_embeddings_v2_base_code | 61,053 | 15.6M | 256 | 29.8MB |
|
| 51 |
| paraphrase_MiniLM_L6_v2 | 29,525 | 7.6M | 256 | 14.4MB |
|
|
|
|
| 52 |
| Reason_ModernColBERT | 50,254 | 12.9M | 256 | 24.5MB |
|
| 53 |
| bge_m3 | 249,999 | 64.0M | 256 | 122.1MB |
|
| 54 |
| jina_embeddings_v3 | 249,999 | 64.0M | 256 | 122.1MB |
|
|
@@ -67,9 +69,9 @@ Our distilled models exhibit consistent architectural characteristics across dif
|
|
| 67 |
#### Key Insights from Model Specifications:
|
| 68 |
|
| 69 |
|
| 70 |
-
- **Vocabulary Consistency**: All models use vocabulary sizes ranging from 29,525 to 249,999 tokens (avg:
|
| 71 |
-
- **Parameter Efficiency**: Models range from 7.6M to 64.0M parameters (avg:
|
| 72 |
-
- **Storage Efficiency**: Disk usage ranges from 14.4MB to 122.1MB (avg:
|
| 73 |
- **Embedding Dimensions**: Consistent 256 dimensions across all models (optimized for efficiency)
|
| 74 |
|
| 75 |
|
|
@@ -79,7 +81,7 @@ Our distilled models exhibit consistent architectural characteristics across dif
|
|
| 79 |
- **Best Teacher Model**: code_model2vec_all_mpnet_base_v2 (NDCG@10: 0.7387)
|
| 80 |
- **Least Effective Teacher**: code_model2vec_codebert_base (NDCG@10: 0.2779)
|
| 81 |
- **Performance Range**: 62.4% difference between best and worst
|
| 82 |
-
- **Average Performance**: 0.
|
| 83 |
|
| 84 |
|
| 85 |
## π― Language Performance Radar Charts
|
|
@@ -108,6 +110,10 @@ Our distilled models exhibit consistent architectural characteristics across dif
|
|
| 108 |
|
| 109 |

|
| 110 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 111 |
#### code_model2vec_Reason_ModernColBERT (Teacher: [lightonai/Reason-ModernColBERT](https://huggingface.co/lightonai/Reason-ModernColBERT)) - NDCG@10: 0.6598
|
| 112 |
|
| 113 |

|
|
@@ -174,20 +180,21 @@ Our distilled models exhibit consistent architectural characteristics across dif
|
|
| 174 |
| 16 | code_model2vec_all_MiniLM_L6_v2 | **π₯ Simplified Distillation** | 0.7385 | 0.7049 | 0.7910 |
|
| 175 |
| 17 | code_model2vec_jina_embeddings_v2_base_code | **π₯ Simplified Distillation** | 0.7381 | 0.6996 | 0.8130 |
|
| 176 |
| 18 | code_model2vec_paraphrase_MiniLM_L6_v2 | **π₯ Simplified Distillation** | 0.7013 | 0.6638 | 0.7665 |
|
| 177 |
-
| 19 |
|
| 178 |
-
| 20 |
|
| 179 |
-
| 21 |
|
| 180 |
-
| 22 |
|
| 181 |
-
| 23 |
|
| 182 |
-
| 24 |
|
| 183 |
-
| 25 |
|
| 184 |
-
| 26 |
|
| 185 |
-
| 27 |
|
| 186 |
-
| 28 |
|
| 187 |
-
| 29 |
|
| 188 |
-
| 30 |
|
| 189 |
-
| 31 |
|
| 190 |
-
| 32 |
|
|
|
|
| 191 |
|
| 192 |
|
| 193 |
## π Performance Analysis
|
|
@@ -236,12 +243,12 @@ Our distilled models exhibit consistent architectural characteristics across dif
|
|
| 236 |
|
| 237 |
| Language | Best Model Performance | Average Performance | Language Difficulty |
|
| 238 |
|----------|------------------------|--------------------|--------------------|
|
| 239 |
-
| Go | 0.9780 | 0.
|
| 240 |
-
| Java | 0.9921 | 0.
|
| 241 |
-
| Javascript | 0.9550 | 0.
|
| 242 |
-
| Php | 1.0000 | 0.
|
| 243 |
-
| Python | 1.0000 | 0.
|
| 244 |
-
| Ruby | 0.9493 | 0.
|
| 245 |
|
| 246 |
|
| 247 |
## π― Conclusions and Recommendations
|
|
@@ -251,13 +258,13 @@ Our distilled models exhibit consistent architectural characteristics across dif
|
|
| 251 |
Based on the evaluation results across all simplified distillation models:
|
| 252 |
|
| 253 |
|
| 254 |
-
1. **Best Teacher Model**: sentence-transformers/all-
|
| 255 |
2. **Least Effective Teacher**: microsoft/codebert-base (NDCG@10: 0.2779)
|
| 256 |
3. **Teacher Model Impact**: Choice of teacher model affects performance by 62.4%
|
| 257 |
|
| 258 |
### Recommendations
|
| 259 |
|
| 260 |
-
- **For Production**: Use sentence-transformers/all-
|
| 261 |
- **For Efficiency**: Model2Vec distillation provides significant size reduction with competitive performance
|
| 262 |
- **For Code Tasks**: Specialized models consistently outperform general-purpose models
|
| 263 |
|
|
@@ -295,5 +302,5 @@ Based on the evaluation results across all simplified distillation models:
|
|
| 295 |
|
| 296 |
---
|
| 297 |
|
| 298 |
-
*Report generated on 2025-05-31
|
| 299 |
*For questions about methodology or results, please refer to the CodeSearchNet documentation.*
|
|
|
|
| 6 |
|
| 7 |
### Evaluated Models Overview
|
| 8 |
|
| 9 |
+
**Simplified Distillation Models:** 14
|
| 10 |
**Peer Comparison Models:** 19
|
| 11 |
+
**Total Models Analyzed:** 33
|
| 12 |
|
| 13 |
### Best Performing Simplified Model: code_model2vec_all_mpnet_base_v2
|
| 14 |
|
|
|
|
| 28 |
| code_model2vec_all_MiniLM_L6_v2 | [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) | 0.7385 | 0.7049 | 0.7910 | π₯ 2nd |
|
| 29 |
| code_model2vec_jina_embeddings_v2_base_code | [jina-embeddings-v2-base-code](https://huggingface.co/jina-embeddings-v2-base-code) | 0.7381 | 0.6996 | 0.8130 | π₯ 3rd |
|
| 30 |
| code_model2vec_paraphrase_MiniLM_L6_v2 | [sentence-transformers/paraphrase-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/paraphrase-MiniLM-L6-v2) | 0.7013 | 0.6638 | 0.7665 | #4 |
|
| 31 |
+
| code_model2vec_all_mpnet_base_v2_fine_tuned | [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) | 0.6906 | 0.6372 | 0.7917 | #5 |
|
| 32 |
+
| code_model2vec_Reason_ModernColBERT | [lightonai/Reason-ModernColBERT](https://huggingface.co/lightonai/Reason-ModernColBERT) | 0.6598 | 0.6228 | 0.7260 | #6 |
|
| 33 |
+
| code_model2vec_bge_m3 | [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) | 0.4863 | 0.4439 | 0.5514 | #7 |
|
| 34 |
+
| code_model2vec_jina_embeddings_v3 | [jinaai/jina-embeddings-v3](https://huggingface.co/jinaai/jina-embeddings-v3) | 0.4755 | 0.4416 | 0.5456 | #8 |
|
| 35 |
+
| code_model2vec_nomic_embed_text_v2_moe | [nomic-ai/nomic-embed-text-v2-moe](https://huggingface.co/nomic-ai/nomic-embed-text-v2-moe) | 0.4532 | 0.4275 | 0.5094 | #9 |
|
| 36 |
+
| code_model2vec_gte_Qwen2_1.5B_instruct | [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct) | 0.4238 | 0.3879 | 0.4719 | #10 |
|
| 37 |
+
| code_model2vec_Qodo_Embed_1_1.5B | [Qodo/Qodo-Embed-1-1.5B](https://huggingface.co/Qodo/Qodo-Embed-1-1.5B) | 0.4101 | 0.3810 | 0.4532 | #11 |
|
| 38 |
+
| code_model2vec_graphcodebert_base | [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base) | 0.3420 | 0.3140 | 0.3704 | #12 |
|
| 39 |
+
| code_model2vec_Linq_Embed_Mistral | [Linq-AI-Research/Linq-Embed-Mistral](https://huggingface.co/Linq-AI-Research/Linq-Embed-Mistral) | 0.2868 | 0.2581 | 0.3412 | #13 |
|
| 40 |
+
| code_model2vec_codebert_base | [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base) | 0.2779 | 0.2534 | 0.3136 | #14 |
|
| 41 |
|
| 42 |
|
| 43 |
### π Model Specifications Analysis
|
|
|
|
| 50 |
| all_MiniLM_L6_v2 | 29,525 | 7.6M | 256 | 14.4MB |
|
| 51 |
| jina_embeddings_v2_base_code | 61,053 | 15.6M | 256 | 29.8MB |
|
| 52 |
| paraphrase_MiniLM_L6_v2 | 29,525 | 7.6M | 256 | 14.4MB |
|
| 53 |
+
| all_mpnet_base_v2_fine_tuned | 77,316 | 19.8M | 256 | 75.5MB |
|
| 54 |
| Reason_ModernColBERT | 50,254 | 12.9M | 256 | 24.5MB |
|
| 55 |
| bge_m3 | 249,999 | 64.0M | 256 | 122.1MB |
|
| 56 |
| jina_embeddings_v3 | 249,999 | 64.0M | 256 | 122.1MB |
|
|
|
|
| 69 |
#### Key Insights from Model Specifications:
|
| 70 |
|
| 71 |
|
| 72 |
+
- **Vocabulary Consistency**: All models use vocabulary sizes ranging from 29,525 to 249,999 tokens (avg: 104,501)
|
| 73 |
+
- **Parameter Efficiency**: Models range from 7.6M to 64.0M parameters (avg: 26.8M)
|
| 74 |
+
- **Storage Efficiency**: Disk usage ranges from 14.4MB to 122.1MB (avg: 53.7MB)
|
| 75 |
- **Embedding Dimensions**: Consistent 256 dimensions across all models (optimized for efficiency)
|
| 76 |
|
| 77 |
|
|
|
|
| 81 |
- **Best Teacher Model**: code_model2vec_all_mpnet_base_v2 (NDCG@10: 0.7387)
|
| 82 |
- **Least Effective Teacher**: code_model2vec_codebert_base (NDCG@10: 0.2779)
|
| 83 |
- **Performance Range**: 62.4% difference between best and worst
|
| 84 |
+
- **Average Performance**: 0.5302 NDCG@10
|
| 85 |
|
| 86 |
|
| 87 |
## π― Language Performance Radar Charts
|
|
|
|
| 110 |
|
| 111 |

|
| 112 |
|
| 113 |
+
#### code_model2vec_all_mpnet_base_v2_fine_tuned (Teacher: [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)) - NDCG@10: 0.6906
|
| 114 |
+
|
| 115 |
+

|
| 116 |
+
|
| 117 |
#### code_model2vec_Reason_ModernColBERT (Teacher: [lightonai/Reason-ModernColBERT](https://huggingface.co/lightonai/Reason-ModernColBERT)) - NDCG@10: 0.6598
|
| 118 |
|
| 119 |

|
|
|
|
| 180 |
| 16 | code_model2vec_all_MiniLM_L6_v2 | **π₯ Simplified Distillation** | 0.7385 | 0.7049 | 0.7910 |
|
| 181 |
| 17 | code_model2vec_jina_embeddings_v2_base_code | **π₯ Simplified Distillation** | 0.7381 | 0.6996 | 0.8130 |
|
| 182 |
| 18 | code_model2vec_paraphrase_MiniLM_L6_v2 | **π₯ Simplified Distillation** | 0.7013 | 0.6638 | 0.7665 |
|
| 183 |
+
| 19 | code_model2vec_all_mpnet_base_v2_fine_tuned | **π Fine-tuned Distillation** | 0.6906 | 0.6372 | 0.7917 |
|
| 184 |
+
| 20 | code_model2vec_Reason_ModernColBERT | **π₯ Simplified Distillation** | 0.6598 | 0.6228 | 0.7260 |
|
| 185 |
+
| 21 | potion-multilingual-128M | Model2Vec | 0.6124 | 0.5683 | 0.7017 |
|
| 186 |
+
| 22 | huggingface/CodeBERTa-small-v1 | Code-Specific | 0.5903 | 0.5350 | 0.6779 |
|
| 187 |
+
| 23 | Salesforce/codet5-base | Code-Specific | 0.4872 | 0.4500 | 0.5742 |
|
| 188 |
+
| 24 | code_model2vec_bge_m3 | **π₯ Simplified Distillation** | 0.4863 | 0.4439 | 0.5514 |
|
| 189 |
+
| 25 | code_model2vec_jina_embeddings_v3 | **π₯ Simplified Distillation** | 0.4755 | 0.4416 | 0.5456 |
|
| 190 |
+
| 26 | code_model2vec_nomic_embed_text_v2_moe | **π₯ Simplified Distillation** | 0.4532 | 0.4275 | 0.5094 |
|
| 191 |
+
| 27 | code_model2vec_gte_Qwen2_1.5B_instruct | **π₯ Simplified Distillation** | 0.4238 | 0.3879 | 0.4719 |
|
| 192 |
+
| 28 | code_model2vec_Qodo_Embed_1_1.5B | **π₯ Simplified Distillation** | 0.4101 | 0.3810 | 0.4532 |
|
| 193 |
+
| 29 | microsoft/graphcodebert-base | Code-Specific | 0.4039 | 0.3677 | 0.4650 |
|
| 194 |
+
| 30 | code_model2vec_graphcodebert_base | **π₯ Simplified Distillation** | 0.3420 | 0.3140 | 0.3704 |
|
| 195 |
+
| 31 | code_model2vec_Linq_Embed_Mistral | **π₯ Simplified Distillation** | 0.2868 | 0.2581 | 0.3412 |
|
| 196 |
+
| 32 | code_model2vec_codebert_base | **π₯ Simplified Distillation** | 0.2779 | 0.2534 | 0.3136 |
|
| 197 |
+
| 33 | microsoft/codebert-base | Code-Specific | 0.1051 | 0.1058 | 0.1105 |
|
| 198 |
|
| 199 |
|
| 200 |
## π Performance Analysis
|
|
|
|
| 243 |
|
| 244 |
| Language | Best Model Performance | Average Performance | Language Difficulty |
|
| 245 |
|----------|------------------------|--------------------|--------------------|
|
| 246 |
+
| Go | 0.9780 | 0.6978 | Easy |
|
| 247 |
+
| Java | 0.9921 | 0.6618 | Easy |
|
| 248 |
+
| Javascript | 0.9550 | 0.5877 | Easy |
|
| 249 |
+
| Php | 1.0000 | 0.6355 | Easy |
|
| 250 |
+
| Python | 1.0000 | 0.8615 | Easy |
|
| 251 |
+
| Ruby | 0.9493 | 0.6398 | Easy |
|
| 252 |
|
| 253 |
|
| 254 |
## π― Conclusions and Recommendations
|
|
|
|
| 258 |
Based on the evaluation results across all simplified distillation models:
|
| 259 |
|
| 260 |
|
| 261 |
+
1. **Best Teacher Model**: sentence-transformers/all-MiniLM-L6-v2 (NDCG@10: 0.7385)
|
| 262 |
2. **Least Effective Teacher**: microsoft/codebert-base (NDCG@10: 0.2779)
|
| 263 |
3. **Teacher Model Impact**: Choice of teacher model affects performance by 62.4%
|
| 264 |
|
| 265 |
### Recommendations
|
| 266 |
|
| 267 |
+
- **For Production**: Use sentence-transformers/all-MiniLM-L6-v2 as teacher model for best performance
|
| 268 |
- **For Efficiency**: Model2Vec distillation provides significant size reduction with competitive performance
|
| 269 |
- **For Code Tasks**: Specialized models consistently outperform general-purpose models
|
| 270 |
|
|
|
|
| 302 |
|
| 303 |
---
|
| 304 |
|
| 305 |
+
*Report generated on 2025-05-31 16:36:16 using automated analysis pipeline.*
|
| 306 |
*For questions about methodology or results, please refer to the CodeSearchNet documentation.*
|