Ecommerce Embedding Model Benchmarks

This Space contains benchmark results conducted as part of the release of our ecommerce embedding models: Marqo-Ecommerce-L and Marqo-Ecommerce-B.

Our benchmarking process was divided into two distinct regimes, each using different datasets of ecommerce product listings: marqo-ecommerce-hard and marqo-ecommerce-easy. Both datasets contained product images and text and only differed in size. The "easy" dataset is approximately 10-30 times smaller (200k vs 4M products), and designed to accommodate rate-limited models, specifically Cohere-Embeddings-v3 and GCP-Vertex (with limits of 0.66 rps and 2 rps respectively). The "hard" dataset represents the true challenge, since it contains four million ecommerce product listings and is more representative of real-world ecommerce search scenarios.

Within both these scenarios, the models were benchmarked against three different tasks:

  • Google Shopping Text-to-Image
  • Google Shopping Category-to-Image
  • Amazon Products Text-to-Image

As part of this launch, we also released two evaluation datasets: Marqo/google-shopping-general-eval and Marqo/amazon-products-eval.

For more information on these models, benchmark results, and how you can run these evaluations yourself, visit our blog post.

Marqo-Ecommerce-Hard

Google Shopping Text to Image 1m

Embedding Model
mAP
R@10
MRR
nDCG@10
0.682
0.878
0.683
0.726

Google Shopping Category to Image 1m

Embedding Model
mAP
P@10
MRR
nDCG@10
0.463
0.652
0.822
0.666

Amazon Products Text to Image 3m

Embedding Model
mAP
R@10
MRR
nDCG@10
0.658
0.854
0.663
0.703

Marqo-Ecommerce-Easy

Google Shopping Text to Image

Embedding Model
mAP
R@10
MRR
nDCG@10
0.879
0.971
0.879
0.901

Google Shopping Category to Image

Embedding Model
mAP
P@10
MRR
nDCG@10
0.515
0.358
0.764
0.558

Amazon Products Text to Image

Embedding Model
mAP
R@10
MRR
nDCG@10
0.928
0.978
0.928
0.914