πŸ‰ DRAGON. Dynamic RAG Benchmark On News

This leaderboard allows comparing RAG systems based on generative and retrieval metrics across different question types (simple, comparison, multi-hop, conditional, etc.).

  • Questions are automatically generated from news sources.
  • The question dataset is updated regularly, and metrics for open models are recalculated.
  • User submissions use the latest calculated metrics for them.
  • To recalculate a previously submitted configuration with the latest data version, use the submit_id received during the initial submission via the client (see instructions below).
  • Version 1.34.1 β†’ 600 questions, generated from news sources β†’ 03 июля 2025

    Generation Metrics

    Retrieval Metrics

    Model
    Embeddings
    Top k
    Retrieval (avg)
    Generation (avg)
    Total Score
    Version
    Last Updated
    RuadaptQwen2.5-32B-Instruct (9449f3)
    multilingual-e5-large-instruct_0
    20
    0.6769
    0.4702
    0.5736
    1.11.0
    2025-07-20

    Citation

    @misc{chernogorskii2025dragondynamicragbenchmark,
          title={DRAGON: Dynamic RAG Benchmark On News}, 
          author={Fedor Chernogorskii and Sergei Averkiev and Liliya Kudraleeva and Zaven Martirosian and Maria Tikhonova and Valentin Malykh and Alena Fenogenova},
          year={2025},
          eprint={2507.05713},
          archivePrefix={arXiv},
          primaryClass={cs.CL},
          url={https://arxiv.org/abs/2507.05713}, 
    }
    

    Version Selection

    Start counting from the current dataset version

    1 5

    Click on models in the table to add them to the charts

    DRAGON. Dynamic RAG Benchmark Leaderboard