Papers
arxiv:2605.14386

Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning

Published on May 14
· Submitted by
seawolf
on May 15
Authors:
,
,
,
,
,
,

Abstract

The Darwin Family framework enables training-free evolutionary merging of large language models through gradient-free weight-space recombination, achieving superior reasoning performance without additional training.

We present Darwin Family, a framework for training-free evolutionary merging of large language models via gradient-free weight-space recombination. We ask whether frontier-level reasoning performance can be improved without additional training, by reorganizing latent capabilities already encoded in existing checkpoints. Darwin introduces three key ideas: (i) a 14-dimensional adaptive merge genome enabling fine-grained component- and block-level recombination; (ii) MRI-Trust Fusion, which adaptively balances diagnostic layer-importance signals with evolutionary search through a learnable trust parameter; and (iii) an Architecture Mapper that enables cross-architecture breeding between heterogeneous model families. Empirically, the flagship Darwin-27B-Opus achieves 86.9% on GPQA Diamond, ranking #6 among 1,252 evaluated models, and outperforming its fully trained foundation model without any gradient-based training. Across scales from 4B to 35B parameters, Darwin models consistently improve over their parents, support recursive multi-generation evolution, and enable a training-free evolutionary merge that combines Transformer- and Mamba-based components. Together, the Darwin Family demonstrates that diagnostic-guided evolutionary merging is a practical and reproducible alternative to costly post-training pipelines for reasoning-centric language models.

Community

Paper submitter

FINAL Bench introduces a new evaluation paradigm for LLMs:
functional metacognitive reasoning — not just "can the model solve it,"
but "does the model know when, why, and how it solves it."

  • 100 tasks across 15 domains, built on the TICOS framework
    (Task / Introspection / Calibration / Output / Self-correction)
  • Already #5 globally on HF Datasets popularity
  • Officially endorsed by the HF Evaluation Team (Nathan Habib)

We believe metacognition is the missing axis in current LLM benchmarks.
Feedback welcome.

Darwin Family — Architecture Overview

Darwin Family Diagram

Flagship update: Darwin-36B-Opus achieves 88.4% on GPQA Diamond,
matching Qwen3.5-397B-A17B with ~10× fewer params, training-free.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

ty for that kind public shared work

·

Thanks! Glad you found it useful. More to come.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.14386
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 24

Browse 24 models citing this paper

Datasets citing this paper 1

Spaces citing this paper 20

Browse 20 spaces citing this paper

Collections including this paper 9