NOTE: full blogpost forthcoming!
2048former v0.1
2048former is a 2048 gameplaying agent. It's a 50M parameter encoder-only transformer model trained from scratch to predict the next move. Unlike other approaches which use explicit search, 2048former distills gameplay from a search-based algorithm into a 1-ply model via supervised learning, similar to Google DeepMind's 1-ply chess policy (Ruoss et al. 2023).
Performance
2048former v0.1 (to my knowledge) is currently the third best publicly released model in the world, and the best model with no explicit search.
| Model | Depth | Games | Mean score | % 32768 | % 16384 | % 8192 | Moves/sec |
|---|---|---|---|---|---|---|---|
| 2048Former v0.1 | 1 | 2048 | 605,491 | 66.1% | 92.7% | 96.0% | 22,000 batch / ~300 its/s |
| Expectimax (Macroxue) | 6 | 1000 | 690,621 | 78.0% | 97.0% | 99.7% | 196 |
| Expectimax (Macroxue) 3-ply | 3 | 1000 | 493,058 | 52.5% | 81.4% | 93.7% | 6461 |
| Optimistic TD Learning (Guei et al. 2021) | 6 | 100 | 625,377 += 40,936 | 72% | 98.8% | 99.8% | ? |
| Guei et al. 2021 1-ply | 1 | 1e6 | 412,785 | 30.1% | 85.4% | 97.2% | |
| Stochastic MuZero | 3 | ~500K |
References:
- Macroxue hybrid expectimax
- Optimistic Temporal Difference Learning for 2048: Guei et al. 2021
- Stochastic MuZero: Antonoglou et al. 2021
Architecture and data
Architectural details:
- Encoder-only transformer, with LLaMA-3 style blocks (GQA + SiLU activation + RMSNorm) but bidirectional attention and no RoPE
- Absolute positional encoding, 16-token context window (the 4x4 board)
- Four output heads
This model was trained on 120,000 games (~3 billion frames) of play from Macroxue's hybrid expectimax 2048 engine (link to my fork) at 6-ply search. Training took about 3 days on a single 4090.
How to use this model
Please see https://github.com/EndlessReform/2048former for codebase.
- Downloads last month
- 26