APEX: Large-Scale Multi-Task Aesthetic-Informed Popularity Prediction for AI-Generated Music
APEX is the first large-scale multi-task learning framework for jointly predicting popularity and aesthetic quality of AI-generated music from audio alone. It is trained on over 211k AI-generated songs (~10k hours of audio) from Suno and Udio, leveraging MERT-v1-95M audio embeddings.
What does APEX predict?
Given any audio file, APEX predicts 7 scores:
Popularity:
| Score | Range | Description |
|---|---|---|
score_streams |
0β100 | Predicted streaming engagement score |
score_likes |
0β100 | Predicted likes engagement score |
Aesthetic Quality (from SongEval):
| Score | Range | Description |
|---|---|---|
coherence |
1β5 | Structural and harmonic coherence |
musicality |
1β5 | Overall musical quality |
memorability |
1β5 | How memorable the song is |
clarity |
1β5 | Clarity of production and mix |
naturalness |
1β5 | Naturalness of the generated audio |
Architecture
Usage
Installation
pip uninstall -y torch torchvision torchaudio transformers -q
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0
pip install transformers soundfile librosa "numpy<2" "scipy<1.16"
Inference
from transformers import AutoModel
import torch
model = AutoModel.from_pretrained(
"amaai-lab/apex",
trust_remote_code = True,
device_map = None,
low_cpu_mem_usage = False,
ignore_mismatched_sizes = True
)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
results = model.predict("/path/to/your/mp3/file", save_json="results.json")
print(f"Streams Score : {results['score_streams']:.2f}")
print(f"Likes Score : {results['score_likes']:.2f}")
print(f"Coherence : {results['coherence']:.2f}")
print(f"Musicality : {results['musicality']:.2f}")
print(f"Memorability : {results['memorability']:.2f}")
print(f"Clarity : {results['clarity']:.2f}")
print(f"Naturalness : {results['naturalness']:.2f}")
- Downloads last month
- 245
