APEX: Large-Scale Multi-Task Aesthetic-Informed Popularity Prediction for AI-Generated Music

APEX is the first large-scale multi-task learning framework for jointly predicting popularity and aesthetic quality of AI-generated music from audio alone. It is trained on over 211k AI-generated songs (~10k hours of audio) from Suno and Udio, leveraging MERT-v1-95M audio embeddings.

What does APEX predict?

Given any audio file, APEX predicts 7 scores:

Popularity:

Score	Range	Description
`score_streams`	0–100	Predicted streaming engagement score
`score_likes`	0–100	Predicted likes engagement score

Aesthetic Quality (from SongEval):

Score	Range	Description
`coherence`	1–5	Structural and harmonic coherence
`musicality`	1–5	Overall musical quality
`memorability`	1–5	How memorable the song is
`clarity`	1–5	Clarity of production and mix
`naturalness`	1–5	Naturalness of the generated audio

Architecture

Usage

Installation

pip uninstall -y torch torchvision torchaudio transformers -q
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0
pip install transformers soundfile librosa "numpy<2" "scipy<1.16"

Inference

from transformers import AutoModel
import torch

model = AutoModel.from_pretrained(
    "amaai-lab/apex",
    trust_remote_code       = True,
    device_map              = None,
    low_cpu_mem_usage       = False,
    ignore_mismatched_sizes = True
)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model  = model.to(device)

results = model.predict("/path/to/your/mp3/file", save_json="results.json")

print(f"Streams Score : {results['score_streams']:.2f}")
print(f"Likes Score   : {results['score_likes']:.2f}")
print(f"Coherence     : {results['coherence']:.2f}")
print(f"Musicality    : {results['musicality']:.2f}")
print(f"Memorability  : {results['memorability']:.2f}")
print(f"Clarity       : {results['clarity']:.2f}")
print(f"Naturalness   : {results['naturalness']:.2f}")

Downloads last month: 245