# Roformer Separation

Audio source separation using Roformer-based models. Separate vocals, guitar, and other instruments from mixed audio tracks.

## Features

- 🎵 High-quality audio source separation
- 🎸 Support for vocals and guitar separation (more instruments coming soon)
- 🚀 Automatic model downloading from HuggingFace
- 💾 Caching of downloaded models
- 🎛️ Simple command-line interface
- 🔄 Optional test-time augmentation for better quality
- 📦 Easy pip installation

## Installation

### From PyPI (coming soon)

```bash
pip install roformer-separation
```

### From Source

```bash
git clone https://github.com/xavriley/roformer-separation.git
cd roformer-separation
pip install -e .
```

## Quick Start

### Separate vocals from a folder of audio files

```bash
roformer-separate \
    --instrument vocals \
    --input_folder /path/to/audio \
    --store_dir /path/to/output
```

### Separate guitar from a single audio file

```bash
roformer-separate \
    --instrument guitar \
    --input_folder /path/to/song.wav \
    --store_dir /path/to/output
```

### Extract instrumental track (vocals removed)

```bash
roformer-separate \
    --instrument vocals \
    --input_folder /path/to/audio \
    --store_dir /path/to/output \
    --extract_instrumental
```

### Use test-time augmentation for better quality

```bash
roformer-separate \
    --instrument vocals \
    --input_folder /path/to/audio \
    --store_dir /path/to/output \
    --use_tta
```

## Usage

```
usage: roformer-separate [-h] --instrument {vocals,guitar} --input_folder INPUT_FOLDER
                         --store_dir STORE_DIR [--force_cpu] [--extract_instrumental]
                         [--use_tta] [--output_format {wav,flac}]

Separate audio sources using Roformer models

options:
  -h, --help            show this help message and exit
  --instrument {vocals,guitar}
                        Type of instrument to separate
  --input_folder INPUT_FOLDER
                        Path to audio file or folder containing audio files
  --store_dir STORE_DIR
                        Directory to store separated audio outputs
  --force_cpu           Force CPU usage even if GPU is available
  --extract_instrumental
                        Also extract instrumental track (original minus separated instrument)
  --use_tta             Use test-time augmentation for better quality (slower)
  --output_format {wav,flac}
                        Output audio format (default: wav)
```

## Advanced Usage

For advanced users, you can specify custom model checkpoints and configs:

```bash
roformer-separate \
    --instrument vocals \
    --input_folder /path/to/audio \
    --store_dir /path/to/output \
    --checkpoint /path/to/custom_model.ckpt \
    --config /path/to/custom_config.yaml
```

## Supported Instruments

Currently supported:
- `vocals` - Separate vocal tracks from music
- `guitar` - Separate guitar tracks from music

More instruments coming soon!

## How It Works

1. **Model Download**: On first run, the tool automatically downloads the appropriate model weights and configuration from HuggingFace
2. **Caching**: Models are cached in `~/.cache/roformer-separation/` to avoid re-downloading
3. **Processing**: Audio is processed in chunks with overlap for seamless separation
4. **Output**: Separated tracks are saved in the specified output directory

## Model Details

This package uses Mel-Band Roformer models trained for audio source separation:

- **Vocals Model**: Lead_VocalDereverb.ckpt with config_karaoke_becruily.yaml
- **Guitar Model**: becruily_guitar.ckpt with config_guitar_becruily.yaml

Models are hosted on HuggingFace: [xavriley/source_separation_mirror](https://huggingface.co/xavriley/source_separation_mirror)

## Python API

You can also use the package programmatically:

```python
from roformer_separation import MelBandRoformer
from roformer_separation.download import get_model_paths
from roformer_separation.config import load_config_from_yaml
from roformer_separation.inference import demix
import librosa
import torch

# Get model paths (downloads if needed)
checkpoint_path, config_path = get_model_paths("vocals")

# Load config and model
config = load_config_from_yaml(str(config_path))
model = MelBandRoformer(**vars(config.model))

# Load checkpoint
state_dict = torch.load(checkpoint_path, map_location='cpu')
if 'state' in state_dict:
    state_dict = state_dict['state']
model.load_state_dict(state_dict, strict=False)
model.eval()

# Load audio
mix, sr = librosa.load("song.wav", sr=44100, mono=False)

# Separate
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
separated = demix(config, model, mix, device, model_type='mel_band_roformer')

# Access separated tracks
vocals = separated['Vocals']
instrumental = separated['Instrumental']
```

## Requirements

- Python 3.8+
- PyTorch 2.0+
- CUDA-capable GPU (optional, but recommended for faster processing)

## Performance Tips

- **GPU Usage**: Processing is significantly faster with a CUDA-capable GPU
- **Batch Processing**: Process multiple files at once by pointing to a folder
- **TTA**: Use `--use_tta` for better quality at the cost of 3x processing time
- **Output Format**: Use FLAC for lossless compression, WAV for maximum compatibility

## Troubleshooting

### Out of Memory Errors

If you encounter CUDA out of memory errors, try:
- Using `--force_cpu` to process on CPU
- Processing shorter audio files
- Closing other GPU-intensive applications

### Download Issues

If model downloads fail:
- Check your internet connection
- Try again (downloads resume automatically)
- Manually download models and use `--checkpoint` and `--config` options

### Audio Quality

For best results:
- Use high-quality input audio (WAV, FLAC)
- Try `--use_tta` for improved separation quality
- Ensure input audio is stereo (mono will be converted automatically)

## License

MIT License - See LICENSE file for details

## Citation

If you use this package in your research, please cite the original Roformer paper and model authors.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## Acknowledgments

- Model architecture based on Mel-Band Roformer
- Pre-trained models from the audio source separation community
- Built with PyTorch, librosa, and einops

