Add usage example to model card

This PR enhances the model card by directly embedding a Python code snippet for quick model usage. This allows users to understand how to load and run the `FocalCodec-Stream` model directly from the Hugging Face Hub, improving accessibility and ease of use without needing to visit the external GitHub repository for basic instructions.

The `config` argument in the usage example has been updated to `lucadellalib/focalcodec_50hz_4k_causal` to specifically load this model, and a publicly accessible audio file URL from Hugging Face datasets is provided for immediate execution.

Files changed (1) hide show

README.md +47 -3

README.md CHANGED Viewed

@@ -1,8 +1,8 @@
 ---
-license: apache-2.0
-library_name: torch
 base_model:
 - microsoft/wavlm-large
 pipeline_tag: audio-to-audio
 ---
@@ -28,7 +28,51 @@ This repository contains the **50 Hz causal checkpoint with a codebook size of 4
 ## ▶️ Quickstart
-See the readme at: https://github.com/lucadellalib/focalcodec
 ---------------------------------------------------------------------------------------------------------

 ---
 base_model:
 - microsoft/wavlm-large
+library_name: torch
+license: apache-2.0
 pipeline_tag: audio-to-audio
 ---
 ## ▶️ Quickstart
+Here's a quick example of how to load the FocalCodec-Stream model and perform speech resynthesis (encode an audio file into tokens and decode it back into a waveform):
+```python
+import torch
+import torchaudio
+# Load FocalCodec model
+# This loads the model's code from the FocalCodec GitHub repository via torch.hub
+# and the configuration/weights for this specific checkpoint from the Hugging Face Hub.
+codec = torch.hub.load(
+    repo_or_dir="lucadellalib/focalcodec",
+    model="focalcodec",
+    config="lucadellalib/focalcodec_50hz_4k_causal", # This model's ID on Hugging Face
+    force_reload=True,  # Set to True to ensure the latest FocalCodec version is fetched
+)
+codec.eval().requires_grad_(False)
+# Load and preprocess an input audio file
+# Example audio from FocalCodec's dataset on Hugging Face:
+audio_file_url = "https://huggingface.co/datasets/lucadellalib/focalcodec_audios/resolve/main/librispeech-dev-clean/251-118436-0003.wav"
+sig, sample_rate = torchaudio.load(audio_file_url)
+# Resample the audio to the model's expected input sample rate (e.g., 16 kHz)
+sig = torchaudio.functional.resample(sig, sample_rate, codec.sample_rate_input)
+# Encode audio into binary tokens
+toks = codec.sig_to_toks(sig)  # Shape: (batch_size, num_frames)
+print(f"Encoded tokens shape: {toks.shape}")
+print(f"First few tokens: {toks[:, :5]}")
+# Decode tokens back into a waveform
+rec_sig = codec.toks_to_toks(toks)
+# Save the reconstructed audio
+output_sample_rate = codec.sample_rate_output # Use model's specified output sample rate
+# Resample back to original sample rate for saving if needed
+rec_sig = torchaudio.functional.resample(rec_sig, output_sample_rate, sample_rate)
+torchaudio.save("focalcodec_reconstruction.wav", rec_sig.cpu(), sample_rate)
+print("Speech resynthesis complete. Output saved to focalcodec_reconstruction.wav")
+```
+---------------------------------------------------------------------------------------------------------
+## ▶️ Further Quickstart Info
+For more detailed examples, including streaming inference from microphone/file and voice conversion, please refer to the extensive documentation and demo scripts in the [FocalCodec GitHub repository](https://github.com/lucadellalib/focalcodec).
 ---------------------------------------------------------------------------------------------------------