nielsr HF Staff commited on
Commit
9de7942
·
verified ·
1 Parent(s): f60218e

Add usage example to model card

Browse files

This PR enhances the model card by directly embedding a Python code snippet for quick model usage. This allows users to understand how to load and run the `FocalCodec-Stream` model directly from the Hugging Face Hub, improving accessibility and ease of use without needing to visit the external GitHub repository for basic instructions.

The `config` argument in the usage example has been updated to `lucadellalib/focalcodec_50hz_4k_causal` to specifically load this model, and a publicly accessible audio file URL from Hugging Face datasets is provided for immediate execution.

Files changed (1) hide show
  1. README.md +47 -3
README.md CHANGED
@@ -1,8 +1,8 @@
1
  ---
2
- license: apache-2.0
3
- library_name: torch
4
  base_model:
5
  - microsoft/wavlm-large
 
 
6
  pipeline_tag: audio-to-audio
7
  ---
8
 
@@ -28,7 +28,51 @@ This repository contains the **50 Hz causal checkpoint with a codebook size of 4
28
 
29
  ## ▶️ Quickstart
30
 
31
- See the readme at: https://github.com/lucadellalib/focalcodec
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
  ---------------------------------------------------------------------------------------------------------
34
 
 
1
  ---
 
 
2
  base_model:
3
  - microsoft/wavlm-large
4
+ library_name: torch
5
+ license: apache-2.0
6
  pipeline_tag: audio-to-audio
7
  ---
8
 
 
28
 
29
  ## ▶️ Quickstart
30
 
31
+ Here's a quick example of how to load the FocalCodec-Stream model and perform speech resynthesis (encode an audio file into tokens and decode it back into a waveform):
32
+
33
+ ```python
34
+ import torch
35
+ import torchaudio
36
+
37
+ # Load FocalCodec model
38
+ # This loads the model's code from the FocalCodec GitHub repository via torch.hub
39
+ # and the configuration/weights for this specific checkpoint from the Hugging Face Hub.
40
+ codec = torch.hub.load(
41
+ repo_or_dir="lucadellalib/focalcodec",
42
+ model="focalcodec",
43
+ config="lucadellalib/focalcodec_50hz_4k_causal", # This model's ID on Hugging Face
44
+ force_reload=True, # Set to True to ensure the latest FocalCodec version is fetched
45
+ )
46
+ codec.eval().requires_grad_(False)
47
+
48
+ # Load and preprocess an input audio file
49
+ # Example audio from FocalCodec's dataset on Hugging Face:
50
+ audio_file_url = "https://huggingface.co/datasets/lucadellalib/focalcodec_audios/resolve/main/librispeech-dev-clean/251-118436-0003.wav"
51
+ sig, sample_rate = torchaudio.load(audio_file_url)
52
+ # Resample the audio to the model's expected input sample rate (e.g., 16 kHz)
53
+ sig = torchaudio.functional.resample(sig, sample_rate, codec.sample_rate_input)
54
+
55
+ # Encode audio into binary tokens
56
+ toks = codec.sig_to_toks(sig) # Shape: (batch_size, num_frames)
57
+ print(f"Encoded tokens shape: {toks.shape}")
58
+ print(f"First few tokens: {toks[:, :5]}")
59
+
60
+ # Decode tokens back into a waveform
61
+ rec_sig = codec.toks_to_toks(toks)
62
+
63
+ # Save the reconstructed audio
64
+ output_sample_rate = codec.sample_rate_output # Use model's specified output sample rate
65
+ # Resample back to original sample rate for saving if needed
66
+ rec_sig = torchaudio.functional.resample(rec_sig, output_sample_rate, sample_rate)
67
+ torchaudio.save("focalcodec_reconstruction.wav", rec_sig.cpu(), sample_rate)
68
+ print("Speech resynthesis complete. Output saved to focalcodec_reconstruction.wav")
69
+ ```
70
+
71
+ ---------------------------------------------------------------------------------------------------------
72
+
73
+ ## ▶️ Further Quickstart Info
74
+
75
+ For more detailed examples, including streaming inference from microphone/file and voice conversion, please refer to the extensive documentation and demo scripts in the [FocalCodec GitHub repository](https://github.com/lucadellalib/focalcodec).
76
 
77
  ---------------------------------------------------------------------------------------------------------
78