ASR
Collection
1 item
β’
Updated
νκ΅μ΄ μμ± μΈμ(ASR)μ μν΄ LoRA fine-tuningλ Whisper-base λͺ¨λΈμ λλ€.
λ κ±°λ ΈμΈ λ° μ·¨μ½κ³μΈ΅ λ³΅μ§ μλ΄ μμ€ν μ μν΄ νμ΅λ νκ΅μ΄ μμ± μΈμ λͺ¨λΈμ λλ€.
| Model | Category | WER | CER |
|---|---|---|---|
| Baseline | ALL | 0.4236 | 0.1588 |
| LoRA Fine-tuned | ALL | 0.2592 | 0.0584 |
| Baseline | μ μ κ±΄κ° λ³΅μ§ | 0.354 | 0.1315 |
| LoRA Fine-tuned | μ μ κ±΄κ° λ³΅μ§ | 0.228 | 0.0571 |
νκ΅μ΄ μμ±μ ν μ€νΈλ‘ λ³ννλ ASR μμ μ μ¬μ©λ©λλ€.
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from peft import PeftModel
import torch
import librosa
# Load base model and processor
processor = WhisperProcessor.from_pretrained("openai/whisper-base")
base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-base")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "jaehyeono/whisper-base-korean-lora")
model = model.merge_and_unload() # Merge for faster inference
model.eval()
# Inference
audio, sr = librosa.load("audio.wav", sr=16000)
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
with torch.no_grad():
predicted_ids = model.generate(input_features, language="ko", task="transcribe")
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)
| Dataset | Description |
|---|---|
| AIHub 186 | νκ΅μ΄ μμ± λ°μ΄ν° (μΌλ° λν) |
| Zeroth Korean | κ³΅κ° νκ΅μ΄ μμ± λ°μ΄ν°μ |
| AIHub 134 | κ°μ /μ μ κ±΄κ° κ΄λ ¨ μμ± λ°μ΄ν° |
Whisper-base λͺ¨λΈμ LoRA adapterλ₯Ό μ μ©νμ¬ νκ΅μ΄ ASR μ±λ₯μ ν₯μμμΌ°μ΅λλ€.
@misc{whisper-korean-lora-2026,
title={Whisper-Base Korean LoRA for Welfare Call Center},
author={Jaehyeon},
year={2026},
publisher={HuggingFace}
}
Base model
openai/whisper-base