FormoSpeech

non-profit

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

hungshinlee updated a collection about 3 hours ago

Taiwanese Hakka for Whisper

hungshinlee updated a collection about 3 hours ago

Taiwanese Hakka for Whisper

hungshinlee updated a collection about 3 hours ago

Taiwanese Hakka for Whisper

View all activity

Organization Card

Community About org cards

FormoSpeech (Speech in Formosa)

Advancing Speech AI for the Linguistic Diversity of Taiwan

About Us

Welcome to FormoSpeech (Speech in Formosa), an open-source initiative and community hub hosted on Hugging Face. We are dedicated to the development and promotion of speech and language technologies for the languages of Taiwan.

This organization is founded by Dr. Hung-Shin Lee (李鴻欣), with support from Professor Chen-Chi Chang (張陳基) of National United University (國立聯合大學) and assistance from ÌTHUÂN KHOKI (意傳科技) led by Sîng-hông Sih (薛丞宏). Another key contributor is Li-Wei Chen (陳力瑋), a master's student at National Tsing Hua University (國立清華大學). The organization aims to bring together researchers, developers, and language enthusiasts to build and share high-quality, accessible resources for the Taiwanese linguistic context.

Our Mission

Our primary goal is to create and curate foundational models, datasets, and tools to support the rich linguistic tapestry of Taiwan. We believe that advancing AI capabilities for these languages is crucial for digital inclusion, cultural preservation, and future innovation.

We are specifically focused on:

Taiwanese Mandarin (國語): Developing models that capture the unique accent, lexicon, and nuances of Mandarin spoken in Taiwan.
Taiwanese Hokkien (臺灣台語): Building resources for a vital language with a rich oral tradition but historically fewer digital resources.
Taiwanese Hakka (臺灣客語): Supporting the various dialects of Hakka spoken across Taiwan.
Formosan Languages (臺灣原住民語): Contributing to the digital preservation and revitalization of Taiwan's Formosan languages.

What You'll Find Here

This organization will host a growing collection of:

Models: Pre-trained models for tasks such as Automatic Speech Recognition (ASR), Text-to-Speech (TTS), speaker identification, and more, all fine-tuned or trained on Taiwanese language data.
Datasets: Curated and pre-processed speech corpora suitable for training and evaluating speech AI models.
Demos & Tools: Interactive demos (Spaces) and tools to facilitate research and application development.

Citation

If you use any models, datasets, or demos from FormoSpeech in your research or projects, we would be incredibly grateful if you could cite our organization.

You can use the following BibTeX entry:

@misc{formospeech,
  author       = {Hung-Shin Lee and Li-Wei Chen},
  title        = {FormoSpeech: Advancing Speech AI for the Linguistic Diversity of Taiwan},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{[https://huggingface.co/FormoSpeech](https://huggingface.co/FormoSpeech)}}
}

Get Involved

We believe in the power of community collaboration. Whether you are a researcher, a developer, a linguist, or a native speaker, there are many ways to contribute:

Use our models and datasets: Integrate them into your projects and provide feedback.
Contribute your own work: Share your models, datasets, or code with the community.
Report issues: Help us improve the quality and reliability of our resources.
Collaborate: We are always open to new ideas and partnerships.

Let's work together to build a vibrant and inclusive future for Taiwan's languages in the age of AI.