Open-source speech understanding toolkit

FunASR

Production-ready ASR, VAD, punctuation, speaker diarization, emotion detection, and audio event recognition with one unified Python interface.

PyPI Stars License
funasr-pipeline.py
from funasr import AutoModel

model = AutoModel(
    model="paraformer-zh",
    vad_model="fsmn-vad",
    punc_model="ct-punc",
    spk_model="cam++",
)
res = model.generate(input="meeting.wav")
print(res[0]["sentence_info"])
50+languages
15xfaster
1 APIpipeline

Capabilities

Everything needed for speech understanding, from raw audio segmentation to speaker-aware transcripts.

ASR

Speech Recognition

Streaming and offline ASR with VAD segmentation. Process long-form audio with a single API call.

50+

Multilingual Models

Fun-ASR-Nano covers 31 languages and Qwen3-ASR covers 52 languages with language detection.

SPK

Speaker Diarization

Identify who spoke when, then attach speaker labels to sentence-level ASR output.

SFX

Emotion & Events

SenseVoice detects emotion and audio events including background music, applause, laughter, and crying.

RTF

Low Latency

Non-autoregressive models support fast batch and realtime workloads across common deployment targets.

OPS

Train & Deploy

Fine-tune with DeepSpeed, export to ONNX, and deploy through Docker runtime or the Python SDK.

Models

Pre-trained industrial models ready for recognition, segmentation, and speech understanding workflows.

Fun-ASR-Nano

End-to-end ASR trained on tens of millions of hours. 31 languages, dialects, accents, lyrics, timestamps, and hotwords.

800M 31 Languages Timestamps Speaker Diarization Hotwords

Paraformer

Non-autoregressive Chinese and English ASR with streaming and offline variants for production systems.

Streaming Offline Hotwords ONNX

SenseVoice

Multi-task speech understanding for ASR, language ID, emotion, and audio events across five languages.

Emotion Audio Events Speaker Diarization 70ms/10s

Qwen3-ASR

LLM-based ASR with 52 languages, contextual understanding, and automatic language detection.

52 Languages 0.6B / 1.7B Context-aware

Quick Start

Install the package, compose the pipeline, and run recognition from Python.

pip install funasr
# Or latest: pip install git+https://github.com/modelscope/FunASR.git
from funasr import AutoModel

model = AutoModel(
    model="paraformer-zh",
    vad_model="fsmn-vad",
    punc_model="ct-punc",
    spk_model="cam++",
)
res = model.generate(input="meeting.wav", batch_size_s=300)

for sent in res[0]["sentence_info"]:
    print(f"[Speaker {sent['spk']}] {sent['text']}")

Ecosystem

Related projects around ASR, speech understanding, video clipping, and voice generation.

Fun-ASR-Nano

The latest ASR large model with multilingual recognition, timestamps, speaker diarization, and hotwords.

SenseVoice

Multi-task speech understanding for ASR, emotion detection, and audio event recognition.

FunClip

AI video clipping powered by FunASR and LLM-assisted editing workflows.

CosyVoice

Natural speech generation with multi-language, timbre, and emotion control.