Open-source speech understanding toolkit

FunASR

Production-ready ASR, VAD, punctuation, speaker diarization, emotion detection, and audio event recognition with one unified Python interface.

Get started Training guide API reference

funasr-pipeline.py

from funasr import AutoModel

model = AutoModel(
    model="paraformer-zh",
    vad_model="fsmn-vad",
    punc_model="ct-punc",
    spk_model="cam++",
)
res = model.generate(input="meeting.wav")
print(res[0]["sentence_info"])

50+languages

170xrealtime

1 APIpipeline

What's New

The public docs now track the latest README and main-branch capabilities.

vLLM Inference Engine

2-3x faster LLM decoding for Fun-ASR-Nano, with tensor parallel batch inference and real-time WebSocket service.

Read the vLLM guide

Agent Infrastructure

funasr-server exposes OpenAI-compatible transcription APIs; MCP and voice-input examples connect local ASR to AI tools.

Set up Agent integration

Benchmark Report

Long-form benchmark results cover SenseVoice, Paraformer, Fun-ASR-Nano, GLM-ASR, and Whisper variants on GPU and CPU.

View benchmark

Private Speech API in Minutes

Run an OpenAI-compatible transcription endpoint locally, then plug it into agents, apps, and batch pipelines without sending audio to a cloud ASR provider.

Start the server

Use SenseVoice for a fast first test, or switch models after the endpoint is running.

pip install funasr fastapi uvicorn python-multipart
funasr-server --model sensevoice --device cuda

Verify with curl

Download a public sample and call the same route used by OpenAI-compatible clients.

curl -L https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0764W0121.wav -o sample.wav
curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@sample.wav \
  -F model=sensevoice \
  -F response_format=verbose_json

Open in Colab OpenAI API example Workflow recipes Gradio demo Security guide Agent integration

Documentation

Start with examples, tune on your own data, extend the registry, or jump into source-linked API docs.

Choose

Model Selection

Pick SenseVoice, Paraformer, Fun-ASR-Nano, streaming runtime, or OpenAI API aliases for your workload.

Deploy

Deployment Matrix

Choose between Python API, OpenAI API, Docker Compose, Kubernetes, WebSocket runtime, vLLM, MCP, batch, subtitles, and Triton.

Compare

Migration Guide

Evaluate FunASR against Whisper or cloud ASR with feature mapping, representative benchmarks, and rollout checks.

Choose

Use Cases

Find the fastest path for private APIs, agents, streaming, vLLM, subtitles, batch jobs, and benchmarks.

Learn

Tutorial

Install FunASR, choose a model, and run common ASR, VAD, diarization, and export flows.

Tune

Training

Prepare JSONL data, fine-tune Paraformer, SenseVoice, and Fun-ASR-Nano, then monitor runs.

Extend

Developer Guide

Understand the registry, add a model, package remote code, and avoid integration pitfalls.

Accelerate

vLLM Inference

Run LLM-based ASR with vLLM, tensor parallel batch decoding, streaming SDK, and WebSocket service.

Integrate

Agent & API

Expose FunASR as an OpenAI-compatible endpoint, low-code workflow node, MCP tool, voice input, or subtitle generator.

Measure

Benchmark

Compare FunASR and Whisper on long-form audio, including GPU and CPU speed/CER numbers.

Reference

API

Browse generated classes, methods, source previews, and GitHub line links.

Capabilities

Everything needed for speech understanding, from raw audio segmentation to speaker-aware transcripts.

ASR

Speech Recognition

Streaming and offline ASR with VAD segmentation. Process long-form audio with a single API call.

50+

Multilingual Models

Fun-ASR-Nano covers 31 languages and Qwen3-ASR covers 52 languages with language detection.

SPK

Speaker Diarization

Identify who spoke when, then attach speaker labels to sentence-level ASR output.

SFX

Emotion & Events

SenseVoice detects emotion and audio events including background music, applause, laughter, and crying.

RTF

Low Latency

Non-autoregressive models support fast batch and realtime workloads across common deployment targets.

OPS

Train & Deploy

Fine-tune with DeepSpeed, export to ONNX, and deploy through Docker runtime or the Python SDK.

Models

Pre-trained industrial models ready for recognition, segmentation, and speech understanding workflows.

Fun-ASR-Nano

End-to-end ASR trained on tens of millions of hours. 31 languages, dialects, accents, lyrics, timestamps, and hotwords.

800M 31 Languages Timestamps Speaker Diarization Hotwords

Paraformer

Non-autoregressive Chinese and English ASR with streaming and offline variants for production systems.

Streaming Offline Hotwords ONNX

SenseVoice

Multi-task speech understanding for ASR, language ID, emotion, and audio events across five languages.

Emotion Audio Events Speaker Diarization 70ms/10s

Qwen3-ASR

LLM-based ASR with 52 languages, contextual understanding, and automatic language detection.

52 Languages 0.6B / 1.7B Context-aware

Quick Start

Install locally, or run the Colab quickstart first to transcribe a sample in your browser.

pip install funasr
# Or latest: pip install git+https://github.com/modelscope/FunASR.git

from funasr import AutoModel

model = AutoModel(
    model="paraformer-zh",
    vad_model="fsmn-vad",
    punc_model="ct-punc",
    spk_model="cam++",
)
res = model.generate(input="meeting.wav", batch_size_s=300)

for sent in res[0]["sentence_info"]:
    print(f"[Speaker {sent['spk']}] {sent['text']}")

Open in Colab Full tutorial

Ecosystem

Related projects around ASR, speech understanding, video clipping, and voice generation.

FunASR

What's New

vLLM Inference Engine

Agent Infrastructure

Benchmark Report

Private Speech API in Minutes

Start the server

Verify with curl

Documentation

Model Selection

Deployment Matrix

Migration Guide

Use Cases

Tutorial

Training

Developer Guide

vLLM Inference

Agent & API

Benchmark

API

Capabilities

Speech Recognition

Multilingual Models

Speaker Diarization

Emotion & Events

Low Latency

Train & Deploy

Models

Fun-ASR-Nano

Paraformer

SenseVoice

Qwen3-ASR

Quick Start

Ecosystem

Fun-ASR-Nano

SenseVoice

FunClip

CosyVoice