Use Cases
Pick the shortest path from evaluation to production. FunASR covers local transcription, private OpenAI-compatible APIs, Kubernetes services, agent voice input, streaming services, vLLM acceleration, subtitles, and batch processing.
Choose the right path
| Goal | Start here | Why it matters |
|---|---|---|
| Try FunASR in a browser | Colab quickstart | Run a public sample and upload your own audio before setting up a local environment. |
| Transcribe one file locally | Tutorial · Model selection | Verify install, model choice, and model download in minutes. |
| Compare accuracy and speed | Benchmark report | Review long-audio speed and CER before choosing a model. |
| Migrate from Whisper/cloud ASR | Migration guide · Benchmark example | Map existing pipelines to FunASR, benchmark representative audio, and plan a safe rollout. |
| Build a private speech API | OpenAI-compatible API · Kubernetes template · JS/TS recipes · Gradio demo · Security guide · Workflow recipes | Reuse OpenAI-style clients, Dify, n8n, HTTP workflow nodes, and a Gradio browser UI while planning TLS, auth, upload limits, and logs at the service boundary. |
| Add speech input to agents | MCP server | Connect local ASR to Claude, Cursor, desktop tools, and internal assistants. |
| Choose a deployment path | Deployment matrix | Compare Python API, OpenAI API, Docker Compose, Kubernetes, WebSocket, vLLM, MCP, subtitles, batch jobs, and Triton. |
| Serve streaming ASR | Realtime examples | Handle live captioning, meetings, and call-center style workloads. |
| Accelerate LLM-based ASR | vLLM guide | Use tensor parallel decoding and streaming service support for Fun-ASR-Nano. |
| Generate subtitles | Subtitle generator | Create SRT/VTT files from audio or video, with speaker labels when needed. |
| Process many recordings | Batch ASR example | Build repeatable offline jobs for archives, meetings, and datasets. |
Production recipes
Private transcription API
Use this path when an application already speaks OpenAI-style APIs or when audio cannot leave your environment.
pip install funasr fastapi uvicorn python-multipart
funasr-server --model sensevoice --device cuda
curl http://localhost:8000/v1/audio/transcriptions \
-F file=@sample.wav \
-F model=sensevoice \
-F response_format=verbose_jsonAgent speech input
Start from the MCP server when you want to talk to coding agents, internal assistants, or workflow tools.
pip install funasr
python examples/mcp_server/funasr_mcp.py
# Set FUNASR_DEVICE=cuda for GPU inferenceStreaming workloads
Pair ASR with VAD, punctuation, and speaker diarization when partial transcripts need to be readable by humans.
Validate with real audio: background noise, long silence, overlapping speakers, and different microphone quality.
Benchmark before migration
Compare FunASR against Whisper or cloud ASR using your own sample set. Track throughput, CPU viability, download size, and deployment complexity together.
Model selection hints
For a deeper comparison of SenseVoice, Paraformer, Fun-ASR-Nano, streaming runtime, and OpenAI API aliases, use the model selection guide.
| Need | Good first choice | Notes |
|---|---|---|
| Fast multilingual transcription | SenseVoice-Small | Strong default for local demos and private APIs. |
| Mandarin production ASR | Paraformer-Large | Mature choice for Chinese speech recognition. |
| LLM-based ASR experiments | Fun-ASR-Nano | Pair with vLLM when throughput matters. |
| Speaker-aware transcripts | SenseVoice or Paraformer with spk_model="cam++" | Useful for meetings, interviews, and customer calls. |
| Live audio | Runtime WebSocket service | Validate chunking, VAD, and endpointing with real traffic. |