One command turns FunASR into a speech backend for any AI framework. OpenAI-compatible API, MCP server, self-hosted, 170x realtime.
Two lines to add speech recognition to your agent stack:
Drop-in replacement for /v1/audio/transcriptions. Works with LangChain, AutoGen, CrewAI, Dify, Flowise, Open WebUI — any framework using OpenAI audio API.
Add FunASR as a tool in Claude Code, Cursor, or Windsurf. Your AI assistant can transcribe any audio file directly.
Real-time voice input via WebSocket. 2Pass mode: instant partial results + high-accuracy final correction. For voice agents that need low latency.
Use FunASR directly as a Python function in your agent code. No server needed for single-process applications.
| Framework | Stars | Integration | Method |
|---|---|---|---|
| LangChain | 137K | OpenAI audio tool | Change base_url |
| Dify | 142K | STT provider | OpenAI-compatible endpoint |
| Open WebUI | 138K | Speech-to-text | OpenAI-compatible endpoint |
| AutoGen | 47K | Agent tool function | OpenAI SDK |
| Flowise | 30K | STT node | OpenAI-compatible endpoint |
| Claude / Cursor | — | Audio transcription tool | MCP Server |
| Pipecat | 12K | STT service | WebSocket / OpenAI |
| LiveKit Agents | — | STT plugin | WebSocket streaming |
| Model | Speed (GPU) | Speed (CPU) | Languages | Best For |
|---|---|---|---|---|
| sensevoice | 170x realtime | 17x realtime | zh/en/ja/ko/yue | General + emotion |
| paraformer | 120x realtime | 15x realtime | zh/en | Chinese production |
| fun-asr-nano | 17x realtime | 3.6x realtime | 31 languages | Multilingual + LLM |
Two commands. No config files. No Docker required.