Migrate from Whisper or Cloud ASR to FunASR

Use this guide when you already have a Whisper, OpenAI or cloud ASR, or custom speech pipeline and want to decide whether FunASR is worth switching to. Compare quality, speed, cost, and deployment fit on audio that looks like your real workload.

When it fits Evaluation plan Feature mapping Rollout

When FunASR is a good fit

Private or self-hosted transcription where audio should stay inside your environment.
High-throughput long-form transcription for meetings, archives, media, or call recordings.
Speaker-aware transcripts with VAD, punctuation, timestamps, and diarization in one pipeline.
An OpenAI-compatible audio endpoint for agents, Dify, LangChain, AutoGen, or internal apps.
Streaming ASR or live captions with WebSocket/runtime service support.
CPU-viable smoke tests before moving to GPU deployment.

Stay on your current pipeline if you need a fully managed service, a vendor SLA, or a language/domain that your own benchmark shows FunASR does not handle well enough yet.

Fast evaluation plan

Pick 20-50 representative audio files, including short clips, long recordings, noisy samples, different speakers, and target languages or dialects.
Run your current Whisper or cloud ASR pipeline exactly as you use it in production. Save transcripts, latency, cost, and failure cases.
Run FunASR locally with the tutorial, then choose a deployment path from the deployment matrix.
Compare output with human review or your normal WER/CER process. Do not compare only one clean demo file.
Run the migration benchmark example to write JSONL and Markdown summaries for your own audio folder.
Run the OpenAI-compatible API Python smoke test if your application already uses OpenAI-style clients.
Record warmup time, model download time, device, GPU/CPU type, batch size, and audio duration separately from steady-state throughput.

Feature mapping

Existing workflow	FunASR path	What to validate
Whisper file transcription	Tutorial · Model selection	Transcript quality, timestamps, speed, model download, CPU/GPU behavior.
Whisper plus pyannote	`spk_model="cam++"` with VAD and punctuation	Speaker labels, speaker changes, overlapping speech, long silences.
OpenAI audio API or cloud batch ASR	OpenAI-compatible API · Kubernetes template · JS/TS recipes	`/v1/audio/transcriptions`, response format, client compatibility, upload limits.
Dify/LangChain/AutoGen agent audio	Agent and API recipes or MCP server	Tool latency, file handling, auth boundary, error reporting.
Live captions or call-center streaming	Realtime examples	Chunking, endpointing, reconnects, backpressure, partial/final result behavior.
Subtitle generation	Subtitle generator	Segment readability, line length, speaker labels, SRT/VTT compatibility.
Offline archive processing	Batch ASR example	Manifest handling, retries, progress logs, throughput, failed-file recovery.

Minimal local comparison

Install FunASR and run the same file you used for your baseline. For a folder-level evaluation, use benchmark_funasr.py to generate results.jsonl and summary.md.

pip install funasr

from funasr import AutoModel

model = AutoModel(
    model="iic/SenseVoiceSmall",
    vad_model="fsmn-vad",
    spk_model="cam++",
    device="cuda",  # use "cpu" for a portable smoke test
)
result = model.generate(input="sample.wav")
print(result)

For a repeatable folder benchmark:

python examples/migration/benchmark_funasr.py \
  --input /path/to/audio_samples \
  --recursive \
  --model iic/SenseVoiceSmall \
  --device cuda \
  --spk-model cam++ \
  --output-dir outputs/funasr_migration_eval

For an API-style comparison:

pip install funasr fastapi uvicorn python-multipart
funasr-server --model sensevoice --device cuda

curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@sample.wav \
  -F model=sensevoice \
  -F response_format=verbose_json

Quality and speed checklist

Audio duration, language, domain, sample rate, channel count, and speaker count.
Model name, model version, FunASR version, Python/PyTorch/CUDA versions, and Docker image tag if used.
Hardware, device mode, batch size, streaming chunk size, and whether warmup/model download is excluded.
WER/CER or human review notes for names, numbers, punctuation, diarization, timestamps, and domain terms.
Latency, throughput, GPU/CPU memory, cost per hour of audio, and failed-file rate.
Operational requirements: authentication, upload limits, TLS, logs, monitoring, retries, and retention rules.

Rollout checklist

Keep the old pipeline available until FunASR passes your representative benchmark.
Start with an internal endpoint or batch job before exposing a public API.
Add request IDs and log audio duration, model, device, latency, and error type for every request.
Pin the model alias and deployment command in your runbook.
Test noisy audio, silence, overlapping speakers, long files, non-UTF-8 filenames, and network interruptions.
Open a Deployment Help issue with your command, logs, model, device, and sample characteristics if you hit a blocker.