Benchmark

장문 ASR workload의 속도와 정확도 측정 결과입니다. 핵심 결과: production transcription pipeline에서 FunASR CPU inference가 Whisper GPU inference보다 빠를 수 있습니다.

요약 결과 측정 방법 선택 기준

요약

항목	결과
Dataset	184개의 중국어 장문 오디오, 총 11,539초, 192.3분.
GPU	NVIDIA H100 80GB HBM3.
최고 GPU 속도	SenseVoice-Small: full benchmark에서 169.6x realtime, initial run에서 211.8x.
최고 CPU 속도	SenseVoice-Small: 17.2x realtime; Paraformer-Large: 15.6x realtime.
Baseline	OpenAI Whisper-large-v3: GPU에서 13.4x realtime.

결과

Model	Device	RTF	Speed	CER	Notes
SenseVoice-Small	GPU	0.005896	169.6x	7.81%	ASR + language / emotion / event tags; tag 제거 후 CER 계산.
Paraformer-Large	GPU	0.008359	119.6x	10.18%	VAD/punctuation pipeline과 잘 맞는 빠른 non-autoregressive 중국어 ASR.
Fun-ASR-Nano	GPU	0.058803	17.0x	8.06%	중국어·영어·일본어, 7개 중국어 방언군, 26개 지역 억양을 지원하는 LLM-based ASR. hotword 지원. 신뢰할 수 있는 checkpoint-native timestamp는 미지원 (#106).
GLM-ASR-Nano	GPU	0.026974	37.1x	31.07%	LLM-based multilingual ASR.
Whisper-large-v3-turbo (OpenAI)	GPU	0.021708	46.1x	21.71%	OpenAI Whisper implementation.
Whisper-large-v3 (OpenAI)	GPU	0.074694	13.4x	20.02%	large Whisper quality 기준 baseline.
SenseVoice-Small	CPU	0.057988	17.2x	7.81%	remaining benchmark script에서 수집한 CPU run.
Paraformer-Large	CPU	0.064056	15.6x	10.18%	CPU batch job에도 활용 가능.
Fun-ASR-Nano	CPU	0.274318	3.6x	8.06%	LLM-based model은 더 무겁지만 realtime보다 빠릅니다.

측정 방법

workspace의 benchmark script로 184개 오디오 파일에서 측정했습니다. RTF는 total inference time / total audio duration, speed는 1 / RTF입니다. CER는 SenseVoice tag처럼 model-specific output을 정리한 뒤 계산합니다.

python benchmark/run_full_benchmark.py
python benchmark/run_remaining.py
python benchmark/fix_sensevoice_cer.py

이 수치는 universal leaderboard가 아니라 practical guidance입니다. hardware, batch size, audio length, decoding option, text normalization에 따라 결과가 달라집니다.

선택 기준

필요한 것	추천 model
가장 빠른 production transcription	SenseVoice-Small 또는 Paraformer-Large.
CPU batch transcription	먼저 SenseVoice-Small; 중국어 production pipeline은 Paraformer-Large.
중국어·영어·일본어 및 중국어 방언/억양 LLM-style recognition	Fun-ASR-Nano. 31개 언어는 별도 checkpoint인 Fun-ASR-MLT-Nano를 사용하고, LLM decoding throughput이 중요하면 vLLM.
OpenAI 호환 local endpoint	funasr-server와 model alias `sensevoice`, `paraformer`, `fun-asr-nano`.