Agent 連携

音声認識を Agent、プロダクト、デスクトップワークフローのローカル基盤として利用できます。OpenAI 互換 API、MCP server、音声入力、字幕生成に対応します。

OpenAI API SDK / curl MCP Server 音声入力字幕生成

OpenAI 互換 API Server

funasr-server exposes /v1/audio/transcriptions, /v1/models, /health, and Swagger docs at /docs. It works with frameworks that already know the OpenAI audio API.

pip install funasr fastapi uvicorn python-multipart
funasr-server --device cuda --port 8000

# CPU fallback
funasr-server --device cpu --model sensevoice --port 8000

Model alias	Backend	Best for
`sensevoice`	`iic/SenseVoiceSmall`	Fast multilingual ASR with language/emotion/event tags.
`paraformer`	`paraformer-zh` + VAD + punctuation	Chinese production transcription.
`paraformer-en`	`paraformer-en` + VAD	English transcription.
`fun-asr-nano`	`FunAudioLLM/Fun-ASR-Nano-2512`	31-language LLM-based ASR with timestamps.

日本語の導入手順は OpenAI API 日本語クイックスタートを参照してください。ローカルで動作確認する場合は Python smoke test も利用できます。

OpenAI SDK と curl

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
result = client.audio.transcriptions.create(
    model="sensevoice",
    file=open("meeting.wav", "rb"),
)
print(result.text)

verbose = client.audio.transcriptions.create(
    model="sensevoice",
    file=open("meeting.wav", "rb"),
    response_format="verbose_json",
)
print(verbose.segments)

curl http://localhost:8000/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F model=sensevoice \
  -F response_format=verbose_json

MCP Server

MCP server は Claude Code、Claude Desktop、Cursor、Windsurf などの MCP client にローカル transcribe_audio tool を提供します。

pip install funasr
python examples/mcp_server/funasr_mcp.py

{
  "mcpServers": {
    "funasr": {
      "command": "python",
      "args": ["/path/to/examples/mcp_server/funasr_mcp.py"],
      "env": {"FUNASR_DEVICE": "cuda"}
    }
  }
}

Variable	Default	Description
`FUNASR_DEVICE`	`cpu`	`cuda`, `cpu`, or `mps`.
`FUNASR_MODEL`	`iic/SenseVoiceSmall`	ASR model used by the MCP tool.

Desktop 音声入力

音声入力サンプルはマイクから録音し、funasr-server に送信して、認識結果を現在のカーソル位置へ貼り付けます。

pip install funasr sounddevice numpy pyperclip openai pynput
funasr-server --device cuda

cd examples/voice_input
python funasr_input.py --server http://localhost:8000/v1 --model sensevoice

Platform	Recording	Paste
macOS	Yes	AppleScript
Linux	Yes	xdotool
Windows	Yes	Manual Ctrl+V if needed

字幕生成

字幕サンプルは音声または動画ファイルを SRT/VTT に変換し、必要に応じて話者ラベルを付けられます。

cd examples/subtitle
python generate_subtitle.py video.mp4
python generate_subtitle.py meeting.wav --spk
python generate_subtitle.py podcast.mp3 --format vtt
python generate_subtitle.py audio.wav --device cpu

Option	Default	Description
`--format`	`srt`	`srt` or `vtt`.
`--model`	`iic/SenseVoiceSmall`	ASR model.
`--spk`	off	Add speaker labels.
`--lang`	`auto`	Language hint.