Model Registration Guide

How to add new models to FunASR using the registry system.

Architecture Overview

FunASR uses a registry pattern to decouple model discovery from model implementation. The core flow is:

User: AutoModel(model="ModelName")
  → download_model(): fetch from ModelScope/HuggingFace, read config.yaml
  → tables.model_classes["ModelName"]: lookup registered class
  → model_class(**config): instantiate model
  → load_pretrained_model(): load weights from model.pt
  → model.eval().to(device): ready for inference

Registry Categories

CategoryPurposeExample
model_classesASR/VAD/PUNC/SPK modelsParaformer, SenseVoiceSmall, FsmnVADStreaming
frontend_classesAudio feature extractionWavFrontend, WavFrontendOnline
tokenizer_classesText tokenizationSentencepiecesTokenizer, CharTokenizer
encoder_classesEncoder modulesSANMEncoder, ConformerEncoder
decoder_classesDecoder modulesParaformerSANMDecoder
dataset_classesTraining datasetsAudioDataset, SenseVoiceCTCDataset
batch_sampler_classesBatch sampling strategiesDynamicBatchLocalShuffleSampler

Viewing the Registry

from funasr.register import tables

# Print all registered classes
tables.print()

# Print specific category
tables.print("model")

Step-by-Step: Register a New Model

1Create Model Directory

Create a new directory under funasr/models/your_model/:

funasr/models/your_model/
├── __init__.py      # empty
├── model.py         # main model class
└── (other files)    # encoder, decoder, utils, etc.
Each model must be self-contained. Do NOT import from other model directories. Do NOT use class inheritance across models.

2Implement the Model Class

Your model class must implement 3 methods: __init__, forward, inference.

import torch.nn as nn
from funasr.register import tables

@tables.register("model_classes", "YourModelName")
class YourModel(nn.Module):

    def __init__(self, **kwargs):
        super().__init__()
        # Build your model architecture
        # kwargs contains everything from config.yaml + runtime params

    def forward(self, speech, speech_lengths, text, text_lengths, **kwargs):
        # Training forward pass
        # Return: (loss, stats_dict, weight)
        ...

    def inference(self, data_in, data_lengths=None, key=None,
                  tokenizer=None, frontend=None, **kwargs):
        # Inference: process audio, return results
        # data_in: list of audio (numpy/tensor/path)
        # Return: ([{"key": ..., "text": ..., ...}], meta_data_dict)
        ...

3Create config.yaml

This defines model architecture and all components:

# config.yaml
model: YourModelName          # must match @tables.register key
model_conf:
    hidden_size: 512
    num_layers: 6

frontend: WavFrontend          # reuse existing frontend
frontend_conf:
    fs: 16000
    n_mels: 80
    frame_length: 25
    frame_shift: 10
    cmvn_file: null

tokenizer: SentencepiecesTokenizer
tokenizer_conf:
    bpemodel: null

# Training config (optional)
dataset: AudioDataset
dataset_conf:
    batch_size: 32
    batch_type: example

4Create configuration.json (for Hub distribution)

This file resolves relative paths when model is downloaded from Hub:

{
  "framework": "pytorch",
  "task": "auto-speech-recognition",
  "model": {"type": "funasr"},
  "file_path_metas": {
    "init_param": "model.pt",
    "config": "config.yaml",
    "tokenizer_conf": {"bpemodel": "tokenizer.model"},
    "frontend_conf": {"cmvn_file": "am.mvn"}
  }
}
file_path_metas maps config fields to filenames. AutoModel prepends the model download directory to each path automatically.

5Upload to Hub

Upload to ModelScope or HuggingFace with this directory structure:

your-model-repo/
├── config.yaml
├── configuration.json
├── model.pt               # trained weights
├── am.mvn                 # CMVN file (if needed)
├── tokenizer.model        # BPE model (if needed)
└── example/
    └── test.wav           # example audio for demo

6Test

from funasr import AutoModel

# From Hub
model = AutoModel(model="your-org/your-model", device="cuda:0")
res = model.generate(input="test.wav")
print(res)

# From local path
model = AutoModel(model="/path/to/your-model-repo", device="cuda:0")
res = model.generate(input="test.wav")
print(res)

Inference Method Contract

The inference() method is called by AutoModel. It must follow this contract:

Input

ParameterTypeDescription
data_inlistBatch of audio data (numpy arrays, tensors, or file paths)
data_lengthstensor/NoneLength of each sample (optional)
keylistIdentifier for each sample
tokenizerobjectTokenizer instance from config
frontendobjectFrontend instance from config
**kwargsdictAll config params + user params from generate()

Output

Must return a tuple: (results_list, meta_data_dict)

# results_list: list of dicts, one per sample
[
    {"key": "sample_id", "text": "recognized text", "timestamp": [[0, 100], ...]},
    ...
]

# meta_data_dict: timing info for RTF calculation
{"batch_data_time": 5.5, "load_data": "0.01", "extract_feat": "0.02"}

Standalone Repository (Remote Code)

For code secrecy or independent release, your model can live in a separate repo:

from funasr import AutoModel

# trust_remote_code=True loads model class from remote_code path
model = AutoModel(
    model="your-org/your-model",
    trust_remote_code=True,
    remote_code="./model.py",   # local or URL
    hub="hf",
)
res = model.generate(input="audio.wav")
With trust_remote_code=True, FunASR dynamically loads the model class from the specified file. The model does NOT need to be integrated into the FunASR source tree.

Direct Inference (without AutoModel)

from model import YourModel

m, kwargs = YourModel.from_pretrained(model="your-org/your-model")
m.eval()
res = m.inference(data_in=["audio.wav"], **kwargs)
print(res)

Real Examples

Integrated Model (in FunASR source tree)

ModelCodeConfig
Paraformerfunasr/models/paraformer/model.pyconfig.yaml in Hub
SenseVoicefunasr/models/sense_voice/model.pyconfig.yaml in Hub
FSMN-VADfunasr/models/fsmn_vad_streaming/model.pyconfig.yaml in Hub
CAM++funasr/models/campplus/model.pyconfig.yaml in Hub
Qwen3-ASRfunasr/models/qwen3_asr/model.pyUses qwen-asr package

Standalone Model (separate repo)

ModelRepoUsage
Fun-ASR-NanoFunAudioLLM/Fun-ASRtrust_remote_code=True, remote_code="./model.py"
SenseVoice (standalone)FunAudioLLM/SenseVoicetrust_remote_code=True, remote_code="./model.py"

Key Rules

Troubleshooting

"ModelName is not registered"

The model file was not imported. Debug by importing directly:

from funasr.models.your_model.model import *
# If this fails, fix the import error first

"module 'xxx' not found"

Missing dependency. Add it to your requirements or install it.

Weights mismatch warnings

Warning, miss key in ckpt: ... means your model class defines layers that aren't in the checkpoint. This is OK if those layers are optional (e.g., CTC decoder not in open-source weights).