Model Registration Guide

How to add new models to FunASR using the registry system.

Architecture Overview

FunASR uses a registry pattern to decouple model discovery from model implementation. The core flow is:

User: AutoModel(model="ModelName")
  → download_model(): fetch from ModelScope/HuggingFace, read config.yaml
  → tables.model_classes["ModelName"]: lookup registered class
  → model_class(**config): instantiate model
  → load_pretrained_model(): load weights from model.pt
  → model.eval().to(device): ready for inference

Registry Categories

Category	Purpose	Example
`model_classes`	ASR/VAD/PUNC/SPK models	Paraformer, SenseVoiceSmall, FsmnVADStreaming
`frontend_classes`	Audio feature extraction	WavFrontend, WavFrontendOnline
`tokenizer_classes`	Text tokenization	SentencepiecesTokenizer, CharTokenizer
`encoder_classes`	Encoder modules	SANMEncoder, ConformerEncoder
`decoder_classes`	Decoder modules	ParaformerSANMDecoder
`dataset_classes`	Training datasets	AudioDataset, SenseVoiceCTCDataset
`batch_sampler_classes`	Batch sampling strategies	DynamicBatchLocalShuffleSampler

Viewing the Registry

from funasr.register import tables

# Print all registered classes
tables.print()

# Print specific category
tables.print("model")

Step-by-Step: Register a New Model

1Create Model Directory

Create a new directory under funasr/models/your_model/:

funasr/models/your_model/
├── __init__.py      # empty
├── model.py         # main model class
└── (other files)    # encoder, decoder, utils, etc.

Each model must be self-contained. Do NOT import from other model directories. Do NOT use class inheritance across models.

2Implement the Model Class

Your model class must implement 3 methods: __init__, forward, inference.

import torch.nn as nn
from funasr.register import tables

@tables.register("model_classes", "YourModelName")
class YourModel(nn.Module):

    def __init__(self, **kwargs):
        super().__init__()
        # Build your model architecture
        # kwargs contains everything from config.yaml + runtime params

    def forward(self, speech, speech_lengths, text, text_lengths, **kwargs):
        # Training forward pass
        # Return: (loss, stats_dict, weight)
        ...

    def inference(self, data_in, data_lengths=None, key=None,
                  tokenizer=None, frontend=None, **kwargs):
        # Inference: process audio, return results
        # data_in: list of audio (numpy/tensor/path)
        # Return: ([{"key": ..., "text": ..., ...}], meta_data_dict)
        ...

3Create config.yaml

This defines model architecture and all components:

# config.yaml
model: YourModelName          # must match @tables.register key
model_conf:
    hidden_size: 512
    num_layers: 6

frontend: WavFrontend          # reuse existing frontend
frontend_conf:
    fs: 16000
    n_mels: 80
    frame_length: 25
    frame_shift: 10
    cmvn_file: null

tokenizer: SentencepiecesTokenizer
tokenizer_conf:
    bpemodel: null

# Training config (optional)
dataset: AudioDataset
dataset_conf:
    batch_size: 32
    batch_type: example

4Create configuration.json (for Hub distribution)

This file resolves relative paths when model is downloaded from Hub:

{
  "framework": "pytorch",
  "task": "auto-speech-recognition",
  "model": {"type": "funasr"},
  "file_path_metas": {
    "init_param": "model.pt",
    "config": "config.yaml",
    "tokenizer_conf": {"bpemodel": "tokenizer.model"},
    "frontend_conf": {"cmvn_file": "am.mvn"}
  }
}

file_path_metas maps config fields to filenames. AutoModel prepends the model download directory to each path automatically.

5Upload to Hub

Upload to ModelScope or HuggingFace with this directory structure:

your-model-repo/
├── config.yaml
├── configuration.json
├── model.pt               # trained weights
├── am.mvn                 # CMVN file (if needed)
├── tokenizer.model        # BPE model (if needed)
└── example/
    └── test.wav           # example audio for demo

6Test

from funasr import AutoModel

# From Hub
model = AutoModel(model="your-org/your-model", device="cuda:0")
res = model.generate(input="test.wav")
print(res)

# From local path
model = AutoModel(model="/path/to/your-model-repo", device="cuda:0")
res = model.generate(input="test.wav")
print(res)

Inference Method Contract

The inference() method is called by AutoModel. It must follow this contract:

Input

Parameter	Type	Description
`data_in`	list	Batch of audio data (numpy arrays, tensors, or file paths)
`data_lengths`	tensor/None	Length of each sample (optional)
`key`	list	Identifier for each sample
`tokenizer`	object	Tokenizer instance from config
`frontend`	object	Frontend instance from config
`**kwargs`	dict	All config params + user params from generate()

Output

Must return a tuple: (results_list, meta_data_dict)

# results_list: list of dicts, one per sample
[
    {"key": "sample_id", "text": "recognized text", "timestamp": [[0, 100], ...]},
    ...
]

# meta_data_dict: timing info for RTF calculation
{"batch_data_time": 5.5, "load_data": "0.01", "extract_feat": "0.02"}

Standalone Repository (Remote Code)

For code secrecy or independent release, your model can live in a separate repo:

from funasr import AutoModel

# trust_remote_code=True loads model class from remote_code path
model = AutoModel(
    model="your-org/your-model",
    trust_remote_code=True,
    remote_code="./model.py",   # local or URL
    hub="hf",
)
res = model.generate(input="audio.wav")

With trust_remote_code=True, FunASR dynamically loads the model class from the specified file. The model does NOT need to be integrated into the FunASR source tree.

Direct Inference (without AutoModel)

from model import YourModel

m, kwargs = YourModel.from_pretrained(model="your-org/your-model")
m.eval()
res = m.inference(data_in=["audio.wav"], **kwargs)
print(res)

Real Examples

Integrated Model (in FunASR source tree)

Model	Code	Config
Paraformer	funasr/models/paraformer/model.py	config.yaml in Hub
SenseVoice	funasr/models/sense_voice/model.py	config.yaml in Hub
FSMN-VAD	funasr/models/fsmn_vad_streaming/model.py	config.yaml in Hub
CAM++	funasr/models/campplus/model.py	config.yaml in Hub
Qwen3-ASR	funasr/models/qwen3_asr/model.py	Uses qwen-asr package

Standalone Model (separate repo)

Model	Repo	Usage
Fun-ASR-Nano	FunAudioLLM/Fun-ASR	`trust_remote_code=True, remote_code="./model.py"`
SenseVoice (standalone)	FunAudioLLM/SenseVoice	`trust_remote_code=True, remote_code="./model.py"`

Key Rules

Model isolation: Each model in its own directory. No cross-model imports.
Reuse shared components: Frontend, tokenizer, dataset — use existing registered ones when possible.
Don't modify existing models: Register new ones instead.
Always call super().__init__(): Required for PyTorch nn.Module.
inference() must return tuple: (results_list, meta_data)
Support batch_size=1: At minimum, handle single-sample inference.

Troubleshooting

"ModelName is not registered"

The model file was not imported. Debug by importing directly:

from funasr.models.your_model.model import *
# If this fails, fix the import error first

"module 'xxx' not found"

Missing dependency. Add it to your requirements or install it.

Weights mismatch warnings

Warning, miss key in ckpt: ... means your model class defines layers that aren't in the checkpoint. This is OK if those layers are optional (e.g., CTC decoder not in open-source weights).