MultiLoraTransformersModel

This model inherits from TransformersModel. In addition to providing the same functions, it also provides the ability to run multiple loras in time-sharing, mainly used for multi-tenant training.

class MultiLoraTransformersModel:

    def __init__(self,  # noqa
                 model_cls = AutoModelForCausalLM,
                 model_id: Optional[str] = None,
                 config: Optional[PretrainedConfig] = None,
                 device_mesh: Optional[DeviceMesh] = None,
                 mixed_precision: Literal['no', 'fp8', 'fp16', 'bf16'] = 'bf16',
                 grad_scaler_config: Dict[str, Any] = None,
                 max_loras: int = 5,
                 max_r: int = 32,
                 max_length: int = 8192,
                 **kwargs):
        ...

    ...

In addition to the same parameters as the base class, this class provides several additional parameters for multi-lora configuration:

max_loras: Maximum number of loras
max_r: Maximum lora rank
max_length: Maximum supported training length

The reason for the existence of max_loras and max_r parameters is that Twinkle’s multi-lora technical solution is to add loras to max_loras before DDP wrap to prevent later added loras from being unable to accept DDP management. Because of this, the user’s r must be less than or equal to the max_r configuration. During actual training, only part of the lora’s rank will be used in the calculation.

MultiLoraTransformersModel supports the @remote_class annotation and supports device_mesh, which means it can run in Ray workers.

Tenant Lifecycle

Under the hood, MultiLoraTransformersModel uses the MultiLora manager to handle tenant LoRA slots. The key APIs:

acquire_lora

Claim an available LoRA slot for a tenant:

adapter_name = model.multi_lora.acquire_lora('tenant_a', LoraConfig(r=16, lora_alpha=32))

Raises RuntimeError if all slots are in use or config.r > max_r

release_lora

Release a tenant’s LoRA slot, resetting weights to initial state:

model.multi_lora.release_lora('tenant_a')

Context Manager

Use adapter() for scoped activation:

with model.multi_lora.adapter('tenant_a') as name:
    output = model.forward(inputs)

LoraTenant

Each slot is tracked as a LoraTenant dataclass:

@dataclass
class LoraTenant:
    index: int                    # Slot index (0..max_loras-1)
    adapter_name: str             # Internal name (e.g. "lora_0")
    config: LoraConfig            # Pre-allocated config (max_r)
    tenant_adapter_name: str      # User-facing tenant name (None if free)
    tenant_config: LoraConfig     # Tenant's actual config (None if free)

← TransformersModel

MegatronModel →

No results found

Tenant Lifecycle

acquire_lora

release_lora

Context Manager

LoraTenant