MultiLoraTransformersModel
This model inherits from TransformersModel. In addition to providing the same functions, it also provides the ability to run multiple loras in time-sharing, mainly used for multi-tenant training.
class MultiLoraTransformersModel:
def __init__(self, # noqa
model_cls = AutoModelForCausalLM,
model_id: Optional[str] = None,
config: Optional[PretrainedConfig] = None,
device_mesh: Optional[DeviceMesh] = None,
mixed_precision: Literal['no', 'fp8', 'fp16', 'bf16'] = 'bf16',
grad_scaler_config: Dict[str, Any] = None,
max_loras: int = 5,
max_r: int = 32,
max_length: int = 8192,
**kwargs):
...
...
In addition to the same parameters as the base class, this class provides several additional parameters for multi-lora configuration:
- max_loras: Maximum number of loras
- max_r: Maximum lora rank
- max_length: Maximum supported training length
The reason for the existence of max_loras and max_r parameters is that Twinkle’s multi-lora technical solution is to add loras to max_loras before DDP wrap to prevent later added loras from being unable to accept DDP management.
Because of this, the user’s r must be less than or equal to the max_r configuration. During actual training, only part of the lora’s rank will be used in the calculation.
MultiLoraTransformersModel supports the @remote_class annotation and supports device_mesh, which means it can run in Ray workers.
Tenant Lifecycle
Under the hood, MultiLoraTransformersModel uses the MultiLora manager to handle tenant LoRA slots. The key APIs:
acquire_lora
Claim an available LoRA slot for a tenant:
adapter_name = model.multi_lora.acquire_lora('tenant_a', LoraConfig(r=16, lora_alpha=32))
- Raises
RuntimeErrorif all slots are in use orconfig.r > max_r
release_lora
Release a tenant’s LoRA slot, resetting weights to initial state:
model.multi_lora.release_lora('tenant_a')
Context Manager
Use adapter() for scoped activation:
with model.multi_lora.adapter('tenant_a') as name:
output = model.forward(inputs)
LoraTenant
Each slot is tracked as a LoraTenant dataclass:
@dataclass
class LoraTenant:
index: int # Slot index (0..max_loras-1)
adapter_name: str # Internal name (e.g. "lora_0")
config: LoraConfig # Pre-allocated config (max_r)
tenant_adapter_name: str # User-facing tenant name (None if free)
tenant_config: LoraConfig # Tenant's actual config (None if free)