Mainstream Models LLM · VLM · MoE
Runtime Modes Local · Ray · HTTP
Multi-Tenancy Parallel LoRA Training
Setup Time pip install & go
Twinkle is a client-server LLM training framework that separates what you train from how you train.
Write your training logic once with clean Python APIs. Then deploy it anywhere — locally with torchrun,
across Ray clusters, or as serverless Training-as-a-Service.
Built by the ms-swift team at ModelScope.

Native Twinkle API for full features, or Tinker-compatible API for easy migration
15+ components: Dataset, Template, Model, Sampler, Loss, Reward, Metric...
Transformers or Megatron — switch with one config change
Same code runs on your laptop and on thousand-GPU clusters. Switch from torchrun to Ray to HTTP deployment without changing your training logic.
Train N different LoRAs on one base model simultaneously. Each tenant gets isolated optimizer, data pipeline, and loss function — sharing only compute.
No hidden magic. See and control every forward, backward, and optimizer step. Debug freely, customize completely.
Built for production TaaS deployments with automated cluster management, dynamic scaling, and enterprise multi-tenant isolation.
SFT, pre-training, GRPO, GKD, and more. Dense models and MoE architectures. Full FSDP, tensor parallelism, pipeline parallelism support.
Qwen 3.5/3/2.5, DeepSeek R1/V2, GLM-4, InternLM2, and more. Both Hugging Face and ModelScope model hubs.
import twinkle
from peft import LoraConfig
from twinkle import DeviceGroup
from twinkle.dataloader import DataLoader
from twinkle.dataset import Dataset, DatasetMeta
from twinkle.model import TransformersModel
# Choose your runtime: 'local' (torchrun), 'ray', or 'http'
twinkle.initialize(mode='ray', groups=[DeviceGroup(name='default', ranks=8)])
# Prepare data — works with ModelScope and Hugging Face
dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition'))
dataset.set_template('Template', model_id='ms://Qwen/Qwen3.5-4B')
dataset.encode()
# Create model with LoRA
model = TransformersModel(model_id='ms://Qwen/Qwen3.5-4B', remote_group='default')
model.add_adapter_to_model('default', LoraConfig(r=8, lora_alpha=32))
model.set_optimizer(optimizer_cls='AdamW', lr=1e-4)
# Train — you control the loop
for batch in DataLoader(dataset=dataset, batch_size=8):
model.forward_backward(inputs=batch)
model.clip_grad_and_step()
model.save('my-finetuned-model')

Run completely different training jobs on a shared deployment:
| Tenant | Setup | Task |
|---|---|---|
| A | LoRA r=8, private data | SFT fine-tuning |
| B | LoRA r=32, Hub dataset | Continued pre-training |
| C | GRPO loss + Sampler | Reinforcement learning |
| D | Inference mode | Log-prob computation |
Each tenant is fully isolated — different optimizers, data pipelines, loss functions. They only share the base model’s compute. Checkpoints auto-sync to ModelScope or Hugging Face.
Works with mainstream LLMs · NVIDIA · Ascend NPU · SFT / PT / GRPO / GKD