All

Mainstream Models LLM · VLM · MoE

3

Runtime Modes Local · Ray · HTTP

Multi-Tenancy Parallel LoRA Training

<5min

Setup Time pip install & go

What is Twinkle?

Twinkle is a client-server LLM training framework that separates what you train from how you train.

Write your training logic once with clean Python APIs. Then deploy it anywhere — locally with torchrun, across Ray clusters, or as serverless Training-as-a-Service.

Built by the ms-swift team at ModelScope.

Twinkle Architecture

🔌 Dual API

Native Twinkle API for full features, or Tinker-compatible API for easy migration

🧩 Modular

15+ components: Dataset, Template, Model, Sampler, Loss, Reward, Metric...

🔀 Backend Agnostic

Transformers or Megatron — switch with one config change

Why Twinkle?

Scale Without Rewriting

Same code runs on your laptop and on thousand-GPU clusters. Switch from torchrun to Ray to HTTP deployment without changing your training logic.

Multi-Tenancy Built-In

Train N different LoRAs on one base model simultaneously. Each tenant gets isolated optimizer, data pipeline, and loss function — sharing only compute.

You Own the Loop

No hidden magic. See and control every forward, backward, and optimizer step. Debug freely, customize completely.

Training as a Service

Built for production TaaS deployments with automated cluster management, dynamic scaling, and enterprise multi-tenant isolation.

All Training Methods

SFT, pre-training, GRPO, GKD, and more. Dense models and MoE architectures. Full FSDP, tensor parallelism, pipeline parallelism support.

Broad Model Support

Qwen 3.5/3/2.5, DeepSeek R1/V2, GLM-4, InternLM2, and more. Both Hugging Face and ModelScope model hubs.

Train in 20 Lines

import twinkle
from peft import LoraConfig
from twinkle import DeviceGroup
from twinkle.dataloader import DataLoader
from twinkle.dataset import Dataset, DatasetMeta
from twinkle.model import TransformersModel

# Choose your runtime: 'local' (torchrun), 'ray', or 'http'
twinkle.initialize(mode='ray', groups=[DeviceGroup(name='default', ranks=8)])

# Prepare data — works with ModelScope and Hugging Face
dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition'))
dataset.set_template('Template', model_id='ms://Qwen/Qwen3.5-4B')
dataset.encode()

# Create model with LoRA
model = TransformersModel(model_id='ms://Qwen/Qwen3.5-4B', remote_group='default')
model.add_adapter_to_model('default', LoraConfig(r=8, lora_alpha=32))
model.set_optimizer(optimizer_cls='AdamW', lr=1e-4)

# Train — you control the loop
for batch in DataLoader(dataset=dataset, batch_size=8):
    model.forward_backward(inputs=batch)
    model.clip_grad_and_step()

model.save('my-finetuned-model')

Multi-Tenancy: N Jobs, 1 Base Model

Multi-Tenancy

Run completely different training jobs on a shared deployment:

TenantSetupTask
ALoRA r=8, private dataSFT fine-tuning
BLoRA r=32, Hub datasetContinued pre-training
CGRPO loss + SamplerReinforcement learning
DInference modeLog-prob computation

Each tenant is fully isolated — different optimizers, data pipelines, loss functions. They only share the base model’s compute. Checkpoints auto-sync to ModelScope or Hugging Face.

Supported Models

Qwen 3.5 Qwen MoE DeepSeek R1 GLM-4 InternLM2

Works with mainstream LLMs · NVIDIA · Ascend NPU · SFT / PT / GRPO / GKD

Ready to make your model glow?

Install Twinkle and start training in under 5 minutes.