Quick Start

Quick Start

Get Twinkle up and running in minutes.

What is Twinkle?

Twinkle is a production-oriented large model training framework with a component-based architecture:

  • Loosely Coupled Architecture · Standardized Interfaces
  • Multiple Runtime Modes · torchrun / Ray / HTTP
  • Multi-Framework Compatible · Transformers / Megatron
  • Multi-Tenant Support · Single Base Model Deployment

When to Choose Twinkle

  • You want to understand model mechanisms and training methods
  • You are a researcher who wants to customize models or training algorithms
  • You prefer writing explicit training loops
  • You need to build enterprise-level training platforms

When to Choose ms-swift

  • You don’t care about the training process, just want to provide data
  • You need more model support and dataset varieties
  • You need inference, deployment, quantization capabilities
  • You need day-0 support for new models

Prerequisites

ComponentRequirementNotes
Python>= 3.11, < 3.133.11 recommended
PyTorch>= 2.02.7.1 for NPU
GPUA10/A100/H100/RTXT4/V100 limited support
NPUAscend 910 seriesOptional

Installation

Install Twinkle

# Install from PyPI
pip install 'twinkle-kit'

# Or install from source
git clone https://github.com/modelscope/twinkle.git
cd twinkle
pip install -e .

Install Client (Optional)

If you need to use Twinkle’s Client for remote training:

# Mac or Linux
sh INSTALL_CLIENT.sh

# Windows (PowerShell)
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
.\INSTALL_CLIENT.ps1

Install Megatron Dependencies (Optional)

For ultra-large-scale model training with Megatron backend:

sh INSTALL_MEGATRON.sh

Your First Training

Single GPU Training

from peft import LoraConfig
from twinkle import get_device_placement, get_logger
from twinkle.dataloader import DataLoader
from twinkle.dataset import Dataset, DatasetMeta
from twinkle.model import TransformersModel
from twinkle.preprocessor import SelfCognitionProcessor

logger = get_logger()

def train():
    # Load dataset (1000 samples)
    dataset = Dataset(dataset_meta=DatasetMeta(
        'ms://swift/self-cognition', 
        data_slice=range(1000)
    ))
    # Set template for encoding
    dataset.set_template('Qwen3_5Template', model_id='ms://Qwen/Qwen3.5-4B')
    # Preprocess to standard format
    dataset.map(SelfCognitionProcessor('Twinkle LLM', 'ModelScope'))
    # Encode dataset
    dataset.encode()
    
    # Create dataloader (global batch size = 8)
    dataloader = DataLoader(dataset=dataset, batch_size=8)
    
    # Load model
    model = TransformersModel(model_id='ms://Qwen/Qwen3.5-4B')
    
    # Add LoRA adapter
    lora_config = LoraConfig(r=8, lora_alpha=32, target_modules='all-linear')
    model.add_adapter_to_model('default', lora_config, gradient_accumulation_steps=2)
    
    # Set optimizer and scheduler
    model.set_optimizer(optimizer_cls='AdamW', lr=1e-4)
    model.set_lr_scheduler(
        scheduler_cls='CosineWarmupScheduler',
        num_warmup_steps=5,
        num_training_steps=len(dataloader)
    )
    
    # Training loop
    for step, batch in enumerate(dataloader):
        model.forward_backward(inputs=batch)
        model.clip_grad_and_step()
        if step % 20 == 0:
            metric = model.calculate_metric(is_training=True)
            logger.info(f'Step {step}/{len(dataloader)}, metric: {metric}')
    
    model.save('last-checkpoint')

if __name__ == '__main__':
    train()

Multi-GPU Training (8 GPUs)

Just add initialization with DeviceMesh:

import twinkle
from twinkle import DeviceMesh

# Build device_mesh: fsdp=4, dp=2, using 8 GPUs
device_mesh = DeviceMesh.from_sizes(fsdp_size=4, dp_size=2)
twinkle.initialize(mode='local', global_device_mesh=device_mesh)

# ... rest of training code remains the same

Run with torchrun:

torchrun --nproc_per_node=8 train.py

Using Only Partial Components

You can use only a portion of Twinkle’s components with your existing code:

from twinkle.dataset import PackingDataset, DatasetMeta
from twinkle.dataloader import DataLoader
from twinkle.preprocessor import SelfCognitionProcessor

dataset_meta = DatasetMeta(
    dataset_id='ms://swift/self-cognition',
)

dataset = PackingDataset(dataset_meta)
dataset.map(SelfCognitionProcessor(
    model_name='Twinkle Model', 
    model_author='ModelScope Community'
))
dataset.set_template('Qwen3_5Template', model_id='ms://Qwen/Qwen3.5-4B', max_length=512)
dataset.encode()
dataset.pack_dataset()

dataloader = DataLoader(dataset, batch_size=8)

for data in dataloader:
    # Use with your custom training code
    print(data.keys())  # input_ids, position_ids, ...
    break

After Deployment: OpenAI-Compatible API

After deploying your model with Twinkle Server, you get an OpenAI-compatible API out of the box. Any OpenAI SDK or tool can directly call your model for inference:

# Start the server
twinkle-server launch -c server_config.yaml

For details on writing server_config.yaml, see Server & Client Guide.

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:8000/api/v1',
    api_key='your-token',
)

response = client.chat.completions.create(
    model='Qwen/Qwen3.5-4B',
    messages=[{'role': 'user', 'content': 'Hello!'}],
    temperature=0.7,
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content, end='')

Supported endpoints:

EndpointMethodDescription
/chat/completionsPOSTChat completions (streaming & non-streaming)
/modelsGETList available models

Features include full streaming support (SSE), sticky session routing for adapter isolation, automatic chat template initialization, and adapter-to-base-model resolution.

Auto Research: Train with Natural Language

Auto Research is Twinkle’s built-in LLM agent terminal that autonomously handles the entire training workflow through natural language — from cluster setup, script generation, and training launch to error diagnosis and auto-fix, without writing any shell commands:

# Install the client
pip install twinkle-client

# Launch Auto Research with a local LLM
twinkle-tui --llm-base-url http://localhost:11434/v1 --llm-model qwen3.5

# Or with a remote API
twinkle-tui --llm-base-url https://api.example.com/v1 --llm-api-key sk-xxx --llm-model gpt-4o

What you can do:

  • “Start a GRPO training with Qwen3.5-4B on gsm8k” — auto-generates scripts and launches training
  • “How is the training going?” — real-time metrics and status monitoring
  • “Show me the reward metrics, zoom into steps 100-200” — interactive chart visualization
  • “Search for math datasets on ModelScope” — model and dataset discovery

Key capabilities:

FeatureDescription
Training lifecycleStart, pause, resume, stop with checkpoint saving
Server managementAuto GPU partitioning, Ray cluster setup, health checks
Auto-fixDetects crashes, diagnoses errors, rewrites scripts, and restarts (up to 3 attempts)
Real-time monitoringASCII metrics charts, log streaming, health checks every 30s
Skills systemExtensible plugin architecture (bundled + local + community)

Auto Research turns ML training into a conversation — describe what you want to train, and the agent handles everything from server setup to troubleshooting.

Supported Hardware

HardwareNotes
GPU A10/A100/H100/RTXFull support
GPU T4/V100No bfloat16, no Flash-Attention
Ascend NPUSome operators not supported
PPUSupported
CPUPartial components (dataset, dataloader)

Next Steps

docs