Architecture

Twinkle features a decoupled Client-Server architecture designed for maximum flexibility and scalability.

System Overview

Twinkle Architecture
Twinkle client-server architecture diagram

Core Design

Twinkle’s architecture is built around three fundamental principles:

  1. Decoupled Components — Each component (Dataset, Model, Sampler, Loss) operates independently with standardized interfaces
  2. Unified API — The same training code runs across different backends (torchrun, Ray, HTTP) with minimal changes
  3. Multi-Tenancy — Multiple users can train on a shared base model simultaneously with isolated configurations

Client-Server Model

The client-side provides two distinct integration paths:

  • Twinkle Native API — A conforming API that mirrors the server-side interface for seamless end-to-end integration
  • Tinker Compatibility — Full support for the native Tinker API, enabling developers to leverage Twinkle’s backend using Tinker client

This dual-path design ensures access to Twinkle’s training services using either API.

Core Components

ComponentDescription
DatasetData loading and preprocessing with ModelScope/HuggingFace integration
TemplateEncoding and decoding for different model architectures
DataLoaderData distribution and batching with device mesh awareness
PreprocessorData ETL transformations to standard format
ModelLarge model wrapper supporting Transformers and Megatron
SamplerSampling logic (e.g., vLLM-based) for RL training
LossCustomizable loss functions (CrossEntropy, GRPO, etc.)
RewardReward functions for reinforcement learning
AdvantageAdvantage estimation for policy optimization

DeviceGroup and DeviceMesh

DeviceGroup and DeviceMesh are the core of Twinkle’s distributed architecture:

import twinkle
from twinkle import DeviceMesh, DeviceGroup

# Define resource groups
device_group = [
    DeviceGroup(name='model', ranks=4, device_type='cuda'),
    DeviceGroup(name='sampler', ranks=4, device_type='cuda'),
]

# Define parallel topology
device_mesh = DeviceMesh.from_sizes(pp_size=2, tp_size=2, dp_size=2)

# Initialize
twinkle.initialize(mode='ray', nproc_per_node=8, groups=device_group)

DeviceGroup

Defines how many resource groups are needed for training. Components can run remotely by selecting resource groups:

model = TransformersModel(
    model_id='Qwen/Qwen3.5-4B', 
    remote_group='model', 
    device_mesh=device_mesh
)

DeviceMesh

Specifies the parallel topology of components within a resource group:

  • pp_size — Pipeline parallelism
  • tp_size — Tensor parallelism
  • dp_size — Data parallelism
  • fsdp_size — Fully Sharded Data Parallelism
  • cp_size — Context parallelism

Multi-Tenancy Architecture

Twinkle supports simultaneous multi-tenant training on a shared base model:

Multi-Tenancy Architecture
Multi-LoRA training on shared base model

Using a LoRA Pool + Tenant Application architecture:

  • Tenant A: Local private dataset, LoRA rank=8, SFT training
  • Tenant B: Open-source dataset, LoRA rank=32, PT training
  • Tenant C: GRPO loss calculation with sampling
  • Tenant D: Log probability inference

All processes execute concurrently because Model and Sampler are integrated as task-agnostic components.

Runtime Modes

Single GPU

Direct training without distributed setup:

model = TransformersModel(model_id='ms://Qwen/Qwen3.5-4B')
model.forward_backward(inputs=batch)

torchrun Mode

Distributed training with PyTorch’s torchrun:

twinkle.initialize(mode='local', global_device_mesh=device_mesh)
torchrun --nproc_per_node=8 train.py

Ray Mode

Distributed training across Ray clusters:

twinkle.initialize(mode='ray', nproc_per_node=8, groups=device_group)

HTTP Mode

Training as a Service deployment:

# Server
twinkle.initialize(mode='http', ...)

# Client
client = init_twinkle_client(base_url='http://localhost:8000')

Customizable Components

ComponentBase ClassDescription
Losstwinkle.loss.LossTraining loss functions
Metrictwinkle.metric.MetricEvaluation metrics
Patchtwinkle.patch.PatchModel training fixes
Preprocessortwinkle.preprocessor.PreprocessorData ETL
Filtertwinkle.preprocessor.FilterData validation
InputProcessortwinkle.processor.InputProcessorTask-specific input handling
Modeltwinkle.model.TwinkleModelLarge model wrapper
Samplertwinkle.sampler.SamplerSampling strategies
Rewardtwinkle.reward.RewardRL reward functions
Advantagetwinkle.advantage.AdvantageAdvantage estimation
Templatetwinkle.template.TemplateTokenization templates
CheckpointEnginetwinkle.checkpoint_engine.CheckpointEngineWeight synchronization

Design Principles

Twinkle adheres to these core principles:

  • Retain Training Loop Control — Developers can clearly see and control forward, backward, and step operations
  • Highly Cohesive Components — Each component has clear responsibilities and works independently
  • Hidden Distributed Complexity — Same code runs on single GPU, torchrun, or Ray clusters
  • Production-Grade Deployment — Built-in multi-tenancy, HTTP services, and weight synchronization
docs