Blog

Sequence Parallel & Ring Attention: Training with Ultra-Long Contexts

Jun 22, 2026

Modern LLMs demand ever-longer context windows — 128K, 256K, even 1M tokens. A single GPU cannot hold such long sequences in memory. Twinkle’s Sequence Parallel module solves this by splitting the sequence dimension across multiple devices, combining Ulysses-style All-to-All parallelism with ZigZag Ring Attention to achieve near-linear scaling.

Jun 22, 2026

Ascend NPU Support: Fused Operators and Flash Linear Attention

Jun 5, 2026

Twinkle provides first-class support for Huawei Ascend NPU through a comprehensive monkey-patching system that replaces standard CUDA operators with NPU-optimized fused kernels. This post covers the kernel architecture and the optimizations enabled.

Jun 5, 2026

Two Execution Modes: torchrun (Local) vs Ray (Distributed)

Jun 3, 2026

Twinkle’s infra module provides a unified programming model that runs seamlessly in two modes: local (single-node via torchrun) and ray (multi-node via Ray cluster). This post explains the architecture, the decorator-based API, and when to use each mode.

Jun 3, 2026

TUI & Auto-Research: An AI Agent for Training Control

Jun 1, 2026

Twinkle ships a terminal-based UI (TUI) powered by an embedded LLM agent that can autonomously start, monitor, pause, and debug ML training runs. This post covers the architecture of the TUI, the agent loop, and the tool system that makes “auto-research” possible.

Jun 1, 2026

Multi-LoRA: Concurrent Multi-Tenant Training on Shared GPUs

Jun 1, 2026

Twinkle’s Multi-LoRA architecture enables multiple tenants to train independent LoRA adapters on a single shared model simultaneously. This post explains the technical design, covering both the Transformers and Megatron backends.

Jun 1, 2026

OpenEnv Integration: Connecting External Environments to RL Training

May 30, 2026

Twinkle’s envs module bridges the gap between asynchronous external environments (code sandboxes, web browsers, game engines) and synchronous RL training loops. This post explains the Env abstraction, the EnvTool adapter, and the OpenEnv WebSocket client.

May 30, 2026

Free LLM Training on ModelScope: Twinkle Training-as-a-Service

Mar 15, 2026

We’re excited to announce that Twinkle Training-as-a-Service (TaaS) is now available on ModelScope! Developers can experience Twinkle’s training API for free—no GPU cluster required.

Mar 15, 2026

No results found

Blog

Sequence Parallel & Ring Attention: Training with Ultra-Long Contexts

Ascend NPU Support: Fused Operators and Flash Linear Attention

Two Execution Modes: torchrun (Local) vs Ray (Distributed)

TUI & Auto-Research: An AI Agent for Training Control

Multi-LoRA: Concurrent Multi-Tenant Training on Shared GPUs

OpenEnv Integration: Connecting External Environments to RL Training

Free LLM Training on ModelScope: Twinkle Training-as-a-Service