TUI & Auto-Research: An AI Agent for Training Control

Jun 1, 2026 · 4 min read
blog Technical Deep Dive

Twinkle ships a terminal-based UI (TUI) powered by an embedded LLM agent that can autonomously start, monitor, pause, and debug ML training runs. This post covers the architecture of the TUI, the agent loop, and the tool system that makes “auto-research” possible.

Architecture Overview

The TUI is built on Textual and consists of four panels in a 2x3 grid layout:

PanelPositionPurpose
StatusBarTop, full widthRun ID, model, step counter, training state
MetricsPanelMiddle leftReal-time loss/reward/grad_norm charts
LogPanelRight, spanning 2 rowsStreaming stdout from training process
ChatPanelBottom leftNatural language interaction with the agent
Screen {
    layout: grid;
    grid-size: 2 3;
    grid-rows: auto 2fr 3fr;
    grid-columns: 2fr 1fr;
}

The Agent Loop

At the heart of the TUI is AgentLoop — an async tool-calling agent that uses any OpenAI-compatible API (local Ollama, cloud API, etc.):

agent = AgentLoop(
    connection=connection,
    llm_base_url='http://localhost:11434/v1',
    llm_model='qwen3.5',
    llm_api_key='not-needed',
)

The loop follows a standard ReAct pattern:

  1. User sends a message via ChatPanel
  2. Agent calls LLM with conversation history + tool schemas
  3. LLM either responds directly or generates tool calls
  4. Tools are executed, results fed back to LLM
  5. Repeat until LLM produces a final text response (max 10 rounds)

Key design decisions:

  • Streaming: Tokens are streamed to the UI in real-time. If tool calls are detected mid-stream, on_stream_reset discards partial output
  • History pruning: Conversation is capped at 50 messages (excluding system prompt), with cuts always at user message boundaries to avoid breaking tool-call sequences
  • Async skills loading: Skills are loaded in the background — the agent is usable immediately, skills are injected via inject_skills() when ready

Tool System

The agent has access to 15+ tools organized into categories:

Training Lifecycle

ToolDescription
start_serverLaunch Ray cluster + Twinkle Server (GPU partition, config generation)
shutdown_serverStop server and release GPU resources
start_trainingWrite training script, launch process, begin monitoring
pause_trainingSIGKILL client process (server retains state)
resume_trainingRe-launch client script from saved state
stop_trainingGraceful stop with checkpoint saving
update_scriptArchive current script, write new version
ToolDescription
list_training_runsList active and historical runs
get_training_statusGet run state + recent metrics
search_modelsSearch ModelScope Hub for models
search_datasetsSearch ModelScope Hub for datasets
list_supported_modelsQuery server for available models
get_cluster_infoDetect GPU resources (Ray or nvidia-smi)

Visualization

ToolDescription
zoom_metricsPan/zoom the metrics chart
select_metricsChoose which metrics to display (max 4)
select_runSwitch monitoring to a different run

Server Startup Pipeline

The start_server tool orchestrates a complete server deployment:

  1. Hardware detectionnvidia-smi GPU count
  2. GPU allocation — Partition GPUs between training model and sampler/teacher models
  3. Config generation — Auto-generate server_config.yaml with Ray Serve applications
  4. Ray cluster start — Multi-node GPU partitioning with separate raylets per role
  5. Server launchpython -m twinkle.server launch --config ...
  6. Health check — Poll /api/v1/healthz + sampler engine readiness

The config generator supports multi-model topology: one training model + N sampler/teacher models, with GPU sorting by size (largest PG deploys first to avoid scheduling deadlock).

Skills System

The TUI supports extensible skills — pluggable capabilities loaded from three sources:

  1. Bundled skills — shipped with the twinkle_client package
  2. Local skills — user-defined in ~/.cache/twinkle/tui/skills/local/
  3. Community skills — fetched from ModelScope (with 10s timeout)

Skills are loaded asynchronously after the agent starts, so the TUI is interactive immediately.

TrainingRuntime: Script-Side Integration

Training scripts integrate with the TUI via TrainingRuntime:

from twinkle_client.tui.runtime import TrainingRuntime

rt = TrainingRuntime(run_id='grpo-gsm8k')
rt.start(model_id='Qwen/Qwen3.5-4B', config={...})
rt.register_graceful_shutdown(model, dataloader)

for step, batch in enumerate(dataloader):
    loss = train(batch)
    rt.log_metrics(step=step, loss=loss, reward=reward)
    rt.log(f'Step {step}, loss={loss:.4f}')

rt.finish()

Key features:

  • metrics.jsonl — structured metrics with auto-timestamp, streamed to TUI in real-time
  • Graceful shutdown — SIGTERM handler saves checkpoint (LoRA weights + optimizer state + dataloader position)
  • Auto-resumeget_resume_info() reads last saved step from meta.json
  • Script archival — each update_script call archives train.py as train_v{N}.py

Getting Started

# Start TUI with local LLM
twinkle tui --llm-base-url http://localhost:11434/v1 --llm-model qwen3.5

# Or with a specific run
twinkle tui --run-id my-grpo-run

The TUI turns ML training into a conversation — describe what you want to train, and the agent handles server setup, script writing, monitoring, and troubleshooting.