TUI & Auto-Research: An AI Agent for Training Control
Twinkle ships a terminal-based UI (TUI) powered by an embedded LLM agent that can autonomously start, monitor, pause, and debug ML training runs. This post covers the architecture of the TUI, the agent loop, and the tool system that makes “auto-research” possible.
Architecture Overview
The TUI is built on Textual and consists of four panels in a 2x3 grid layout:
| Panel | Position | Purpose |
|---|---|---|
| StatusBar | Top, full width | Run ID, model, step counter, training state |
| MetricsPanel | Middle left | Real-time loss/reward/grad_norm charts |
| LogPanel | Right, spanning 2 rows | Streaming stdout from training process |
| ChatPanel | Bottom left | Natural language interaction with the agent |
Screen {
layout: grid;
grid-size: 2 3;
grid-rows: auto 2fr 3fr;
grid-columns: 2fr 1fr;
}
The Agent Loop
At the heart of the TUI is AgentLoop — an async tool-calling agent that uses any OpenAI-compatible API (local Ollama, cloud API, etc.):
agent = AgentLoop(
connection=connection,
llm_base_url='http://localhost:11434/v1',
llm_model='qwen3.5',
llm_api_key='not-needed',
)
The loop follows a standard ReAct pattern:
- User sends a message via ChatPanel
- Agent calls LLM with conversation history + tool schemas
- LLM either responds directly or generates tool calls
- Tools are executed, results fed back to LLM
- Repeat until LLM produces a final text response (max 10 rounds)
Key design decisions:
- Streaming: Tokens are streamed to the UI in real-time. If tool calls are detected mid-stream,
on_stream_resetdiscards partial output - History pruning: Conversation is capped at 50 messages (excluding system prompt), with cuts always at
usermessage boundaries to avoid breaking tool-call sequences - Async skills loading: Skills are loaded in the background — the agent is usable immediately, skills are injected via
inject_skills()when ready
Tool System
The agent has access to 15+ tools organized into categories:
Training Lifecycle
| Tool | Description |
|---|---|
start_server | Launch Ray cluster + Twinkle Server (GPU partition, config generation) |
shutdown_server | Stop server and release GPU resources |
start_training | Write training script, launch process, begin monitoring |
pause_training | SIGKILL client process (server retains state) |
resume_training | Re-launch client script from saved state |
stop_training | Graceful stop with checkpoint saving |
update_script | Archive current script, write new version |
Discovery & Search
| Tool | Description |
|---|---|
list_training_runs | List active and historical runs |
get_training_status | Get run state + recent metrics |
search_models | Search ModelScope Hub for models |
search_datasets | Search ModelScope Hub for datasets |
list_supported_models | Query server for available models |
get_cluster_info | Detect GPU resources (Ray or nvidia-smi) |
Visualization
| Tool | Description |
|---|---|
zoom_metrics | Pan/zoom the metrics chart |
select_metrics | Choose which metrics to display (max 4) |
select_run | Switch monitoring to a different run |
Server Startup Pipeline
The start_server tool orchestrates a complete server deployment:
- Hardware detection —
nvidia-smiGPU count - GPU allocation — Partition GPUs between training model and sampler/teacher models
- Config generation — Auto-generate
server_config.yamlwith Ray Serve applications - Ray cluster start — Multi-node GPU partitioning with separate raylets per role
- Server launch —
python -m twinkle.server launch --config ... - Health check — Poll
/api/v1/healthz+ sampler engine readiness
The config generator supports multi-model topology: one training model + N sampler/teacher models, with GPU sorting by size (largest PG deploys first to avoid scheduling deadlock).
Skills System
The TUI supports extensible skills — pluggable capabilities loaded from three sources:
- Bundled skills — shipped with the
twinkle_clientpackage - Local skills — user-defined in
~/.cache/twinkle/tui/skills/local/ - Community skills — fetched from ModelScope (with 10s timeout)
Skills are loaded asynchronously after the agent starts, so the TUI is interactive immediately.
TrainingRuntime: Script-Side Integration
Training scripts integrate with the TUI via TrainingRuntime:
from twinkle_client.tui.runtime import TrainingRuntime
rt = TrainingRuntime(run_id='grpo-gsm8k')
rt.start(model_id='Qwen/Qwen3.5-4B', config={...})
rt.register_graceful_shutdown(model, dataloader)
for step, batch in enumerate(dataloader):
loss = train(batch)
rt.log_metrics(step=step, loss=loss, reward=reward)
rt.log(f'Step {step}, loss={loss:.4f}')
rt.finish()
Key features:
- metrics.jsonl — structured metrics with auto-timestamp, streamed to TUI in real-time
- Graceful shutdown — SIGTERM handler saves checkpoint (LoRA weights + optimizer state + dataloader position)
- Auto-resume —
get_resume_info()reads last saved step frommeta.json - Script archival — each
update_scriptcall archivestrain.pyastrain_v{N}.py
Getting Started
# Start TUI with local LLM
twinkle tui --llm-base-url http://localhost:11434/v1 --llm-model qwen3.5
# Or with a specific run
twinkle tui --run-id my-grpo-run
The TUI turns ML training into a conversation — describe what you want to train, and the agent handles server setup, script writing, monitoring, and troubleshooting.