This page provides a detailed description of the configuration files for AgentJet.
Overview
AgentJet uses YAML-format configuration files to set up data, algorithms, rewards, logging, and other runtime behaviors.
Default Configuration
The default config is located at ajet/default_config/ajet_default.yaml.
At a high level, a typical config contains a single root section ajet, which is divided into several logical parts:
-
Basic Metadata — Project name, experiment name, experiment directory, and backbone selection
project_name,experiment_name,experiment_dirbackbone: Select training backend (debug,trinity, orverl)
-
Data & Reward — How to load data and evaluate agents
task_reader: Load training/validation samplestask_judge: Evaluate agents and compute rewardsdata: Prompt/response length and batch sizes
-
Model & Rollout — Model configuration and agent interaction
model: Base model to trainrollout: Agent-environment interaction settingscontext_tracker: Conversation/history management
Model Configuration
Specifying the Model
| Source Type | Example |
|---|---|
| Local file | /mnt/data/models/Qwen2.5-14B-Instruct |
| HuggingFace repo | Qwen/Qwen2.5-14B-Instruct (auto-downloaded) |
Environment Variables for LLM-as-Judge
If using LLM-as-a-Judge, configure these environment variables:
# DashScope API key for remote LLM calling
export DASHSCOPE_API_KEY='sk-xxxxxx|sk-yyyyyy'
export DASHSCOPE_API_KEY_BACKUP='sk-zzzzzz'
Data Configuration
Task Reader
task_reader defines how to read training and validation data.
Task Judge
task_judge evaluates agent performance and calculates rewards.
ajet:
task_judge:
judge_type: customized_protocol # or 'rubrics_auto_grader'
judge_protocol: ajet.task_judge.env_service_as_judge->EnvServiceJudge
alien_llm_model: qwen3-235b-a22b-instruct-2507
alien_llm_response_length: 512
| Option | Description |
|---|---|
customized_protocol |
Use a custom Python class for scoring |
rubrics_auto_grader |
Use LLM-based automatic grading |
Training Configuration
Backend Selection
AgentJet supports three training backends:
| Backend | Description |
|---|---|
| trinity | Default. Flexible and scalable framework for RL fine-tuning |
| verl | Volcano Engine reinforcement learning for LLMs |
| debug | Allows breakpoint debugging in IDEs |
Rollout Configuration
Controls agent behavior during environment interaction:
ajet:
rollout:
user_workflow: tutorial.example_appworld.appworld->ExampleAgentScopeWorkflow
max_env_worker: 128
temperature: 0.9
top_p: 1.0
name: vllm
n_vllm_engine: 2
num_repeat: 4
| Parameter | Description |
|---|---|
user_workflow |
Path to workflow implementation class |
temperature / top_p |
Sampling parameters |
name |
Inference engine (e.g., vllm) |
n_vllm_engine |
Number of vLLM engines (Trinity only) |
Common Training Parameters
ajet:
trainer_common:
total_epochs: 50
save_freq: 20
test_freq: 20
val_before_train: False
val_pass_n: 4
nnodes: 1
n_gpus_per_node: 8
mini_batch_num: 1
fsdp_config:
param_offload: True
optimizer_offload: True
| Parameter | Description |
|---|---|
total_epochs |
Total training epochs |
save_freq |
Checkpoint save frequency (steps) |
test_freq |
Validation frequency (steps) |
nnodes / n_gpus_per_node |
Distributed training setup |
fsdp_config |
FSDP memory optimization |
Optimization Algorithms
ajet:
trainer_common:
algorithm:
adv_estimator: grpo
use_kl_in_reward: False
optim:
lr: 1e-6
use_kl_loss: True
kl_loss_coef: 0.002
kl_loss_type: low_var_kl
| Parameter | Description |
|---|---|
adv_estimator |
Advantage estimator (e.g., grpo) |
lr |
Learning rate |
use_kl_loss |
Include KL divergence in loss |
kl_loss_coef |
KL loss coefficient |
Debug Mode
When backbone: debug, additional settings are available:
ajet:
debug:
debug_max_parallel: 16
debug_first_n_tasks: 2
debug_vllm_port: 18000
debug_vllm_seed: 12345
debug_tensor_parallel_size: 4
Debug Mode Use Cases
- Limiting tasks: Quickly verify the pipeline on a few tasks
- Fixing randomness:
debug_vllm_seedhelps reproduce issues - Reduced parallelism: Easier to debug with smaller concurrency
Logging & Monitoring
Logger Selection
| Logger | Description |
|---|---|
console |
Standard output for quick progress checking |
wandb |
Weights & Biases experiment tracking |
swanlab |
SwanLab logging |
Output Structure
All experiment outputs are saved in ./launcher_record/{experiment_name}:
| Directory | Contents |
|---|---|
| Logs | Logs and error messages |
| Metrics | Training metrics (depends on logger) |
| Checkpoint | Model checkpoints |