Shell Launch (torchrun)
·
1 min read
The standard way to launch local multi-GPU training with torchrun:
#!/usr/bin/env bash
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
torchrun --nproc_per_node=8 fsdp2.py \
--model-id ms://Qwen/Qwen3.5-4B \
--dataset-id ms://swift/self-cognition \
--template-cls Qwen3_5Template \
--fsdp-size 2 \
--dp-size 4 \
--batch-size 8 \
--lr 1e-4 \
--gradient-accumulation-steps 2 \
--output-dir ./output/fsdp2 \
--adapter-name default \
--scheduler-cls CosineWarmupScheduler \
--num-warmup-steps 5 \
--train-samples 1000