grpo 多机多卡训练不同

**Describe the bug**
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图)
[图片]
rollout 的

<img width="3802" height="436" alt="Image" src="https://github.com/user-attachments/assets/ae982115-436d-4308-8f6a-ec01e2899167" />
测试是同的。。但是访问rollout 确不通
**Your hardware and system info**
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等)
a100 8 *2

**Additional context**
Add any other context about the problem here(在这里补充其他信息)

两台机器，一台4卡启动rollout， 另一台8卡启动训练

如下为测试启动脚本
CUDA_VISIBLE_DEVICES=0,1,2,3 swift rollout --model /home/search/houchangjian/cephfs-houchangjian/ms-swift-main/output/SFT/SFT-lora/v0-20251127-155534/checkpoint-199-merged --port 8002 --model_type qwen2_5_vl --vllm_tensor_parallel_size 4

训练脚本如下

PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,7 \
MAX_PIXELS=691200 \
NPROC_PER_NODE=8 \
swift rlhf \
    --rlhf_type grpo \
    --model /home/search/houchangjian/cephfs-houchangjian/ms-swift-main/output/SFT/SFT-lora/v0-20251127-155534/checkpoint-199-merged \
    --external_plugins /home/search/houchangjian/cephfs-houchangjian/ms-swift/examples/train/grpo/plugin/plugin.py \
    --reward_funcs table_edit_distance_strict \
    --use_vllm true \
    --vllm_mode server \
    --vllm_server_host 10.178.143.49 \
    --vllm_server_port 8002 \
    --train_type full \
    --torch_dtype bfloat16 \
    --dataset /home/search/houchangjian/cephfs-houchangjian/data/tablerecg/train/0217_TSR_Real_all_converted.json \
    --load_from_cache_file false \
    --max_completion_length 2048 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 4 \
    --learning_rate 1e-6 \
    --gradient_accumulation_steps 1 \
    --save_strategy steps \
    --eval_strategy steps \
    --eval_steps 500 \
    --save_steps 500 \
    --save_total_limit 10 \
    --logging_steps 1 \
    --output_dir output/DAPO_CLEVR_COUNTDOWN_test \
    --warmup_ratio 0.01 \
    --dataloader_num_workers 4 \
    --num_generations 1 \
    --generation_batch_size 20 \
    --temperature 1.0 \
    --deepspeed zero3 \
    --log_completions false \
    --num_iterations 1 \
    --async_generate False \
    --beta 0.001 \
    --model_type qwen2_5_vl \
    --max_pixels 691200 \
    --loss_type dapo \
    --dynamic_sample true \
    --max_resample_times 3

目前rollout 完全没有反应

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

grpo 多机多卡训练不同 #6810

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

grpo 多机多卡训练不同 #6810

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions