-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
[图片]
rollout 的
测试是同的。。但是访问rollout 确不通
**Your hardware and system info**
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
a100 8 *2
Additional context
Add any other context about the problem here(在这里补充其他信息)
两台机器,一台4卡启动rollout, 另一台8卡启动训练
如下为测试启动脚本
CUDA_VISIBLE_DEVICES=0,1,2,3 swift rollout --model /home/search/houchangjian/cephfs-houchangjian/ms-swift-main/output/SFT/SFT-lora/v0-20251127-155534/checkpoint-199-merged --port 8002 --model_type qwen2_5_vl --vllm_tensor_parallel_size 4
训练脚本如下
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,7
MAX_PIXELS=691200
NPROC_PER_NODE=8
swift rlhf
--rlhf_type grpo
--model /home/search/houchangjian/cephfs-houchangjian/ms-swift-main/output/SFT/SFT-lora/v0-20251127-155534/checkpoint-199-merged
--external_plugins /home/search/houchangjian/cephfs-houchangjian/ms-swift/examples/train/grpo/plugin/plugin.py
--reward_funcs table_edit_distance_strict
--use_vllm true
--vllm_mode server
--vllm_server_host 10.178.143.49
--vllm_server_port 8002
--train_type full
--torch_dtype bfloat16
--dataset /home/search/houchangjian/cephfs-houchangjian/data/tablerecg/train/0217_TSR_Real_all_converted.json
--load_from_cache_file false
--max_completion_length 2048
--num_train_epochs 1
--per_device_train_batch_size 1
--per_device_eval_batch_size 4
--learning_rate 1e-6
--gradient_accumulation_steps 1
--save_strategy steps
--eval_strategy steps
--eval_steps 500
--save_steps 500
--save_total_limit 10
--logging_steps 1
--output_dir output/DAPO_CLEVR_COUNTDOWN_test
--warmup_ratio 0.01
--dataloader_num_workers 4
--num_generations 1
--generation_batch_size 20
--temperature 1.0
--deepspeed zero3
--log_completions false
--num_iterations 1
--async_generate False
--beta 0.001
--model_type qwen2_5_vl
--max_pixels 691200
--loss_type dapo
--dynamic_sample true
--max_resample_times 3
目前rollout 完全没有反应