四卡训练,已经显式指定了gpu编号 export CUDA_VISIBLE_DEVICES=4,5,6,7,目前就是卡在这里一直不动
![]()
[rank0]:[W513 02:56:19.001482657 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id.
2 个帖子 - 2 位参与者