You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Two errors: "The server socket has failed to listen on any local network address." and "subprocess.CalledProcessError: Command '['wget', '-P', '/root/.cache/vbench/amt_model', 'https://huggingface.co/lalala125/AMT/resolve/main/amt-s.pth']' returned non-zero exit status 4."
#97
Firstly I run vbench evaluate --dimension motion_smoothness --videos_path /user-fs/dataset/videos_mini/ --mode=custom_input,
the error is:
[W socket.cpp:401] [c10d] The server socket has failed to bind to [::]:50001 (errno: 98 - Address already in use).
[W socket.cpp:401] [c10d] The server socket has failed to bind to 0.0.0.0:50001 (errno: 98 - Address already in use).
[E socket.cpp:435] [c10d] The server socket has failed to listen on any local network address.
Traceback (most recent call last):
File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/vbench/cli/../launch/evaluate.py", line 153, in <module>
main()
File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/vbench/cli/../launch/evaluate.py", line 109, in main
dist_init()
File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/vbench/distributed.py", line 37, in dist_init
torch.distributed.init_process_group(backend=backend, init_method='env://')
File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 595, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/torch/distributed/rendezvous.py", line 257, in _env_rendezvous_handler
store = _create_c10d_store(master_addr, master_port, rank, world_size, timeout)
File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/torch/distributed/rendezvous.py", line 188, in _create_c10d_store
return TCPStore(
RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:50001 (errno: 98 - Address already in use). The server socket has failed to bind to 0.0.0.0:50001 (errno: 98 - Address already in use).
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 73806) of binary: /opt/conda/envs/openmmlab/bin/python
Traceback (most recent call last):
File "/opt/conda/envs/openmmlab/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/envs/openmmlab/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in <module>
main()
File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
return f(*args, **kwargs)
File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main
run(args)
File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run
elastic_launch(
File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/opt/conda/envs/openmmlab/lib/python3.8/site-packages/vbench/cli/../launch/evaluate.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2025-01-07_11:56:55
host : 47vevg2begmd2-0
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 73806)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
then I add MASTER_PORT=29501 in the front of command: MASTER_PORT=29501 vbench evaluate --dimension motion_smooth ness --videos_path /user-fs/dataset/videos_mini/ --mode=custom_input,
it still throw error:
2025-01-07 11:57:19,779 - torch.distributed.distributed_c10d - INFO - Added key: store_based_barrier_key:1 to store for rank: 0
2025-01-07 11:57:19,779 - torch.distributed.distributed_c10d - INFO - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
args: Namespace(category=None, dimension=['motion_smoothness'], full_json_dir='/opt/conda/envs/openmmlab/lib/python3.8/site-packages/vbench/cli/../VBench_full_info.json', imaging_quality_preprocessing_mode='longer', load_ckpt_from_local=None, mode='custom_input', output_path='./evaluation_results/', prompt='None', prompt_file=None, read_frame=None, videos_path='/user-fs/dataset/videos_mini/')
start evaluation
File /root/.cache/vbench/amt_model/amt-s.pth does not exist. Downloading...
--2025-01-07 11:57:19-- https://huggingface.co/lalala125/AMT/resolve/main/amt-s.pth
Resolving huggingface.co (huggingface.co)... 157.240.15.8, 2a03:2880:f127:283:face:b00c:0:25de
Connecting to huggingface.co (huggingface.co)|157.240.15.8|:443... failed: Connection timed out.
Connecting to huggingface.co (huggingface.co)|2a03:2880:f127:283:face:b00c:0:25de|:443... failed: Network is unreachable.
Traceback (most recent call last):
File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/vbench/cli/../launch/evaluate.py", line 153, in <module>
main()
File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/vbench/cli/../launch/evaluate.py", line 139, in main
my_VBench.evaluate(
File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/vbench/__init__.py", line 141, in evaluate
submodules_dict = init_submodules(dimension_list, local=local, read_frame=read_frame)
File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/vbench/utils.py", line 275, in init_submodules
subprocess.run(wget_command, check=True)
File "/opt/conda/envs/openmmlab/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['wget', '-P', '/root/.cache/vbench/amt_model', 'https://huggingface.co/lalala125/AMT/resolve/main/amt-s.pth']' returned non-zero exit status 4.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 74247) of binary: /opt/conda/envs/openmmlab/bin/python
Traceback (most recent call last):
File "/opt/conda/envs/openmmlab/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/envs/openmmlab/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/torch/distributed/run.py", line 765, in <module>
main()
File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
return f(*args, **kwargs)
File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main
run(args)
File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run
elastic_launch(
File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/conda/envs/openmmlab/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/opt/conda/envs/openmmlab/lib/python3.8/site-packages/vbench/cli/../launch/evaluate.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2025-01-07_11:59:28
host : 47vevg2begmd2-0
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 74247)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
I tried excute export HF_ENDPOINT=https://hf-mirror.com, but still got above errors
The text was updated successfully, but these errors were encountered:
Firstly I run
vbench evaluate --dimension motion_smoothness --videos_path /user-fs/dataset/videos_mini/ --mode=custom_input
,the error is:
then I add
MASTER_PORT=29501
in the front of command:MASTER_PORT=29501 vbench evaluate --dimension motion_smooth ness --videos_path /user-fs/dataset/videos_mini/ --mode=custom_input
,it still throw error:
I tried excute
export HF_ENDPOINT=https://hf-mirror.com
, but still got above errorsThe text was updated successfully, but these errors were encountered: