Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid argument in path #30

Open
Addalin opened this issue May 24, 2021 · 1 comment
Open

invalid argument in path #30

Addalin opened this issue May 24, 2021 · 1 comment

Comments

@Addalin
Copy link
Owner

Addalin commented May 24, 2021

below error message of a run of main_lightning.py:


Failure # 1 (occurred at 2021-05-23_21-45-03)
Traceback (most recent call last):
File "C:\Users\addalin.conda\envs\lidar\lib\site-packages\ray\tune\trial_runner.py", line 880, in _process_trial_save
results = self.trial_executor.fetch_result(trial)
File "C:\Users\addalin.conda\envs\lidar\lib\site-packages\ray\tune\ray_trial_executor.py", line 686, in fetch_result
result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
File "C:\Users\addalin.conda\envs\lidar\lib\site-packages\ray_private\client_mode_hook.py", line 47, in wrapper
return func(*args, **kwargs)
File "C:\Users\addalin.conda\envs\lidar\lib\site-packages\ray\worker.py", line 1481, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(OSError): �[36mray::ImplicitFunc.save()�[39m (pid=22632, ip=132.68.58.209)
File "python\ray_raylet.pyx", line 505, in ray._raylet.execute_task
File "python\ray_raylet.pyx", line 449, in ray._raylet.execute_task.function_executor
File "C:\Users\addalin.conda\envs\lidar\lib\site-packages\ray_private\function_manager.py", line 556, in actor_method_executor
return method(__ray_actor, *args, **kwargs)
File "C:\Users\addalin.conda\envs\lidar\lib\site-packages\ray\tune\function_runner.py", line 434, in save
checkpoint_path = TrainableUtil.process_checkpoint(
File "C:\Users\addalin.conda\envs\lidar\lib\site-packages\ray\tune\utils\trainable.py", line 46, in process_checkpoint
with open(checkpoint_path + ".tune_metadata", "wb") as f:
OSError: [Errno 22] Invalid argument: 'C:\Users\addalin\Dropbox\Lidar\lidar_learning\results\main_2021-05-23_19-35-00\main_5831d016_3_bsize=32,dfilter=None,dnorm=False,fc_size=[32],hsizes=[4, 4, 4, 4],lr=0.001,ltype=MAELoss,source=signal_p,use_bg=F_2021-05-23_21-28-18\checkpoint_epoch=3-step=703\.tune_metadata'


This is weird since it failed in the last epoch. And also in other experiments.
running resume with 'ERRORED_ONLY', fix this.
But why would it happen from the beginning?

@Addalin
Copy link
Owner Author

Addalin commented Jul 28, 2021

A similar error keeps showing throughout runs.
Usually running the resume option with 'ERRORED_ONLY', fix this.
However this time it didn't help, and only a restart of the computer solved this.
This accured on the last experiment in 'main_2021-07-27_18-22-37' , the name starts with 'main_798e6_00015_15....'
See the error file below:
error.txt

Is this error related to tune module? or to the file system of Windows?
We should also check if there is any relation between #27, #28, and this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant