Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR | yolox.core.launch:90 - An error has been caught in function 'launch', process 'MainProcess' #431

Open
lida2003 opened this issue Dec 11, 2024 · 0 comments

Comments

@lida2003
Copy link

I have got below error on jetson orin nano board, anyone met this before?

daniel@daniel-nvidia:~/Work/ByteTrack$ python3 tools/train.py -f exps/example/mot/yolox_x_ablation.py -d 1 -b 48 --fp16 -o -c pretrained/yolox_x.pth
2024-12-11 11:37:59 | INFO     | yolox.core.trainer:126 - args: Namespace(batch_size=48, ckpt='pretrained/yolox_x.pth', devices=1, dist_backend='nccl', dist_url=None, exp_file='exps/example/mot/yolox_x_ablation.py', experiment_name='yolox_x_ablation', fp16=True, local_rank=0, machine_rank=0, name=None, num_machines=1, occupy=True, opts=[], resume=False, start_epoch=None)
2024-12-11 11:37:59 | INFO     | yolox.core.trainer:127 - exp value:
╒══════════════════╤════════════════════╕
│ keys             │ values             │
╞══════════════════╪════════════════════╡
│ seed             │ None               │
├──────────────────┼────────────────────┤
│ output_dir       │ './YOLOX_outputs'  │
├──────────────────┼────────────────────┤
│ print_interval   │ 20                 │
├──────────────────┼────────────────────┤
│ eval_interval    │ 5                  │
├──────────────────┼────────────────────┤
│ num_classes      │ 1                  │
├──────────────────┼────────────────────┤
│ depth            │ 1.33               │
├──────────────────┼────────────────────┤
│ width            │ 1.25               │
├──────────────────┼────────────────────┤
│ data_num_workers │ 4                  │
├──────────────────┼────────────────────┤
│ input_size       │ (800, 1440)        │
├──────────────────┼────────────────────┤
│ random_size      │ (18, 32)           │
├──────────────────┼────────────────────┤
│ train_ann        │ 'train.json'       │
├──────────────────┼────────────────────┤
│ val_ann          │ 'val_half.json'    │
├──────────────────┼────────────────────┤
│ degrees          │ 10.0               │
├──────────────────┼────────────────────┤
│ translate        │ 0.1                │
├──────────────────┼────────────────────┤
│ scale            │ (0.1, 2)           │
├──────────────────┼────────────────────┤
│ mscale           │ (0.8, 1.6)         │
├──────────────────┼────────────────────┤
│ shear            │ 2.0                │
├──────────────────┼────────────────────┤
│ perspective      │ 0.0                │
├──────────────────┼────────────────────┤
│ enable_mixup     │ True               │
├──────────────────┼────────────────────┤
│ warmup_epochs    │ 1                  │
├──────────────────┼────────────────────┤
│ max_epoch        │ 80                 │
├──────────────────┼────────────────────┤
│ warmup_lr        │ 0                  │
├──────────────────┼────────────────────┤
│ basic_lr_per_img │ 1.5625e-05         │
├──────────────────┼────────────────────┤
│ scheduler        │ 'yoloxwarmcos'     │
├──────────────────┼────────────────────┤
│ no_aug_epochs    │ 10                 │
├──────────────────┼────────────────────┤
│ min_lr_ratio     │ 0.05               │
├──────────────────┼────────────────────┤
│ ema              │ True               │
├──────────────────┼────────────────────┤
│ weight_decay     │ 0.0005             │
├──────────────────┼────────────────────┤
│ momentum         │ 0.9                │
├──────────────────┼────────────────────┤
│ exp_name         │ 'yolox_x_ablation' │
├──────────────────┼────────────────────┤
│ test_size        │ (800, 1440)        │
├──────────────────┼────────────────────┤
│ test_conf        │ 0.1                │
├──────────────────┼────────────────────┤
│ nmsthre          │ 0.7                │
╘══════════════════╧════════════════════╛
/home/daniel/.local/lib/python3.8/site-packages/torch/functional.py:505: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3490.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
2024-12-11 11:38:02 | INFO     | yolox.core.trainer:132 - Model Summary: Params: 99.00M, Gflops: 793.21
2024-12-11 11:38:06 | INFO     | yolox.core.trainer:291 - loading checkpoint for fine tuning
2024-12-11 11:38:09 | WARNING  | yolox.utils.checkpoint:25 - Shape of head.cls_preds.0.weight in checkpoint is torch.Size([80, 320, 1, 1]), while shape of head.cls_preds.0.weight in model is torch.Size([1, 320, 1, 1]).
2024-12-11 11:38:09 | WARNING  | yolox.utils.checkpoint:25 - Shape of head.cls_preds.0.bias in checkpoint is torch.Size([80]), while shape of head.cls_preds.0.bias in model is torch.Size([1]).
2024-12-11 11:38:09 | WARNING  | yolox.utils.checkpoint:25 - Shape of head.cls_preds.1.weight in checkpoint is torch.Size([80, 320, 1, 1]), while shape of head.cls_preds.1.weight in model is torch.Size([1, 320, 1, 1]).
2024-12-11 11:38:09 | WARNING  | yolox.utils.checkpoint:25 - Shape of head.cls_preds.1.bias in checkpoint is torch.Size([80]), while shape of head.cls_preds.1.bias in model is torch.Size([1]).
2024-12-11 11:38:09 | WARNING  | yolox.utils.checkpoint:25 - Shape of head.cls_preds.2.weight in checkpoint is torch.Size([80, 320, 1, 1]), while shape of head.cls_preds.2.weight in model is torch.Size([1, 320, 1, 1]).
2024-12-11 11:38:09 | WARNING  | yolox.utils.checkpoint:25 - Shape of head.cls_preds.2.bias in checkpoint is torch.Size([80]), while shape of head.cls_preds.2.bias in model is torch.Size([1]).
2024-12-11 11:38:09 | INFO     | yolox.data.datasets.mot:39 - loading annotations into memory...
2024-12-11 11:38:16 | INFO     | yolox.data.datasets.mot:39 - Done (t=6.44s)
2024-12-11 11:38:16 | INFO     | pycocotools.coco:88 - creating index...
2024-12-11 11:38:16 | INFO     | pycocotools.coco:88 - index created!
2024-12-11 11:38:20 | INFO     | yolox.core.trainer:150 - init prefetcher, this might take one minute or less...
2024-12-11 11:38:51 | ERROR    | yolox.core.launch:90 - An error has been caught in function 'launch', process 'MainProcess' (78890), thread 'MainThread' (281473143934992):
Traceback (most recent call last):

  File "/home/daniel/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1135, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
           │    │           │           └ 5.0
           │    │           └ <function Queue.get at 0xffff2415faf0>
           │    └ <queue.Queue object at 0xfffea80c62e0>
           └ <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0xfffea8091d30>
  File "/usr/lib/python3.8/queue.py", line 179, in get
    self.not_empty.wait(remaining)
    │    │         │    └ 4.999996639999154
    │    │         └ <function Condition.wait at 0xffff91dd10d0>
    │    └ <Condition(<unlocked _thread.lock object at 0xfffea80c6420>, 0)>
    └ <queue.Queue object at 0xfffea80c62e0>
  File "/usr/lib/python3.8/threading.py", line 306, in wait
    gotit = waiter.acquire(True, timeout)
            │      │             └ 4.999996639999154
            │      └ <method 'acquire' of '_thread.lock' objects>
            └ <locked _thread.lock object at 0xfffed003a390>
  File "/home/daniel/.local/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
    └ <built-in function _error_if_any_worker_fails>

RuntimeError: DataLoader worker (pid 79134) is killed by signal: Killed.


The above exception was the direct cause of the following exception:


Traceback (most recent call last):

  File "tools/train.py", line 114, in <module>
    launch(
    └ <function launch at 0xffff21e639d0>

> File "/home/daniel/Work/ByteTrack/yolox/core/launch.py", line 90, in launch
    main_func(*args)
    │          └ (╒══════════════════╤════════════════════════════════════════════════════════════════════════════════════════════════════════...
    └ <function main at 0xffff22f4d160>

  File "tools/train.py", line 100, in main
    trainer.train()
    │       └ <function Trainer.train at 0xfffed70d8940>
    └ <yolox.core.trainer.Trainer object at 0xffff22f4efa0>

  File "/home/daniel/Work/ByteTrack/yolox/core/trainer.py", line 70, in train
    self.before_train()
    │    └ <function Trainer.before_train at 0xffff22f21e50>
    └ <yolox.core.trainer.Trainer object at 0xffff22f4efa0>

  File "/home/daniel/Work/ByteTrack/yolox/core/trainer.py", line 151, in before_train
    self.prefetcher = DataPrefetcher(self.train_loader)
    │                 │              │    └ <yolox.data.dataloading.DataLoader object at 0xfffea8091dc0>
    │                 │              └ <yolox.core.trainer.Trainer object at 0xffff22f4efa0>
    │                 └ <class 'yolox.data.data_prefetcher.DataPrefetcher'>
    └ <yolox.core.trainer.Trainer object at 0xffff22f4efa0>

  File "/home/daniel/Work/ByteTrack/yolox/data/data_prefetcher.py", line 26, in __init__
    self.preload()
    │    └ <function DataPrefetcher.preload at 0xfffed70d1e50>
    └ <yolox.data.data_prefetcher.DataPrefetcher object at 0xfffea80917f0>

  File "/home/daniel/Work/ByteTrack/yolox/data/data_prefetcher.py", line 30, in preload
    self.next_input, self.next_target, _, _ = next(self.loader)
    │                │                             │    └ <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0xfffea8091d30>
    │                │                             └ <yolox.data.data_prefetcher.DataPrefetcher object at 0xfffea80917f0>
    │                └ <yolox.data.data_prefetcher.DataPrefetcher object at 0xfffea80917f0>
    └ <yolox.data.data_prefetcher.DataPrefetcher object at 0xfffea80917f0>

  File "/home/daniel/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 633, in __next__
    data = self._next_data()
           │    └ <function _MultiProcessingDataLoaderIter._next_data at 0xffff240741f0>
           └ <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0xfffea8091d30>
  File "/home/daniel/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1331, in _next_data
    idx, data = self._get_data()
                │    └ <function _MultiProcessingDataLoaderIter._get_data at 0xffff24074160>
                └ <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0xfffea8091d30>
  File "/home/daniel/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1287, in _get_data
    success, data = self._try_get_data()
    │               │    └ <function _MultiProcessingDataLoaderIter._try_get_data at 0xffff240740d0>
    │               └ <torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0xfffea8091d30>
    └ False
  File "/home/daniel/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1148, in _try_get_data
    raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
                                                                                  └ '79134'

RuntimeError: DataLoader worker (pid(s) 79134) exited unexpectedly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant