Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the code stop after showing ---------- Networks initialized ------------ #6

Open
CharisWg opened this issue Jul 1, 2024 · 5 comments

Comments

@CharisWg
Copy link

CharisWg commented Jul 1, 2024

After running ./train_track1.sh in the terminal, the code stops after displaying '----Networks initialized....'. I've included the settings from my train_track1.sh script. It appears that it is not progressing to the train.py stage. I am attempting to train CRNet on my personal dataset. Can you suggest where the steps or settings might be wrong?

#!/bin/bash
echo "Start to train the model...."
dataroot="./data/NTIRE_Val/" # including 'Train' and 'NTIRE_Val' floders
device='0'
name="coca"
build_dir="./ckpt/"$name

if [ ! -d "$build_dir" ]; then
mkdir $build_dir
fi

#LOG=./ckpt/$name/date +%Y-%m-%d-%H-%M-%S.txt
LOG="./ckpt/$name/$(date +%Y-%m-%d-%H-%M-%S).txt"
#echo "Using GPU with ID: $device"

python train.py
--dataset_name bracketire
--model cat
--name $name
--lr_policy step
--patch_size 128
--niter 400
--save_imgs True
--lr 1e-4
--dataroot $dataroot
--batch_size 36
--print_freq 500
--calc_metrics True
--weight_decay 0.01
--gpu_ids $device
-j 8
--lr_decay_iters 27
--block Convnext
--load_optimizers False
| tee $LOG

@CalvinYang0
Copy link
Owner

First, you may need to modify the ‘dataroot’ to your own data path. Second, you might want to check if there is any issue with the format of your dataset. Additionally, another possibility is that you are experiencing a CPU bottleneck or GPU bottleneck, resulting in insufficient iteration counts to print training information. You can check the usage of your CPU and GPU. If it’s a CPU bottleneck, you could manually augment your dataset instead of doing it during training. If it’s a GPU bottleneck, you could reduce the number of parameters in your model.

@CharisWg
Copy link
Author

CharisWg commented Jul 2, 2024

Thank you for your reply. I am using a GPU and trying to replicate your code with the dataset from your shared link, which is BracketIRE. The code stops after displaying '----Networks initialized....' and does not proceed to the training step. I am wondering if there might be some steps I did wrong when I tried to replicate your code.

@CalvinYang0
Copy link
Owner

I noticed that the last level of your ‘dataroot’ directory is ‘NTIRE_Val’. I suggest you try changing it to the parent directory, ‘./data’. Under the ‘dataroot’ directory, there should be two subfolders: ‘Train’ and ‘NTIRE_Val’. Since you didn’t receive any error messages and the process stopped at network initialization, it’s hard to diagnose the potential issue. Perhaps you could try using the BracketIRE code framework and then migrate our CRNet to that framework (although my framework should be consistent with theirs). Lastly, I apologize for any confusion. This is my first open-source code, so there are many imperfections in various aspects.

@CharisWg
Copy link
Author

CharisWg commented Jul 2, 2024

Thank you for your answer. I fixed the issue. You are doing well and explain things kindly and carefully

@maopengcheng924
Copy link

Thank you for your answer. I fixed the issue. You are doing well and explain things kindly and carefully

Hello, I have encountered the same problem as you. May I ask how you solved it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants