EOFError: Ran out of input
Hello,
Thank you for your prompt assisting while helping solving the previous issue.
Now, I got some error while do training. Attached is the traceback. Thanks traceback_train.txt
Please checkout the commit a1179c8. Then edit the script train.py: find the line for sample in self.train_loader: (should be line 220) and insert 5 lines right above that.
print(f"train_loader length = {len(self.train_loader)}")
print("print_dbg_info_dataloader() call:")
print_dbg_info_dataloader(self.train_loader)
print("print_dbg_info_dataloader() exited")
e()
for sample in self.train_loader:
gt_cls0_label, gt_cls1_label, gt_cls2_label = sample['labels_gt']
...
Then launch the script by
python train.py dataset=ShanghaiTech_part_B > train_loader.txt 2>&1
and send me the generated file train_loader.txt.
I wonder if the issue is specific to your windows environment. Do you have a Linux OS installation / Linux machine at hand (maybe a virtual machine)? Could you install the required packages simply by pip3 install <package_name>, download the data, git clone my repo and run my scripts? (Python 3.6 or above is required.)
I have successfully run the scripts on a few Linux servers (have not encountered any issues).
Yeah that's why, I don't have linux machine. Yes I did install using pip install package name with python 3.6. Thank you for helping me out.
@darissa Please discard all changes and checkout the commit a53b52d. Then open the script train.py, find all occurrences of num_workers=4 and replace them by num_workers=0 (there are 3 occurrences). Launch the script again:
python train.py dataset=ShanghaiTech_part_B
Thanks a lot. I able to run train.py (following you latest instruction), now using virtual ubuntu. I'll report to you if any bug or error happen. Thanks you.
OK.
After initializing num_workers=0, the script should work on Windows, too (I hope).
(The issue was likely related to this one: https://discuss.pytorch.org/t/dataloader-multiprocessing-error-cant-pickle-odict-keys-objects-when-num-workers-0/43951)
I tried on Windows, using epochs=3. No error found but the epoch stays 0/3 and then stop without displaying an error. Also, no expected output, only event file and log file.
What is the hardware you are trying to train on? (When you tried on Windows)
I am using GPU GeForce RTX 2070.