S-DCNet icon indicating copy to clipboard operation
S-DCNet copied to clipboard

EOFError: Ran out of input

Open darissa opened this issue 5 years ago • 10 comments

Hello,

Thank you for your prompt assisting while helping solving the previous issue.

Now, I got some error while do training. Attached is the traceback. Thanks traceback_train.txt

darissa avatar May 29 '20 17:05 darissa

Please checkout the commit a1179c8. Then edit the script train.py: find the line for sample in self.train_loader: (should be line 220) and insert 5 lines right above that.

            print(f"train_loader length = {len(self.train_loader)}")
            print("print_dbg_info_dataloader() call:")
            print_dbg_info_dataloader(self.train_loader)
            print("print_dbg_info_dataloader() exited")
            e()
            for sample in self.train_loader:
                gt_cls0_label, gt_cls1_label, gt_cls2_label = sample['labels_gt']
                ...

Then launch the script by

python train.py dataset=ShanghaiTech_part_B > train_loader.txt 2>&1

and send me the generated file train_loader.txt.

dmburd avatar May 29 '20 21:05 dmburd

As attached.

train_loader.txt

darissa avatar May 30 '20 03:05 darissa

I wonder if the issue is specific to your windows environment. Do you have a Linux OS installation / Linux machine at hand (maybe a virtual machine)? Could you install the required packages simply by pip3 install <package_name>, download the data, git clone my repo and run my scripts? (Python 3.6 or above is required.) I have successfully run the scripts on a few Linux servers (have not encountered any issues).

dmburd avatar May 30 '20 10:05 dmburd

Yeah that's why, I don't have linux machine. Yes I did install using pip install package name with python 3.6. Thank you for helping me out.

darissa avatar May 30 '20 14:05 darissa

@darissa Please discard all changes and checkout the commit a53b52d. Then open the script train.py, find all occurrences of num_workers=4 and replace them by num_workers=0 (there are 3 occurrences). Launch the script again:

python train.py dataset=ShanghaiTech_part_B

dmburd avatar May 31 '20 13:05 dmburd

Thanks a lot. I able to run train.py (following you latest instruction), now using virtual ubuntu. I'll report to you if any bug or error happen. Thanks you.

darissa avatar Jun 01 '20 05:06 darissa

OK. After initializing num_workers=0, the script should work on Windows, too (I hope). (The issue was likely related to this one: https://discuss.pytorch.org/t/dataloader-multiprocessing-error-cant-pickle-odict-keys-objects-when-num-workers-0/43951)

dmburd avatar Jun 01 '20 07:06 dmburd

I tried on Windows, using epochs=3. No error found but the epoch stays 0/3 and then stop without displaying an error. Also, no expected output, only event file and log file.

darissa avatar Jun 03 '20 04:06 darissa

What is the hardware you are trying to train on? (When you tried on Windows)

dmburd avatar Jun 03 '20 05:06 dmburd

I am using GPU GeForce RTX 2070.

darissa avatar Jun 09 '20 02:06 darissa