Error in load("data/2020plus_10k.Rdata") : error reading from connection
Hi KarchinLab:
This error occurred when 2020plus was about to run to completion.
2020plus_10k.Rdata has already located in the /data directory. ../../2020plus/2020plus-1.2.3/data/2020plus_10k.Rdata
`[Sat Dec 11 05:30:02 2021] rule predict_test: input: data/2020plus_10k.Rdata, 2021.12.10_2020plus_1/features.txt, 2021.12.10_2020plus_1/simulated_summary/simulated_features.txt output: 2021.12.10_2020plus_1/pretrained_output/results/r_random_forest_prediction.txt jobid: 1 resources: tmpdir=/tmp
python `which 2020plus.py` --log-level=INFO classify --trained-classifier data/2020plus_10k.Rdata --null-distribution 2021.12.10_2020plus_1/simulated_null_dist.txt --fe>
python `which 2020plus.py` --out-dir 2021.12.10_2020plus_1/pretrained_output --log-level=INFO classify -n 200 --trained-classifier data/2020plus_10k.Rdata -d .7 -o 1.0 >
Version: 1.2.3 Command: /home/data/vip13t22/wes_cancer/biosoft/2020plus/2020plus-1.2.3//2020plus.py --log-level=INFO classify --trained-classifier data/2020plus_10k.Rdata --null-distribution > Running Random forest . . . Type: <class 'rpy2.rinterface.RRuntimeError'> Exception: Error in load("data/2020plus_10k.Rdata") : error reading from connection
Traceback:
File "/home/data/vip13t22/wes_cancer/biosoft/2020plus/2020plus-1.2.3//2020plus.py", line 275, in
AN ERROR HAS OCCURRED: check the log file
[Sat Dec 11 05:30:05 2021] Error in rule predict_test: jobid: 1 output: 2021.12.10_2020plus_1/pretrained_output/results/r_random_forest_prediction.txt shell:
python `which 2020plus.py` --log-level=INFO classify --trained-classifier data/2020plus_10k.Rdata --null-distribution 2021.12.10_2020plus_1/simulated_null_dist.txt --fe>
python `which 2020plus.py` --out-dir 2021.12.10_2020plus_1/pretrained_output --log-level=INFO classify -n 200 --trained-classifier data/2020plus_10k.Rdata -d .7 -o 1.0 >
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/data/vip13t22/wes_cancer/biosoft/2020plus/2020plus-1.2.3/.snakemake/log/2021-12-10T142639.085265.snakemake.log`
This seems like it may be a path error to where the 2020plus_10k.Rdata is located, and therefore R throws an error. I suspect if you put the full path to the data directory in the config file it likely will fix your problem. Specifically, change the following line "data_dir: data/" to "data_dir: /your/full/path/data/" in "config.yaml". If you used ".." in the path for the config file, it is possible that python is reading that path correctly but R may be throwing an error (as suggested by your above error message).