make_lastz_chains icon indicating copy to clipboard operation
make_lastz_chains copied to clipboard

setup_genome_sequences fails if rerunning from the "cat" step

Open ning-y opened this issue 2 years ago • 3 comments

This is a low importance issue, because the use case is rare and the workaround is easy.

If make_lastz_chains is called with --continue_from_step cat, but the "cat" step has been run before, make_lastz_chains will fail with e.g.

### Trying to continue from step: cat
Making chains for /data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/200-genomes/hg38.2bit and /data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/210-repeatmasked/GCA_004027375.1_MacSob_v1_BIUU_genomic.2bit files, saving results to /data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/300-chain/hg38/GCA_004027375.1_MacSob_v1_BIUU_genomic/out
Pipeline started at 2024-02-28 12:56:25.148054
 * Setting up genome sequences for target
genomeID: hg38
input sequence file: /data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/200-genomes/hg38.2bit
is 2bit: True
planned genome dir location: /data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/300-chain/hg38/GCA_004027375.1_MacSob_v1_BIUU_genomic/out/target.2bit
Traceback (most recent call last):
  File "/data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/100-install/make_lastz_chains/./make_chains.py", line 261, in <module>
    main()
  File "/data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/100-install/make_lastz_chains/./make_chains.py", line 257, in main
    run_pipeline(args)
  File "/data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/100-install/make_lastz_chains/./make_chains.py", line 233, in run_pipeline
    setup_genome_sequences(args.target_genome,
  File "/data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/100-install/make_lastz_chains/modules/project_setup_procedures.py", line 172, in setup_genome_sequences
    os.symlink(arg_input_2bit, seq_dir)
FileExistsError: [Errno 17] File exists: '/data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/200-genomes/hg38.2bit' -> '/data/wanglf/home/e0175719/runs/toga-pipeline-1.2/inter/300-chain/hg38/GCA_004027375.1_MacSob_v1_BIUU_genomic/out/target.2bit'

Because the "target.2bit" file was symlinked from the earlier run, and the os.symlink fails if the destination already exists.

The user workaround is to delete "target.2bit" which fixes this easily.

The developer fix is to check and remove the destination if it exists, or use a symbolic linking function which tolerates already existing destinations. Checking briefly, the os.symlink function does not have this option.

ning-y avatar Feb 28 '24 06:02 ning-y

Thanks for reporting this. The correct behavior should be to NOT overwrite results that already exists (to prevent that somebody executes the pipe accidentally again). If the cat step was already run (either successfully, or not), then the user should clean it up and then continue with cat.

However, I agree that the script should output a proper warning + what the user needs to do. And the symlinks of the 2bit is likely not a stable solution. Bogdan, can you pls have a look at some point?

Thx a lot

MichaelHiller avatar Feb 29 '24 21:02 MichaelHiller

This error also appears if trying to run --continue_from_step fill_chains if either the target.2bit or query.2bit symlink is present in the project folder from the earlier run.

npaulat avatar Apr 28 '25 13:04 npaulat

Thanks for the info, but things are a little bit different on my side or I'm not sure if this is intended behavior.

One of my run failed at the chain_run step. So I removed both the target.2bit and query.2bit files from the project folder, and then started a rerun with the flag: --continue_from_step chain_run

The rerun started normally, however it started from the lastz step. Is this intended or did I mess something up?

NilaBlueshirt avatar May 08 '25 01:05 NilaBlueshirt