spladder local variable 'genes2' referenced before assignment

spladder version:3.0.4
Python version:3.8.15
Operating System:CentOS

Description

I was trying to optimize a run a large cohort by using the "--chunked-merge" mechanism. I first generated the individual graphs. I then tried to merge using the "--chunked-merge" mechanism, which failed with an error listed below. The merge runs ok if I omit the "--chunked-merge" parameters, but since the cohort is too large I can't get away with this this.

What I Did

spladder build                          --parallel 1             -o /pathTo/spladder             -a /pathTo/genome.gtf             -b /pathTo/alignments.txt             --merge-strat merge_graphs             --no-extract-ase             --no-quantify-graph             --chunked-merge 0 1 0 2
merging level 0 chunk 0 to 0
Traceback (most recent call last):
  File "/pathTo/bin/spladder", line 8, in <module>
    sys.exit(main())
  File "/pathTo/lib/python3.8/site-packages/spladder/spladder.py", line 229, in main
    options.func(options)
  File "/pathTo/lib/python3.8/site-packages/spladder/spladder_build.py", line 88, in spladder
    run_merge(options.samples, options)
  File "/pathTo/lib/python3.8/site-packages/spladder/merge.py", line 253, in run_merge
    merge_genes_by_splicegraph(options, merge_list=merge_list[chunk_start:chunk_end], fn_out=fn)
  File "/pathTo/lib/python3.8/site-packages/spladder/merge.py", line 203, in merge_genes_by_splicegraph
    genes = genes2.copy()
UnboundLocalError: local variable 'genes2' referenced before assignment

Apr 01 '24 17:04 ls233

Dear @ls233 ,

Thanks for reaching out. How many samples are you working on? I have written a short description of how to use the chunked merge in #136 . The issue might stem from the fact that levels are 1-based. That is, if we assume you have 16 samples, you could merge them in two levels with chunksize 4. Thus, you would call SplAdder merge 5 times in total:

... --chunksize 4 --chunked-merge 1 2 0 4 ...
... --chunksize 4 --chunked-merge 1 2 4 8 ...
... --chunksize 4 --chunked-merge 1 2 8 12 ...
... --chunksize 4 --chunked-merge 1 2 12 16 ...
... --chunksize 4 --chunked-merge 2 2 0 4 ...

The second level can only be started once the first level runs are completed. I omitted all other options that are irrelevant for chunking.

Let me know how it goes.

Cheers,

Andre

Apr 03 '24 17:04 akahles

thanks Andre, indeed moving from 0- to 1-based levels help. It's just that on the other thread you seem to have suggested that the level were 0-based, therefore, thanks for the clarifications.

I'm working with about 1800 samples and now, with the merge step in the back mirror, I'm struggling with the last step, which is calling the AS events. After running for a few days the runs are just dying off, producing no error message.

May 12 '24 17:05 ls233

Thanks for getting back. If the graphs get very complex, calling events can be a bit of a struggle. We had a few ideas on how to improve this, but did not get a chance yet to try them out. Anyways, in the meantime, I can suggest a few workarounds:

call event types independently via --event-types (some are easier to call than others and you can run the event types in parallel, rather than sequentially)
skip overly complex genes via --ase-edge-limit (the default will skip any gene with more than 500 edges; you could lower this further)
make sure numba is installed (this will be much faster for event detection)

May 13 '24 09:05 akahles

thanks Andre

call event types independently via --event-types and run in parallel - we had it implemented already as part of our snakemake pipeline
skip overly complex genes via --ase-edge-limit - what is that lower value you'd recommend we use?
make sure numba is installed - we already had installed

May 13 '24 17:05 ls233