MIDAS icon indicating copy to clipboard operation
MIDAS copied to clipboard

TypeError: cannot pickle '_io.TextIOWrapper' object

Open Ivan-vechetti opened this issue 5 years ago • 7 comments

Hello, running

run_midas.py genes

Goes well but in the end, I get: E::idx_fin_and_load Could not retrieve index file for 'midas_output//genes/temp/pangenomes.bam'

And then when I run:

run_midas.py snps

Goes well but in the end, I get: TypeError: cannot pickle '_io.TextIOWrapper' object

Can someone help me with that?

Python 3.8.5

Ivan-vechetti avatar Dec 17 '20 21:12 Ivan-vechetti

It seems to be caused by:

def iopen(inpath, mode='r'):
        """ Open input file for reading regardless of compression [gzip, bzip] or python version """
        ext = inpath.split('.')[-1]
        # Python2
        if sys.version_info[0] == 2:
                if ext == 'gz': return gzip.open(inpath, mode)
                elif ext == 'bz2': return bz2.BZ2File(inpath, mode)
                else: return open(inpath, mode)
        # Python3
        elif sys.version_info[0] == 3:
                if ext == 'gz': return io.TextIOWrapper(gzip.open(inpath, mode))
                elif ext == 'bz2': return bz2.BZ2File(inpath, mode)
                else: return open(inpath, mode)

which is called by species_pileup() in pysam_pileup(). I'm guessing that the file handler is not actually closed in the subprocess, which is causing the serialization error.

nick-youngblut avatar Dec 19 '20 13:12 nick-youngblut

Actually, it seems to be due to passing the file hander in the args['log'] variable to species_pileup() via utility.parallel(). The file hander can't be serialized.

Changing:

def pysam_pileup(args, species, contigs):
        start = time()
        print("\nCounting alleles")
        args['log'].write("\nCounting alleles\n")

        # run pileups per species in parallel
        argument_list = []

to:

def pysam_pileup(args, species, contigs):
        start = time()
        print("\nCounting alleles")
        args['log'].write("\nCounting alleles\n")
        args['log'].close()      # new line

        # run pileups per species in parallel
        argument_list = []

Fixes the issue. It appears that the log file isn't actually written to by species_pileup anyway. I'll submit a PR

nick-youngblut avatar Dec 19 '20 13:12 nick-youngblut

Thank you so much for your reply. Where should I change that line?

Thanks once again

Ivan

On Sat, Dec 19, 2020 at 7:27 AM Nick Youngblut [email protected] wrote:

Actually, it seems to be due to passing the file hander in the args['log'] variable to species_pileup() via utility.parallel(). The file hander can't be serialized.

Changing:

def pysam_pileup(args, species, contigs): start = time() print("\nCounting alleles") args['log'].write("\nCounting alleles\n")

    # run pileups per species in parallel
    argument_list = []

to:

def pysam_pileup(args, species, contigs): start = time() print("\nCounting alleles") args['log'].write("\nCounting alleles\n") args['log'].close() # new line

    # run pileups per species in parallel
    argument_list = []

Fixes the issue. It appears that the log file isn't actually written to by species_pileup anyway. I'll submit a PR

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/snayfach/MIDAS/issues/112#issuecomment-748474903, or unsubscribe https://github.com/notifications/unsubscribe-auth/APD5TZLAWRNY7DXMTCYWR6TSVSS5BANCNFSM4VAJBYCQ .

Ivan-vechetti avatar Dec 19 '20 16:12 Ivan-vechetti

Check out the PR edits: https://github.com/snayfach/MIDAS/pull/113

nick-youngblut avatar Dec 19 '20 16:12 nick-youngblut

Hi Nick,

thanks for the input, but adding the line as you suggested caused the run to an early finish with this message: IndentationError: unindent does not match any outer indentation level

Regarding the gene run, the message below is normal?

Computing coverage of pangenomes E::idx_fin_and_load Could not retrieve index file for 'midas_output//genes/temp/pangenomes.bam'

On Sat, Dec 19, 2020 at 10:18 AM Nick Youngblut [email protected] wrote:

Check out the PR edits: #113 https://github.com/snayfach/MIDAS/pull/113

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/snayfach/MIDAS/issues/112#issuecomment-748493905, or unsubscribe https://github.com/notifications/unsubscribe-auth/APD5TZOKQIMW76MDI66KVNDSVTG3VANCNFSM4VAJBYCQ .

Ivan-vechetti avatar Dec 19 '20 18:12 Ivan-vechetti

My editor defaults to spaces, but MIDAS is written all with tabs. This caused the indentation error. I've fixed it. Also, I added a pop for the log variable, since it appears that closing the file handler didn't actually fix the serialization error. It should work now. At least, it works for me. There's no CI for the PRs, so it's untested for a broader set of envs (eg., different version of Ubuntu), but it should work.

nick-youngblut avatar Dec 19 '20 19:12 nick-youngblut

I tried this (although del instead of pop). This works for me.

Aiswarya-prasad avatar Oct 08 '22 19:10 Aiswarya-prasad