scenicplus ValueError: Cannot convert float NaN to integer

Hi, I am trying to run QC step and it shows error.

Cell In[45], line 14 11 from pycisTopic.qc import * 12 path_to_regions = {'D59':os.path.join(work_dir, 'scATAC/consensus_peak_calling/consensus_regions.bed')} ---> 14 metadata_bc, profile_data_dict = compute_qc_stats( 15 fragments_dict = fragments_dict, 16 tss_annotation = annot, 17 stats=['barcode_rank_plot', 'duplicate_rate', 'insert_size_distribution', 'profile_tss', 'frip'], 18 label_list = None, 19 path_to_regions = path_to_regions, 20 n_cpu = 5, 21 valid_bc = None, 22 n_frag = 100, 23 n_bc = None, 24 tss_flank_window = 1000, 25 tss_window = 10, 26 tss_minimum_signal_window = 100, 27 tss_rolling_window = 10, 28 remove_duplicates = True, 29 _temp_dir = os.path.join(tmp_dir + 'ray_spill')) 31 if not os.path.exists(os.path.join(work_dir, 'scATAC/quality_control')): 32 os.makedirs(os.path.join(work_dir, 'scATAC/quality_control'))

File ~/.local/lib/python3.8/site-packages/pycisTopic/qc.py:1033, in compute_qc_stats(fragments_dict, tss_annotation, stats, label_list, path_to_regions, n_cpu, partition, valid_bc, n_frag, n_bc, tss_flank_window, tss_window, tss_minimum_signal_window, tss_rolling_window, min_norm, check_for_duplicates, remove_duplicates, use_polars, **kwargs) ... 543 dtype=dtype, 544 copy=copy, 545 )

ValueError: Cannot convert float NaN to integer

Any help regarding this?

Thanks

Jul 10 '23 16:07 Ajeet1699

I am facing the same issue.

Please let me know if anyone able to solve it

Jul 18 '23 19:07 Citugulia40

@Ajeet1699 and @Citugulia40

This issue seems the same as this one: https://github.com/aertslab/pycisTopic/issues/81.

Could you run the code that I posted as a comment on that issue and report back?

https://github.com/aertslab/pycisTopic/issues/81#issuecomment-1641916325

Best,

Seppe

Jul 19 '23 12:07 SeppeDeWinter

I have the same error, just with 'profile_tss', I have no problem with the other metrics. I tried to run it @SeppeDeWinter , here is the output:

annot

Chromosome Start Strand Gene Transcript_type 90 chrHG1342_HG2282_PATCH 12923 -1 PRAMEF11 protein_coding 92 chrHG1342_HG2282_PATCH 30238 -1 HNRNPCL1 protein_coding 93 chrHG1342_HG2282_PATCH 38599 1 PRAMEF2 protein_coding 94 chrHG1342_HG2282_PATCH 67714 -1 PRAMEF4 protein_coding 95 chrHG1342_HG2282_PATCH 79783 -1 PRAMEF10 protein_coding ... ... ... ... ... ... 249304 chr1 15617458 1 DDI2 protein_coding 249310 chr1 15659713 1 RSC1A1 protein_coding 249312 chr1 15684320 1 PLEKHM2 protein_coding 249313 chr1 15684390 1 PLEKHM2 protein_coding 249315 chr1 15684556 1 PLEKHM2 protein_coding

set(annot["Strand"])

{-1, 1}

from pycisTopic.utils import read_fragments_from_file

fragments=read_fragments_from_file(fragments_dict["wt1"])

fragments

Chromosome Start Strand Gene Transcript_type 90 chrHG1342_HG2282_PATCH 12923 -1 PRAMEF11 protein_coding 92 chrHG1342_HG2282_PATCH 30238 -1 HNRNPCL1 protein_coding 93 chrHG1342_HG2282_PATCH 38599 1 PRAMEF2 protein_coding 94 chrHG1342_HG2282_PATCH 67714 -1 PRAMEF4 protein_coding 95 chrHG1342_HG2282_PATCH 79783 -1 PRAMEF10 protein_coding ... ... ... ... ... ... 249304 chr1 15617458 1 DDI2 protein_coding 249310 chr1 15659713 1 RSC1A1 protein_coding 249312 chr1 15684320 1 PLEKHM2 protein_coding 249313 chr1 15684390 1 PLEKHM2 protein_coding 249315 chr1 15684556 1 PLEKHM2 protein_coding

annotation = annot

flank_window = 1000

tss_space_annotation = annotation[["Chromosome", "Start", "Strand"]]

tss_space_annotation["End"] = tss_space_annotation["Start"] + flank_window

tss_space_annotation["Start"] = tss_space_annotation["Start"] - flank_window

tss_space_annotation = tss_space_annotation[ ["Chromosome", "Start", "End", "Strand"]]

tss_space_annotation = pr.PyRanges(tss_space_annotation)

overlap_with_TSS = fragments.join(tss_space_annotation, nb_cpu=1).df

overlap_with_TSS

Chromosome	Start	End	Name	Score	Start_b	End_b	Strand

0 chr1 922601 922941 GAGGTCCAGGCGCTTC-1 1 922923 924923 NaN 1 chr1 922650 922999 GCGGTGTTCGTAGCGC-1 1 922923 924923 NaN 2 chr1 922655 923037 AAATGAGGTGGGTAGT-1 1 922923 924923 NaN 3 chr1 922682 922934 AGACAAATCGTGATAC-1 1 922923 924923 NaN 4 chr1 922728 923031 CTTGAAGAGACGCCAA-1 1 922923 924923 NaN ... ... ... ... ... ... ... ... ... 62558513 chrY 20756019 20756181 AGCCTGGTCACACGTA-1 1 20755108 20757108 NaN 62558514 chrY 20756029 20756092 GCATGATCAATGATGA-1 1 20755108 20757108 NaN 62558515 chrY 20756212 20756381 ACAAGCTCAGGTGGTA-1 1 20755108 20757108 NaN 62558516 chrY 20756323 20756488 TTATGTCAGTCACGCC-1 1 20755108 20757108 NaN 62558517 chrY 20756460 20756493 AATGGCTCATAGTCCA-1 1 20755108 20757108 NaN

set(overlap_with_TSS["Strand"])

{nan}

Jul 20 '23 07:07 MariaRosariaNucera

This issue might be solved now, can you check this out? https://github.com/aertslab/pycisTopic/issues/81#issuecomment-1643460879

Best,

Seppe

Jul 20 '23 08:07 SeppeDeWinter

This issue might be solved now, can you check this out? aertslab/pycisTopic#81 (comment)

Best,

Seppe

Yes, I followed this and it is fixed for me now. Thank you!

Jul 20 '23 08:07 MariaRosariaNucera