scenicplus icon indicating copy to clipboard operation
scenicplus copied to clipboard

ValueError: A gene signature must have at least one gene.

Open Citugulia40 opened this issue 2 years ago • 5 comments

Hi,

Thanks for Scenic+.

I am running the Scenic+ and getting an error on my data as well as the pbmc data after running

from scenicplus.wrappers.run_pycistarget import run_pycistarget run_pycistarget( region_sets = region_sets, species = 'homo_sapiens', save_path = os.path.join(work_dir, 'motifs'), ctx_db_path = rankings_db, dem_db_path = scores_db, path_to_motif_annotations = motif_annotation, run_without_promoters = True, n_cpu = 1, #_temp_dir = os.path.join(tmp_dir, 'ray_spill'), annotation_version = 'v10nr_clust', )

2023-07-24 20:07:12,591 pycisTarget_wrapper INFO pbmc_tutorial/motifs/DEM_topics_top_3_No_promoters folder already exists. 2023-07-24 20:07:12,772 pycisTarget_wrapper INFO Loading cisTarget database for DARs 2023-07-24 20:07:12,773 cisTarget INFO Reading cisTarget database

`ValueError                                Traceback (most recent call last)
Cell In[36], line 2
      1 from scenicplus.wrappers.run_pycistarget import run_pycistarget
----> 2 run_pycistarget(
      3     region_sets = region_sets,
      4     species = 'homo_sapiens',
      5     save_path = os.path.join(work_dir, 'motifs'),
      6     ctx_db_path = rankings_db,
      7     dem_db_path = scores_db,
      8     path_to_motif_annotations = motif_annotation,
      9     run_without_promoters = True,
     10     n_cpu = 1,
     11     #_temp_dir = os.path.join(tmp_dir, 'ray_spill'),
     12     annotation_version = 'v10nr_clust',
     13     )

File /data2/ccitu/software/scenicplus/src/scenicplus/wrappers/run_pycistarget.py:182, in run_pycistarget(region_sets, species, save_path, custom_annot, save_partial, ctx_db_path, dem_db_path, run_without_promoters, biomart_host, promoter_space, ctx_auc_threshold, ctx_nes_threshold, ctx_rank_threshold, dem_log2fc_thr, dem_motif_hit_thr, dem_max_bg_regions, annotation, motif_similarity_fdr, path_to_motif_annotations, annotation_version, n_cpu, _temp_dir, exclude_motifs, exclude_collection, **kwargs)
    180 ## CISTARGET
    181 regions = region_sets[key]
--> 182 ctx_db = cisTargetDatabase(ctx_db_path, regions)  
    183 if exclude_motifs is not None:
    184     out = pd.read_csv(exclude_motifs, header=None).iloc[:,0].tolist()

File ~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pycistarget/motif_enrichment_cistarget.py:67, in cisTargetDatabase.__init__(self, fname, region_sets, name, fraction_overlap)
     48 def __init__(self, 
     49             fname: str,
     50             region_sets: Union[Dict[str, pr.PyRanges], pr.PyRanges] = None,
     51             name: str = None,
     52             fraction_overlap: float = 0.4):
     53     """
     54     Initialize cisTargetDatabase
     55     
   (...)
     65         Minimal overlap between query and regions in the database for the mapping.     
     66     """
---> 67     self.regions_to_db, self.db_rankings, self.total_regions = self.load_db(fname,
     68                                                       region_sets,
     69                                                       name,
     70                                                       fraction_overlap)

File ~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pycistarget/motif_enrichment_cistarget.py:131, in cisTargetDatabase.load_db(self, fname, region_sets, name, fraction_overlap)
    129 if prefix is not None:
    130     target_regions_in_db = [prefix + '__' + x for x in target_regions_in_db]
--> 131 target_regions_in_db = GeneSignature(name=name, gene2weight=target_regions_in_db)
    132 db_rankings = db.load(target_regions_in_db)
    133 if prefix is not None:

File <attrs generated init ctxcore.genesig.GeneSignature>:7, in __init__(self, name, gene2weight)
      5 if _config._run_validators is True:
      6     __attr_validator_name(self, __attr_name, self.name)
----> 7     __attr_validator_gene2weight(self, __attr_gene2weight, self.gene2weight)

File ~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/ctxcore/genesig.py:172, in GeneSignature.gene2weight_validator(self, attribute, value)
    169 @gene2weight.validator
    170 def gene2weight_validator(self, attribute, value) -> None:
    171     if len(value) == 0:
--> 172         raise ValueError("A gene signature must have at least one gene.")

ValueError: A gene signature must have at least one gene.`

Please help me in solving this.

Thanks in advance.

Citugulia40 avatar Jul 25 '23 02:07 Citugulia40

Hi @Citugulia40

Thanks for opening an issue. It looks like the regions in your region sets are not overlapping with the cistarget database you are using.

Which database are you using? Did you create a custom one based on your dataset?

Can you show the output of?:

region_sets

Best,

Seppe

SeppeDeWinter avatar Jul 27 '23 06:07 SeppeDeWinter

The region_sets output:

{'topics_otsu': {'Topic1': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 35192543 | 35193043 | | chr1 | 73997502 | 73998002 | | chr1 | 9690509 | 9691009 | | chr1 | 108149602 | 108150102 | | ... | ... | ... | | chrX | 51341387 | 51341887 | | chrX | 63760345 | 63760845 | | chrX | 148602031 | 148602531 | | chrX | 129577059 | 129577559 | | chrY | 19701689 | 19702189 | +--------------+-----------+-----------+ Unstranded PyRanges object has 256 rows and 3 columns from 24 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic2': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 15524141 | 15524641 | | chr1 | 58783879 | 58784379 | | chr1 | 161389869 | 161390369 | | chr1 | 110338851 | 110339351 | | ... | ... | ... | | chrX | 17736981 | 17737481 | | chrX | 23743058 | 23743558 | | chrX | 46836672 | 46837172 | | chrX | 47217776 | 47218276 | | chrY | 13479857 | 13480357 | | chrY | 19567021 | 19567521 | | chrY | 2935735 | 2936235 | +--------------+-----------+-----------+ Unstranded PyRanges object has 6,465 rows and 3 columns from 24 chromosomes. For printing, the PyRanges was sorted on Chromosome.}, 'topics_top_3': {'Topic1': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 35192543 | 35193043 | | chr1 | 73997502 | 73998002 | | chr1 | 9690509 | 9691009 | | chr1 | 108149602 | 108150102 | | ... | ... | ... | | chrY | 14839383 | 14839883 | | chrY | 14518523 | 14519023 | | chrY | 6601166 | 6601666 | | chrY | 20575611 | 20576111 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,677 rows and 3 columns from 24 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic2': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 15524141 | 15524641 | | chr1 | 58783879 | 58784379 | | chr1 | 161389869 | 161390369 | | chr1 | 110338851 | 110339351 | | ... | ... | ... | | chrX | 16786133 | 16786633 | | chrX | 10014954 | 10015454 | | chrX | 18425128 | 18425628 | | chrX | 149505021 | 149505521 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,829 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome.}, 'DARs': {}}

I am taking the databases from

https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/

.tbl file from

https://resources.aertslab.org/cistarget/motif2tf/

Thank you so much for your help.

Citugulia40 avatar Jul 28 '23 03:07 Citugulia40

Hi @Citugulia40

The reason for the error is that you don't have any regions for "DARs", see


'DARs': {}

Best,

Seppe

SeppeDeWinter avatar Jul 28 '23 08:07 SeppeDeWinter

Hi Seppe, I have the same issue with the dataset I'm analyzing currently. I can't detect any DARs- I tweaked some of the QC filters to get some signal. But still nothing. Does that simply mean that the landscape is not different? Do you have any thoughts on why this can happen?

skoturan avatar Mar 05 '24 17:03 skoturan

HI @skoturan

It's difficult to answer this question on a general basis without any more information. I might be able to help if you provide some more context (with example outputs etc).

All the best,

Seppe

SeppeDeWinter avatar Mar 07 '24 14:03 SeppeDeWinter