ValueError: A gene signature must have at least one gene.
Hi,
Thanks for Scenic+.
I am running the Scenic+ and getting an error on my data as well as the pbmc data after running
from scenicplus.wrappers.run_pycistarget import run_pycistarget run_pycistarget( region_sets = region_sets, species = 'homo_sapiens', save_path = os.path.join(work_dir, 'motifs'), ctx_db_path = rankings_db, dem_db_path = scores_db, path_to_motif_annotations = motif_annotation, run_without_promoters = True, n_cpu = 1, #_temp_dir = os.path.join(tmp_dir, 'ray_spill'), annotation_version = 'v10nr_clust', )
2023-07-24 20:07:12,591 pycisTarget_wrapper INFO pbmc_tutorial/motifs/DEM_topics_top_3_No_promoters folder already exists. 2023-07-24 20:07:12,772 pycisTarget_wrapper INFO Loading cisTarget database for DARs 2023-07-24 20:07:12,773 cisTarget INFO Reading cisTarget database
`ValueError Traceback (most recent call last)
Cell In[36], line 2
1 from scenicplus.wrappers.run_pycistarget import run_pycistarget
----> 2 run_pycistarget(
3 region_sets = region_sets,
4 species = 'homo_sapiens',
5 save_path = os.path.join(work_dir, 'motifs'),
6 ctx_db_path = rankings_db,
7 dem_db_path = scores_db,
8 path_to_motif_annotations = motif_annotation,
9 run_without_promoters = True,
10 n_cpu = 1,
11 #_temp_dir = os.path.join(tmp_dir, 'ray_spill'),
12 annotation_version = 'v10nr_clust',
13 )
File /data2/ccitu/software/scenicplus/src/scenicplus/wrappers/run_pycistarget.py:182, in run_pycistarget(region_sets, species, save_path, custom_annot, save_partial, ctx_db_path, dem_db_path, run_without_promoters, biomart_host, promoter_space, ctx_auc_threshold, ctx_nes_threshold, ctx_rank_threshold, dem_log2fc_thr, dem_motif_hit_thr, dem_max_bg_regions, annotation, motif_similarity_fdr, path_to_motif_annotations, annotation_version, n_cpu, _temp_dir, exclude_motifs, exclude_collection, **kwargs)
180 ## CISTARGET
181 regions = region_sets[key]
--> 182 ctx_db = cisTargetDatabase(ctx_db_path, regions)
183 if exclude_motifs is not None:
184 out = pd.read_csv(exclude_motifs, header=None).iloc[:,0].tolist()
File ~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pycistarget/motif_enrichment_cistarget.py:67, in cisTargetDatabase.__init__(self, fname, region_sets, name, fraction_overlap)
48 def __init__(self,
49 fname: str,
50 region_sets: Union[Dict[str, pr.PyRanges], pr.PyRanges] = None,
51 name: str = None,
52 fraction_overlap: float = 0.4):
53 """
54 Initialize cisTargetDatabase
55
(...)
65 Minimal overlap between query and regions in the database for the mapping.
66 """
---> 67 self.regions_to_db, self.db_rankings, self.total_regions = self.load_db(fname,
68 region_sets,
69 name,
70 fraction_overlap)
File ~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pycistarget/motif_enrichment_cistarget.py:131, in cisTargetDatabase.load_db(self, fname, region_sets, name, fraction_overlap)
129 if prefix is not None:
130 target_regions_in_db = [prefix + '__' + x for x in target_regions_in_db]
--> 131 target_regions_in_db = GeneSignature(name=name, gene2weight=target_regions_in_db)
132 db_rankings = db.load(target_regions_in_db)
133 if prefix is not None:
File <attrs generated init ctxcore.genesig.GeneSignature>:7, in __init__(self, name, gene2weight)
5 if _config._run_validators is True:
6 __attr_validator_name(self, __attr_name, self.name)
----> 7 __attr_validator_gene2weight(self, __attr_gene2weight, self.gene2weight)
File ~/miniconda3/envs/scenicplus/lib/python3.8/site-packages/ctxcore/genesig.py:172, in GeneSignature.gene2weight_validator(self, attribute, value)
169 @gene2weight.validator
170 def gene2weight_validator(self, attribute, value) -> None:
171 if len(value) == 0:
--> 172 raise ValueError("A gene signature must have at least one gene.")
ValueError: A gene signature must have at least one gene.`
Please help me in solving this.
Thanks in advance.
Hi @Citugulia40
Thanks for opening an issue. It looks like the regions in your region sets are not overlapping with the cistarget database you are using.
Which database are you using? Did you create a custom one based on your dataset?
Can you show the output of?:
region_sets
Best,
Seppe
The region_sets output:
{'topics_otsu': {'Topic1': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 35192543 | 35193043 | | chr1 | 73997502 | 73998002 | | chr1 | 9690509 | 9691009 | | chr1 | 108149602 | 108150102 | | ... | ... | ... | | chrX | 51341387 | 51341887 | | chrX | 63760345 | 63760845 | | chrX | 148602031 | 148602531 | | chrX | 129577059 | 129577559 | | chrY | 19701689 | 19702189 | +--------------+-----------+-----------+ Unstranded PyRanges object has 256 rows and 3 columns from 24 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic2': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 15524141 | 15524641 | | chr1 | 58783879 | 58784379 | | chr1 | 161389869 | 161390369 | | chr1 | 110338851 | 110339351 | | ... | ... | ... | | chrX | 17736981 | 17737481 | | chrX | 23743058 | 23743558 | | chrX | 46836672 | 46837172 | | chrX | 47217776 | 47218276 | | chrY | 13479857 | 13480357 | | chrY | 19567021 | 19567521 | | chrY | 2935735 | 2936235 | +--------------+-----------+-----------+ Unstranded PyRanges object has 6,465 rows and 3 columns from 24 chromosomes. For printing, the PyRanges was sorted on Chromosome.}, 'topics_top_3': {'Topic1': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 35192543 | 35193043 | | chr1 | 73997502 | 73998002 | | chr1 | 9690509 | 9691009 | | chr1 | 108149602 | 108150102 | | ... | ... | ... | | chrY | 14839383 | 14839883 | | chrY | 14518523 | 14519023 | | chrY | 6601166 | 6601666 | | chrY | 20575611 | 20576111 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,677 rows and 3 columns from 24 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic2': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 15524141 | 15524641 | | chr1 | 58783879 | 58784379 | | chr1 | 161389869 | 161390369 | | chr1 | 110338851 | 110339351 | | ... | ... | ... | | chrX | 16786133 | 16786633 | | chrX | 10014954 | 10015454 | | chrX | 18425128 | 18425628 | | chrX | 149505021 | 149505521 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,829 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome.}, 'DARs': {}}
I am taking the databases from
https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/
.tbl file from
https://resources.aertslab.org/cistarget/motif2tf/
Thank you so much for your help.
Hi @Citugulia40
The reason for the error is that you don't have any regions for "DARs", see
'DARs': {}
Best,
Seppe
Hi Seppe, I have the same issue with the dataset I'm analyzing currently. I can't detect any DARs- I tweaked some of the QC filters to get some signal. But still nothing. Does that simply mean that the landscape is not different? Do you have any thoughts on why this can happen?
HI @skoturan
It's difficult to answer this question on a general basis without any more information. I might be able to help if you provide some more context (with example outputs etc).
All the best,
Seppe