Panda preprocessing expression
In Panda preprocessing there was a problem with indices. Using gene2idx.get(x, 0) always give you the index 0 if x is missing from gene2idx.get (like a gene in gene expression and not in motif, since gene2idx is build on top of the intersection of expression and motif). Now we use gene_names to both create the indices for self.expression and to access with .loc[] the expression data frame self.expression_data
Hi @michelegentili93 , thanks! I just re-based the PR to the devel branch.
Thanks @michelegentili93, that's a great catch, so this affects cases where genes are in expression but not in motif and sets them all to the expression of the first gene.
Correct :) Thank you for the Python implementation and maintenance!
Il giorno mar 25 ott 2022 alle ore 13:46 Marouen @.***> ha scritto:
Thanks @michelegentili93 https://github.com/michelegentili93, that's a great catch, so this affects cases where genes are in expression but not in motif and sets them all to the expression of the first gene.
— Reply to this email directly, view it on GitHub https://github.com/netZoo/netZooPy/pull/275#issuecomment-1290931493, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADS5W64ZM5IJEC5DJ37RUPDWFAMIRANCNFSM6AAAAAARNKYFNY . You are receiving this because you were mentioned.Message ID: @.***>
@michelegentili93 How did you find out about this bug, did you get an error while running panda?
@michelegentili93 How did you find out about this bug, did you get an error while running panda?
I was running PUMA giving the df_correlation_matrix as input. And I noticed the values weren't the same.
Codecov Report
Base: 54.50% // Head: 54.74% // Increases project coverage by +0.23% :tada:
Coverage data is based on head (
4a86b8d) compared to base (793d88f). Patch coverage: 100.00% of modified lines in pull request are covered.
Additional details and impacted files
@@ Coverage Diff @@
## devel #275 +/- ##
==========================================
+ Coverage 54.50% 54.74% +0.23%
==========================================
Files 37 37
Lines 2343 2351 +8
==========================================
+ Hits 1277 1287 +10
+ Misses 1066 1064 -2
| Impacted Files | Coverage Δ | |
|---|---|---|
| netZooPy/panda/panda.py | 76.04% <100.00%> (+1.39%) |
:arrow_up: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
@violafanfani this is good to go. When aligning TFs or genes using processing mode 'intersection', and when gene expression has a nonoverlapping set of genes, these genes get assigned to index 0 instead of being discarded, I fixed it by simply restricting the indices to those of the intersection. In Matlab, this is equivalent to https://github.com/netZoo/netZooM/blob/master/netZooM/tools/processData.m#L120
This affects the first gene and first tf of panda and puma networks when gene expression has genes not present in motif and when they're run with 'intersection'. I've also added new MATLAB ground truth results and everything passes to 12 decimal digit in relative tolerance.
Please make this as a separate release after you release the GPU fix.
Ok, great job! Thanks for helping on this.