There is no output even a warning, when I compute gene-gene distances with the function cal_ot_mat_from_numpy.
Hi!
I installed module gene_trajectory with pip in a conda env.I can comput the gene-gene distances with the seurat data in GeneTrajectory tutorial and the progress _bar are showed in screen. But when I comput my own seurat data(36077 features across 482 samples), there's nothing in screen. The number of gene used to compute gene-gene distances is 481 and meta-cells is 50. I run "gene.dist.mat <- cal_ot_mat_from_numpy(ot_cost = cg_output[["graph.dist"]], gene_expr = cg_output[["gene.expression"]], num_iter_max = 50000, show_progress_bar = TRUE)" in R for at least 8 hours with no output even a progress_bar. Is there something I missed?
Hope receive a reply~
Hi @Fufu-Hu,
I am not sure about what it could be.
- Can you check if anything is still running (e.g. using
topor the Task Manager)? - Can you let me know the size of the objects (e.g.
dim(cg_output[["graph.dist"]]),dim(cg_output[["gene.expression"]]))? I don't think it should be that slow is the size is 481x50, but it may be if you are using the full matrix. - Do you get any error or notifications when you start the cal_ot_mat_from_numpy function?
encounter similar problems.
it has been >4000 CPU hours, but without progress bar, for neither python or R. machine info: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz
the program seems working on another machine for the same data, the progress bar appeared in ~30 CPU hours. machine info: AMD Opteron(tm) Processor 6344
Hi @panyuwen,
It's hard to know what is going wrong in one machine when it works on another.
- Can you run the tutorial on the machine where it doesn't work?
- What kind of machine it is (linux / mac / win)?
- Can you let me know the size of the input objects?
- yes, I can run the human data tutorial on both machines. it takes about 10-20 CPU minutes from the beginning to the end of the gene.dist.mat step.
- linux. centos7
- about 50k cells x 10k genes, and default parameters.
using subset of my original data (17k cells x 10k genes), with default parameters, it takes about 2500 CPU hours from the beginning to the end of the gene.dist.mat step. the progress bar appeared during the final 6 mins (so only 6min recorded on the bar).
machine info: Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz ; centos7
@panyuwen
Do you also select the top genes and coarse grain cells? The reference steps in the tutorial are
genes = select_top_genes(adata, layer='counts')
gene_expression_updated, graph_dist_updated = coarse_grain_adata(adata, graph_dist=cell_graph_dist, features=genes, dims=10)
If so, what are the dimensions of gene_expression_updated and graph_dist_updated?
yes, I manually selected genes.
gene_expression_updated: (1000, 11352) graph_dist_updated: (1000, 1000)
11352 genes is a large number and calculating the earth mover distance is going to be very slow.
Try using ~2000 genes using select_top_genes or a similar approach