xqtl-protocol icon indicating copy to clipboard operation
xqtl-protocol copied to clipboard

Different result from available results

Open rfeng2023 opened this issue 2 years ago • 2 comments

I am reproducing the MASH pipeline, and I found the result of mashr is different from the available results (produced by Hao), the code in this step is

library(mashr) dat = readRDS('/mnt/vast/hpc/csg/rf2872/Work/MASH_csg.q/output/Ast_Exc_Inh_Mic_OPC_Oli.rds') set.seed(1) random.subset = sample(1:nrow(dat$random.b), min(6000, nrow(dat$random.b))) random.subset = mash_set_data(dat$random.b[random.subset,], dat$random.s[random.subset,], alpha=1, zero_Bhat_Shat_reset = 1E3) vhatprior = mash_estimate_corr_em(random.subset, readRDS('/mnt/vast/hpc/csg/rf2872/Work/MASH_csg.q/MASH_6_celltypes2/Ast_Exc_Inh_Mic_OPC_Oli.EZ.prior.rds'), max_iter = 6) vhat = vhatprior$V saveRDS(vhat, '/mnt/vast/hpc/csg/rf2872/Work/MASH_csg.q/MASH_6_celltypes/Ast_Exc_Inh_Mic_OPC_Oli.EZ.V_mle.rds')

Here is the result of my job, image

and below is the result of Hao's job image

The random subset may cause a tiny difference from others. But there are some of the values reversed. Is that normal?

rfeng2023 avatar Mar 10 '23 19:03 rfeng2023

@rfeng2023 The result should be identical if you set seed ... did you use the same container? It could make a diffeerence if major R version changes also changed behavior of seed such that set.seed(1) is not the same between different versions R or related libraries?

To solve the problem: since you are worried of the small differences particularly the sign differences, my suggestion is to rerun but using a much larger random.b sample and see how it works ? for example, take 4 random SNPs per gene so you get many such SNPs.

gaow avatar Mar 14 '23 21:03 gaow

To solve the problem: since you are worried of the small differences particularly the sign differences, my suggestion is to rerun but using a much larger random.b sample and see how it works ? for example, take 4 random SNPs per gene so you get many such SNPs.

We can hold on to this. Because picking these SNPs should be built int othe updated pipeline with fine-mapping CS in mind. We can revisit at that point. But it is good for you (and important) that you keep documenting these problems. @rfeng2023

gaow avatar Mar 14 '23 21:03 gaow