Different result from available results
I am reproducing the MASH pipeline, and I found the result of mashr is different from the available results (produced by Hao),
the code in this step is
library(mashr) dat = readRDS('/mnt/vast/hpc/csg/rf2872/Work/MASH_csg.q/output/Ast_Exc_Inh_Mic_OPC_Oli.rds') set.seed(1) random.subset = sample(1:nrow(dat$random.b), min(6000, nrow(dat$random.b))) random.subset = mash_set_data(dat$random.b[random.subset,], dat$random.s[random.subset,], alpha=1, zero_Bhat_Shat_reset = 1E3) vhatprior = mash_estimate_corr_em(random.subset, readRDS('/mnt/vast/hpc/csg/rf2872/Work/MASH_csg.q/MASH_6_celltypes2/Ast_Exc_Inh_Mic_OPC_Oli.EZ.prior.rds'), max_iter = 6) vhat = vhatprior$V saveRDS(vhat, '/mnt/vast/hpc/csg/rf2872/Work/MASH_csg.q/MASH_6_celltypes/Ast_Exc_Inh_Mic_OPC_Oli.EZ.V_mle.rds')
Here is the result of my job,

and below is the result of Hao's job

The random subset may cause a tiny difference from others. But there are some of the values reversed. Is that normal?
@rfeng2023 The result should be identical if you set seed ... did you use the same container? It could make a diffeerence if major R version changes also changed behavior of seed such that set.seed(1) is not the same between different versions R or related libraries?
To solve the problem: since you are worried of the small differences particularly the sign differences, my suggestion is to rerun but using a much larger random.b sample and see how it works ? for example, take 4 random SNPs per gene so you get many such SNPs.
To solve the problem: since you are worried of the small differences particularly the sign differences, my suggestion is to rerun but using a much larger random.b sample and see how it works ? for example, take 4 random SNPs per gene so you get many such SNPs.
We can hold on to this. Because picking these SNPs should be built int othe updated pipeline with fine-mapping CS in mind. We can revisit at that point. But it is good for you (and important) that you keep documenting these problems. @rfeng2023