CoGAPS does not learn to specifed nPatterns when runnign in dsitributed mode
I am running CoGAPS on a small single-cell data set: 11623 genes x 900 cells. I have noticed that when I run CoGAPS in distributed mode, it will not produce the number of patterns I specified in nPatterns. Here is the full params stored in the result object: as
cogapsresult@metadata$params
-- Standard Parameters --
nPatterns 6
nIterations 500
seed 1234
sparseOptimization TRUE
distributed genome-wide
-- Sparsity Parameters --
alpha 0.01
maxGibbsMass 100
-- Distributed CoGAPS Parameters --
nSets 7
cut 6
minNS 4
maxNS 11
however, as you can see, only 4 patterns were learned:
cogapsresult
[1] "CogapsResult object with 11623 features and 900 samples"
[1] "4 patterns were learned"
Now, if I run not in distributed mode, it takes longer, but I get the number of patterns I asked for. Here are the parameters for this run:
cogapsresult@metadata$params
-- Standard Parameters --
nPatterns 6
nIterations 500
seed 1234
sparseOptimization TRUE
-- Sparsity Parameters --
alpha 0.01
maxGibbsMass 100
And the object itself:
cogapsresult
[1] "CogapsResult object with 11623 features and 900 samples"
[1] "6 patterns were learned"
I don't know why this is happening. I assumed I was overwriting some parameters when I created the distributed params object, but as you can see, the intended number of patterns is indeed being passed on the the CoGAPS function.
This data set is small, so I can afford to run in standard mode, but it's not scaleable without the ability to run distributed and generate the intended number of patterns. Could you please help me understand what's going on here? I'm hoping there's something simple I'm overlooking. Thanks!
This occurs because there’s a consensus step that seeks common patterns between the random sets of used for parallel analysis. This can happen when one of the sets contains a pattern that isn’t correlated with another, and therefore is added it. It can indicate you need a higher number of dimensions to capture the variation in your data.
On May 2, 2024, at 11:29 AM, dtatarak @.@.>> wrote:
I am running CoGAPS on a small single-cell data set: 11623 genes x 900 cells. I have noticed that when I run CoGAPS in distributed mode, it will not produce the number of patterns I specified in nPatterns. Here is the full params stored in the result object: as
@.***$params
-- Standard Parameters -- nPatterns 6 nIterations 500 seed 1234 sparseOptimization TRUE distributed genome-wide
-- Sparsity Parameters -- alpha 0.01 maxGibbsMass 100
-- Distributed CoGAPS Parameters -- nSets 7 cut 6 minNS 4 maxNS 11
however, as you can see, only 4 patterns were learned:
cogapsresult
[1] "CogapsResult object with 11623 features and 900 samples" [1] "4 patterns were learned"
Now, if I run not in distributed mode, it takes longer, but I get the number of patterns I asked for. Here are the parameters for this run:
@.***$params
-- Standard Parameters -- nPatterns 6 nIterations 500 seed 1234 sparseOptimization TRUE
-- Sparsity Parameters -- alpha 0.01 maxGibbsMass 100
And the object itself:
cogapsresult
[1] "CogapsResult object with 11623 features and 900 samples" [1] "6 patterns were learned"
I don't know why this is happening. I assumed I was overwriting some parameters when I created the distributed params object, but as you can see, the intended number of patterns is indeed being passed on the the CoGAPS function.
This data set is small, so I can afford to run in standard mode, but it's not scaleable without the ability to run distributed and generate the intended number of patterns. Could you please help me understand what's going on here? I'm hoping there's something simple I'm overlooking. Thanks!
— Reply to this email directly, view it on GitHubhttps://github.com/FertigLab/CoGAPS/issues/100, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AATMMKYTHY2WI6I6MGU65ALZAJLUHAVCNFSM6AAAAABHD6ICZGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI3TKOBSGU4TKNI. You are receiving this because you are subscribed to this thread.Message ID: @.***>
This occurs because there’s a consensus step that seeks common patterns between the random sets of used for parallel analysis. This can happen when one of the sets contains a pattern that isn’t correlated with another, and therefore is added it. It can indicate you need a higher number of dimensions to capture the variation in your data. On May 2, 2024, at 11:29 AM, dtatarak @.@.>> wrote: I am running CoGAPS on a small single-cell data set: 11623 genes x 900 cells. I have noticed that when I run CoGAPS in distributed mode, it will not produce the number of patterns I specified in nPatterns. Here is the full params stored in the result object: as @.$params … -- Standard Parameters -- nPatterns 6 nIterations 500 seed 1234 sparseOptimization TRUE distributed genome-wide -- Sparsity Parameters -- alpha 0.01 maxGibbsMass 100 -- Distributed CoGAPS Parameters -- nSets 7 cut 6 minNS 4 maxNS 11 however, as you can see, only 4 patterns were learned: cogapsresult [1] "CogapsResult object with 11623 features and 900 samples" [1] "4 patterns were learned" Now, if I run not in distributed mode, it takes longer, but I get the number of patterns I asked for. Here are the parameters for this run: @.$params -- Standard Parameters -- nPatterns 6 nIterations 500 seed 1234 sparseOptimization TRUE -- Sparsity Parameters -- alpha 0.01 maxGibbsMass 100 And the object itself: cogapsresult [1] "CogapsResult object with 11623 features and 900 samples" [1] "6 patterns were learned" I don't know why this is happening. I assumed I was overwriting some parameters when I created the distributed params object, but as you can see, the intended number of patterns is indeed being passed on the the CoGAPS function. This data set is small, so I can afford to run in standard mode, but it's not scaleable without the ability to run distributed and generate the intended number of patterns. Could you please help me understand what's going on here? I'm hoping there's something simple I'm overlooking. Thanks! — Reply to this email directly, view it on GitHub<#100>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AATMMKYTHY2WI6I6MGU65ALZAJLUHAVCNFSM6AAAAABHD6ICZGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI3TKOBSGU4TKNI. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Ok that makes sense. So that being the case, it sounds like while using distributed mode, it isn't possible to enforce a hard number of patterns in the final result. Is that correct?
Would you expect this behavior to be different between genome-wide and single-cell modes of distribution?
Thanks very much for the information!
Correct - it can’t be locked in unless you manually fix the number of patterns in the pattern matching step. This would happen for both genome-wide and single-cell, but you may get different results as one parallelizes in the genes and one the cells.
On May 2, 2024, at 11:56 AM, dtatarak @.@.>> wrote:
This occurs because there’s a consensus step that seeks common patterns between the random sets of used for parallel analysis. This can happen when one of the sets contains a pattern that isn’t correlated with another, and therefore is added it. It can indicate you need a higher number of dimensions to capture the variation in your data. On May 2, 2024, at 11:29 AM, dtatarak @.@.>> wrote: I am running CoGAPS on a small single-cell data set: 11623 genes x 900 cells. I have noticed that when I run CoGAPS in distributed mode, it will not produce the number of patterns I specified in nPatterns. Here is the full params stored in the result object: as @.$params …x-msg://234/# -- Standard Parameters -- nPatterns 6 nIterations 500 seed 1234 sparseOptimization TRUE distributed genome-wide -- Sparsity Parameters -- alpha 0.01 maxGibbsMass 100 -- Distributed CoGAPS Parameters -- nSets 7 cut 6 minNS 4 maxNS 11 however, as you can see, only 4 patterns were learned: cogapsresult [1] "CogapsResult object with 11623 features and 900 samples" [1] "4 patterns were learned" Now, if I run not in distributed mode, it takes longer, but I get the number of patterns I asked for. Here are the parameters for this run: @.$params -- Standard Parameters -- nPatterns 6 nIterations 500 seed 1234 sparseOptimization TRUE -- Sparsity Parameters -- alpha 0.01 maxGibbsMass 100 And the object itself: cogapsresult [1] "CogapsResult object with 11623 features and 900 samples" [1] "6 patterns were learned" I don't know why this is happening. I assumed I was overwriting some parameters when I created the distributed params object, but as you can see, the intended number of patterns is indeed being passed on the the CoGAPS function. This data set is small, so I can afford to run in standard mode, but it's not scaleable without the ability to run distributed and generate the intended number of patterns. Could you please help me understand what's going on here? I'm hoping there's something simple I'm overlooking. Thanks! — Reply to this email directly, view it on GitHub<#100https://github.com/FertigLab/CoGAPS/issues/100>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AATMMKYTHY2WI6I6MGU65ALZAJLUHAVCNFSM6AAAAABHD6ICZGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI3TKOBSGU4TKNI. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Ok that makes sense. So that being the case, it sounds like while using distributed mode, it isn't possible to enforce a hard number of patterns in the final result. Is that correct?
Would you expect this behavior to be different between genome-wide and single-cell modes of distribution?
Thanks very much for the information!
— Reply to this email directly, view it on GitHubhttps://github.com/FertigLab/CoGAPS/issues/100#issuecomment-2090879886, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AATMMK4FALXMISNTS2BLHLDZAJO4VAVCNFSM6AAAAABHD6ICZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJQHA3TSOBYGY. You are receiving this because you commented.Message ID: @.***>
Closing as answered.