SequentialSamplingModels.jl icon indicating copy to clipboard operation
SequentialSamplingModels.jl copied to clipboard

Review for SequentialSamplingModels.jl

Open rsenne opened this issue 6 months ago • 12 comments

Hi all!

I'm one of your reviewers for the The Proceedings of the JuliaCon Conferences. Overall, I think the package is a great addition to the ecosystem, but I have comments on the following items that I would like to hear your rebuttal or see changes before signing off.

Comments About the Code

  • The does not have error handling of invalid parameter combinations. For example, if I specified the following DDM model: DDM(ν = 1.0, α = -0.8, z = -473.89, τ = -1.0) no error is thrown. I think using either ArgCheck.jl
  • Many of the simulate functions could benefit from better memory allocation approaches. One possibility is to use something like sizehint!
  • More of a personal matter, but i think restructuring the src directory to have subdirs like you do on the documentation would be great.
  • Using specific seeds for the Random module is discouraged. 1.) Using randomness can make tests more reliable 2.) If you want to use RNGs, use StableRNGs
  • Personal choice, but I would suggest making the testing folder its own julia "project" with its own Project.toml file. I think this will make things easier to debug and you can remove all of the repeated library imports.
  • Instead of overloading Random.rand, you could define a custom sampler for your models which could simplify the API for some of the models. Something like this perhaps (I wrote this quick so might need to iron out some details...):
# Abstract type for all samplers
abstract type AbstractSSMSampler end

# Generic sampler struct that holds model + auxiliary data
struct SSMSampler{M<:Union{SSM1D,SSM2D}, D} <: AbstractSSMSampler
    model::M
    data::D
end

# Single rand method that dispatches on sampler type
function rand(rng::AbstractRNG, s::SSMSampler)
    _rand(rng, s.model, s.data)
end
  • Some of the ways "invalid" RTs are handled is inconsistent. For example, in the DDM:
    if τ ≥ t
        return T(NaN)
    end

but in LBA:

rt < τ ? (return 1e-10) : nothing

I think choosing a small positive constant makes more sense, and I would make this consistent across the package. Unless there is a reason I am missing?

Documentation

Overall very few comments.

  • I would make sure you have a stable documentation not just a dev documentation.
  • I feel the contribution guidelines are a little light. I think it would also be great if they were more front and center? Maybe on the home page?
  • I would turn your examples in the docstrings to doctests.
  • Would you consider adding codecov? It would be nice to know how much code is tested.

Comments on the paper

  • There are some plot rendering issues on the plots. For example, there are overlapping labels in figure 4.
  • Mentioning again here, you link to the dev documentation. I would change this to stable.

rsenne avatar Jul 24 '25 23:07 rsenne

Accidentally hit enter before finishing gonna continue editing :)

rsenne avatar Jul 24 '25 23:07 rsenne

Thanks @rsenne ! Can you post your final review here: https://github.com/JuliaCon/proceedings-review/issues/186#issuecomment-2997967561 That way we have both reviews in the same place and the editor can more easily keep track of them.

kiante-fernandez avatar Jul 25 '25 04:07 kiante-fernandez

@rsenne, thank you for your feedback. I think we can make most of these changes without too much effort and it would improve the code base.

@kiante-fernandez here is a table of the issues. I assigned myself to a few of the issues. Please coordinate with me on any issues you would like to fix.

issue Complete Assignee Comments
The does not have error handling of invalid parameter combinations. For example, if I specified the following DDM model: DDM(ν = 1.0, α = -0.8, z = -473.89, τ = -1.0) no error is thrown. I think using either ArgCheck.jl itsdfish Argument checking is implemented via ArgCheck.jl.
Many of the simulate functions could benefit from better memory allocation approaches. One possibility is to use something like sizehint! itsdfish One challenge is that the size is a random variable. We might be able to define a heuristic based on a multiple of expected number of steps. It would be wise to benchmark efficiency gain first to assess cost benefit ratio. Update: sizehint! seems to make performance slower. Marked as complete unless others identify other performance improvements.
More of a personal matter, but i think restructuring the src directory to have subdirs like you do on the documentation would be great. itsdfish Now that the code base has increased, this is a good suggestion. I also did the same thing in the test folder.
Using specific seeds for the Random module is discouraged. 1.) Using randomness can make tests more reliable 2.) If you want to use RNGs, use StableRNGs itsdfish Most tests with with a seed test consistency between simulated distribution and pdf. If a seed is not used, there is a high chance of at least one spurious failure because test value is greater than tolerance by chance. I will remove seeds where not necessary, and use StableRNGs in necessary cases to enforce consistency across versions of Julia
Personal choice, but I would suggest making the testing folder its own julia "project" with its own Project.toml file. I think this will make things easier to debug and you can remove all of the repeated library imports. itsdfish Added a test specific environment, but SafeTestsets requires repeated imports because each test is encapsulated in its own module.
Instead of overloading Random.rand, you could define a custom sampler for your models which could simplify the API for some of the models. itsdfish Didn't implement this change because it only provided a small improvement, but it might be worth reconsidering if the code base grows.
I would make sure you have a stable documentation not just a dev documentation itsdfish Permissions in documentation.yml should now allow stable documentation to be generated.
I feel the contribution guidelines are a little light. I think it would also be great if they were more front and center? Maybe on the home page? itsdfish The front page is primarily for users to navigate the package. My concern is that putting contribution details on the front page would hinder easy navigation of the documentation with information that is not relevant for most people. On the main page, I added a section for "Contributing" which points to the development guidelines. In the guidelines, I added more information about the API and type system. I also added a page for filing issues and asking for help. In terms of adding more detail, I am open to specific suggestions or good examples to which we can aspire.
I would turn your examples in the docstrings to doctests. itsdfish I like the suggestion to use doctests to ensure the example run. However, it requires comparing the output of the last line to an expected output, which doesn't make sense in this particular case. Someday this requirement might be relaxed, but its been an issue for about 8 years.
Would you consider adding codecov? It would be nice to know how much code is tested. itsdfish codecov now reported in the README
Some of the ways "invalid" RTs are handled is inconsistent itsdfish After considering different options, it seems like @argcheck is the best option because it has an informative error message, but does not have a noticeable performance penalty. I will commit a fix this week.
There are some inconsistencies in how Julia packages are referenced. For example, in Section 6 you write "Turing" but in Code 3 you write "Turing.jl". I recommend including the ".jl" suffix in all references to Julia packages. itsdfish References to packages now include the .jl extension
There are some grammatical issues. For instance: "By doing so, SSMs provide a generative model for response time (RT) distributions of actions made by organisms capturing both the speed of the response and the response itself." should be “By doing so, SSMs provide a generative model for response time (RT) distributions of actions made by organisms capturing both the speed of the response and the response itself” Similarly, “SSMs popularity” should be “SSMs’ popularity” itsdfish Simplified to "By doing so, SSMs provide a generative model of joint choice-response time (RT) distributions. " and fixed other grammatical error
In figure 4, the X-axis tick labels are squished together and hard to read. Increasing the spacing or rotating the labels would improve readability. itsdfish I reduced the number of ticks to improve readability of the plot
I like that you give many examples of the code's functionality in the paper. I know these appear in the documentation, but it would be helpful to state what the parameters represent in the examples given in the paper to improve readability. itsdfish For examples 1 and 2, I defined the parameters in a list. For example 3, I noted that the LBA and RDM in example 2 have the same parameters, but differ in the source of stochasticity in the drift rate
The MCMC acronym is used without definition itsdfish In example 3, I defined the acronym MCMC and noted that we used the No U-Turn sampler in the example.
You write "stochastically (i.e., randomly)". It might be clearer to include this clarification the first time you use "stochastic". itsdfish In the abstract, I clarified that stochastic refers to a random process
To be consistent with your example, perhaps the figure one labels can be changed from "Choose left" and "Choose right" to "Choose Margherita" and "Choose pineapple". itsdfish Switched to Margherita and Pinapple.
Please ensure that the citation formatting is consistent and that all references include DOIs where available. itsdfish I enabled URLs and DOIs by changing \bibliographystyle{authordate1} to \bibliographystyle{plainnat}. I added missing DOIs.
In the affiliations I see that Christopher Fisher has been listed as an Independent Researcher. If you don't mind, can you clarify this choice? itsdfish The company I worked for did not clarify whether they are OK with me using their name on an unrelated project, so I decided to indicate independent researcher
I see that you have a clear section for those who wish to contribute to the software. It could also be helpful to include instructions for those wishing to report bugs or ask for help (e.g. directing them to open an issue or providing contact information). Otherwise, the docs look great to me. itsdfish I added a section for reporting issues here. We also have a developer guide section
I see that there are many code tests written to ensure everything is working as intended. It would be helpful to integrate something like CodeCov (https://about.codecov.io/) to show what percentage of the code is covered by tests and where there may be blind spots. You cite other software packages outside of Julia for working with SSMs. Do you have a sense about how SequentialSamplingModels.jl stacks up against them in terms of performance? I don't think this is necessary for acceptance, but I'd be curious to see any benchmarks or comparisons if you've explored that. itsdfish I added a codecov badge to the readme. Coverage is 89%. We don't have performance benchmarks, but that would be a good feature to add in the future

itsdfish avatar Jul 25 '25 07:07 itsdfish

Another question I realize i forgot to ask: how do you feel about extending some of these models to have distributions over parameters? I feel some of these models are only used when that is the case (the most prominent being the Extended DDM ala the Ratcliff homogeneous noise process)?

Also, it looks like you had a few issues with my suggestions. I'll do my best to provide help on these points:

I enabled codecov, but it recently provided an error about the coverage report not being uploaded. I'm not sure how to fix it.

Did you set up a CODECOV_TOKEN? See my CI here.

As far as I can tell, stable versions should generate automatically. I'm not sure how to fix it.

It doesn't look like you have the right permissions set up in your CI.yml file. I recommend using the default one supplied in the link.

rsenne avatar Jul 31 '25 20:07 rsenne

Thank you. I did not realize the token was missing. The codecov docs pointed me in the wrong direction. Right now it reports 88% coverage. I'll gradually increase the coverage.

I'll see if I can fix the permissions for stable documentation>

Another question I realize i forgot to ask: how do you feel about extending some of these models to have distributions over parameters? I feel some of these models are only used when that is the case (the most prominent being the Extended DDM ala the Ratcliff homogeneous noise process)?

Good question. Kiante started working on adding inter-trial variability in drift rate and non-decision time to the DDM, but run into some issues. I haven't had time to debug, but it will be a good addition to the package.

itsdfish avatar Jul 31 '25 23:07 itsdfish

Just wanted to check in on this. Hope the reviews are going well :)

rsenne avatar Oct 16 '25 17:10 rsenne

@rsenne thanks for the follow up. I think I was able to address most of the comments. Is there any action I need to take in order to proceed?

itsdfish avatar Oct 19 '25 10:10 itsdfish

Hi @itsdfish . Could you provide a quick write-up of any changes you couldn't make and just give me a quick 1-2 sentence blurb on why they weren't accomplished (Feel free to just copy past anything you've already written in the table above and then whatever else)? Overall, I think you've done a phenomenal job addressing my comments so I think is will be pretty quick. I just to keep mental tabs on what has been done and not.

I also want to check back on my other suggestion of making this work with distributions over the parameters? This seems pretty central to many aspects of these models (at least in my experience with variants of the DDM), maybe just an example in Turing how one could accomplish this (does this exist already)? Or at the least a discussion/open issue for extending the current model classes to support this. I myself, before I knew of the existence of this package, have written my own WFPT code and used FastGaussQuadrature.jl to integrate out the nuisance variables and get good model fits. I myself could even do this if you were interested.

After all this, I will let the editor know my review is complete. Then you will just have to address all the other comments from @ZachLoschin

rsenne avatar Oct 20 '25 12:10 rsenne

@rsenne, the main item I did not address was this:

Instead of overloading Random.rand, you could define a custom sampler for your models which could simplify the API for some of the models.

If I understood correctly, this change would only provide a small benefit and would not change the user experience.

Regarding your second question, the LBA, RDM, and Wald models have distributions over some parameters which are marginalized out analytically. Kiante has an PR in which he implementing the full DDM with a uniform distribution over non-decision time and a normal distribution for the drift rate across trials. Kiante made good progress, but ran into some issues. I haven't had time to inspect the code in detail. Your idea with FastGaussQuadrature.jl sounds interesting. At some point, we tried QuadGK.jl, but it didn't work well with automatic differentiation (AD). The problem is that AD propagated through the integration routine and slowed things down to a crawl. Perhaps there is a custom rule that can be written to prevent AD from propagating through the numeric integrator, but AD is a bit outside of my expertise. We would definitely appreciate a PR to the full DDM working. Before getting into the details, I recommend creating a simple model to test whether FastGaussQuadrature.jl plays well with ForwardDiff and Mooncake ADs.

This weekend, I will have time to address comments from ZachLoschin. I will incorporate them into the table above and ping you when it is complete. Thanks again!

itsdfish avatar Oct 20 '25 18:10 itsdfish

@kiante-fernandez, I was wondering whether you can address the final issue below? Thanks!

To be consistent with your example, perhaps the figure one labels can be changed from "Choose left" and "Choose right" to "Choose Margherita" and "Choose pineapple".

Update I was able to change the labels without a noticeable decrease in image quality. I was concerned that not having the original files would have resulted in loss in quality.

itsdfish avatar Oct 24 '25 12:10 itsdfish

@rsenne, I was able to address the remaining issues including the labeling of Figure 1. Please let me know if I further action is needed to proceed.

itsdfish avatar Oct 26 '25 09:10 itsdfish

I would revive the review thread and let the editor know that you are done with the reviews. You'll need a sign off from myself and @zachloschin . Should be fairly quick after that. I will probably try getting a working eDDM this week and see if at the least I can get a working example--though I won't let this stand in the way of publishing.

rsenne avatar Oct 27 '25 13:10 rsenne

https://zenodo.org/records/17932370?token=eyJhbGciOiJIUzUxMiJ9.eyJpZCI6IjkzZjFlODJkLWZkMzItNDJhYi04NzlkLTkzODIzNzg0MjZjYyIsImRhdGEiOnt9LCJyYW5kb20iOiJlYTAyNzEyM2M3ZDRlYWIzOTc1MWQ4OGQ3NzJkMmJjYyJ9.nZjcH-dLv-u34WHWhIGBJNZSXNgZg5D7sRkkSWBHY1Zmh3qq2DHaa7_-UPYOGlfidbCPtFfBxIqnwWeSllHvcA

itsdfish avatar Dec 14 '25 23:12 itsdfish