reproducibility - sample new levels, testing data

Open bachlaw opened this issue 10 months ago • 0 comments

I find that even when a seed is set, extracted testing data predictions will not be reproducible if the default "sample new levels" option is left at TRUE, at least when there are new group levels in the test set and the dataset is large (50k plus). But if you set "sample new levels" to FALSE, the predictions are reproducible, as confirmed by the identical() function. In the sample problem, which admittedly involves a very small dataset, it doesn't seem to matter which value is used.

It makes more sense to me for the testing data predictions to be identical either way, but that may not be what was intended. In any event, I wanted to alert you to it. Thanks again for the great package.

Mar 20 '25 21:03 bachlaw