SurvPath icon indicating copy to clipboard operation
SurvPath copied to clipboard

Label Discretize Function Question

Open GYDDHPY opened this issue 7 months ago • 0 comments

Thank you for your contribution.

I have a question regarding the discretize function. It appears that the quantilization step within the discretization process is applied to the entire dataset, including both the training and validation splits. This means that event time values from the validation set are also used to determine the quantile boundaries. Could this potentially lead to data leakage, since information from the validation set is influencing the discretization process applied during training?

https://github.com/mahmoodlab/SurvPath/blob/3f73ddd6705ec67d643020c5bb04fb13f9f382cc/datasets/dataset_survival.py#L238-L261

GYDDHPY avatar Jun 09 '25 04:06 GYDDHPY