SurvPath
SurvPath copied to clipboard
Label Discretize Function Question
Thank you for your contribution.
I have a question regarding the discretize function. It appears that the quantilization step within the discretization process is applied to the entire dataset, including both the training and validation splits. This means that event time values from the validation set are also used to determine the quantile boundaries. Could this potentially lead to data leakage, since information from the validation set is influencing the discretization process applied during training?
https://github.com/mahmoodlab/SurvPath/blob/3f73ddd6705ec67d643020c5bb04fb13f9f382cc/datasets/dataset_survival.py#L238-L261