TimeMixer
TimeMixer copied to clipboard
Why is input normalization (mean/std) needed after scaling? Performance drops if removed
Hi, I noticed that in the code, there are two normalization steps for the input data:
- First, the data is scaled (e.g., MinMaxScaler or similar scaling method).
- Then, for each input sequence (with input length), the mean is subtracted and divided by the standard deviation.
I’m curious about the reason for applying the second normalization step (subtracting the mean and dividing by std) after the initial scaling. When I remove this second normalization across the input length, I observe a noticeable drop in model performance. Could you please explain why this additional normalization is necessary? What is the intuition or theoretical reason behind it?