[Question, possible BUG] Output of _TCNModule is a function of input_chunk_length, not output_chunk_length
Hello!
I've been working with the _TCNModule of the library, and only noticed now that the size of the output is the following shape:
x = x.view(batch_size, self.input_chunk_length, self.target_size, self.nr_params)
This strikes me as a bit odd; why would we want to produce target predictions for the past? In that case the model ought to just output the targets it has access to directly.
Best regards, Daniel
To be more specific, this is the part of the code I am referencing: https://github.com/unit8co/darts/blob/6096968537da2a6d45591692329fdbdea48b8829/darts/models/forecasting/tcn_model.py#L234-L248
The reason this is working right now is that input_chunk_length is passed as the length parameter in TCNModel:
https://github.com/unit8co/darts/blob/6096968537da2a6d45591692329fdbdea48b8829/darts/models/forecasting/tcn_model.py#L478-L493
But this means that losses are calculated based on entries not related to output_chunk_length.
After reading the documentation I noticed the following comment, which seems to indicate that this is intentional behavior: https://github.com/unit8co/darts/blob/6096968537da2a6d45591692329fdbdea48b8829/darts/models/forecasting/tcn_model.py#L177-L183
It seems different from other implementations of TCNs (https://github.com/locuslab/TCN for example).
The reason this could be a problem is that this indeed does result in the training losses being based on the targets of previous time steps. As a user, I don't particularly care for this I just want the model to be able to predict the future output_chunk_length number of targets accurately.
I would just like confirmation whether I've missed something, if I didn't miss anything I would be down to create a PR to try and change it.
Hi @DanielBergThomsen, and sorry for the late response.
Thanks for opening this issue. I think you are right, the loss shouldn't be computed on the target values that were fed to the model as input.
I raised PR #2006 to address this. Feel free to review, or suggest changes if you thought of a different way of solving it.
Hi @DanielBergThomsen. While this might seem counter-intuitive, it should make sense given how the TCN model works and how its training data is structured.
The base TCN model (not specifically adapted to the forecasting use case) transforms a sequence of a given length into another sequence of the same length using causal convolutions. The way we adapt this to the forecasting use case is by shifting the output (target sequence) by the number of time steps we want to forecast (for more details check out our blog post https://unit8.com/resources/temporal-convolutional-networks-and-forecasting/). Because we use convolutions, the learned weights are the same regardless of which timestamp we are predicting. You can think of the convolutional kernel as learning to model the time series a given number of steps in advance, regardless of position. And, because the convolutions are causal, predictions are only dependent on datapoints that precede that time step. Thus, this kernel can be trained at any point in the time series, and it thus is ok to compute the loss on the whole sequence. There is a caveat: For the points at the beginning of the output sequence, a part of their input consists of zero-padding, so there is less learning happening there. But to utilize the data to its fullest extent, it still makes sense to train on them. As a side note, to my knowledge this is more or less how a lot of language transformers are trained as well.
When it comes to the actual prediction, however, we indeed only care about the last output_chunk_length timestamps of the output and disregard the rest.
TLDR: You didn't miss anything, this is intentional!