Additional regressors
- Orion version: Current on pip
- Python version: 3.7.10
- Operating System: Windows 10
Description
Hello there,
I would like to use additional regressors, as I understand it this is possible. In my specific case, demand forecasting, I am attempting to see if we can use tadGAN for difficult time series in which the outliers are not captured by using difference from a LSTM-RNN model which uses additional information.
For example, when we are trying to find an outlier for the sales of coke each day, the model may work better if it was able to tell see if the product was on sale or not. Which would be a simple binary flag 0 OFF 1 ON. Of course there are other regressors that could be added i.e. price.
My question is how would I go about adding the additional regressors in? I don't see an example in the documented examples/demos, hope this isn't a stupid question.
All the best
Graeme
Hi @AugustComte! I believe what you are referring is not supported at the moment by Orion. To further understand your problem, you want to input a multivariate time series where you have two variables (num_sales, on_sale) for example
| timestamp | num_sales |
on_sale |
|---|---|---|
| 1222819200 | 150 | 1 |
| 1222840800 | 0 | 0 |
| 1222862400 | 128 | 1 |
| 1222884000 | 93 | 1 |
and expect that when on_sale is zero then there are no sales. Then you can use that information to locate anomalies?
Hi @sarahmish,
Almost, I'm looking at data similar to the table below. I've added a few more columns just to be closer to the kind of problem I'm looking at, depending on the problem some are more or less useful.
Typically the outliers are point anomalies, that are larger (less often lower) than would expected, given other columns, such as:
- n_sale flag = 0 but num_sales are 150 when 50 would be expected.
- n_sale flag = 1 but num_sales are 300 when 150 would be expected, because sale_type is A not B.
- n_sale flag = 1 but num_sales are 150 when 300 would be expected, because sale_type is B not A.
While sales patterns are often regular i.e. every 2 weeks, they can be more erratic, so a univariate input does not accurately pick up the outliers, compared to a RNN with a low and high pass filter or an isolation forest.
In rare cases, data is mislabelled for periods such as week, and should be seen contextual anomalies, but the lack of additional information makes to difficult for the GAN approach.
I can share examples if you think it would help, as the data in anonymised.
Most my data is actually panel data, but I tested the GANs on individual case.
| timestamp | num_sales | price | on_sale | sale_type |
|---|---|---|---|---|
| 1631871570 | 150 | 1.8 | 1 | A |
| 1631957970 | 144 | 1.8 | 1 | A |
| 1632044370 | 128 | 2.5 | 0 | 0 |
| 1632130770 | 37 | 2.5 | 0 | 0 |
| 1632217170 | 40 | 2.5 | 0 | 0 |
| 1632303570 | 39 | 2.5 | 0 | 0 |
| 1632389970 | 198 | 1.2 | 1 | B |
| 1632476370 | 211 | 1.2 | 1 | B |
| 1632562770 | 203 | 1.2 | 1 | B |
| 1632649170 | 350 | 1.2 | 1 | B |
| 1632735570 | 42 | 2.5 | 0 | 0 |
| 1632821970 | 38 | 2.5 | 0 | 0 |
| 1632908370 | 150 | 2.5 | 0 | 0 |
Thanks for taking the time to respond
August