Orion icon indicating copy to clipboard operation
Orion copied to clipboard

Additional regressors

Open AugustComte opened this issue 4 years ago • 2 comments

  • Orion version: Current on pip
  • Python version: 3.7.10
  • Operating System: Windows 10

Description

Hello there,

I would like to use additional regressors, as I understand it this is possible. In my specific case, demand forecasting, I am attempting to see if we can use tadGAN for difficult time series in which the outliers are not captured by using difference from a LSTM-RNN model which uses additional information.

For example, when we are trying to find an outlier for the sales of coke each day, the model may work better if it was able to tell see if the product was on sale or not. Which would be a simple binary flag 0 OFF 1 ON. Of course there are other regressors that could be added i.e. price.

My question is how would I go about adding the additional regressors in? I don't see an example in the documented examples/demos, hope this isn't a stupid question.

All the best

Graeme

AugustComte avatar Sep 06 '21 15:09 AugustComte

Hi @AugustComte! I believe what you are referring is not supported at the moment by Orion. To further understand your problem, you want to input a multivariate time series where you have two variables (num_sales, on_sale) for example

timestamp num_sales on_sale
1222819200 150 1
1222840800 0 0
1222862400 128 1
1222884000 93 1

and expect that when on_sale is zero then there are no sales. Then you can use that information to locate anomalies?

sarahmish avatar Sep 10 '21 22:09 sarahmish

Hi @sarahmish,

Almost, I'm looking at data similar to the table below. I've added a few more columns just to be closer to the kind of problem I'm looking at, depending on the problem some are more or less useful.

Typically the outliers are point anomalies, that are larger (less often lower) than would expected, given other columns, such as:

  1. n_sale flag = 0 but num_sales are 150 when 50 would be expected.
  2. n_sale flag = 1 but num_sales are 300 when 150 would be expected, because sale_type is A not B.
  3. n_sale flag = 1 but num_sales are 150 when 300 would be expected, because sale_type is B not A.

While sales patterns are often regular i.e. every 2 weeks, they can be more erratic, so a univariate input does not accurately pick up the outliers, compared to a RNN with a low and high pass filter or an isolation forest.

In rare cases, data is mislabelled for periods such as week, and should be seen contextual anomalies, but the lack of additional information makes to difficult for the GAN approach.

I can share examples if you think it would help, as the data in anonymised.

Most my data is actually panel data, but I tested the GANs on individual case.

timestamp num_sales price on_sale sale_type
1631871570 150 1.8 1 A
1631957970 144 1.8 1 A
1632044370 128 2.5 0 0
1632130770 37 2.5 0 0
1632217170 40 2.5 0 0
1632303570 39 2.5 0 0
1632389970 198 1.2 1 B
1632476370 211 1.2 1 B
1632562770 203 1.2 1 B
1632649170 350 1.2 1 B
1632735570 42 2.5 0 0
1632821970 38 2.5 0 0
1632908370 150 2.5 0 0

Thanks for taking the time to respond

August

AugustComte avatar Sep 17 '21 10:09 AugustComte