Orion Additional regressors

Orion version: Current on pip
Python version: 3.7.10
Operating System: Windows 10

Description

Hello there,

I would like to use additional regressors, as I understand it this is possible. In my specific case, demand forecasting, I am attempting to see if we can use tadGAN for difficult time series in which the outliers are not captured by using difference from a LSTM-RNN model which uses additional information.

For example, when we are trying to find an outlier for the sales of coke each day, the model may work better if it was able to tell see if the product was on sale or not. Which would be a simple binary flag 0 OFF 1 ON. Of course there are other regressors that could be added i.e. price.

My question is how would I go about adding the additional regressors in? I don't see an example in the documented examples/demos, hope this isn't a stupid question.

All the best

Graeme

Sep 06 '21 15:09 AugustComte

Hi @AugustComte! I believe what you are referring is not supported at the moment by Orion. To further understand your problem, you want to input a multivariate time series where you have two variables (num_sales, on_sale) for example

timestamp	`num_sales`	`on_sale`
1222819200	150	1
1222840800	0	0
1222862400	128	1
1222884000	93	1

and expect that when on_sale is zero then there are no sales. Then you can use that information to locate anomalies?

Sep 10 '21 22:09 sarahmish

Hi @sarahmish,

Almost, I'm looking at data similar to the table below. I've added a few more columns just to be closer to the kind of problem I'm looking at, depending on the problem some are more or less useful.

Typically the outliers are point anomalies, that are larger (less often lower) than would expected, given other columns, such as:

n_sale flag = 0 but num_sales are 150 when 50 would be expected.
n_sale flag = 1 but num_sales are 300 when 150 would be expected, because sale_type is A not B.
n_sale flag = 1 but num_sales are 150 when 300 would be expected, because sale_type is B not A.

While sales patterns are often regular i.e. every 2 weeks, they can be more erratic, so a univariate input does not accurately pick up the outliers, compared to a RNN with a low and high pass filter or an isolation forest.

In rare cases, data is mislabelled for periods such as week, and should be seen contextual anomalies, but the lack of additional information makes to difficult for the GAN approach.

I can share examples if you think it would help, as the data in anonymised.

Most my data is actually panel data, but I tested the GANs on individual case.

timestamp	num_sales	price	on_sale	sale_type
1631871570	150	1.8	1	A
1631957970	144	1.8	1	A
1632044370	128	2.5	0	0
1632130770	37	2.5	0	0
1632217170	40	2.5	0	0
1632303570	39	2.5	0	0
1632389970	198	1.2	1	B
1632476370	211	1.2	1	B
1632562770	203	1.2	1	B
1632649170	350	1.2	1	B
1632735570	42	2.5	0	0
1632821970	38	2.5	0	0
1632908370	150	2.5	0	0

Thanks for taking the time to respond

August

Sep 17 '21 10:09 AugustComte