Orion icon indicating copy to clipboard operation
Orion copied to clipboard

Rolling predictions

Open AlexanderGeiger opened this issue 6 years ago • 0 comments

Description

We want to be able to train and predict our model several times in a moving manner, meaning we want to train a model and predict anomalies for the first n timestamps, move the window by some steps and repeat the procedure. Therefore we also need to remove any anomalies that we find before we train on that data in the next step.

Suggestion

We will introduce a new primitive that allows to specify intervals that should be dropped from the data. Then we can just use the found anomalies and exclude them from the signal while we iterate over it.

In Orion we could just specify the window_size that we want to use and modify the analyze method to iterate over the signal:

def analyze(pipeline, X):

    pipeline = _load_pipeline(pipeline)

    found_intervals = []
    found_events = []

    start = 0
    training_size = 2000
    testing_size = 2000

    while start < len(X) - training_size - testing_size:
        train_window = X[start:start+training_size]

        if start + testing_size < len(X) - training_size - testing_size:
            test_window = X[start + training_size - 250:start + training_size + testing_size]
        else:
            test_window = X[start + training_size - 250:]

        pipeline.fit(train_window, train_ind=True, intervals=found_intervals)
        events = pipeline.predict(test_window, train_ind=False, intervals=found_intervals)
        if len(events) > 0:
            for event in events:
                found_events.append(event)
                found_intervals.append((event[0], event[1]))

        start = start + testing_size

    if len(found_events) == 0:
        found_events = list()
        found_events.append([X.iloc[0]['timestamp'], X.iloc[0]['timestamp'], 0])

    found_events = pd.DataFrame(np.vstack(found_events), columns=['start', 'end', 'score'])
    found_events['start'] = found_events['start'].astype(int)
    found_events['end'] = found_events['end'].astype(int)

    return found_events

AlexanderGeiger avatar May 14 '19 22:05 AlexanderGeiger