stumpy icon indicating copy to clipboard operation
stumpy copied to clipboard

Add Tutorial for meter-swapping detection

Open Attol8 opened this issue 5 years ago • 4 comments

A tutorial that reproduces section 3.6 of this paper.

The supporting page with code and data for the paper is not present. I have downloaded the original dataset from here. @seanlaw do you think we should email them to provide us with the data and cleaning code that they have used? it would solve many time-consuming tasks and ensure 100% reproducibility.

As usual, I am going to submit a pull request with the notebook, so we can discuss the technical details in there.

Attol8 avatar Sep 11 '20 13:09 Attol8

@Attol8 Thank you for diving into this. For efficiency's sake, I think we may want to take a two pronged approach and:

  1. Reach out to Eamonn Keogh directly and explain that the accompanying website for the paper is empty and see if he is able to help. Unfortunately, I've tried to reach out to first author, Yan Zhu, before but haven't had much luck.
  2. Simultaneously, we should try to dig into it ourselves (I can help with this) and see how far we can get. I've often uncovered other issues when doing this and so it's a good sanity check.

What do you think?

seanlaw avatar Sep 11 '20 14:09 seanlaw

Hi @seanlaw. I agree that trying to dig into it ourselves may be the best approach. But, as you will see in the pull request here, the data they use is simply different from the original dataset, which implies that they took some preprocessing steps which would be interesting to uncover. However, the method works anyway so we can post a tutorial without reproducing the exact steps taken in the paper, which also proves that the methods actually work on slightly different datasets.

Also, the paper is not meant to be as detailed as possible (they often do not list key parameters and are vague on the code implementations), thus a slightly different implementation may still fit our use-case.

Let me know if you agree with me

Attol8 avatar Sep 11 '20 14:09 Attol8

But, as you will see in the pull request here, the data they use is simply different from the original dataset, which implies that they took some preprocessing steps which would be interesting to uncover. However, the method works anyway so we can post a tutorial without reproducing the exact steps taken in the paper, which also proves that the methods actually work on slightly different datasets.

I understand and agree. The most important thing is to convey the core idea (i.e., that one can actually detect a swap) using a clear example. To be pragmatic, I'll time box the data reproducibility part to Monday so that we move forward with what your PR proposes if we can't figure it out by then. I'll try to play around with the data a little and see what is going on.

seanlaw avatar Sep 11 '20 14:09 seanlaw

@seanlaw can you assign me to this issue

jaydurant avatar Jun 25 '21 21:06 jaydurant