Gracefully handle source regressions

Open pnoll1 opened this issue 5 years ago • 3 comments

Machine accepts bad data from source making it the last successful run and what gets packaged in the data downloads.

Example:Franklin County WA 28,619 addresses to 37

Marking the run as failed and returning the previously cached result would be much better for data consumers.

Jul 26 '20 19:07 pnoll1

Machine doesn't know that it's bad data, but it could check to see if the row count changed significantly and flag it as an error. I think the system that @ingalls is working on to replace machine should provide for this.

Jul 26 '20 20:07 iandees

Is there a roadmap or public code to track progress on this?

Jul 27 '20 20:07 pnoll1

Yes, you can see the new processing system here: https://github.com/openaddresses/batch

Jul 27 '20 21:07 ingalls