Gracefully handle source regressions
Machine accepts bad data from source making it the last successful run and what gets packaged in the data downloads.
Example:Franklin County WA 28,619 addresses to 37
Marking the run as failed and returning the previously cached result would be much better for data consumers.
Machine doesn't know that it's bad data, but it could check to see if the row count changed significantly and flag it as an error. I think the system that @ingalls is working on to replace machine should provide for this.
Is there a roadmap or public code to track progress on this?
Yes, you can see the new processing system here: https://github.com/openaddresses/batch