Data sanity rules
What is the issue and why is it an issue?
Following the discussion in https://github.com/MobilityData/gbfs-validator/pull/106, consensus was reached to create a separate issue to discuss whether rules that verify data sanity should be added to the validator (ie: validation rules that are not part of the spec).
Examples of data sanity rules can be:
- station
capacityvalue should not be lower thannum_vehicles_available+num_docks_available -
reservation_price_flat_rateshould be in cents - etc
Open questions are:
- Should the validator remain a canonical validator (only validate rules that are part of the spec) or should it also help producers detect errors in the content of their data?
- What process or rationale should determine what these rules and their threshold values should be?
- Who shall bear the burden of maintaining these rules over time?
Please describe some potential solutions you have considered (even if they aren’t related to GBFS).
The code for a dozen data sanity rules can be found in the commits of https://github.com/MobilityData/gbfs-validator/pull/106. This code will be removed from that PR to separate data sanity rules (not part of the spec) and data compliance rules (part of the spec).
- the
last_updatedtimestamp should not be in the future
As the specification says about last_updated: "Indicates the last time data in the feed was updated. ", I would argue that alast_updated in the future does break the specification.
A clearer example of "validation rules that are not part of the spec" could be "the unlock price is < 100€": that rule is not part of the specification, but could detect systems where "€ cents" and € were confused.
I would argue that a
last_updatedin the future does break the specification.
You are right @tdelmas. I have updated the issue descriptions to keep this rule in https://github.com/MobilityData/gbfs-validator/pull/106. Thank you!