TIDES icon indicating copy to clipboard operation
TIDES copied to clipboard

📄🚀 – options for defining requirements for end-uses of TIDES data

Open botanize opened this issue 3 years ago • 4 comments

Describe the feature you want and how it meets your needs or solves a problem

People want a way of standardizing which optional fields are required for various end-uses of TIDES data. How do I tell a vendor I need data in TIDES format, with at least the fields required for NTD service supplied reporting?

Describe the solution you'd like

I prefer Fork Repo to Require Everything to Feature Flags.

Describe alternatives you've considered

  • Fork Repo: fork the TIDES repo, change the spec to make the fields you need required.
    • Pros:
      • forker maintains control over requirements
      • validation is easy with existing tools
    • Cons:
      • requirements not standardized across agencies
  • Feature Flags: add a features property to each field in the table specs. features is an array of strings describing the features that require the field, e.g, "features": [ "Playback", "NTDServiceSupplied" ].
    • Pros:
      • standardizes requirements for common end-uses
    • Cons:
      • more difficult to add or remove fields from the spec
      • requires building a validator that supports the feature flags
      • must know the requirements a priori
      • tools that produce the same output (e.g., NTD service supplied) could have different requirements depending on methods, or their own optional features (e.g., a departure prediction engine that predicts dwell time from APC data has very different requirements from one that predicts dwell time from historical dwells)
  • Require Everything: require all tables and fields unless the vendor can demonstrate that they are not applicable to the system.
    • Pros:
      • simple
    • Cons:
      • self-certification of compliance can be problematic
      • validation would require forking the TIDES spec and setting the required constraint based on the vendor-negotiated requirements

Additional context and sample data

Describing the features required for a playback tool is a good example of the pitfalls of setting requirements based on features.

A Playback tool can use every field of the vehicle_locations, passenger_events and fare_transactions tables, as well as additional event data that aren't (yet) part of the TIDES spec, and it doesn't require some of the required fields, like trip_id_performed. The only absolutely required fields of vehicle_locations are probably timestamp and vehicle_id, since vehicle position may not always be available (position is optional in GTFS-realtime VehiclePositions).

It may be the case that you want a field to be required, but allow nulls when information isn't available, for example, you might want to require latitude and longitude, but allow them to be nullable when GPS is unavailable. Frictionless doesn't allow this, nulls/missing values are not allowed in required fields.

Finally, adding feature flags complicates changes to the spec. If a field has a feature flag and we decide it should be removed, does that mean the feature will break? If we want to add a field do we need to figure out what features would require it? How do feature flags interact with versioning? There's a desire for a stable document for RFP requirements, but what happens when you discover an optional field is required for a feature. Do you have to update the version?

botanize avatar Dec 15 '22 21:12 botanize

Another option is to have Feature Flags defined inside Field Profile files.

The name of the file would reflect the feature flag name, i.e. NTDServiceSupplied.csv (single column). The contents of the file would include a list of field names required (materialized paths could be used for naming for tree node locations). In the case of needing multiple features, a merge-sort would be used to combine multiple files.

  • Pros: vendor or team can create their own files so their needs don't mix with standard, can be used along with Spec Feature Flags for overriding spec flag properties.
  • Cons: mostly the same as Feature Flags.

mpaine-act avatar Dec 19 '22 22:12 mpaine-act

Another option to consider is separately listing the required files and fields in a file that defines a TIDES "profile". This could take the form of a JSON or CSV file. (Maybe CSV would be better for less technical users to define and read, such as writers of an RFP requiring data in TIDES format.) The absence of a file or a field would imply that the file or field is not required (but could still be optionally included). For example, in tabular format, one could have something like this in an RFP for a basic AVL system that isn't connected to doors, APC, or AFC, in a bus-only transit agency:

File Field Notes
stop_visits service_date
stop_visits trip_id_performed
stop_visits stop_sequence
stop_visits vehicle_id
stop_visits pattern_id can be null when unknown
stop_visits stop_id can be null if the vehicle stops at an undefined location
stop_visits actual_arrival_time
stop_visits actual_departure_time
stop_visits schedule_relationship
vehicle_locations location_ping_id
vehicle_locations service_date
vehicle_locations event_timestamp records at least every 10 seconds when unit is on and vehicle is in motion
vehicle_locations trip_id_performed can be null when not in a defined trip
vehicle_locations stop_sequence can be null when not at a stop
vehicle_locations vehicle_id
vehicle_locations pattern_id can be null when unknown or not in a trip
vehicle_locations stop_id only defined when serving a stop
vehicle_locations latitude
vehicle_locations longitude
vehicle_locations in_service
vehicle_locations schedule_relationship
trips_performed service_date
trips_performed trip_id_performed
trips_performed vehicle_id
trips_performed trip_id_scheduled
trips_performed route_id
trips_performed pattern_id
trips_performed direction_id
trips_performed block_id
trips_performed schedule_relationship

If we standardize the format in which requirements to apps or vendors are specified, then it will be possible to define a collection of TIDES "profiles" that people can refer to and that can be loaded to a program that checks a dataset against the profile (well not the notes, but at least the presence of files and fields).

@e-lo , @jlstpaul , what do you think?

gabriel-korbato avatar Jun 28 '23 20:06 gabriel-korbato

@gabriel-korbato is the profile you describe similar to the one @mpaine-act described above (on Dec 19)? They seem similar, but if not I'd like to understand the difference.

Overall I think this is a reasonable approach in the short term. In the long term, if we get a lot of different uses of TIDES data, it may become difficult to centrally manage these profiles, but for now it provides a good framework to develop the concept.

jlstpaul avatar Jun 28 '23 20:06 jlstpaul

@jlstpaul Yes, very similar. I suggested a 3-column table instead of a single column table, but that's just a formatting difference and either would work. My third column with notes adds the possibility of defining extra requirements or clarifications that humans have to read, analyze, and check.

gabriel-korbato avatar Jun 28 '23 21:06 gabriel-korbato