open-simulation-interface icon indicating copy to clipboard operation
open-simulation-interface copied to clipboard

828 MCAP specification for osi tracefiles

Open TimmRuppert opened this issue 1 year ago • 14 comments

Reference to a related issue in the repository

Closes https://github.com/OpenSimulationInterface/open-simulation-interface/issues/828

Add a description

  • Add an initial support for mcap files as an alternative to tradtional .osi and .txth files
  • Precision of .osi and .txth spec
  • Removed example code as this here is a spec and not the implementation
  • Mentioning of a companion repo for utils which implement this spec (separation of spec and tutorials/examples/code)

Some questions to ask: What is this change?

  • Spec change to support mcap Is this a bug fix or a feature? Does it break any existing functionality or force me to update to a new version? How has it been tested?
  • Feature. Breaking changes for naming convention are currently in discussion..

Take this checklist as orientation for yourself, if this PR is ready for the Change Control Board:

If you can’t check all of them, please explain why. If all boxes are checked or commented and you have achieved at least one positive review, you can assign the label ReadyForCCBReview!

TimmRuppert avatar Oct 14 '24 07:10 TimmRuppert

For metadata discussion: timestamp definitions in OSI:

  • Common Timestamp: https://opensimulationinterface.github.io/osi-antora-generator/asamosi/latest/gen/structosi3_1_1Timestamp.html
  • SensorView: https://opensimulationinterface.github.io/osi-antora-generator/asamosi/latest/gen/structosi3_1_1SensorView.html#ad30bef352de1b64dd211718c18c540f1
  • SensorData::timestamp: https://opensimulationinterface.github.io/osi-antora-generator/asamosi/latest/gen/structosi3_1_1SensorData.html#a82adb4a7724d1e4835ae0921fab4080c
  • SensorData::last_measurement_time: https://opensimulationinterface.github.io/osi-antora-generator/asamosi/latest/gen/structosi3_1_1SensorData.html#a9baaaa40491c5f241c117a7dacbe0d9d
  • SensorData::system_time: https://opensimulationinterface.github.io/osi-antora-generator/asamosi/latest/gen/structosi3_1_1SensorData.html#ae64496913ffa0c41f154260cf48725e6
  • SensorDetectionHeader::measurement_time: https://opensimulationinterface.github.io/osi-antora-generator/asamosi/latest/gen/structosi3_1_1SensorDetectionHeader.html#a7276bbdf40b2da4fed4397df62f167aa

In the OSI documentation, the wording "sending time" is used. This would correspond to the publish_time in MCAP.

ClemensLinnhoff avatar Oct 14 '24 07:10 ClemensLinnhoff

I would (rather quickly) need an example MCAP file for OSI v3.7.0 with (at best all) objects filled according to standard.

jdsika avatar Oct 15 '24 12:10 jdsika

I would (rather quickly) need an example MCAP file for OSI v3.7.0 with (at best all) objects filled according to standard.

Here a three example files: Example_OSI_MCAP.zip

  • Only_SD_v1.mcap contains multiple sensor data on one channel (topic)
  • Two_SD_different_rate_v1.mcap contains multiple sensor data on two channels
  • All_Top-Level_with_Timestamp_v1.mcap contains all top-level messages which have a timestamp field

I quickly tried to adapt most outcomes of the last meeting. The messages are basically empty except that varying timestamps are set.

TimmRuppert avatar Oct 16 '24 07:10 TimmRuppert

I would (rather quickly) need an example MCAP file for OSI v3.7.0 with (at best all) objects filled according to standard.

Here a three example files: Example_OSI_MCAP.zip

* `Only_SD_v1.mcap` contains multiple sensor data on one channel (topic)

* `Two_SD_different_rate_v1.mcap` contains multiple sensor data on **two** channels

* `All_Top-Level_with_Timestamp_v1.mcap` contains all top-level messages which have a timestamp field

I quickly tried to adapt most outcomes of the last meeting. The messages are basically empty except that varying timestamps are set.

Thanks! Some ground truth data from esmini with objects etc from as many types as possible would be great :))

jdsika avatar Oct 16 '24 13:10 jdsika

Thanks! Some ground truth data from esmini with objects etc from as many types as possible would be great :))

I just updated esmini and it seems like there is an issue with the FMU.. Anyways, for starters here is one of our highway examples where I converted a native tracefile SensorView to an mcap file (the ground truth is therefore a submessage)

demo_SV_onramp_V1.zip

I can provide a GT top level message as well but I am a bit tight on schedule for today and tomorrow.

TimmRuppert avatar Oct 17 '24 08:10 TimmRuppert

No, thank you. It was just important for me to have a "third set of eyes" creating a file in order to debug something. Otherwise everyone is always blaming in circles :)

jdsika avatar Oct 17 '24 08:10 jdsika

FYI: OSI TRacefile writer in openPASS

jdsika avatar Oct 18 '24 06:10 jdsika

I have updated the spec based on our last discussion. In the next meeting we need to address the following things :

  • File naming convention (see https://github.com/OpenSimulationInterface/open-simulation-interface/pull/833#discussion_r1803044372)
  • Requirement of the protobuf version in the meta and/or the naming convention
  • Clarify timestamp details based on https://github.com/OpenSimulationInterface/open-simulation-interface/issues/834
    • timestamps in the file name according to the naming convention
    • in the publish_time field of the MCAP message
  • General review and potential new things somebody would like to be considered

TimmRuppert avatar Oct 18 '24 10:10 TimmRuppert

Documenting today's meeting concerning the points mentioned above:

  • We agreed on
    • <opt prefix> To help sort after maneuver or similar
    • <opt timestamp> To help sort after recording time (if no prefix) or quickly identify a recording. Should represent abs. timestamp of zero-time of top-level messages (synchronized global time)
    • <required type> Like for .txth and .osi but additionally multi
    • <opt suffix> Further details you might want to provide without having to inspect the .mcap metadata (e.g. if applicable a min. required OSI version)
  • Requirement of the protobuf version in the meta and/or the naming convention
  • move to the channel metadata
  • might not be used by anyone except for some debugging edge-cases but does not hurt either
  • Clarify timestamp details based on Documentation page for timing #834

    • timestamps in the file name according to the naming convention
    • in the publish_time field of the MCAP message
  • General review and potential new things somebody would like to be considered

  • log_time = publish_time = timestamp of top-level message

TimmRuppert avatar Oct 21 '24 09:10 TimmRuppert

Consider the following case: We will have one mcap trace with a lot of information and traces and a user wants to add osi traces to it. This has as a consequence that as much information as possible must be placed in the trace/channel meta data and not

jdsika avatar Oct 21 '24 09:10 jdsika

List of potential optional metadata fields that mainly emerged from the Gaia-X project (focus on measurement data):

  • [ ] (1) Start/stop timestamp / duration (include/exclude depending on mcap capabilities of extracting those from the mcap timestamps)
  • [ ] (2) Frame rate (@ClemensLinnhoff You mentioned frame rate as an interesting metadata field. I thought about how to handle variable frame rates: Maybe define it to be an approximated frame rate in the case of variable frame rates?)
  • [ ] (3) Content granularity: Level of granularity of contained data, e.g. trace contains object lists / detection lists (could also be extended to flag if a trace contains (detected) lane network, traffic lights, traffic signs, environmental conditions)
  • [ ] (4) Traffic direction: Left-hand/right-hand
  • [ ] (5) Location: E.g. country, state, city, etc. (could also be put into trace description)
  • [ ] (6) Contained road types: Indicate that road types like motorway/rural/city/traffic calmed zone etc. are contained in the trace
  • [ ] (7) Contained lane types: Indicate that biking/walking/parking lanes are contained in the trace
  • [ ] (8) Scenario identifier: User-/company-/project-specific scenario identifiers
  • [ ] (9) Host moving object description, e.g. measurement vehicle details -> Could also be put in "Used data sources" (@ClemensLinnhoff You mentioned, this could be useful to you; Do you have an example use case for synthetic data?)
  • [ ] (10) Target moving objects: OSI-id(s)+description of object(s) of interest (vehicle/pedestrian performing certain actions of interest)
  • [ ] (11) Events: Tag certain events of interest with timestamp+description, e.g. cut-out action of vehicle x at timestamp y, sensor fault at timestamp z
  • [ ] (12) Used data sources/tools: How/when was the trace generated/processed (repeated field to record tool/processing chain), e.g.
    • 2024-10-17 15:54:08+02:00: Captured on measurement vehicle x with sensor y (firmware version 1.0)
    • 2024-10-17 15:55:20+02:00: Processed with tool x version 1.0
    • 2024-10-17 15:54:08+02:00: Synthetically generated with tool x version 1.0/sensor model x version 1.0
  • [ ] (13) Creator information: Contact/name/company/license

@TimmRuppert @ClemensLinnhoff @jdsika Feel free to add your opinion on which we should include.

In case we keep a lot of the fields above I would list the less important metadata definitions with less normative priority:

  1. Required ("must"): osi_version, protobuf
  2. Recommended ("should"): e.g. description, creator information, data sources, frame rate
  3. Metadata hints ("optionally"): We suggest to use the proposed structure if the respective information is available and the trace creator wants to include it (e.g. location, granularity, contained road/lane types, events, traffic direction).

thomassedlmayer avatar Oct 21 '24 13:10 thomassedlmayer

2: This is also something that mcap natively supports, as far as I know. @TimmRuppert how does it handle varying frame rates? Does it take the mean? 5: This is probably only valid for measurements which is not the standard case for OSI, so I would not put this in the standard specification. 9: My example use-case would be for re-simulating a measurement for model validation. Then you know, which kind of measurement vehicle is simulated.

ClemensLinnhoff avatar Oct 21 '24 15:10 ClemensLinnhoff

  1. start and stop timestamps are present in the "summary" section and thus easily accessible (either via API or CLI). Therefore, I would recommend to not store them in the metadata again.
  2. The average framerate is given by the message-count and start/stop timestamps. This is for example supported by mcap CLI . Therefore, I would recommend to not store this in the metadata again.
  3. I think this is somewhat covered by the shacls and hard to unify without further spec. I would agree with Clemens and leave that out for the moment
  4. -9. Sure, why not
  5. This might work for simple scenarios. But if we think about more complex city scenarios, deciding which traffic participant might be relevant is sometimes up to the function consuming this data.
  6. Strongly agree
  7. We thought about something similar as well. Instead of a repeated field (where the "field" would need to be defined by us as only string values are accepted) we could also do it per channel and add it to the channel metadata?
  8. Good idea, how about splitting this into "license" and "contact person" (or similar). Storing a license in the file might help sharing/reusing data.

In any case everything needs good key-name and category-name.

TimmRuppert avatar Oct 22 '24 06:10 TimmRuppert

General:

  • Can only contain one scenario with a unique global time
  • one mcap file is a dataset

Metadata:

  • Strongly recommended additional detailed metadata, category asam_osi
    • description e.g. cut-in
    • creator e.g. person or company (not tool) csv
    • license csv of spdx identifiers
    • data_sources e.g. csv of model, recorder, scenario player
  • ~~Channel-wise: zero_time : ISO 8601 YYYYMMDDThhmmss.f representing the wall clock time~~

Solved by "Can only contain one scenario with a unique global time "

  • Optional if you want to use more detailed metadata which follows some kind of public schema or standard

    • Add a file wide metadata with the name context . This category of metadata should contain prefixes as keys and details about the used metadata specification in form of links/names as values. This prefix should be an identifier for other metadata used on a file or channel level. Here are some examples:
    • gaiax_hdmap : https://github.com/GAIA-X4PLC-AAD/ontology-management-base/tree/main/hdmap/
      • Usage in channel gaiax_hdmaps_HdMapShape : value 123
      • So everybody seeing this channel metadata has the chance to get the document specifing what that data means
    • Add another example for left hand / right traffic rule of openDrive
    • Add another example for some random iso

    See also https://github.com/OpenSimulationInterface/osi-utilities/pull/2#issuecomment-2441047465

TimmRuppert avatar Oct 25 '24 13:10 TimmRuppert

I would like to suggest to add a comment to "Nested Top Level Messages" as e.g. the SensorView in SensorData like the following:

When using ASAM OSI MCAP as a container for OSI traces the user is allowed to remove nested OSI Top level messages and add them as separate channels into the MCAP container"

This comment shoule be added above the top level messages in the .proto files as well.

What is your opinion about this? I think the nested messages were created as the collection of traces in one container was not defined at the time with .osi files?

jdsika avatar Nov 08 '24 13:11 jdsika

What is your opinion about this? I think the nested messages were created as the collection of traces in one container was not defined at the time with .osi files?

I think the nesting is not only for trace files, but also for the messages between FMUs directly. I think it is just simpler to send one SensorData instead of a SensorData, a SensorView and a GroundTruth. So I would not add this to the proto files but just in the MCAP trace file documentation.

ClemensLinnhoff avatar Nov 08 '24 14:11 ClemensLinnhoff

What is your opinion about this? I think the nested messages were created as the collection of traces in one container was not defined at the time with .osi files?

Besides the points @ClemensLinnhoff mentioned, this might result in a good portion of looking back and forth in the file in order to the retrieve the corresponding messages. Especially as the SensorData::timestamp is not required to (but should) be the GroundTruth::timestamp. So there is no enforcement of a real identifier to match messages. Furthermore, SensorView is repeated field in the SensorData. How would one specify which ones are meant?

Nonetheless, I understand and totally share your motivation. Maybe something for OSI4. It would just require a lot of breaking changes.

TimmRuppert avatar Nov 08 '24 14:11 TimmRuppert

What is your opinion about this? I think the nested messages were created as the collection of traces in one container was not defined at the time with .osi files?

I think the nesting is not only for trace files, but also for the messages between FMUs directly. I think it is just simpler to send one SensorData instead of a SensorData, a SensorView and a GroundTruth. So I would not add this to the proto files but just in the MCAP trace file documentation.

I don't think this is even at all related to trace files. I expect a trace file to contain the messages as they are being sent/received. No one is forcing anyone to use the nested messages, if they don't want to (I personally think they are usually a mistake in the case of SensorData->SensorView just for traceability purposes, but that's just me). But if they are used then they should be stored as is.

Now if someone wants to be creative they can do all kinds of things as they like. It's not the standards job to say how to use it. So I think this needs no mention anywhere, since it is purely up to the user and use case.

pmai avatar Nov 08 '24 15:11 pmai

I have taken the liberty to morph the current state into something that tries to be a more precise and normative specification while trying to be more minimalist in what it touches, and leaves other topics (like changes in the naming convention, the specific mapping of non-OSI meta-data into meta-data records) for either separate PRs or other layered specifications: #841.

It also tries to make the spec more robust, by specifying more explicitly how to use MCAP elements (e.g. placement of records, chunking, ...).

The other major change is that it now recommends making log_time and published_time identical, if no specific reasons would speak to making them differ, as this enables much better use of MCAP indexing facilities to do random access of traces.

People who want to replay while keeping jitter of their middleware (due to asynchronous communication) intact can still do so, but more sane use cases that abstract away middleware jitter (or are synchronous in nature) can still reap the benefits of the MCAP format index machinery (one might suggest to MCAP that they might like to add indexing on published_time to enable both use cases at the same time, but that's a different story).

pmai avatar Nov 24 '24 18:11 pmai