market-data-transcoder icon indicating copy to clipboard operation
market-data-transcoder copied to clipboard

Add a handler that appends a static date field to all outbound messages

Open salsferrazza opened this issue 2 years ago • 2 comments

Some feeds only provide timestamps as duration past midnight, assuming prior context of what day the processed messages are from. Since the messages themselves don't have this context, it would be useful to be able to append a statically specified date, convert that to Unix milliseconds, and append it as a manufactured int field to outbound messages. This would reduce friction for downstream SQL analytics on messages ingested from these feeds.

e.g --message_handlers AppendDateHandler:date=20291102

would turn that date into it's UNIX seconds equivalent, and append that as column to all outbound messages.

salsferrazza avatar Apr 11 '23 15:04 salsferrazza

This could be an additional option to the timestamp pull forward handler.

https://github.com/GoogleCloudPlatform/market-data-transcoder/blob/main/transcoder/message/handler/TimestampPullForwardHandler.py

mservidio avatar Apr 12 '23 13:04 mservidio

Good point, now with handlers being somewhat configurable, this could possibly just become a TimestampHandler with several modes: e.g.: pull forward from Seconds message, append static date, manufacture single timestamp column from a nanos timestamp + date, etc.

Algorithmically, normalizing dates from low-context streams (messages providing only nanos past midnight, e.g.) might look something like:

day = datetime.fromisoformat('20191230')            # YYYYMMDD from MessageHandler params
day_epoch = time.mktime(day.timetuple())            # UNIX seconds equivalent
midnight_in_nanos = day_epoch * 1000000000
epoch_nanos = midnight_in_nanos + msg['nanos_past_midnight']

Then the handler can manufacture any combination of those values as a field, or unify into a single field (like ts_epoch_nanos)

salsferrazza avatar Apr 12 '23 14:04 salsferrazza