DataflowTemplates icon indicating copy to clipboard operation
DataflowTemplates copied to clipboard

[Feature Request]: Spanner to cloud storage, custom output format

Open darkmakukudo opened this issue 1 year ago • 0 comments

Related Template(s)

Spanner change events to cloud storage

What feature(s) are you requesting?

Setup a dataflow job that listen to spanner change events and dump events to cloud storage. Works fine however, there are a lot of unneeded fields that make the file size big.

{"partitionToken":"token here","isLastRecordInTransactionInPartition":true,"recordSequence":"00000000","tableName":"table","rowType":[],"mods":[],"modType":"INSERT","valueCaptureType":"OLD_AND_NEW_VALUES","numberOfRecordsInTransaction":1,"numberOfPartitionsInTransaction":1,"metadata":{"com.google.cloud.teleport.v2.ChangeStreamRecordMetadata":{"partitionToken":"-","recordTimestamp":1727773949740589,"partitionStartTimestamp":1727772327871000,"partitionEndTimestamp":253402300799999999,"partitionCreatedAt":1727772505654159,"partitionScheduledAt":1727772506852883,"partitionRunningAt":1727772506981536,"queryStartedAt":-62135596800000000,"recordStreamStartedAt":1727773949836000,"recordStreamEndedAt":1727773949836000,"recordReadAt":1727773949836000,"totalStreamTimeMillis":2034,"numberOfRecordsRead":114}},"spannerDatabaseId":null,"spannerInstanceId":null,"outputMessageMetadata":null}

Is there a way to override the output format and output only the newValuesJson?

darkmakukudo avatar Oct 01 '24 10:10 darkmakukudo