[Feature Request]: Spanner to cloud storage, custom output format
Related Template(s)
Spanner change events to cloud storage
What feature(s) are you requesting?
Setup a dataflow job that listen to spanner change events and dump events to cloud storage. Works fine however, there are a lot of unneeded fields that make the file size big.
{"partitionToken":"token here","isLastRecordInTransactionInPartition":true,"recordSequence":"00000000","tableName":"table","rowType":[],"mods":[],"modType":"INSERT","valueCaptureType":"OLD_AND_NEW_VALUES","numberOfRecordsInTransaction":1,"numberOfPartitionsInTransaction":1,"metadata":{"com.google.cloud.teleport.v2.ChangeStreamRecordMetadata":{"partitionToken":"-","recordTimestamp":1727773949740589,"partitionStartTimestamp":1727772327871000,"partitionEndTimestamp":253402300799999999,"partitionCreatedAt":1727772505654159,"partitionScheduledAt":1727772506852883,"partitionRunningAt":1727772506981536,"queryStartedAt":-62135596800000000,"recordStreamStartedAt":1727773949836000,"recordStreamEndedAt":1727773949836000,"recordReadAt":1727773949836000,"totalStreamTimeMillis":2034,"numberOfRecordsRead":114}},"spannerDatabaseId":null,"spannerInstanceId":null,"outputMessageMetadata":null}
Is there a way to override the output format and output only the newValuesJson?