DataflowTemplates
DataflowTemplates copied to clipboard
Hive Paritioning isn't working, the DD part of YYYY-MM-DD is day of year (133)
Related Template(s)
Cloud_PubSub_to_Avro
What happened?
This is using Google's Dataflow.
Here are some slices of the relevant terraform code. This is the template being used.
template_gcs_path = "gs://dataflow-templates/latest/Cloud_PubSub_to_Avro"
And here is the date formatting.
outputDirectory = "gs://${var.gcs_path}/${var.topic_name}/dt=YYYY-MM-DD"
Instead of getting dt=YYYY-MM-DD, I'm seeing this: dt=2022-05-133
Note the day of year at the end of the date string.
Beam Version
2.35.0
Relevant log output
No response
I run into the same problem. It seems to be caused by an error in WindowedFilenamePolicy in common package. I created a PR to fix it.
Fixed