PARQUET-1292 Adding constructors to ProtoParquetWriter with writeSpecsCompliant flag
This PR adds constructors to ProtoParquetWriter with flags for specs compliant writer (based on #411). Now a specs compliant ProtoParquetWriter can be created as
new ProtoParquetWriter(file, protoMessageClass, true);
Thanks @chawlakunal for the commit, it looks good to me. I believe there is a slight mistake in a comment (see my review above). You may want to open a Jira ticket for this commit and add the ticket name in the message ;)
@BenoitHanotte Happy to contribute!!1
I have created a JIRA for this but I am unable to assign it to myself.
@chawlakunal, I added you the list of contributors so that you can assign JIRA-s to yourself. I assigned this one to you to make sure it works.
@BenoitHanotte Can this be approved and merged?
Hi, Thank you for your work. I looked at it briefly, I will send you detailed opinion after weekend. I am not sure if we need so many new constructors. We should publish the Configuration parameter from superclass.
Hi @lukasnalezenec Waiting to hear from you.
+1 on finding a way to pass the writeSpecsCompliant flag directly to the ProtoParquetWriter and simplify things.
Right now we have to do something like:
Configuration conf = new Configuration();
ProtoWriteSupport.setWriteSpecsCompliant(conf, true);
ParquetWriter<MyMessage> writer
= new ParquetWriter<>(file, new ProtoWriteSupport<MyMessage>(cls), CompressionCodecName.GZIP, 256 * 1024 * 1024, 1 * 1024 * 1024, 1 * 1024 * 1024, true, false, ParquetWriter.DEFAULT_WRITER_VERSION, conf);
I am still waiting for comments from @lukasnalezenec
I think that problem is that there was no ProtoParquetWriter constructor with conf parameter. I would recommend adding the conf parameter and not writeSpecsCompliant parameter. AFAIK there is no elegant solution how we can accept writeSpecsCompliant parameter, set its value to configuration and pass it super constructor. (see getConfigWithWriteSpecsCompliant())
I would take existing constructors and add those two new constructors:
ProtoParquetWriter(Path file, Class<? extends Message> protoMessage, Configuration conf);
ProtoParquetWriter(Path file, Class<? extends Message> protoMessage, ParquetFileWriter.Mode mode, CompressionCodecName compressionCodecName, int blockSize, int pageSize, int dictionaryPageSize, boolean enableDictionary, boolean validating, WriterVersion writerVersion, Configuration conf
We might also extend some other existing constructors with conf parameter. (I am not sure if it is necessary)
After this change we might turn on the parameter like this:
Configuration conf = new Configuration();
ProtoWriteSupport.setWriteSpecsCompliant(conf, true);
ParquetWriter<MyMessage> writer = new ProtoParquetWriter<>(file, cls, conf);`
If you disagree with me, I'm open to another solution.
@lukasnalezenec From my perspective, it looks great!
Hello @costimuraru @chawlakunal As we might again be adding a new flag if https://github.com/apache/parquet-mr/pull/410 passes, I think we may want to follow @lukasnalezenec suggestion and try to have constructors that accept the conf object, which would allow to avoid duplicating all the existing constructors for each new flag.
@BenoitHanotte I'll work on that sometime this week.
@chawlakunal any luck?
@chawlakunal Any news on this?