parquet-java icon indicating copy to clipboard operation
parquet-java copied to clipboard

PARQUET-1292 Adding constructors to ProtoParquetWriter with writeSpecsCompliant flag

Open chawlakunal opened this issue 7 years ago • 14 comments

This PR adds constructors to ProtoParquetWriter with flags for specs compliant writer (based on #411). Now a specs compliant ProtoParquetWriter can be created as

new ProtoParquetWriter(file, protoMessageClass, true);

chawlakunal avatar May 01 '18 20:05 chawlakunal

Thanks @chawlakunal for the commit, it looks good to me. I believe there is a slight mistake in a comment (see my review above). You may want to open a Jira ticket for this commit and add the ticket name in the message ;)

BenoitHanotte avatar May 03 '18 09:05 BenoitHanotte

@BenoitHanotte Happy to contribute!!1

I have created a JIRA for this but I am unable to assign it to myself.

chawlakunal avatar May 03 '18 15:05 chawlakunal

@chawlakunal, I added you the list of contributors so that you can assign JIRA-s to yourself. I assigned this one to you to make sure it works.

zivanfi avatar May 04 '18 09:05 zivanfi

@BenoitHanotte Can this be approved and merged?

chawlakunal avatar May 04 '18 23:05 chawlakunal

Hi, Thank you for your work. I looked at it briefly, I will send you detailed opinion after weekend. I am not sure if we need so many new constructors. We should publish the Configuration parameter from superclass.

lukasnalezenec avatar May 05 '18 07:05 lukasnalezenec

Hi @lukasnalezenec Waiting to hear from you.

chawlakunal avatar May 07 '18 17:05 chawlakunal

+1 on finding a way to pass the writeSpecsCompliant flag directly to the ProtoParquetWriter and simplify things.

Right now we have to do something like:

Configuration conf = new Configuration();
ProtoWriteSupport.setWriteSpecsCompliant(conf, true);

ParquetWriter<MyMessage> writer 
   = new ParquetWriter<>(file, new ProtoWriteSupport<MyMessage>(cls), CompressionCodecName.GZIP, 256 * 1024 * 1024, 1 * 1024 * 1024,  1 * 1024 * 1024, true, false, ParquetWriter.DEFAULT_WRITER_VERSION, conf);

costimuraru avatar May 15 '18 12:05 costimuraru

I am still waiting for comments from @lukasnalezenec

chawlakunal avatar May 15 '18 14:05 chawlakunal

I think that problem is that there was no ProtoParquetWriter constructor with conf parameter. I would recommend adding the conf parameter and not writeSpecsCompliant parameter. AFAIK there is no elegant solution how we can accept writeSpecsCompliant parameter, set its value to configuration and pass it super constructor. (see getConfigWithWriteSpecsCompliant())

I would take existing constructors and add those two new constructors:

ProtoParquetWriter(Path file, Class<? extends Message> protoMessage, Configuration conf);

ProtoParquetWriter(Path file, Class<? extends Message> protoMessage, ParquetFileWriter.Mode mode, CompressionCodecName compressionCodecName, int blockSize, int pageSize, int dictionaryPageSize, boolean enableDictionary, boolean validating, WriterVersion writerVersion, Configuration conf

We might also extend some other existing constructors with conf parameter. (I am not sure if it is necessary)

After this change we might turn on the parameter like this:

Configuration conf = new Configuration();
ProtoWriteSupport.setWriteSpecsCompliant(conf, true);

ParquetWriter<MyMessage> writer  = new ProtoParquetWriter<>(file, cls, conf);`

If you disagree with me, I'm open to another solution.

lukasnalezenec avatar May 18 '18 15:05 lukasnalezenec

@lukasnalezenec From my perspective, it looks great!

costimuraru avatar May 19 '18 11:05 costimuraru

Hello @costimuraru @chawlakunal As we might again be adding a new flag if https://github.com/apache/parquet-mr/pull/410 passes, I think we may want to follow @lukasnalezenec suggestion and try to have constructors that accept the conf object, which would allow to avoid duplicating all the existing constructors for each new flag.

BenoitHanotte avatar May 29 '18 17:05 BenoitHanotte

@BenoitHanotte I'll work on that sometime this week.

chawlakunal avatar May 29 '18 17:05 chawlakunal

@chawlakunal any luck?

costimuraru avatar Jun 11 '18 21:06 costimuraru

@chawlakunal Any news on this?

georgehdd avatar Sep 23 '19 04:09 georgehdd