HIVE-28377: Add support for hive.output.file.extension to HCatStorer
What changes were proposed in this pull request?
HCatStorer now respects hive.file.output.extension for the output files it writes.
Why are the changes needed?
Brings HCatStorer's feature set more in line with Hive's.
Does this PR introduce any user-facing change?
Adds support for a property to HCatStorer that would previously have been ignored.
Is the change a dependency upgrade?
No
How was this patch tested?
TestHCatExtension was added.
Quality Gate passed
Issues
1 New issue
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
can the file extension be detected instead of user setting it explicitly? this may not be a good idea if a single job writes to different file formats. for example user set hive.output.file.extension to be '.parquet' but then same data is loaded into both parquet and orc tables.
Quality Gate passed
Issues
1 New issue
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
@yigress So for my implementation I just tried to match what Hive does in its existing implementation. If a sequential job wants to have different extensions for different tables, it can simply adjust the setting between HCat queries. Does Hive do something other than that to support that case that I missed?
@yigress So for my implementation I just tried to match what Hive does in its existing implementation. If a sequential job wants to have different extensions for different tables, it can simply adjust the setting between HCat queries. Does Hive do something other than that to support that case that I missed?
in hive the property is only used for text file format https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L916
+1 for this useful feature, if a single job only loads into one table and the table format is known user can switch the values for the property between jobs. I am not expert in pig, somehow I have the impression that one pig job can load into multiple tables simultaneously, then user needs to be careful about setting