Hive Metastore Export : Add Support for Legacy Hive Serde backed Tables in Spark 3.x
Spark 3.x no longer supports SHOW CREATE TABLE statement outputs for legacy Hive Serde definitions as documented here: https://spark.apache.org/docs/latest/sql-migration-guide.html
Specifically, users need to add another keyword to generate the DDL for Hive Serde tables.
In Spark 3.0, SHOW CREATE TABLE table_identifier always returns Spark DDL, even when the given table is a Hive SerDe table. For generating Hive DDL, use SHOW CREATE TABLE table_identifier AS SERDE command instead.
Workaround: If users run into this, they can use a cluster using a Spark 2.x runtime until we add support for this.
Example DDL to reproduce this:
CREATE EXTERNAL TABLE `foo`.`bar`
(`a` STRING, `b` STRING, `c` TIMESTAMP, `d` INT)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim' = '1',
'serialization.format' = '1'
)
STORED AS
INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileOutputFormat'
LOCATION 'dbfs:/tmp/foo.db/bar/'
TBLPROPERTIES (
'transient_lastDdlTime' = '123456789'
)
Error message:
<span class="ansi-red-fg">AnalysisException</span>: Failed to execute SHOW CREATE TABLE against table `foo`.`bar`, which is created by Hive and uses the following unsupported serde configuration
@mrchristine how should we do the proper fix? Is it something that can be fixed within the workspace migration script? or do we need helps from spark team?
for example, is it just as simple as making wm script populate DDL with the proper keyword? (with AS SERDE)
We don't need Spark team's help.
We can catch the AnalysisException and look for created by Hive and uses the following unsupported serde configuration error, then append AS SERDE to the command to get the DDL.