Incoming batch schema is not compatible with the table's one
I got below exception when ingest data from sql server into hudi. org.apache.hudi.exception.SchemaCompatibilityException: Incoming batch schema is not compatible with the table's one at org.apache.hudi.HoodieSparkSqlWriter$.deduceWriterSchema(HoodieSparkSqlWriter.scala:496) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:314) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:150) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:109) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
Source table ddl is: -- auto-generated definition create table Address ( Id int identity constraint [xxxx] primary key, Line1 nvarchar(128), Line2 nvarchar(128), ccode nvarchar(2) not null constraint [xxxx] references Country, XEID int not null, cbUser nvarchar(48), MuUser int not null, MyUser nvarchar(48), CreateDate datetime not null, Latitude decimal(12, 9), Longitude decimal(12, 9) )
Environment Description
Hudi version : 0.9
Spark version : 3.0.1
Hive version : 3.1
Hadoop version : 3.2.2
Storage (HDFS/S3/GCS..) :
Running on Docker? no :
@njalan It happens when the source schema is not backward compatible to Hudi table schema. Can you give us more insights what schema changes you are getting.
@ad1happy2go ,why there are Incoming schema (canonicalized). and Table's schema is not the source table schema and only Incoming schema is the source table schema Below are the schema detail: Incoming schema { "type" : "record", "name" : "address_record", "namespace" : "hoodie.address", "fields" : [ { "name" : "Id", "type" : [ "null", "int" ], "default" : null }, { "name" : "Line1", "type" : [ "null", "string" ], "default" : null }, { "name" : "Line2", "type" : [ "null", "string" ], "default" : null }, { "name" : "City", "type" : [ "null", "string" ], "default" : null }, { "name" : "State", "type" : [ "null", "string" ], "default" : null }, { "name" : "Zip", "type" : [ "null", "string" ], "default" : null }, { "name" : "CountryCode", "type" : [ "null", "string" ], "default" : null }, { "name" : "CreateByUserID", "type" : [ "null", "int" ], "default" : null }, { "name" : "CreateByUser", "type" : [ "null", "string" ], "default" : null }, { "name" : "ModifyByUserID", "type" : [ "null", "int" ], "default" : null }, { "name" : "ModifyByUser", "type" : [ "null", "string" ], "default" : null }, { "name" : "CreateDate", "type" : [ "null", { "type" : "long", "logicalType" : "timestamp-micros" } ], "default" : null }, { "name" : "ModifyDate", "type" : [ "null", { "type" : "long", "logicalType" : "timestamp-micros" } ], "default" : null }, { "name" : "Latitude", "type" : [ "null", { "type" : "fixed", "name" : "fixed", "namespace" : "hoodie.address.address_record.Latitude", "size" : 6, "logicalType" : "decimal", "precision" : 12, "scale" : 9 } ], "default" : null }, { "name" : "Longitude", "type" : [ "null", { "type" : "fixed", "name" : "fixed", "namespace" : "hoodie.address.address_record.Longitude", "size" : 6, "logicalType" : "decimal", "precision" : 12, "scale" : 9 } ], "default" : null } ] } Incoming schema (canonicalized) { "type" : "record", "name" : "address_record", "namespace" : "hoodie.address", "fields" : [ { "name" : "Id", "type" : [ "null", "int" ], "default" : null }, { "name" : "Line1", "type" : [ "null", "string" ], "default" : null }, { "name" : "Line2", "type" : [ "null", "string" ], "default" : null }, { "name" : "City", "type" : [ "null", "string" ], "default" : null }, { "name" : "State", "type" : [ "null", "string" ], "default" : null }, { "name" : "Zip", "type" : [ "null", "string" ], "default" : null }, { "name" : "CountryCode", "type" : [ "null", "string" ], "default" : null }, { "name" : "CreateByUserID", "type" : [ "null", "int" ], "default" : null }, { "name" : "CreateByUser", "type" : [ "null", "string" ], "default" : null }, { "name" : "ModifyByUserID", "type" : [ "null", "int" ], "default" : null }, { "name" : "ModifyByUser", "type" : [ "null", "string" ], "default" : null }, { "name" : "CreateDate", "type" : [ "null", { "type" : "long", "logicalType" : "timestamp-micros" } ], "default" : null }, { "name" : "ModifyDate", "type" : [ "null", { "type" : "long", "logicalType" : "timestamp-micros" } ], "default" : null }, { "name" : "Latitude", "type" : [ "null", { "type" : "fixed", "name" : "fixed", "namespace" : "hoodie.address.address_record.Latitude", "size" : 6, "logicalType" : "decimal", "precision" : 12, "scale" : 9 } ], "default" : null }, { "name" : "Longitude", "type" : [ "null", { "type" : "fixed", "name" : "fixed", "namespace" : "hoodie.address.address_record.Longitude", "size" : 6, "logicalType" : "decimal", "precision" : 12, "scale" : 9 } ], "default" : null } ] } Table's schema { "type" : "record", "name" : "address_record", "namespace" : "hoodie.address", "fields" : [ { "name" : "addressid", "type" : [ "null", "string" ], "doc" : "from deserializer", "default" : null }, { "name" : "addressline1", "type" : [ "null", "string" ], "doc" : "from deserializer", "default" : null }, { "name" : "addressline2", "type" : [ "null", "string" ], "doc" : "from deserializer", "default" : null }, { "name" : "countrycode", "type" : [ "null", "string" ], "doc" : "from deserializer", "default" : null }, { "name" : "admin1code", "type" : [ "null", "string" ], "doc" : "from deserializer", "default" : null }, { "name" : "admin2code", "type" : [ "null", "string" ], "doc" : "from deserializer", "default" : null }, { "name" : "admin3code", "type" : [ "null", "string" ], "doc" : "from deserializer", "default" : null }, { "name" : "postalcode", "type" : [ "null", "string" ], "doc" : "from deserializer", "default" : null }, { "name" : "transactiondate", "type" : [ "null", "string" ], "doc" : "from deserializer", "default" : null }, { "name" : "createdby", "type" : [ "null", "string" ], "doc" : "from deserializer", "default" : null }, { "name" : "createdate", "type" : [ "null", "string" ], "doc" : "from deserializer", "default" : null }, { "name" : "modifydate", "type" : [ "null", "string" ], "doc" : "from deserializer", "default" : null }, { "name" : "modifiedby", "type" : [ "null", "string" ], "doc" : "from deserializer", "default" : null } ] }
@njalan I can see the table schema is completely different than incoming schema. canonicalized schema is identical to incoming schema.
Is your incoming schema supposed to be different than table schema? You may need to transform the schema before upsetting to hudi.
@ad1happy2go I removed hudi table and also removed all files but still got the same error messages. But if I rename the target table from address to address_1 then spark job is running successfully.
@njalan That means the old data was not getting deleted properly I guess with name 'address'. Can you confirm once.
@ad1happy2go I am sure I have totally removed all the data files. I tested many times. It is wired that how this Table's schema generated and it is totally different from source table.
@njalan If it works when you change your table name address1, then ideally there should be some residual for old run for address.. In case you able to reproduce this with sample scenario let us know. Thanks.
@ad1happy2go I have another table with name as address in another schema have the same issues. but If I use hudi 0.9 it is working fine. But it is not working with hudi 0.13. I think there is a bug in hudi 0.13.
@njalan Interesting, Thanks for all the effort. Although I can't think of any reasoning behind it. It would be really helpful if you can provide some dummy data or sample code which I can try to reproduce.
@ad1happy2go I debug the source code and got the reason. My target table is testing.address and I didn't add the hoodie.database.name but there is one existing table default. address. In my case it will get the default.address schema as Table's schema. But they are two total different table. I need to add the hoodie.database.name for each table? Why not take the sync database name as the hudi database name? it is working fine in hudi 0.9. Is is a bug or can I raise a PR for using the sync database name as the hudi database name if not specified?
@njalan Great! Nice catch. Actually it go other way around, as it sets sync database name if its not set. I guess we can raise that PR to fix it. Thanks for the contribution.
@ad1happy2go I just raised one PR https://github.com/apache/hudi/pull/10308. Can you please kindly review it and it is my first pr for hudi. I am not sure it can be merged or not.
I checked @danny is following up on this.