Liz Hurley
Liz Hurley
@KnightChess I added `spark.sql.parquet.mergeSchema true` to the spark properties file, then reconnected to the hudi-cli and re-tried the repair command again. The result was the same: org.apache.spark.sql.AnalysisException: cannot resolve '_hoodie_record_key'...
Confirming that the list of files is empty: `List of files under partition: () => ` I did some debugging and this list seems to be empty because the `timeline`...
@KnightChess , yes there is duplicate data when running a query on the data: ``` val path="s3:///tables/events" val events = spark.read.format("hudi").option("hoodie.datasource.query.type", "read_optimized").load(path) events.createOrReplaceTempView("events") val dupeQuery = """select env_id, event_id, user_id,...
sorry for the confusion @nsivabalan. I reviewed the commands that you specified to verify they were the same as what I tried. The main differences between what you did and...
Thanks @nsivabalan. (I must have flubbed the partition path in my copy/pasting - sorry about that.) I'll close this ticket and follow the jira.
Reopening: @nsivabalan what is the link for the jira? The link posted above is for this ticket.
Yes. I can write a dataframe to the same table, for example: ``` data.write .format("org.apache.hudi.Spark32PlusDefaultSource") .options(writeWithLocking) .mode("append") .save(tablePath) ``` where writeWithLocking options are: ``` (hoodie.bulkinsert.shuffle.parallelism,2) (hoodie.bulkinsert.sort.mode,NONE) (hoodie.clean.async,false) (hoodie.clean.automatic,false) (hoodie.cleaner.policy.failed.writes,LAZY) (hoodie.combine.before.insert,false)...
``` object UUIDRecordKeyDeleter { private val log = loggerForClass(UUIDRecordKeyDeleter.getClass) def query(tablePath: String, predicate: Column, queryType: String)(implicit spark: SparkSession ): DataFrame = { spark.read .format("hudi") .option(QUERY_TYPE.key(), queryType) .option(HoodieMetadataConfig.ENABLE.key(), "false") .load(tablePath) .where(predicate)...
@ad1happy2go - thanks for the update. We will take another look. I have an integration test that reproduces this but I'll need to extract from our codebase and repackage it...
@ad1happy2go here is a small [repo](https://github.com/ehurheap/hudisupport) with code and instructions on how to reproduce this problem. It would be great if you could try it out and let me know...