iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Compact data files faild : org.apache.spark.sql.AnalysisException: spark_catalog requires a single-part namespace, but got [hive, test]

Open lordk911 opened this issue 3 years ago • 5 comments

--------vesion infomation---- spark : 3.2.1 iceberg : org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.13.0 java : 1.8

val catalog = new HiveCatalog()
catalog.setConf(spark.sparkContext.hadoopConfiguration)
val properties = Map("warehouse" -> "hdfs://xx", "uri" -> "thrift://xx:9083,thrift://yy:9083").asJava
catalog.initialize("hive", properties)
val table = catalog.loadTable(TableIdentifier.of("test","aplog"))
SparkActions.get().rewriteDataFiles(table).filter(Expressions.lessThan("dtime", lasteHour)).option("target-file-size-bytes", String.valueOf(500 * 1024 * 1024)).execute();
image

lordk911 avatar Feb 10 '22 14:02 lordk911

Maybe you miss spark catalog configuration in spark conf,TestSpark3Util.testLoadIcebergTable() will show you how to load an iceberg table

Zhangg7723 avatar Feb 10 '22 15:02 Zhangg7723

@lordk911 the issue here is that the "HiveCatalog" you are creating in the code does not have information required for Spark to actually locate the table. Instead get the table reference through Spark by using

https://github.com/apache/iceberg/blob/4d7837093bd19a42237446a9b656a248e106b789/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/Spark3Util.java#L634-L649

RussellSpitzer avatar Feb 10 '22 17:02 RussellSpitzer

Yea I hit this too , maybe we should update the docs.. existing docs seem to indicate you can load the Table from a new HiveCatalog and expect it to work with SparkActions, which seemed pretty logical at the time.

https://iceberg.apache.org/docs/latest/maintenance/ https://iceberg.apache.org/docs/latest/java-api-quickstart/#create-a-table

Also just wondering, there's no way to strip the catalog before loading the table in the Spark3BinPackStrategy and have it work, right?

szehon-ho avatar Feb 10 '22 17:02 szehon-ho

Also just wondering, there's no way to strip the catalog before loading the table in the Spark3BinPackStrategy and have it work, right?

Yeah unfortunately the way we have it set up now is that we work directly on the "table" object passed in and if it is missing catalog information there is no way to get it back.

Then we we actually do our read we are using the Spark API which means we need the catalog so we can actually have all the Spark machinery work correctly. We could probably re-implement this read so that we pass in a manually constructed SparkTable object but that seems like a bit of work.

See https://github.com/apache/iceberg/blob/1e5abcece00d835235dccf0b902ffd988cbded0d/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/actions/Spark3BinPackStrategy.java#L64-L77

RussellSpitzer avatar Feb 10 '22 17:02 RussellSpitzer

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Aug 10 '22 00:08 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions[bot] avatar Aug 24 '22 00:08 github-actions[bot]