drill icon indicating copy to clipboard operation
drill copied to clipboard

Cannot query iceberg with 1.20.1

Open meyergin opened this issue 3 years ago • 5 comments

after upgrade version to 1.20.1, I add iceberg configure followed by the tutorial(https://drill.apache.org/docs/iceberg-format-plugin/), but seems it doesn't work which return err as following: SYSTEM ERROR: NoSuchTableException: Table does not exist at location sql:

 select * from dfs.`/db/test`

test is folder of iceberg table which includes metadata and data folders. I doubted that maybe the question is the type of catalog, does drill support hive-based catalogs or just support hadoop-based catalogs?

meyergin avatar Sep 08 '22 07:09 meyergin

@meyergin This looks like an issue with your query. The path to your file needs to be in backticks.

IE:

SELECT * 
FROM dfs.`/db/test`

You can also define workspaces which are shortcuts to file paths. (https://drill.apache.org/docs/workspaces/)

cgivre avatar Sep 08 '22 12:09 cgivre

This looks like an issue with your query. The path to your file needs to be in backticks.

The gray block in his comment makes me think that GitHub has interpreted the backticks he did use as marking up a code block.

jnturton avatar Sep 08 '22 12:09 jnturton

@jnturton I missed that. Thanks!

In any event, this looks like Drill is not seeing the fill in that folder. Here's what I'd do.

  1. Verify that the iceberg configuration is present in your dfs storage plugin config.
  2. Next I'd run a SHOW FILES IN dfs to verify that Drill can in fact see any files in that path.

If all that checks out, I think we need to do some digging.

cgivre avatar Sep 08 '22 13:09 cgivre

Sorry for the poor format, I've modified it. And I have tested dfs and s3 storage, no one works.

  1. dfs storage config:
{
  "type": "file",
  "connection": "file:///",
  "workspaces": {
    "tmp": {
      "location": "/tmp",
      "writable": true,
      "defaultInputFormat": null,
      "allowAccessOutsideWorkspace": false
    },
    "root": {
      "location": "/",
      "writable": false,
      "defaultInputFormat": null,
      "allowAccessOutsideWorkspace": false
    }
  },
  "formats": {
    "iceberg": {
      "type": "iceberg",
      "properties": {
        "read.split.target-size": "134217728",
        "read.split.metadata-target-size": "33554432"
      },
      "caseSensitive": true,
      "includeColumnStats": null,
      "ignoreResiduals": null,
      "snapshotId": null,
      "snapshotAsOfTime": null,
      "fromSnapshotId": null,
      "toSnapshotId": null
    }
  },
  "enabled": true
}
  1. show files return correct data and metadata folders. image

  2. error log:

org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist at location: /db/test
at org.apache.iceberg.hadoop.HadoopTables.load(HadoopTables.java:89)
at org.apache.drill.exec.store.iceberg.IcebergGroupScan.initTableScan(IcebergGroupScan.java:123)

meyergin avatar Sep 08 '22 13:09 meyergin

Hi, I have problems reading an Apache Iceberg table hosted in S3 (I commented this problem on Slack but I share it here too). My Drill version is 1.21.1 (updated recently)

The Storage Plugin configuration is saved as "S3_iceberg", this is the xml:

{
  "type": "file",
  "connection": "s3a://xxx-data-lake",
  "config": {
    "fs.s3a.secret.key": "1111",
    "fs.s3a.access.key": "111"
  },
  "workspaces": {
    "default": {
      "location": "/iceberg-data-lake",
      "writable": false,
      "defaultInputFormat": null,
      "allowAccessOutsideWorkspace": false
    },
    "root": {
      "location": "/iceberg-data-lake",
      "writable": false,
      "defaultInputFormat": null,
      "allowAccessOutsideWorkspace": false
    }
  },
  "formats": {
    "iceberg": {
      "type": "iceberg",
      "properties": {
        "read.split.target-size": "536870912",
        "read.split.metadata-target-size": "33554432"
      },
      "caseSensitive": true,
      "includeColumnStats": true,
      "ignoreResiduals": null,
      "snapshotId": null,
      "snapshotAsOfTime": null,
      "fromSnapshotId": null,
      "toSnapshotId": null
    }
  },

  "authMode": "SHARED_USER",
  "enabled": true
}

The command "SHOW FILES IN s3_iceberg" returns image

"SHOW FILES IN s3_iceberg.iceberg_accumulated_exposure" returns image

When I launch this query "ANALYZE TABLE s3_iceberg.iceberg_accumulated_exposure REFRESH METADATA;" I get image

The query "select * from s3_iceberg.iceberg_accumulated_exposure" returns same error

image

Can you share a configuration that you know works?

Thank you.

Maegor avatar May 30 '23 13:05 Maegor