openhouse icon indicating copy to clipboard operation
openhouse copied to clipboard

Update table rest API is not working

Open aditya-sjsu opened this issue 1 year ago • 2 comments

Willingness to contribute

Yes. I would be willing to contribute a fix for this bug with guidance from the OpenHouse community.

OpenHouse version

latest

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 20.0): WSL (Windows)
  • JDK version: 1.8.0_402

Describe the problem

I am trying to setup openhouse locally, but while running the update table REST API, I am receiving errors. I am following the steps mentioned in https://github.com/linkedin/openhouse/blob/main/SETUP.md

  1. Create table request:
curl "${curlArgs[@]}" -XPOST http://localhost:8000/v1/databases/d3/tables/ \
--data-raw '{
  "tableId": "t1",
  "databaseId": "d3",
  "baseTableVersion": "INITIAL_VERSION",
  "clusterId": "LocalFSCluster",
  "schema": "{\"type\": \"struct\", \"fields\": [{\"id\": 1,\"required\": true,\"name\": \"id\",\"type\": \"string\"},{\"id\": 2,\"required\": true,\"name\": \"name\",\"type\": \"string\"},{\"id\": 3,\"required\": true,\"name\": \"ts\",\"type\": \"timestamp\"}]}",
  "timePartitioning": {
    "columnName": "ts",
    "granularity": "HOUR"
  },
  "clustering": [
    {
      "columnName": "name"
    }
  ],
  "tableProperties": {
    "key": "value"
  }
}'
  1. Create table response
{
    "tableId": "t1",
    "databaseId": "d3",
    "clusterId": "LocalFSCluster",
    "tableUri": "LocalFSCluster.d3.t1",
    "tableUUID": "e307fe92-56af-403d-983b-6cb0da61ef82",
    "tableLocation": "file:/tmp/openhouse/d3/t1-e307fe92-56af-403d-983b-6cb0da61ef82/00000-1eba8f50-4d83-49fe-b968-b59c5f77c6e7.metadata.json",
    "tableVersion": "INITIAL_VERSION",
    "tableCreator": "DUMMY_ANONYMOUS_USER",
    "schema": "{\"type\":\"struct\",\"schema-id\":0,\"fields\":[{\"id\":1,\"name\":\"id\",\"required\":true,\"type\":\"string\"},{\"id\":2,\"name\":\"name\",\"required\":true,\"type\":\"string\"},{\"id\":3,\"name\":\"ts\",\"required\":true,\"type\":\"timestamp\"}]}",
    "lastModifiedTime": 1715285373822,
    "creationTime": 1715285373822,
    "tableProperties": {
        "policies": "",
        "write.metadata.delete-after-commit.enabled": "true",
        "openhouse.tableId": "t1",
        "openhouse.clusterId": "LocalFSCluster",
        "openhouse.lastModifiedTime": "1715285373822",
        "openhouse.tableVersion": "INITIAL_VERSION",
        "openhouse.creationTime": "1715285373822",
        "openhouse.tableUri": "LocalFSCluster.d3.t1",
        "write.format.default": "orc",
        "write.metadata.previous-versions-max": "28",
        "openhouse.databaseId": "d3",
        "openhouse.tableType": "PRIMARY_TABLE",
        "openhouse.tableLocation": "/tmp/openhouse/d3/t1-e307fe92-56af-403d-983b-6cb0da61ef82/00000-1eba8f50-4d83-49fe-b968-b59c5f77c6e7.metadata.json",
        "openhouse.tableUUID": "e307fe92-56af-403d-983b-6cb0da61ef82",
        "key": "value",
        "openhouse.tableCreator": "DUMMY_ANONYMOUS_USER"
    },
    "timePartitioning": {
        "columnName": "ts",
        "granularity": "HOUR"
    },
    "clustering": [
        {
            "columnName": "name",
            "transform": null
        }
    ],
    "policies": null,
    "tableType": "PRIMARY_TABLE"
}
  1. GET table request
curl "${curlArgs[@]}" -XGET http://localhost:8000/v1/databases/d3/tables/t1
  1. GET table response
{
    "tableId": "t1",
    "databaseId": "d3",
    "clusterId": "LocalFSCluster",
    "tableUri": "LocalFSCluster.d3.t1",
    "tableUUID": "e307fe92-56af-403d-983b-6cb0da61ef82",
    "tableLocation": "file:/tmp/openhouse/d3/t1-e307fe92-56af-403d-983b-6cb0da61ef82/00000-1eba8f50-4d83-49fe-b968-b59c5f77c6e7.metadata.json",
    "tableVersion": "INITIAL_VERSION",
    "tableCreator": "DUMMY_ANONYMOUS_USER",
    "schema": "{\"type\":\"struct\",\"schema-id\":0,\"fields\":[{\"id\":1,\"name\":\"id\",\"required\":true,\"type\":\"string\"},{\"id\":2,\"name\":\"name\",\"required\":true,\"type\":\"string\"},{\"id\":3,\"name\":\"ts\",\"required\":true,\"type\":\"timestamp\"}]}",
    "lastModifiedTime": 1715285373822,
    "creationTime": 1715285373822,
    "tableProperties": {
        "policies": "",
        "write.metadata.delete-after-commit.enabled": "true",
        "openhouse.tableId": "t1",
        "openhouse.clusterId": "LocalFSCluster",
        "openhouse.lastModifiedTime": "1715285373822",
        "openhouse.tableVersion": "INITIAL_VERSION",
        "openhouse.creationTime": "1715285373822",
        "openhouse.tableUri": "LocalFSCluster.d3.t1",
        "write.format.default": "orc",
        "write.metadata.previous-versions-max": "28",
        "openhouse.databaseId": "d3",
        "openhouse.tableType": "PRIMARY_TABLE",
        "openhouse.tableLocation": "/tmp/openhouse/d3/t1-e307fe92-56af-403d-983b-6cb0da61ef82/00000-1eba8f50-4d83-49fe-b968-b59c5f77c6e7.metadata.json",
        "openhouse.tableUUID": "e307fe92-56af-403d-983b-6cb0da61ef82",
        "key": "value",
        "openhouse.tableCreator": "DUMMY_ANONYMOUS_USER"
    },
    "timePartitioning": {
        "columnName": "ts",
        "granularity": "HOUR"
    },
    "clustering": [
        {
            "columnName": "name",
            "transform": null
        }
    ],
    "policies": null,
    "tableType": "PRIMARY_TABLE"
}
  1. Update table request
curl "${curlArgs[@]}" -XPUT http://localhost:8000/v1/databases/d3/tables/t1 \
--data-raw '{
  "tableId": "t1",
  "databaseId": "d3",
  "baseTableVersion":"INITIAL_VERSION",
  "clusterId": "LocalFSCluster",
  "schema": "{\"type\": \"struct\", \"fields\": [{\"id\": 1,\"required\": true,\"name\": \"id\",\"type\": \"string\"},{\"id\": 2,\"required\": true,\"name\": \"name\",\"type\": \"string\"},{\"id\": 3,\"required\": true,\"name\": \"ts\",\"type\": \"timestamp\"}, {\"id\": 4,\"required\": true,\"name\": \"country\",\"type\": \"string\"}]}",
  "timePartitioning": {
    "columnName": "ts",
    "granularity": "HOUR"
  },
  "clustering": [
    {
      "columnName": "name"
    }
  ],
  "tableProperties": {
    "key": "value"
  }
}'
  1. Update table response
{
  "status": "CONFLICT",
  "error": "Conflict",
  "message": "Entity with key[LocalFSCluster.d3.t1] is modified by another process already, nested exception message: Conflict detected for databaseId: d3, tableId: t1, expected version: /tmp/openhouse/d3/t1-e307fe92-56af-403d-983b-6cb0da61ef82/00000-1eba8f50-4d83-49fe-b968-b59c5f77c6e7.metadata.json actual version INITIAL_VERSION: The requested user table has been modified/created by other processes.",
  "stacktrace": null,
  "cause": "Conflict detected for databaseId: d3, tableId: t1, expected version: /tmp/openhouse/d3/t1-e307fe92-56af-403d-983b-6cb0da61ef82/00000-1eba8f50-4d83-49fe-b968-b59c5f77c6e7.metadata.json actual version INITIAL_VERSION: The requested user table has been modified/created by other processes.",
}

Stacktrace, metrics and logs

No response

Code to reproduce bug

No response

What component does this bug affect?

  • [X] Table Service: This is the RESTful catalog service that stores table metadata. :services:tables
  • [ ] Jobs Service: This is the job orchestrator that submits data services for table maintenance. :services:jobs
  • [ ] Data Services: This is the jobs that performs table maintenance. apps:spark
  • [ ] Iceberg internal catalog: This is the internal Iceberg catalog for OpenHouse Catalog Service. :iceberg:openhouse
  • [ ] Spark Client Integration: This is the Apache Spark integration for OpenHouse catalog. :integration:spark
  • [ ] Documentation: This is the documentation for OpenHouse. docs
  • [X] Local Docker: This is the local Docker environment for OpenHouse. infra/recipes/docker-compose
  • [ ] Other: Please specify the component.

aditya-sjsu avatar May 09 '24 23:05 aditya-sjsu

I created a table called 'table10' and tried to update it with "baseTableVersion": "INITIAL_VERSION" and "clusterId": "LocalFSCluster", which I got from the successful table creation response. The update failed due to a conflict, suggesting the table had been modified by another process. However, when I retried after about 4 hours, the update worked.

divyamsavsaviya avatar May 10 '24 00:05 divyamsavsaviya

Hi @aditya-sjsu , in your update request, the baseTableVersion is still pointing to INITIAL_VERSION, can you change it to /tmp/openhouse/d3/t1-e307fe92-56af-403d-983b-6cb0da61ef82/00000-1eba8f50-4d83-49fe-b968-b59c5f77c6e7.metadata.json, and try again ?

OH table versions are used for atomic updates. Each change/update targets a specific version. If the version in HTS has evolved from the specified version you'll see the error "Entity with <> is modified by another process already"

An example update scenario is as follows:

# Action    # targetVersion    # versionAfterUpdate
CREATE_TABLE    INITIAL_VERSION    TBL_LOC_1
UPDATE_TABLE_1    TBL_LOC_1    TBL_LOC_2
INSERT_DATA    TBL_LOC_2    TBL_LOC_3
and so on.

Let me know if you still face this issue.

HotSushi avatar May 15 '24 22:05 HotSushi

Hi @HotSushi Thanks, it worked. Also I had to change the tableProperties field in the update request to the one received from get request.

Update response I got

{
    "tableId": "t1",
    "databaseId": "d3",
    "clusterId": "LocalFSCluster",
    "tableUri": "LocalFSCluster.d3.t1",
    "tableUUID": "782cbf27-314a-4fd7-871f-9ce8eefced09",
    "tableLocation": "file:/tmp/d3/t1-782cbf27-314a-4fd7-871f-9ce8eefced09/00001-dd5968a4-813f-4fd3-9e62-bdfa146450bd.metadata.json",
    "tableVersion": "/tmp/d3/t1-782cbf27-314a-4fd7-871f-9ce8eefced09/00000-aeca28ec-2bf2-44a4-a398-f1f88dd69ca0.metadata.json",
    "tableCreator": "DUMMY_ANONYMOUS_USER",
    "schema": "{\"type\":\"struct\",\"schema-id\":1,\"fields\":[{\"id\":1,\"name\":\"id\",\"required\":true,\"type\":\"string\"},{\"id\":2,\"name\":\"name\",\"required\":true,\"type\":\"string\"},{\"id\":3,\"name\":\"ts\",\"required\":true,\"type\":\"timestamp\"},{\"id\":4,\"name\":\"country\",\"required\":false,\"type\":\"string\"}]}",
    "lastModifiedTime": 1716766531096,
    "creationTime": 1716765289450,
    "tableProperties": {
        "policies": "",
        "write.metadata.delete-after-commit.enabled": "true",
        "openhouse.tableId": "t1",
        "openhouse.clusterId": "LocalFSCluster",
        "openhouse.lastModifiedTime": "1716766531096",
        "openhouse.tableVersion": "/tmp/d3/t1-782cbf27-314a-4fd7-871f-9ce8eefced09/00000-aeca28ec-2bf2-44a4-a398-f1f88dd69ca0.metadata.json",
        "openhouse.creationTime": "1716765289450",
        "openhouse.tableUri": "LocalFSCluster.d3.t1",
        "write.format.default": "orc",
        "write.metadata.previous-versions-max": "28",
        "openhouse.databaseId": "d3",
        "openhouse.tableType": "PRIMARY_TABLE",
        "openhouse.tableLocation": "/tmp/d3/t1-782cbf27-314a-4fd7-871f-9ce8eefced09/00001-dd5968a4-813f-4fd3-9e62-bdfa146450bd.metadata.json",
        "openhouse.tableUUID": "782cbf27-314a-4fd7-871f-9ce8eefced09",
        "key": "value",
        "openhouse.tableCreator": "DUMMY_ANONYMOUS_USER"
    },
    "timePartitioning": {
        "columnName": "ts",
        "granularity": "HOUR"
    },
    "clustering": [
        {
            "columnName": "name",
            "transform": null
        }
    ],
    "policies": null,
    "tableType": "PRIMARY_TABLE"
}

aditya-sjsu avatar May 26 '24 23:05 aditya-sjsu

Thanks @aditya-sjsu

HotSushi avatar May 29 '24 17:05 HotSushi