cube Orchestrator error: Table already exists

Problem

1. The cube pre-aggregation fails when client is requesting cube data via API.

Error:

Error: Error during create table: CREATE TABLE ... already exists

Detailed error:

Orchestrator error {"duration":343,"error":"Error: Error during create table: CREATE TABLE dev_pre_aggregations.p_m__teams_p_m_teams_rollup_mgcnktaq_xztohjkr_1jb5je3 (\"p_m__teams__legal_entity_id\" uuid, \"p_m__teams__management_ailorn\" text, \"p_m__teams__management_end_date\" date, \"p_m__teams__management_id\" text, \"p_m__teams__management_start_date\" date, \"p_m__teams__organisation_id\" uuid, \"p_m__teams__property_address\" text, \"p_m__teams__property_type\" text, \"p_m__teams__team_id\" text, \"p_m__teams__team_name\" text): relation \"p_m__teams_p_m_teams_rollup_mgcnktaq_xztohjkr_1jb5je3\" already exists"

2. Error: relation ... does not exist

This also happens sometime when client is requesting cube data via API, looks like it tries to find a cache but ends up being dropped by cube worker.

detailed error:

Orchestrator error {"duration":269,"error":"Error: relation \"dev_pre_aggregations.p_m__teams_p_m_teams_rollup_wdzwa5eo_wiy3pkps_1jb5m97\" does not exist"

We found out that this does not exist is probably related to the ENV VAR CUBEJS_DROP_PRE_AGG_WITHOUT_TOUCH when it is enabled as true, when we disable this, the does not exist issue disapprears, but it has other side effect, i.e., cube API is using deprecated cache when resolving client's API request, causing missing data issues.

Current setup:

CUBEJS_DOCKER_IMAGE_VERSION=v0.34.62
CUBEJS_DB_QUERY_TIMEOUT=60m
CUBEJS_CACHE_AND_QUEUE_DRIVER=memory
CUBEJS_DROP_PRE_AGG_WITHOUT_TOUCH=true
CUBEJS_DEV_MODE=true
CUBEJS_DB_TYPE=redshift
CUBEJS_REFRESH_WORKER=true

Current cube and preaggregation:

cube(`PM_Teams`, {
  sql: `select 
      COALESCE(t.id::text, MD5(leo.organisation_id || '-unallocated')) as team_id, 
      COALESCE(t.name, 'Unassigned') as team_name, 
      leo.organisation_id, 
      leo.legal_entity_id,
      'ailo:propertymanagement:management:' || m.id as management_ailorn,
      m.id::text as management_id,
      concat_ws(', ', ap.unit_street_number || ' ' || ap.street_name, ap.suburb) as property_address,
      m.start_date as management_start_date, 
      m.end_date as management_end_date,
      ap.primary_type as property_type
    from legal_entity_organisations leo 
    join managements m on leo.legal_entity_id = m.managing_entity_id and m.is_draft = false
    join agency_properties ap on ap.id = m.agency_property_id
    left join teams_managements tm on m.id = tm.management_id
    left join teams t on t.id = tm.team_id
    where ${SECURITY_CONTEXT.organisationId.filter('leo.organisation_id')}
    UNION
    select 
      MD5(leo.organisation_id || '-unallocated') as team_id, 
      'Unassigned' as team_name, 
      leo.organisation_id, 
      leo.legal_entity_id,
      'ailo:authz:legalentity:' || leo.legal_entity_id as management_ailorn,
      leo.legal_entity_id::text as management_id,
      null as property_address,
      null as management_start_date, 
      null as management_end_date,
      null as property_type
    from legal_entity_organisations leo 
    where ${SECURITY_CONTEXT.organisationId.filter('leo.organisation_id')}
    `,
  preAggregations: {
    pMTeamsRollup: {
      dimensions: [
        CUBE.teamId,
        CUBE.teamName,
        CUBE.organisationId,
        CUBE.legalEntityId,
        CUBE.managementAilorn,
        CUBE.propertyAddress,
        CUBE.propertyType,
        CUBE.managementId,
        CUBE.managementStartDate,
        CUBE.managementEndDate,
      ],
      indexes: {
        mgIdx: {
          columns: [CUBE.managementAilorn],
        },
      },
      refresh_key: {
        every: `1 hour`,
      }
    }
  },
  joins: {
    GeneralLedger_Income: {
      relationship: `hasMany`,
      sql: `${GeneralLedger_Income.accountOwner} = 'ailo:authz:legalentity:' || ${PM_Teams.legalEntityId} AND 
      (
        ${GeneralLedger_Income.generalLedgerManagement} = ${PM_Teams.managementAilorn}  
      )
      `,
    },
    GeneralLedger_Rent: {
      relationship: `hasMany`,
      sql: `${GeneralLedger_Rent.generalLedgerManagement} = ${PM_Teams.managementAilorn}`,
    },
    Tenancies: {
      relationship: `hasMany`,
      sql: `${Tenancies.managementId} = ${PM_Teams.managementId}`,
    },
  },
  dimensions: {
    teamId: {
      sql: `team_id`,
      type: `string`,
      primaryKey: true,
      shown: true,
    },
    teamName: {
      sql: `team_name`,
      type: `string`,
    },
    legalEntityId: {
      sql:`legal_entity_id`,
      type: `string`,
    },
    organisationId: {
      sql: `organisation_id`,
      type: `string`,
    },
    managementAilorn: {
      sql: `management_ailorn`,
      type: `string`,
    },
    managementId: {
      sql: `management_id`,
      type: `string`,
      primaryKey: true,
      shown: true,
    },
    managementStartDate: {
      sql: `management_start_date`,
      type: `time`,
    },
    managementEndDate: {
      sql: `management_end_date`,
      type: `time`,
    },
    propertyAddress: {
      sql: `property_address`,
      type: `string`,
    },
    propertyType: {
      sql: `property_type`,
      type: `string`,
    },
  },

  refresh_key: {
    every: `1 hour`,
    sql: `SELECT MAX(created_at) FROM managements`,
  },

  dataSource: `propertyManagement`,
});

Aug 07 '24 02:08 kevinleeTCA

Similar issue has been reported in early cube version.

Aug 07 '24 02:08 kevinleeTCA

Hi @kevinleeTCA 👋

Let's clarify a few bits about your setup. I see that you use pre-aggregations and, at the same time, I see that you set CUBEJS_CACHE_AND_QUEUE_DRIVER=memory. Why would you not use Cube Store? Are you trying to use something else for pre-aggregation storage?

I suggest that you set up a Cube Store cluster (router node, at least two worker nodes), use it both for cache/queue and pre-aggregations; also, please upgrade to the latest version of Cube. I believe that would resolve the issue.

Aug 12 '24 10:08 igorlukanin

Hi @igorlukanin thank you for your reply.

We upgraded to the latest version (v0.35) a few weeks, a and this problem still happens, see below latest log:
Screenshot 2024-09-03 at 1 32 47 pm Screenshot 2024-09-03 at 1 33 30 pm

Regarding the reason that why we have to use memory for CUBEJS_CACHE_AND_QUEUE_DRIVER, as we are blocked by this, i.e., cubestore does not support timestampz (which are used frequently in postgres db).

@paveltiunov Where should I cast the value? Should I create a new dimension with a value casted to timestamp or just casting the existing time dimension will be fine (assuming it wont' affect timezone)?

Sep 03 '24 03:09 kevinleeTCA

@kevinleeTCA How exactly are you blocked? Can you please explain?

Sep 03 '24 09:09 igorlukanin

@igorlukanin I shared the link above, basically we encountered an issue when running pre-aggregation with timestampz (postgres primary type) as time dimension:

ERROR [cubestore::http] <pid:1> Error processing HTTP command: Custom type 'timestamptz' is not supported

I will try your proposal mentioned here:

    dimensions:
        # It's OK to have a TIMESTAMPZ value as a time dimension, it will work just fine
      - name: created_at
        sql: created_at
        type: time

        # You have to cast it to `TIMESTAMP` if you'd like to use it in a non-`time` dimension
      - name: created_at_raw
        sql: "created_at::TIMESTAMP"
        type: string

Will update it here on how it goes

Sep 05 '24 02:09 kevinleeTCA

@igorlukanin It is still blocking us, the current cube version and cubestore version we are using is 1.0.1 (latest one published a few days ago),

Screenshot 2024-10-21 at 12 01 15 pm

we still encountered this: Custom type 'timestamptz' is not supported

error:        error="Error: Error during create table: CREATE TABLE dev_pre_aggregations.gl_r_b_t_rent_tenancy_rollup_quemn0no_teykpbng_1jhb9ur (`gl_r_b_t__first_rent_created_at` timestamptz, `gl_r_b_t__general_ledger_management` varchar(255), `gl_r_b_t__last_rent_created_at` timestamptz, `gl_r_b_t__payment_method_id` varchar(255), `gl_r_b_t__rent_amount` bigint, `gl_r_b_t__tenancy_id` varchar(255)) INDEX gl_r_b_t_rent_tenancy_rollup_mg_idx_quemn0no_teykpbng_1jhb9ur (\"gl_r_b_t__general_ledger_management\") LOCATION ?: Custom type 'timestamptz' is not supported\n    at WebSocket.<anonymous> (/cube/node_modules/@cubejs-backend/cubestore-driver/src/WebSocketConnection.ts:121:30)\n    at WebSocket.emit (node:events:519:28)\n    at WebSocket.emit (node:domain:488:12)\n    at Receiver.receiverOnMessage (/cube/node_modules/ws/lib/websocket.js:1008:20)\n    at Receiver.emit (node:events:519:28)\n    at Receiver.emit (node:domain:488:12)\n    at Receiver.dataMessage (/cube/node_modules/ws/lib/receiver.js:502:14)\n    at Receiver.getData (/cube/node_modules/ws/lib/receiver.js:435:17)\n    at Receiver.startLoop (/cube/node_modules/ws/lib/receiver.js:143:22)\n    at Receiver._write (/cube/node_modules/ws/lib/receiver.js:78:10)\n    at writeOrBuffer (node:internal/streams/writable:570:12)\n    at _write (node:internal/streams/writable:499:10)\n    at Receiver.Writable.write (node:internal/streams/writable:508:10)\n    at Socket.socketOnData (/cube/node_modules/ws/lib/websocket.js:1102:35)\n    at Socket.emit (node:events:519:28)\n    at Socket.emit (node:domain:488:12)\n    at addChunk (node:internal/streams/readable:559:12)\n    at readableAddChunkPushByteMode (node:internal/streams/readable:510:3)\n    at Socket.Readable.push (node:internal/streams/readable:390:5)\n    at TCP.onStreamRead (node:internal/stream_base_commons:191:23)\n    at TCP.callbackTrampoline (node:internal/async_hooks:130:17)"
--

I am using your proposed version

 dimensions:
        # It's OK to have a TIMESTAMPZ value as a time dimension, it will work just fine
      - name: created_at
        sql: created_at
        type: time

here are the data: Screenshot 2024-10-21 at 11 58 06 am Screenshot 2024-10-21 at 11 59 24 am

Oct 21 '24 01:10 kevinleeTCA

So I have to explicitly convert it even when I use

type: `time`

But need to double check whether it will causing issues/side effects for data came across different timezone.

Oct 23 '24 00:10 kevinleeTCA

@kevinleeTCA FYI, the min measure type is supposed to work with numeric values only. It is not correct to use it with time values: https://cube.dev/docs/reference/data-model/types-and-formats#min

Your minDate should be of type time, defined as MIN({createdAt}).

Nov 01 '24 13:11 igorlukanin

I see @igorlukanin

Type of measure min is calculated as a minimum of values defined in sql. The sql parameter is required and must include any valid SQL expression of the numeric type (without an aggregate function).

That is why the explicit conversion will work with min, as it will convert the time to of created_at to TIMESTAMP number based value.

Nov 06 '24 01:11 kevinleeTCA

Given this helps us to solve the issues when switching to cubestore (cache and queue driver), will close this now.

Nov 13 '24 11:11 kevinleeTCA

Orchestrator error: Table already exists | relation does not exist

1. The cube pre-aggregation fails when client is requesting cube data via API.

2. Error: relation ... does not exist

Current setup:

Current cube and preaggregation: