cube icon indicating copy to clipboard operation
cube copied to clipboard

[cubestore] corrupt data error: File **.parquet doesn't exist in remote file system

Open kevinleeTCA opened this issue 1 year ago • 2 comments

Problem

We keeps having this error msg on daily basis in our cubestore logs:

...

2025-01-30 15:17:46.715 | 2025-01-30T04:17:46.715Z ERROR [cubestore::queryplanner::query_executor] <pid:1> Error Query (126.880301ms):
2025-01-30 15:17:46.715 | 2025-01-30T04:17:46.715Z INFO  [cubestore::metastore] <pid:1> Deactivating table prod_pre_aggregations.gl_income_rollup20240101_kla32oge_di0ibpro_1jkd7n6 (#2427) due to corrupt data error: File 9165-ncfolk1k.parquet doesn't exist in remote file system

...

Related Cube.js schema

cube(`GeneralLedger_Income`, {
  sql: `${incomeSQL}
    where ${incomeSQLFilter}
    `,
  extends: GeneralLedger_BaseIncome,
  sqlAlias: 'glIncome',
  preAggregations: {
    rollup: {
      measures: [CUBE.incomeAmount, CUBE.count, CUBE.managementFeeIncomeAmount, CUBE.minDate],
      dimensions: [CUBE.transactionTaxCategoryId, CUBE.generalLedgerManagement, CUBE.accountOwner, CUBE.accountBookType],
      timeDimension: CUBE.createdAt,
      granularity: `day`,
      indexes: {
        idx: {
          columns: [ CUBE.accountOwner, CUBE.generalLedgerManagement],
        },
      },
      refresh_key: {
        every: `30 6 * * *`,
        timezone: `Australia/Sydney`,
        incremental: true,
        update_window: `60 days`,
      },
      partition_granularity: `month`,
      build_range_start: {
        sql: `SELECT '2020-08-11'::timestamp AT TIME ZONE 'utc'`,
      },
      build_range_end: {
        sql: `SELECT NOW()`,
      },
    },
  },
  measures: {},
  dimensions: {},
  segments: {},
  dataSource: `generalLedger`,
});

Related Query


2025-01-30 15:17:46.716 | 2025-01-30T04:17:46.715Z ERROR [cubestore::queryplanner::query_executor] <pid:1> Error Query Physical Plan (126.880301ms): GlobalLimit, n: 3000 |  
-- | -- | --
  |   | 2025-01-30 15:17:46.716 | Scan p_m__fee_tax_categories__rollup, source: CubeTable(index: p_m__fee_tax_categories_rollup_idx_h4fwcpsc_4l4cgk5j_1jjaiga:1346:[2648, 2684]:sort_on[p_m__fee_tax_categories__tax_category_id]), fields: * |  
  |   | 2025-01-30 15:17:46.716 | Scan groups_by_ma__g_b_m_rollup, source: CubeTable(index: groups_by_ma_g_b_m_rollup_mg_idx_xde2o1oi_sdfcdeov_1jpls5j:16773:[32798]:sort_on[groups_by_ma__management_ailorn]), fields: [groups_by_ma__management_ailorn] |  
  |   | 2025-01-30 15:17:46.716 | Scan prod_pre_aggregations.gl_income_rollup20240601_y1vcbtr3_tvbyjwvf_1jplegd, source: CubeTable(index: gl_income_rollup_idx_y1vcbtr3_tvbyjwvf_1jplegd:16738:[32734, 32749]:sort_on[gl_income__account_owner, gl_income__general_ledger_management]), fields: [gl_income__account_book_type, gl_income__account_owner, gl_income__general_ledger_management, gl_income__transaction_tax_category_id, gl_income__created_at_day, gl_income__count, gl_income__income_amount, gl_income__management_fee_income_amount] |  
  |   | 2025-01-30 15:17:46.716 | Scan prod_pre_aggregations.gl_income_rollup20240501_agnnisxe_is1zsk0r_1jplegd, source: CubeTable(index: gl_income_rollup_idx_agnnisxe_is1zsk0r_1jplegd:16736:[32732, 32748]:sort_on[gl_income__account_owner, gl_income__general_ledger_management]), fields: [gl_income__account_book_type, gl_income__account_owner, gl_income__general_ledger_management, gl_income__transaction_tax_category_id, gl_income__created_at_day, gl_income__count, gl_income__income_amount, gl_income__management_fee_income_amount] |  
  |   | 2025-01-30 15:17:46.716 | Scan prod_pre_aggregations.gl_income_rollup20240401_4rr0etsz_50d11yiv_1jplesf, source: CubeTable(index: gl_income_rollup_idx_4rr0etsz_50d11yiv_1jplesf:16740:[32740, 32751]:sort_on[gl_income__account_owner, gl_income__general_ledger_management]), fields: [gl_income__account_book_type, gl_income__account_owner, gl_income__general_ledger_management, gl_income__transaction_tax_category_id, gl_income__created_at_day, gl_income__count, gl_income__income_amount, gl_income__management_fee_income_amount] |  
  |   | 2025-01-30 15:17:46.715 | Scan prod_pre_aggregations.gl_income_rollup20240301_msq2aw0n_ykwidr0u_1jkd5ic, source: CubeTable(index: gl_income_rollup_idx_msq2aw0n_ykwidr0u_1jkd5ic:4619:[9073, 9096]:sort_on[gl_income__account_owner, gl_income__general_ledger_management]), fields: [gl_income__account_book_type, gl_income__account_owner, gl_income__general_ledger_management, gl_income__transaction_tax_category_id, gl_income__created_at_day, gl_income__count, gl_income__income_amount, gl_income__management_fee_income_amount] |  
  |   | 2025-01-30 15:17:46.715 | Scan prod_pre_aggregations.gl_income_rollup20240201_tjjj3bbi_aujhlb1e_1jkd63j, source: CubeTable(index: gl_income_rollup_idx_tjjj3bbi_aujhlb1e_1jkd63j:4629:[9091, 9112]:sort_on[gl_income__account_owner, gl_income__general_ledger_management]), fields: [gl_income__account_book_type, gl_income__account_owner, gl_income__general_ledger_management, gl_income__transaction_tax_category_id, gl_income__created_at_day, gl_income__count, gl_income__income_amount, gl_income__management_fee_income_amount] |  
  |   | 2025-01-30 15:17:46.715 | Scan prod_pre_aggregations.gl_income_rollup20240101_kla32oge_di0ibpro_1jkd7n6, source: CubeTable(index: gl_income_rollup_idx_kla32oge_di0ibpro_1jkd7n6:4657:[9143, 9165]:sort_on[gl_income__account_owner, gl_income__general_ledger_management]), fields: [gl_income__account_book_type, gl_income__account_owner, gl_income__general_ledger_management, gl_income__transaction_tax_category_id, gl_income__created_at_day, gl_income__count, gl_income__income_amount, gl_income__management_fee_income_amount] |  
  |   | 2025-01-30 15:17:46.715 | Scan prod_pre_aggregations.gl_income_rollup20231201_eevk4axq_angupusd_1jpls6k, source: CubeTable(index: gl_income_rollup_idx_eevk4axq_angupusd_1jpls6k:16781:[32810, 32822]:sort_on[gl_income__account_owner, gl_income__general_ledger_management]), fields: [gl_income__account_book_type, gl_income__account_owner, gl_income__general_ledger_management, gl_income__transaction_tax_category_id, gl_income__created_at_day, gl_income__count, gl_income__income_amount, gl_income__management_fee_income_amount] |  
  |   | 2025-01-30 15:17:46.715 | Scan prod_pre_aggregations.gl_income_rollup20231101_m0i0v5sg_na1ecvvu_1jpls6k, source: CubeTable(index: gl_income_rollup_idx_m0i0v5sg_na1ecvvu_1jpls6k:16779:[32808, 32823]:sort_on[gl_income__account_owner, gl_income__general_ledger_management]), fields: [gl_income__account_book_type, gl_income__account_owner, gl_income__general_ledger_management, gl_income__transaction_tax_category_id, gl_income__created_at_day, gl_income__count, gl_income__income_amount, gl_income__management_fee_income_amount] |  
  |   | 2025-01-30 15:17:46.715 | Scan prod_pre_aggregations.gl_income_rollup20231001_y4xyjrqr_jg1k5lim_1jplslq, source: CubeTable(index: gl_income_rollup_idx_y4xyjrqr_jg1k5lim_1jplslq:16783:[32814, 32828]:sort_on[gl_income__account_owner, gl_income__general_ledger_management]), fields: [gl_income__account_book_type, gl_income__account_owner, gl_income__general_ledger_management, gl_income__transaction_tax_category_id, gl_income__created_at_day, gl_income__count, gl_income__income_amount, gl_income__management_fee_income_amount] |  
  |   | 2025-01-30 15:17:46.715 | Scan prod_pre_aggregations.gl_income_rollup20230901_ans5yvsu_05ydwpmt_1jpls7t, source: CubeTable(index: gl_income_rollup_idx_ans5yvsu_05ydwpmt_1jpls7t:16777:[32806, 32818]:sort_on[gl_income__account_owner, gl_income__general_ledger_management]), fields: [gl_income__account_book_type, gl_income__account_owner, gl_income__general_ledger_management, gl_income__transaction_tax_category_id, gl_income__created_at_day, gl_income__count, gl_income__income_amount, gl_income__management_fee_income_amount] |  
  |   | 2025-01-30 15:17:46.715 | Scan prod_pre_aggregations.gl_income_rollup20230801_30cmavvy_zpgdclsh_1jplt4k, source: CubeTable(index: gl_income_rollup_idx_30cmavvy_zpgdclsh_1jplt4k:16787:[32826, 32833]:sort_on[gl_income__account_owner, gl_income__general_ledger_management]), fields: [gl_income__account_book_type, gl_income__account_owner, gl_income__general_ledger_management, gl_income__transaction_tax_category_id, gl_income__created_at_day, gl_income__count, gl_income__income_amount, gl_income__management_fee_income_amount] |  
  |   | 2025-01-30 15:17:46.715 | Scan prod_pre_aggregations.gl_income_rollup20230701_zgkopi5w_wflmgv1y_1jpls7t, source: CubeTable(index: gl_income_rollup_idx_zgkopi5w_wflmgv1y_1jpls7t:16775:[32804, 32816]:sort_on[gl_income__account_owner, gl_income__general_ledger_management]), fields: [gl_income__account_book_type, gl_income__account_owner, gl_income__general_ledger_management, gl_income__transaction_tax_category_id, gl_income__created_at_day, gl_income__count, gl_income__income_amount, gl_income__management_fee_income_amount] |  
  |   | 2025-01-30 15:17:46.715 | Union |  
  |   | 2025-01-30 15:17:46.715 | Scan p_m__teams__p_m_teams_rollup, source: CubeTable(index: p_m__teams_p_m_teams_rollup_mg_idx_lpogm2te_gablnqqs_1jpls5i:16771:[32796, 32813]), fields: [p_m__teams__legal_entity_ailorn, p_m__teams__management_ailorn, p_m__teams__organisation_id, p_m__teams__property_type] |  
  |   | 2025-01-30 15:17:46.715 | Filter |  
  |   | 2025-01-30 15:17:46.715 | Join on: [#p_m__teams__p_m_teams_rollup.p_m__teams__legal_entity_ailorn = #gl_income__rollup.gl_income__account_owner, #p_m__teams__p_m_teams_rollup.p_m__teams__management_ailorn = #gl_income__rollup.gl_income__general_ledger_management] |  
  |   | 2025-01-30 15:17:46.715 | Filter |  
  |   | 2025-01-30 15:17:46.715 | Join on: [#p_m__teams__p_m_teams_rollup.p_m__teams__management_ailorn = #groups_by_ma__g_b_m_rollup.groups_by_ma__management_ailorn] |  
  |   | 2025-01-30 15:17:46.715 | Join on: [#gl_income__rollup.gl_income__transaction_tax_category_id = #p_m__fee_tax_categories__rollup.p_m__fee_tax_categories__tax_category_id] |  
  |   | 2025-01-30 15:17:46.715 | ClusterSend, indices: [[16771], [16775, 16787, 16777, 16783, 16779, 16781, 4657, 4629, 4619, 16740, 16736, 16738], [16773], [1346]] |  
  |   | 2025-01-30 15:17:46.715 | Aggregate |  
  |   | 2025-01-30 15:17:46.715 | Projection, [gl_income__transaction_tax_category_id, p_m__fee_tax_categories__fee_tax_category_name, gl_income__income_amount, gl_income__management_fee_income_amount, gl_income__count] |  
  |   | 2025-01-30 15:17:46.715 | Sort |  
  |   | 2025-01-30 15:17:46.715 | Limit


kevinleeTCA avatar Jan 30 '25 04:01 kevinleeTCA

Other cube also had this issue constantly, the client query ends up timesout.

error: Error while querying queueId="398" queueSize="0" duration="97" queryKey="[\"SELECT `chat__m_r_t_by_agency__median_response_time` `chat__m_r_t_by_agency__median_response_time` FROM prod_pre_aggregations.chat__m_r_t_by_agency_rollup_bgmxzqjp_hvtni333_1kjespr AS `chat__m_r_t_by_agency__rollup` GROUP BY 1 ORDER BY 1 ASC LIMIT 100\",[]]" queuePrefix="SQL_QUERY_EXT_STANDALONE" requestId="3878f8db-9a0e-4af7-a8e2-482ab8fb0bd0-span-1" timeInQueue="0" error="Error: Internal: Execution error: CorruptData: File 356210-6fxdlqjv.parquet doesn't exist in remote file system at WebSocket.<anonymous> (/cube/node_modules/@cubejs-backend/cubestore-driver/src/WebSocketConnection.ts:132:32) at WebSocket.emit (node:events:518:28) at WebSocket.emit (node:domain:489:12) at Receiver.receiverOnMessage (/cube/node_modules/ws/lib/websocket.js:1070:20) at Receiver.emit (node:events:518:28) at Receiver.emit (node:domain:489:12) at Receiver.dataMessage (/cube/node_modules/ws/lib/receiver.js:502:14) at Receiver.getData (/cube/node_modules/ws/lib/receiver.js:435:17) at Receiver.startLoop (/cube/node_modules/ws/lib/receiver.js:143:22) at Receiver._write (/cube/node_modules/ws/lib/receiver.js:78:10)" level="error"

Cube model:

cube('Chat_MRTByAgency', {
    sql: `
       with response_time_by_organisation as (
            select 
                split_part(  organisation_ailorn, ':', '4') as organisation_id,
                median_response_time_last_thirty_days
            from data_export.organisation_median_response_time_report
        )
       select 
           organisation_id,
           median_response_time_last_thirty_days
       from response_time_by_organisation
       where ${SECURITY_CONTEXT.organisationId.filter('organisation_id')}`,

    preAggregations: {
        rollup: {
            dimensions: [
                CUBE.medianResponseTime,
                CUBE.organisationId
            ],
            indexes: {
                idx: {
                    columns: [CUBE.organisationId],
                }
            },

                refresh_key: {
                    every: "1 hour",
                }
        },
    },

    dimensions: {
        organisationId: {
            sql: `organisation_id`,
            type: `string`
        },
        medianResponseTime: {
            sql: `median_response_time_last_thirty_days`,
            type: `number`
        }
    },

    dataSource: `chat`
});

I feel like this kind of out of date cache issue will always introduce small breaking windows to our client query.

kevinleeTCA avatar Dec 09 '25 04:12 kevinleeTCA

@igorlukanin this could be a bug on how pre-aggregation is being updated, ideally it should have 0 downtime. Could you pls provide some insights.

Cube version: 1.3.26

kevinleeTCA avatar Dec 09 '25 04:12 kevinleeTCA