cube icon indicating copy to clipboard operation
cube copied to clipboard

Refresh Worker Configuration - Docker Compose

Open zeokz opened this issue 3 years ago • 4 comments

I followed the steps documented here to create my own docker-compose.yml file for my CubeJS environment and ended up with the following file:

version: '2.2'

services:
  cube_api:
    restart: always
    image: cubejs/cube
    ports:
      - 4000:4000
    environment:
      - CUBEJS_DB_HOST=<IP>
      - CUBEJS_DB_PORT=<PORT>
      - CUBEJS_DB_NAME=<DB_NAME>
      - CUBEJS_DB_USER=<USERNAME>
      - CUBEJS_DB_PASS=<PASSWORD>
      - CUBEJS_DB_SCHEMA=<SCHEMA>
      - CUBEJS_DB_TYPE=postgres
      - CUBEJS_CUBESTORE_HOST=cubestore_router
      - CUBEJS_REDIS_URL=redis://redis:6379
      - CUBEJS_API_SECRET=<SECRET>
    volumes:
      - C:/Users/<USER>/Desktop/cubejs/cube:/cube/conf
    depends_on:
      - cubestore_worker_1
      - cubestore_worker_2
      - cube_refresh_worker
      - redis

  cube_refresh_worker:
    restart: always
    image: cubejs/cube
    environment:
      - CUBEJS_DB_HOST=<IP>
      - CUBEJS_DB_PORT=<PORT>
      - CUBEJS_DB_NAME=<DB_NAME>
      - CUBEJS_DB_USER=<USERNAME>
      - CUBEJS_DB_PASS=<PASSWORD>
      - CUBEJS_DB_SCHEMA=<SCHEMA>
      - CUBEJS_DB_TYPE=postgres
      - CUBEJS_CUBESTORE_HOST=cubestore_router
      - CUBEJS_REDIS_URL=redis://redis:6379
      - CUBEJS_API_SECRET=<SECRET>
      - CUBEJS_REFRESH_WORKER=true
    volumes:
      - C:/Users/<USER>/Desktop/cubejs/cube:/cube/conf

  cubestore_router:
    restart: always
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_WORKERS=cubestore_worker_1:10001,cubestore_worker_2:10002
      - CUBESTORE_REMOTE_DIR=/cube/data
      - CUBESTORE_META_PORT=9999
      - CUBESTORE_SERVER_NAME=cubestore_router:9999
    volumes:
      - C:/Users/<USER>/Desktop/cubejs/store:/cube/data

  cubestore_worker_1:
    restart: always
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_WORKERS=cubestore_worker_1:10001,cubestore_worker_2:10002
      - CUBESTORE_SERVER_NAME=cubestore_worker_1:10001
      - CUBESTORE_WORKER_PORT=10001
      - CUBESTORE_REMOTE_DIR=/cube/data
      - CUBESTORE_META_ADDR=cubestore_router:9999
    volumes:
      - C:/Users/<USER>/Desktop/cubejs/store:/cube/data
    depends_on:
      - cubestore_router

  cubestore_worker_2:
    restart: always
    image: cubejs/cubestore:latest
    environment:
      - CUBESTORE_WORKERS=cubestore_worker_1:10001,cubestore_worker_2:10002
      - CUBESTORE_SERVER_NAME=cubestore_worker_2:10002
      - CUBESTORE_WORKER_PORT=10002
      - CUBESTORE_REMOTE_DIR=/cube/data
      - CUBESTORE_META_ADDR=cubestore_router:9999
    volumes:
      - C:/Users/<USER>/Desktop/cubejs/store:/cube/data
    depends_on:
      - cubestore_router

  redis:
    image: bitnami/redis:latest
    environment:
      - ALLOW_EMPTY_PASSWORD=yes
    logging:
      driver: none

However, although it looks like it is connecting to the DB and attempting to create the Partitioned tables in the prod_pre_aggregations schema, when I check the DB, I don't see them.

Moreover, when I am trying to consume the CubeJS API from my FE, it is giving me the following error:

Error: Your configuration restricts query requests to only be served from pre-aggregations, and required pre-aggregation partitions were not built yet. Please make sure your refresh worker is configured correctly and running.

Can someone advice what I am missing please?

zeokz avatar Aug 30 '22 06:08 zeokz

Also note that the CubeJS Store generated the Parquet files.

image

zeokz avatar Aug 30 '22 09:08 zeokz

I managed to store the cache data in the source database by setting the external property of my aggregation to false. However, CubeJSv0.30.60 is still unable to understand that I have pre-aggregations. I'm still receiving the following error:

{
  "error":"Error: Your configuration restricts query requests to only be served from pre-aggregations, and required pre-aggregation partitions were not built yet. Please make sure your refresh worker is configured correctly and running."
}

I cannot replicate this on v0.29.57, however for some reason when I use 0.29.57 the refresh worker doesn't generate the pre-aggregations unless, I request the data myself from FE.

zeokz avatar Aug 31 '22 05:08 zeokz

The only way I am able to get data is by setting both the rollupOnlyMode & externalRefresh to false (As far as I know these should be false by default). When I do this however, the pre-aggregated tables are recreated, as one can see from the screenshot below:

image

zeokz avatar Aug 31 '22 05:08 zeokz

@zeokz Could you please share your Cube data schema? Logs from your refresh worker instance would also be helpful.

paveltiunov avatar Sep 04 '22 17:09 paveltiunov

I have a similar problem. I upgraded the cube Docker setup from v0.27.7 to v0.30.75 and now get these error messages:

Your configuration restricts query requests to only be served from pre-aggregations, and required pre-aggregation partitions were not built yet. Please make sure your refresh worker is configured correctly and running.

Herse is the docker-compose.yml:

version: '3'
services:
  cube:
    image: 'cubejs/cube:v0.30.75'
    restart: unless-stopped
    ports:
      - 4000:4000
    volumes:
      - ./assets/files/analytics:/cube/conf
    environment:
      - CUBEJS_API_SECRET=
      - CUBEJS_CUBESTORE_HOST=cubestore
      - CUBEJS_DB_HOST=mysql
      - CUBEJS_DB_NAME=${MYSQL_DATABASE}
      - CUBEJS_DB_PASS=${MYSQL_PASSWORD}
      - CUBEJS_DB_PORT=${MYSQL_PORT}
      - CUBEJS_DB_TYPE=mysql
      - CUBEJS_DB_USER=${MYSQL_USER}
      - CUBEJS_REDIS_URL=redis://redis:6379
      - CUBEJS_LOG_LEVEL=trace
    links:
      - cubestore:cubestore
      - mysql:mysql
      - redis:redis
    depends_on:
      - cube-refresh-worker
      - cubestore
      - redis
  cube-refresh-worker:
    image: 'cubejs/cube:v0.30.75'
    restart: unless-stopped
    volumes:
      - ./assets/files/analytics:/cube/conf
    environment:
      - CUBEJS_API_SECRET=
      - CUBEJS_CUBESTORE_HOST=cubestore
      - CUBEJS_DB_HOST=mysql
      - CUBEJS_DB_NAME=${MYSQL_DATABASE}
      - CUBEJS_DB_PASS=${MYSQL_PASSWORD}
      - CUBEJS_DB_PORT=${MYSQL_PORT}
      - CUBEJS_DB_TYPE=mysql
      - CUBEJS_DB_USER=${MYSQL_USER}
      - CUBEJS_LOG_LEVEL=trace
      - CUBEJS_REDIS_URL=redis://redis:6379
      - CUBEJS_REFRESH_WORKER=true
    links:
      - cubestore:cubestore
      - mysql:mysql
      - redis:redis
    depends_on:
      - cubestore
      - redis
  cubestore:
    image: cubejs/cubestore:v0.30.75
    environment:
      - CUBESTORE_REMOTE_DIR=/cube/data
    volumes:
      - .cubestore:/cube/data
  mysql:
    image: 'mysql:5.7'
    restart: unless-stopped
    ports:
      - 3306:3306
    volumes:
      - mysql-volume:/var/lib/mysql
    env_file:
      - .env
  redis:
    image: 'redis:6.0'
    restart: unless-stopped
    ports:
      - 6379:6379
volumes:
  mysql-volume:

The cube configuration looks like this:

/* jshint esversion: 6 */

import { environment } from '../environment';

cube('xstatistics', {
  sql: 'SELECT * FROM ._xStatistics',

  // Refresh Cube.js every hour (production) / second (development)
  refreshKey: {
    every: `${environment() === 'production' ? '1 hour' : '1 second'}`
  },

  joins: {},

  measures: {
    average: {
      sql: 'value',
      type: 'avg'
    },
    count: {
      type: 'count',
      drillMembers: [
        id,
        organizationid,
        locationid,
        xid,
        xcreatedat,
        createdat,
        updatedat
      ]
    }
  },

  dimensions: {
    id: {
      sql: 'id',
      type: 'string',
      primaryKey: true
    },

    organizationid: {
      sql: `${CUBE}.\`organizationId\``,
      type: 'string'
    },

    locationid: {
      sql: `${CUBE}.\`locationId\``,
      type: 'string'
    },

    xid: {
      sql: `${CUBE}.\`xId\``,
      type: 'string'
    },

    key: {
      sql: 'key',
      type: 'string'
    },

    value: {
      sql: 'value',
      type: 'string'
    },

    xcreatedat: {
      sql: `${CUBE}.\`xCreatedAt\``,
      type: 'time'
    },

    createdat: {
      sql: `${CUBE}.\`createdAt\``,
      type: 'time'
    },

    updatedat: {
      sql: `${CUBE}.\`updatedAt\``,
      type: 'time'
    }
  },

  preAggregations: {
    day: {
      dimensionReferences: [
        xstatistics.key,
        xstatistics.locationid,
        xstatistics.organizationid,
        xstatistics.value
      ],
      granularity: `day`,
      measureReferences: [count],
      timeDimensionReference: xcreatedat,
      type: `rollup`
    }
  }
});

The log of the refresh worker container looks unsuspicious to me:

🚀 Cube.js server (0.30.75) is listening on 4000

Refresh Scheduler Run: {"securityContext":{},"requestId":"scheduler-60c14b17-18a1-4a7b-b6bc-eee69352579e"}

Compiling schema: {"version":"default_schema_version"}

Compiling schema: {"version":"default_schema_version"}

Query started: {"requestId":"scheduler-60c14b17-18a1-4a7b-b6bc-eee69352579e"}

Query started: {"requestId":"scheduler-60c14b17-18a1-4a7b-b6bc-eee69352579e"}

Query started: {"requestId":"scheduler-60c14b17-18a1-4a7b-b6bc-eee69352579e"}

Query started: {"requestId":"scheduler-60c14b17-18a1-4a7b-b6bc-eee69352579e"}

Query started: {"requestId":"scheduler-60c14b17-18a1-4a7b-b6bc-eee69352579e"}

Query started: {"requestId":"scheduler-60c14b17-18a1-4a7b-b6bc-eee69352579e"}

Query started: {"requestId":"scheduler-60c14b17-18a1-4a7b-b6bc-eee69352579e"}

Query started: {"requestId":"scheduler-60c14b17-18a1-4a7b-b6bc-eee69352579e"}

Found cache entry: {"cacheKey":["SELECT FLOOR((UNIX_TIMESTAMP()) / 3600) as refresh_key",[]],"time":1664205002450,"renewedAgo":21801,"renewalKey":"SQL_QUERY_RESULT_STANDALONE_a0a6a7624db72ec715e14085e463a1e1","newRenewalKey":"SQL_QUERY_RESULT_STANDALONE_a0a6a7624db72ec715e14085e463a1e1","renewalThreshold":300,"requestId":"scheduler-60c14b17-18a1-4a7b-b6bc-eee69352579e"}

Using cache for: {"cacheKey":["SELECT FLOOR((UNIX_TIMESTAMP()) / 3600) as refresh_key",[]],"requestId":"scheduler-60c14b17-18a1-4a7b-b6bc-eee69352579e"}

Query completed: {"duration":27,"requestId":"scheduler-60c14b17-18a1-4a7b-b6bc-eee69352579e"}

Found cache entry: {"cacheKey":["SELECT FLOOR((UNIX_TIMESTAMP()) / 3600) as refresh_key",[]],"time":1664205002450,"renewedAgo":21802,"renewalKey":"SQL_QUERY_RESULT_STANDALONE_a0a6a7624db72ec715e14085e463a1e1","newRenewalKey":"SQL_QUERY_RESULT_STANDALONE_a0a6a7624db72ec715e14085e463a1e1","renewalThreshold":300,"requestId":"scheduler-60c14b17-18a1-4a7b-b6bc-eee69352579e"}

Using cache for: {"cacheKey":["SELECT FLOOR((UNIX_TIMESTAMP()) / 3600) as refresh_key",[]],"requestId":"scheduler-60c14b17-18a1-4a7b-b6bc-eee69352579e"}

Query completed: {"duration":12,"requestId":"scheduler-60c14b17-18a1-4a7b-b6bc-eee69352579e"}

Query started: {"requestId":"scheduler-60c14b17-18a1-4a7b-b6bc-eee69352579e"}

Found cache entry: {"cacheKey":["SELECT FLOOR((UNIX_TIMESTAMP()) / 3600) as refresh_key",[]],"time":1664205002450,"renewedAgo":21937,"renewalKey":"SQL_QUERY_RESULT_STANDALONE_a0a6a7624db72ec715e14085e463a1e1","newRenewalKey":"SQL_QUERY_RESULT_STANDALONE_a0a6a7624db72ec715e14085e463a1e1","renewalThreshold":300,"requestId":"scheduler-60c14b17-18a1-4a7b-b6bc-eee69352579e"}

Using cache for: {"cacheKey":["SELECT FLOOR((UNIX_TIMESTAMP()) / 3600) as refresh_key",[]],"requestId":"scheduler-60c14b17-18a1-4a7b-b6bc-eee69352579e"}

gizmodus avatar Sep 26 '22 15:09 gizmodus

By the way, when setting CUBEJS_DEV_MODE=true it works.

gizmodus avatar Sep 27 '22 06:09 gizmodus

@zeokz @gizmodus The error means that pre-aggregation tables selected for querying can't be found among pre-aggregation tables. It usually indicates refresh worker misconfiguration. For example, you're using timezones to query, and those aren't set up as CUBEJS_SCHEDULED_REFRESH_TIMEZONES https://cube.dev/docs/reference/environment-variables.

paveltiunov avatar Nov 27 '22 22:11 paveltiunov

@gizmodus did you get a chance to resolve the issue? I had same issue here. I tried CUBEJS_SCHEDULED_REFRESH_TIMEZONES but without luck

zhjuncai avatar Nov 30 '22 03:11 zhjuncai

did you get a chance to resolve the issue?

Hi @zhjuncai

Yesterday I was working on @gizmodus' ticket.

Setting CUBEJS_SCHEDULED_REFRESH_TIMEZONES=Europe/Zurich in docker-compose.yml for cube and cube-refresh-worker seems to have a positive effect (while CUBEJS_SCHEDULED_REFRESH_TIMEZONES=CET does not have any effect).

The problem is hard to analyze. At some point I found out that it takes about 1 minute until the cubestore is ready after a restart (?). During that period I get Your configuration restricts query requests to only be served from pre-aggregations, and required pre-aggregation partitions were not built yet. Please make sure your refresh worker is configured correctly and running. Only after that period, results are provided.

dtslvr avatar Nov 30 '22 07:11 dtslvr

The error message was improved in the latest version.

paveltiunov avatar Dec 02 '22 19:12 paveltiunov