influxdb-python icon indicating copy to clipboard operation
influxdb-python copied to clipboard

DataFrame client takes >50s to get a response for 9k rows with 12 columns

Open raokrutarth opened this issue 5 years ago • 0 comments

  • InfluxDB version: influxdb:1.7-alpine
  • InfluxDB-python version: 5.3.1
  • Python version: 3.8.5
  • Operating system version: $ uname -a Linux 5.4.0-56-generic #62-Ubuntu SMP Mon Nov 23 19:20:19 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux $ cat /proc/cpuinfo | head -n 10 processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 62 model name : Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz stepping : 4 microcode : 0x42e cpu MHz : 2190.312 cache size : 15360 KB

SnakeViz profile snippet: image

Query made:

 SELECT
        FIRST(volume) AS volume_start,
        FIRST(trade_price) AS trade_price_start,
        FIRST(ask_price) AS ask_price_start,

        LAST(volume) AS volume_end,
        LAST(trade_price) AS trade_price_end,
        LAST(ask_price) AS ask_price_end,
        LAST(day_change_percentage) AS day_change_percentage
    FROM full_market_snapshot
    WHERE
        time > now() - 48h30m
    GROUP BY ticker

Database entries/scale:

> select count(*) from full_market_snapshot
name: full_market_snapshot
time                 count_ask_price count_bid_price count_day_change_percentage count_last_trade_size count_trade_price count_volume count_weighted_volume count_yesterday_max
----                 --------------- --------------- --------------------------- --------------------- ----------------- ------------ --------------------- -------------------
1970-01-01T00:00:00Z 15054207        15054207        15054207                    15054207              15054207          15054207     15054207              15054207
> select count(*) from full_market_snapshot where time > now() - 48h30m
name: full_market_snapshot
time                           count_ask_price count_bid_price count_day_change_percentage count_last_trade_size count_trade_price count_volume count_weighted_volume count_yesterday_max
----                           --------------- --------------- --------------------------- --------------------- ----------------- ------------ --------------------- -------------------
2020-12-18T09:58:42.300308784Z 3949856         3949856         3949856                     3949856               3949856           3949856      3949856               3949856
> 

Returned dataframe dimensions: [9203 rows x 8 columns]

What can I do to improve the response time? The InfluxDB container uses <10% of the allotted CPU and main memory and storage is SSD with a docker volume mount.

raokrutarth avatar Dec 20 '20 10:12 raokrutarth