Dynamic endpoints
Fixes #370
Description
Adds support for dynamic queries by periodically re-fetching the query given.
Associates each dynamic query with a UUID. Uses a static ref map to associate the UUIDs with the latest result.
This PR has:
- [ y] been tested to ensure log ingestion and log query works.
- [y ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
- [ y] added documentation for new or modified features or behaviors.
/claim #370
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅
I have read the CLA Document and I hereby sign the CLA
Can you look at the CI failures @TomBebb ?
Done
@TomBebb we need to add a few validations and update a few things in this PR -
- max cache duration should be 60 mins
- max number of unique url server stores should be 10
- results should be cached in disk not in memory - for this, you can expose an env var where user can provide a directory path and you can use this path to store the results in parquet with the file name uuid.parquet
- server should refresh the data every minute , delete the old parquet and write the new parquet in the disk so that at any point when client does a GET /uuid call, server has the preprocessed data and can return the data from the disk.
do let me know if you need further clarification to the points mentioned here
Thanks!
@nikhilsinhaparseable Cool, working on that now
@TomBebb we need to add a few validations and update a few things in this PR -
1. max cache duration should be 60 mins 2. max number of unique url server stores should be 10 3. results should be cached in disk not in memory - for this, you can expose an env var where user can provide a directory path and you can use this path to store the results in parquet with the file name uuid.parquet 4. server should refresh the data every minute , delete the old parquet and write the new parquet in the disk so that at any point when client does a GET /uuid call, server has the preprocessed data and can return the data from the disk.do let me know if you need further clarification to the points mentioned here
Thanks!
What should the difference be between hitting the cache duration and waiting a minute be?
@TomBebb
cache duration is used to store the amount of data you want to store for a particular uuid (eg. 5 mins worth of data)
but this 5 mins range is not fixed, it is relative from current time stamp
say, i have made a POST /dynamic_query call with "cache-duration":"5m"
you will generate a hash for this query but will have to create a separate thread that runs every minute to update the parquet in the disk every minute by fetching latest 5 mins worth of data.
When a user calls GET /dynamic_query/{uuid} at any point of time, server should return latest 5 mins of preprocessed data available in the disk.
We should expose another endpoint DELETE /dynamic_query/{uuid} to delete the uuid and corresponding parquet from the disk.
Implemented using advised notes, please can someone re-review?
@TomBebb below are the review comments -
- the env
DYNAMIC_QUERY_RESULTS_CACHE_PATH_ENVshould be optional, if user provides, dynamic query endpoints work else should give error as env not set - return the uuid as response first and separate thread processes the query and write parquet to disk, handler should not wait for query to be processed before returning the uuid
- query fails when I use aggregate query like
Select count(*) from app2000 order by p_timestamp(please test with other aggregate queries as well)
thread 'actix-rt|system:0|arbiter:1' panicked at server/src/dynamic_query.rs:114:69:
called `Result::unwrap()` on an `Err` value: Datafusion(SchemaError(FieldNotFound { field: Column { relation: Some(Bare { table: "app2000" }), name: "p_timestamp" }, valid_fields: [Column { relation: None, name: "count(*)" }] }, Some("")))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
-
BTreeMap<Ulid, DynamicQuery>- store this in disk in root path ofDYNAMIC_QUERY_RESULTS_CACHE_PATH_ENVso that you can load to memory at server start
I will update if I find anything else.
@nikhilsinhaparseable I cannot reproduce the aggregate query issue on my commit before master merge, but can in commits since and on master.
@nikhilsinhaparseable There is a datafusion byte serialization crate datafusion_proto but it only supports expression conversion to / from bytes for now. I can work around that or just shove the raw text query in the parquet file.
@TomBebb please update if the PR is ready for review