parseable feature: dynamic endpoints based on pre-defined sql queries

Implement an endpoint /api/v1/query/dynamic with payload

{
    "query":"Select * from {{stream_name}} order by p_timestamp limit 10",
    "cache-duration":"5m"
}

This API will respond with a unique URL will serves plain JSON response to this query. The data will be refreshed every 5m as configured in the payload.

We'd love to get feedback from the community if they think this is useful. Please add a 👍🏽 in the issue to show your interest.

Apr 20 '23 09:04 nitisht

@nitisht can I pick this issue?

Oct 10 '23 08:10 sudharsangs

hey @sudharsangs you can, for sure

Oct 10 '23 08:10 nitisht

any update with this issue?

Mar 07 '24 03:03 mrchypark

Right now there are no plans to work on this @mrchypark . Is this something you think will be useful to you?

Mar 07 '24 05:03 nitisht

We can leverage Datafusion CREATE_FUNC https://github.com/apache/arrow-datafusion/blob/main/datafusion-examples/examples/function_factory.rs

Mar 10 '24 15:03 nitisht

@nitisht Thank you for the information. I had initially understood this feature as something similar to a materialized view. I thought it would be a useful feature if it could proactively perform calculations and cache the results in advance, although it might also execute upon request depending on the configuration. I appreciate you sharing the details, and I will definitely look into the aspects you mentioned. Thank you for your help and insights.

Apr 13 '24 07:04 mrchypark

/bounty 300

Sep 16 '24 14:09 nitisht

/attempt #370

Algora profile	Completed bounties	Tech	Active attempts	Options
@ssddOnTop	76 bounties from 1 project	Rust, Java, C & more		Cancel attempt

Sep 16 '24 17:09 ssddOnTop

/attempt #370

Options

Cancel my attempt

Oct 20 '24 01:10 TomBebb

This feature could be valuable for us as well, so it's great to see a PR already in progress!

Below, I’d like to share some feedback from a user perspective.

We use Parseable not only for storing but also for retrieving log data. While response times vary depending on several factors, we wouldn’t describe the responses as instant when querying streams with a high volume of events (we’re already using partitions). With carefully selected start and end dates, the response times are fast enough for internal use cases (up to 10–20 seconds), but this isn’t sufficient when we need to retrieve logs for display to end users, who expect loading times within a few hundred milliseconds to a few seconds at most. For this reason, we currently use Parseable primarily for experimenting with internal features.

A solution to further improve query response times would be fantastic. For our use cases, stale data is an acceptable trade-off for increased speed.

I'm not sure if I understood this PR comment by @nikhilsinhaparseable correctly, but our use case would likely need support for more than 10 dynamic streams at a time. We could potentially have over 100,000 distinct queries, each with varying parameters (e.g., different IDs for filtering logs). Of course, not all of these queries would need to be dynamic/cached - just the most frequent/active ones. So, while I’m not certain this feature would fully meet our needs, it does address a similar concern around improving response times.

Nov 11 '24 16:11 davidwlhlm

Thanks for the detailed feedback @davidwlhlm . Just curious if you tried https://www.parseable.com/docs/features/tiering? And if you still see slow responses

Nov 11 '24 16:11 nitisht

Thanks for pointing out, we will take a look and run some experiments :)

Nov 11 '24 16:11 davidwlhlm