parseable icon indicating copy to clipboard operation
parseable copied to clipboard

feature: dynamic endpoints based on pre-defined sql queries

Open nitisht opened this issue 2 years ago • 12 comments

Implement an endpoint /api/v1/query/dynamic with payload

{
    "query":"Select * from {{stream_name}} order by p_timestamp limit 10",
    "cache-duration":"5m"
}

This API will respond with a unique URL will serves plain JSON response to this query. The data will be refreshed every 5m as configured in the payload.

We'd love to get feedback from the community if they think this is useful. Please add a 👍🏽 in the issue to show your interest.

nitisht avatar Apr 20 '23 09:04 nitisht

@nitisht can I pick this issue?

sudharsangs avatar Oct 10 '23 08:10 sudharsangs

hey @sudharsangs you can, for sure

nitisht avatar Oct 10 '23 08:10 nitisht

any update with this issue?

mrchypark avatar Mar 07 '24 03:03 mrchypark

Right now there are no plans to work on this @mrchypark . Is this something you think will be useful to you?

nitisht avatar Mar 07 '24 05:03 nitisht

We can leverage Datafusion CREATE_FUNC https://github.com/apache/arrow-datafusion/blob/main/datafusion-examples/examples/function_factory.rs

nitisht avatar Mar 10 '24 15:03 nitisht

@nitisht Thank you for the information. I had initially understood this feature as something similar to a materialized view. I thought it would be a useful feature if it could proactively perform calculations and cache the results in advance, although it might also execute upon request depending on the configuration. I appreciate you sharing the details, and I will definitely look into the aspects you mentioned. Thank you for your help and insights.

mrchypark avatar Apr 13 '24 07:04 mrchypark

/bounty 300

nitisht avatar Sep 16 '24 14:09 nitisht

/attempt #370

Algora profile Completed bounties Tech Active attempts Options
@ssddOnTop 76 bounties from 1 project
Rust, Java,
C & more
Cancel attempt

ssddOnTop avatar Sep 16 '24 17:09 ssddOnTop

/attempt #370

Options

TomBebb avatar Oct 20 '24 01:10 TomBebb

This feature could be valuable for us as well, so it's great to see a PR already in progress!

Below, I’d like to share some feedback from a user perspective.

We use Parseable not only for storing but also for retrieving log data. While response times vary depending on several factors, we wouldn’t describe the responses as instant when querying streams with a high volume of events (we’re already using partitions). With carefully selected start and end dates, the response times are fast enough for internal use cases (up to 10–20 seconds), but this isn’t sufficient when we need to retrieve logs for display to end users, who expect loading times within a few hundred milliseconds to a few seconds at most. For this reason, we currently use Parseable primarily for experimenting with internal features.

A solution to further improve query response times would be fantastic. For our use cases, stale data is an acceptable trade-off for increased speed.

I'm not sure if I understood this PR comment by @nikhilsinhaparseable correctly, but our use case would likely need support for more than 10 dynamic streams at a time. We could potentially have over 100,000 distinct queries, each with varying parameters (e.g., different IDs for filtering logs). Of course, not all of these queries would need to be dynamic/cached - just the most frequent/active ones. So, while I’m not certain this feature would fully meet our needs, it does address a similar concern around improving response times.

davidwlhlm avatar Nov 11 '24 16:11 davidwlhlm

Thanks for the detailed feedback @davidwlhlm . Just curious if you tried https://www.parseable.com/docs/features/tiering? And if you still see slow responses

nitisht avatar Nov 11 '24 16:11 nitisht

Thanks for pointing out, we will take a look and run some experiments :)

davidwlhlm avatar Nov 11 '24 16:11 davidwlhlm