materialize icon indicating copy to clipboard operation
materialize copied to clipboard

Create CLOCK load generator source

Open rjnn opened this issue 5 years ago • 12 comments

Given that the behavior of now() captures the time the dataflow was created and not the time as of when a view is queried, this leaves users without a way to extract time (see #2111 for user request). This is the correct behavior for now(), but there should be a different way for users to find out what time it is.

I believe that the correct answer here is for Materialize to expose a few elegantly chosen "system level sources", and I will begin by defining a single one: a system source that ticks once per second, emitting the OS wallclock time. The source is implemented by a trivial operator which emits the timestamp as its output, then goes to sleep for a full second.

Users can create views over this source, just as with normal sources. They can join against it, they can store the last n values of this, and otherwise declaratively do things against this source.

I propose we start with a source that's once a second - if only to keep things manageable to start.

rjnn avatar Mar 27 '20 00:03 rjnn

Seems like a duplicate of #1080.

benesch avatar Mar 27 '20 00:03 benesch

@benesch I'd like to scope this issue out soon.

rjnn avatar May 26 '20 14:05 rjnn

The question in my mind is whether we want users to explicitly type something like this:

CREATE SOURCE clock_1s FROM TIMER '1s'

Or whether we should have a couple premade sources like mz_timer.timer_1s and mz_timer.timer_10ms. These could even be virtual and initialized lazily.

The point we want to drive home is that timers are expensive and you don't want to go creating a bunch of them. You probably don't want that 10ms timer either, unless you really know what you're doing.

benesch avatar May 27 '20 03:05 benesch

There is at least one cheap source we should create: The common ask is the 'current_date' source (only ticks once a day at midnight!).

The "once a second" source feels like one we should also create, but perhaps with footgun protection (its easy to cross join against it!).

One related problem that needs solving is having some notion of monotonicity to the source that allows joins against it to cleverly conclude that this source only ever goes upwards, so that we cleverly merge join rather than cross join. I don't even have a clear specification of what I'm saying, although I suspect @justinj might.

I would be reluctant to allow the creation of arbitrary time sources (except the current_date one which feels cheap and an easy way to derisk the UX details of adding system sources without gigantic footguns) until we have a clearer handle on monotonically joining.

rjnn avatar May 27 '20 15:05 rjnn

I think this is actually a pretty easy source to implement, as long as what we want is

(0, 0, +1)
...
(time, time+1, -1)
(time+1, time+1, +1)
...

As in, if we are just promoting the logical timestamp to data, we can pretty easily run this forward with no hard work. We can even do ranges of time in the future. In particular, the hard part about this source is knowing where to start it from and how much to tick it. Like, the source has no idea that it shouldn't produce all output from now until the end of time from the get-go; we need to slow it down to track ... something.

frankmcsherry avatar May 27 '20 16:05 frankmcsherry

Isn't that where wall clock time saves us? Start at the current wall clock time; never produce time until the system clock is past time.

benesch avatar May 27 '20 17:05 benesch

This has the potential to rule out some use cases that might be relevant. For example, folks loading in historical data. It seems reasonable to start at the as_of bound as long as this is meant to be real-time, but it's all a bit fidgety (does BYO stuff have to line up with system wall-clock timestamps).

frankmcsherry avatar May 27 '20 17:05 frankmcsherry

Yeah, maybe that's an argument for creating your own timer sources, where you could theoretically specify the start time. But I think we can mostly ignore that for the 99% case, which is "I just want access to the current date/time."

I'm far more nervous about how we'll plan queries that look like this:

SELECT * FROM orders, timer_1s WHERE orders.timestamp > (timer_1s.time - interval '24 hours')

benesch avatar May 27 '20 17:05 benesch

Oh yeah that will probably just be a cross join. I'm not myself worried about mediocre performance for folks who want to use them (we should be 100% clear that if they want performance they should reconsider their queries, or hold on until we become a temporal data processor).

frankmcsherry avatar May 27 '20 17:05 frankmcsherry

If folks need to experiment with this before we've implemented it, you can get something similar to this behavior with this hack.

First create a file that gets updated at some frequency that you want materialize to update at:

$ frequency=1
$ while true ; do date +%s >> /tmp/current_time ; sleep $frequency; done

Tell materialize to keep a materialized view up to date with the current time:

CREATE SOURCE current_time (c) from file '/tmp/current_time' with (tail = true) FORMAT text;
CREATE MATERIALIZED VIEW current_time_v as select max(to_timestamp(cast(c as int))) from current_time;

quodlibetor avatar Jun 17 '20 19:06 quodlibetor

I think with the advent of load generator sources, there's an opportunity here to fish this one out of the icebox. We would benefit from having a supported CLOCK load generator, that ticked at some frequency.

rjnn avatar Sep 21 '22 15:09 rjnn

We'd probably want to pull from https://aws.amazon.com/about-aws/whats-new/2017/11/introducing-the-amazon-time-sync-service/ and be careful about it, though.

rjnn avatar Sep 21 '22 15:09 rjnn

MAX CARDINALITY has enabled this since Feb.

CREATE SOURCE ticker FROM LOAD GENERATOR COUNTER (MAX CARDINALITY 1) WITH (size = '3xsmall');

umanwizard avatar Aug 22 '23 19:08 umanwizard

Well, maybe not, as it just makes a counter that increases every second, rather than one based on system time, but that covers at least some nonempty subset of the use cases envisioned here.

umanwizard avatar Aug 22 '23 19:08 umanwizard

@rjnn , do you still feel strongly that we should wire up an AWS service and be careful to have perfect accuracy, rather than just YOLOing the current system time? I think the latter would already unlock a big chunk of the value.

umanwizard avatar Aug 22 '23 20:08 umanwizard

The current system time is fine. It can be upgraded with the same syntax down the line to use AWS Time Sync or TrueTime, or whatever.

rjnn avatar Aug 24 '23 13:08 rjnn

Reopening because #21373 got reverted.

benesch avatar Oct 29 '23 15:10 benesch

Just flagging we have a customer who wants this with the clocking ticking every hour on the hour to check a condition related to timezone

chuck-alt-delete avatar Oct 29 '23 19:10 chuck-alt-delete