Create CLOCK load generator source
Given that the behavior of now() captures the time the dataflow was created and not the time as of when a view is queried, this leaves users without a way to extract time (see #2111 for user request). This is the correct behavior for now(), but there should be a different way for users to find out what time it is.
I believe that the correct answer here is for Materialize to expose a few elegantly chosen "system level sources", and I will begin by defining a single one: a system source that ticks once per second, emitting the OS wallclock time. The source is implemented by a trivial operator which emits the timestamp as its output, then goes to sleep for a full second.
Users can create views over this source, just as with normal sources. They can join against it, they can store the last n values of this, and otherwise declaratively do things against this source.
I propose we start with a source that's once a second - if only to keep things manageable to start.
Seems like a duplicate of #1080.
@benesch I'd like to scope this issue out soon.
The question in my mind is whether we want users to explicitly type something like this:
CREATE SOURCE clock_1s FROM TIMER '1s'
Or whether we should have a couple premade sources like mz_timer.timer_1s and mz_timer.timer_10ms. These could even be virtual and initialized lazily.
The point we want to drive home is that timers are expensive and you don't want to go creating a bunch of them. You probably don't want that 10ms timer either, unless you really know what you're doing.
There is at least one cheap source we should create: The common ask is the 'current_date' source (only ticks once a day at midnight!).
The "once a second" source feels like one we should also create, but perhaps with footgun protection (its easy to cross join against it!).
One related problem that needs solving is having some notion of monotonicity to the source that allows joins against it to cleverly conclude that this source only ever goes upwards, so that we cleverly merge join rather than cross join. I don't even have a clear specification of what I'm saying, although I suspect @justinj might.
I would be reluctant to allow the creation of arbitrary time sources (except the current_date one which feels cheap and an easy way to derisk the UX details of adding system sources without gigantic footguns) until we have a clearer handle on monotonically joining.
I think this is actually a pretty easy source to implement, as long as what we want is
(0, 0, +1)
...
(time, time+1, -1)
(time+1, time+1, +1)
...
As in, if we are just promoting the logical timestamp to data, we can pretty easily run this forward with no hard work. We can even do ranges of time in the future. In particular, the hard part about this source is knowing where to start it from and how much to tick it. Like, the source has no idea that it shouldn't produce all output from now until the end of time from the get-go; we need to slow it down to track ... something.
Isn't that where wall clock time saves us? Start at the current wall clock time; never produce time until the system clock is past time.
This has the potential to rule out some use cases that might be relevant. For example, folks loading in historical data. It seems reasonable to start at the as_of bound as long as this is meant to be real-time, but it's all a bit fidgety (does BYO stuff have to line up with system wall-clock timestamps).
Yeah, maybe that's an argument for creating your own timer sources, where you could theoretically specify the start time. But I think we can mostly ignore that for the 99% case, which is "I just want access to the current date/time."
I'm far more nervous about how we'll plan queries that look like this:
SELECT * FROM orders, timer_1s WHERE orders.timestamp > (timer_1s.time - interval '24 hours')
Oh yeah that will probably just be a cross join. I'm not myself worried about mediocre performance for folks who want to use them (we should be 100% clear that if they want performance they should reconsider their queries, or hold on until we become a temporal data processor).
If folks need to experiment with this before we've implemented it, you can get something similar to this behavior with this hack.
First create a file that gets updated at some frequency that you want materialize to update at:
$ frequency=1
$ while true ; do date +%s >> /tmp/current_time ; sleep $frequency; done
Tell materialize to keep a materialized view up to date with the current time:
CREATE SOURCE current_time (c) from file '/tmp/current_time' with (tail = true) FORMAT text;
CREATE MATERIALIZED VIEW current_time_v as select max(to_timestamp(cast(c as int))) from current_time;
I think with the advent of load generator sources, there's an opportunity here to fish this one out of the icebox. We would benefit from having a supported CLOCK load generator, that ticked at some frequency.
We'd probably want to pull from https://aws.amazon.com/about-aws/whats-new/2017/11/introducing-the-amazon-time-sync-service/ and be careful about it, though.
MAX CARDINALITY has enabled this since Feb.
CREATE SOURCE ticker FROM LOAD GENERATOR COUNTER (MAX CARDINALITY 1) WITH (size = '3xsmall');
Well, maybe not, as it just makes a counter that increases every second, rather than one based on system time, but that covers at least some nonempty subset of the use cases envisioned here.
@rjnn , do you still feel strongly that we should wire up an AWS service and be careful to have perfect accuracy, rather than just YOLOing the current system time? I think the latter would already unlock a big chunk of the value.
The current system time is fine. It can be upgraded with the same syntax down the line to use AWS Time Sync or TrueTime, or whatever.
Reopening because #21373 got reverted.
Just flagging we have a customer who wants this with the clocking ticking every hour on the hour to check a condition related to timezone