Logging source misnamed
The logging source mz_dataflow_operator_addresses is misnamed, in that it contains addresses for both operators and for channels.
materialize=> select distinct id from mz_dataflow_operators where id < 170 order by id;
id
-----
167
168
(2 rows)
materialize=> select distinct id from mz_dataflow_operator_addresses where id < 170 order by id;
id
-----
167
168
169
(3 rows)
Some of the identifiers, above 169, correspond to channels instead
materialize=> select distinct id from mz_dataflow_channels where id < 170 order by id;
id
-----
169
(1 row)
No clue what the impact of naming is on the catalog, and whether it is easy to just swap the name around without harming other things.
That log is only used in a couple testdrive tests, it should be safe to rename and then do the string substitution in test/testdrive/{show,basic}.td
Hi, I'm willing to take this task up if it still exists. @quodlibetor @frankmcsherry
In the doc, mz_dataflow_operator_addresses contains addresses for both operators and channels.
It is indeed misleading.
It may be better to replace it with mz_dataflow_operator_channel_addresses.
mz_dataflow_operator_addresses is in system-catalog.md, diagnosing-using-sql.md, buildin.rs, hierarchical.jsx, memory.jsx, catalog.td, render-delta-join.td.
Do all need to be replaced? And the doc should also be corrected.
I think we should remove the channel addresses from mz_dataflow_operator_addresses and add a new mz_dataflow_channel_addresses that only has addresses of channels.
@jackwener This would require code change to catalog/builtin.rs and logging/timely.rs. Would you be up for that? I could review the change. If now, I will take a stab at it myself.
Counter-point: partitioning the addresses is non-trivial (a semijoin with the channels or operators relations). What about renaming the source something more generic, and then providing two views (for operators and channels) instead?
Hmm, that could be. I thought it would be easy because in logging/timely.rs we get operator events and channel events separately. It just happens that we then send the addresses from both of them into the one addresses log.
Could be that I misunderstood what you meant. :sweat_smile:
Whoever finds this in the future: this is decidedly more complicated than it looks. See Ben's excellent summary on an attempted PR: https://github.com/MaterializeInc/materialize/pull/14022#issuecomment-1206856063