materialize icon indicating copy to clipboard operation
materialize copied to clipboard

Logging source misnamed

Open frankmcsherry opened this issue 5 years ago • 9 comments

The logging source mz_dataflow_operator_addresses is misnamed, in that it contains addresses for both operators and for channels.

materialize=> select distinct id from  mz_dataflow_operators where id < 170 order by id;
 id  
-----
 167
 168
(2 rows)

materialize=> select distinct id from  mz_dataflow_operator_addresses where id < 170 order by id;
 id  
-----
 167
 168
 169
(3 rows)

Some of the identifiers, above 169, correspond to channels instead

materialize=> select distinct id from  mz_dataflow_channels where id < 170 order by id;
 id  
-----
 169
(1 row)

No clue what the impact of naming is on the catalog, and whether it is easy to just swap the name around without harming other things.

frankmcsherry avatar Mar 03 '20 20:03 frankmcsherry

That log is only used in a couple testdrive tests, it should be safe to rename and then do the string substitution in test/testdrive/{show,basic}.td

quodlibetor avatar Mar 03 '20 20:03 quodlibetor

Hi, I'm willing to take this task up if it still exists. @quodlibetor @frankmcsherry

jackwener avatar Aug 18 '21 07:08 jackwener

In the doc, mz_dataflow_operator_addresses contains addresses for both operators and channels. It is indeed misleading. It may be better to replace it with mz_dataflow_operator_channel_addresses.

jackwener avatar Aug 18 '21 07:08 jackwener

mz_dataflow_operator_addresses is in system-catalog.md, diagnosing-using-sql.md, buildin.rs, hierarchical.jsx, memory.jsx, catalog.td, render-delta-join.td. Do all need to be replaced? And the doc should also be corrected.

jackwener avatar Aug 18 '21 07:08 jackwener

I think we should remove the channel addresses from mz_dataflow_operator_addresses and add a new mz_dataflow_channel_addresses that only has addresses of channels.

@jackwener This would require code change to catalog/builtin.rs and logging/timely.rs. Would you be up for that? I could review the change. If now, I will take a stab at it myself.

aljoscha avatar Aug 20 '21 12:08 aljoscha

Counter-point: partitioning the addresses is non-trivial (a semijoin with the channels or operators relations). What about renaming the source something more generic, and then providing two views (for operators and channels) instead?

frankmcsherry avatar Aug 20 '21 13:08 frankmcsherry

Hmm, that could be. I thought it would be easy because in logging/timely.rs we get operator events and channel events separately. It just happens that we then send the addresses from both of them into the one addresses log.

aljoscha avatar Aug 20 '21 13:08 aljoscha

Could be that I misunderstood what you meant. :sweat_smile:

aljoscha avatar Aug 20 '21 14:08 aljoscha

Whoever finds this in the future: this is decidedly more complicated than it looks. See Ben's excellent summary on an attempted PR: https://github.com/MaterializeInc/materialize/pull/14022#issuecomment-1206856063

aljoscha avatar Aug 08 '22 06:08 aljoscha