Log compacted topic testing
We need to think about how we would properly test that we support log compacted topics.
To me, what's tricky about this is balancing the realism of testing vs having a real bug appear in a reasonable amount of time.
On default settings, log compaction is not even observable until your Kafka stream has at least one million records. There is also the issue that we are performing well and keeping up with the latest updates from the stream, we would not even run into log-compaction because only old records get removed via log compaction.
I think the two times when the log-compactedness of a topic matters is when
- User creates a source on a long-running stream.
- Materialize disconnects and reconnects to Kafka. As a result, it has fallen behind from the stream, and some log compaction may have happened in the meantime. This probably ties into #1130
@quodlibetor suggested creating images of compacted Kafka topics to that way a long data generation and ingest process would not be needed every time the log-compacted topic test runs. https://docs.docker.com/storage/volumes/#backup-a-container
One thought I had was that we could artificially reduce the batch size that we handle at a time so that effects of log compaction become more noticeable on smaller topics.