MINOR: Adds KRaft versions of most streams system tests
Adds the annotation @matrix(metadata_quorum=quorum.all_non_upgrade) to many existing tests, which runs them with each of zookeeper and remote_kraft nodes.
This skips tests which use various forms of Kafka versioning since those seem to have issues with KRaft at the moment. Running these tests with KRaft will require a followup PR.
Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)
In addition to addressing the review comments, can you post a link to the system test results with this change? It would be good to verify the impact before merging.
+1. I'd also love to learn how much system test time increase this one would incur.
In addition to addressing the review comments, can you post a link to the system test results with this change? It would be good to verify the impact before merging.
+1. I'd also love to learn how much system test time increase this one would incur.
My run with my changes is here: https://jenkins.confluent.io/job/system-test-kafka-branch-builder/5067/
I'm trying to run the same tests on trunk and see, but there are unrelated failures. Will update when I get a run.
Thanks @AlanConfluent ! A few meta questions:
- Seems the
streams_cooperative_rebalance_upgrade_test.pyis not included in this PR?
I didn't update any of the upgrade tests just yet. I was having a hard time running against different versions of Kafka and using KRaft -- this seems like a reasonable followup, so I didn't do it yet.
- I think for
streams_application_upgrade_test.pywe should also consider enabling kraft on the servers, to make sure that kraft works when streams itself upgrade.
I agree. This will be a followup.
- In
streams_broker_compatibility_test.pywhen we test for broker versions > 3.1 we should also allow it to be kraft.
Maybe this was the main issue I was running into. I thought KRaft was available in earlier versions, but saw odd failures. I'll talk to you about what it would take to get this working.
- This is not related to this PR, but it seems the test coverage for
streams_application_ugprade_testandstreams_upgrade_testhave much overlaps. @vvcephei could you chime in here since you have much experience with the former class file. Could we dedup their coverage hence reduce our test time?
- If the current e2e time is too high, I feel maybe we can skip adding the kraft model for the following suite: a. shutdown_deadlock b. relational_smoke, since its dependency on kraft is exactly covered by smoke -- i.e. if there's an issue with relational_smoke due to kraft, then smoke itself should fail as well. c. named_repartition Good to consider. Let's evaluate after I get a successful run of trunk.
Re-triggered the jenkins build.
I made the change to switch some of these tests to just run with remote_kraft to minimize test run time.
It looks like you have a green run: http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1661474350--AlanConfluent--updates_tests_kraft--9be74f3d1/2022-08-25--001./2022-08-25--001./report.html
I think this was the ultimate run which covered everything under streams: http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1661556003--AlanConfluent--updates_tests_kraft--9be74f3d1/2022-08-26--001./2022-08-26--001./report.html