[Fleet] Test full upgrade migration scenarios
Fleet has many movable parts and it is tricky to test all upgrade scenarios manually. The most common upgrade scenario likely looks as following:
- Upgrade Elasticsearch / Kibana
- Upgrade Elastic Agent with Fleet Server
- Upgrade packages
- Upgrade Elastic Agents
But it could well be, that the packages are upgraded before fleet-server or Elastic Agent is upgrade. It would be nice to have multiple upgrade scenarios where the exact versions before and after the migration of the different components can be passed in to see if it works e2e. I'll use the system package as the example below as it is the one I think we should test first.
Scenario
- Spin up Elastic Stack with Elastic Agent and System package in version 7.13
- Check if data is shipping
- Upgrade Elasticsearch and Kibana to 7.14
- Check data that is still shipping and Elastic Agents / Fleet Server are healthy
- Upgrade Elastic Agent with Fleet Server to 7.14 through API call to Kibana to trigger upgrade
- Check healthy and data shipping
- Upgrade Elastic Agents to 7.14 through API call to Kibana to trigger upgrade
- Check healthy and data shipping
- Upgrade System package to latest version
- Check healthy and data shipping
Every version above could be replaced through a variable. It would be nice if the setup would allow to pass in the variables to run. This would make it very easy to reproduce a certain scenario where a bug has been reported.
There are also variations to the above scenario
- Skip a minor release for the Elastic Stack upgrade
- Upgrade packages first before Elastic Agents are upgrade
- Upgrade Elastic Agents manually instead of through Fleet
- Try error scenarios for upgrade order that should not work, like upgrading the Elastic Agent first
@cachedout I am not sure if we should move this issue to other repo
@kuisathaverat Which repo were you thinking of?
@cachedout I am not sure, the issue is related to the e2e framework, we have deprecated it so if we plan to do something about upgrades testing we should move it to the robots repo or oblt clusters. An upgrade test is pretty straightforward, the only part to implement is the verifications we want to put in place.