Don't override relatedImages in base CSV. #5763
Closes #5763, #5000
Description of the change: Regression: relatedImages in base CSV were being overridden.
Motivation for the change: Do not want relatedImages overridden from baseCSV.
Checklist
If the pull request includes user-facing changes, extra documentation is required:
- [x] Add a new changelog fragment in
changelog/fragments(seechangelog/fragments/00-template.yaml) - [ ] Add or update relevant sections of the docs website in
website/content/en/docs
@ryantking What's the point of appending containerRef on name collision if you are doing it on all collisions anyway? To make it obvious which are colliding?
https://github.com/operator-framework/operator-sdk/blob/fb15b62e9c0defee208086ccab79112c68843454/internal/cmd/operator-sdk/generate/internal/relatedimages.go#L120-L123
ie. I've just tried in my change putting this in base csv.
relatedImages:
- image: quay.io/konveyor/velero-restic-restore-helper:latest
name: velero_restic_restore_helper
- image: quay.io/konveyor/velero-restic-restore-helpser:latest
name: velero_restic_restore_helper
And got back this generated csv.
relatedImages:
- image: quay.io/konveyor/velero-restic-restore-helper:latest
name: baseCSV-velero_restic_restore_helper
- image: quay.io/konveyor/velero-restic-restore-helpser:latest
name: baseCSV-velero_restic_restore_helper
Result is still two relatedImages with name: baseCSV-velero_restic_restore_helper
warning: multiple related images with the same image ref, "openshift_plugin", and different names found.The image will only be listed once with an empty name.It is recmmended to either remove the duplicate or use the exact same name.
Is it a good idea to change image name to empty? Is this to make a user notice what they need to fix? Is it okay to leave as is or will this cause failures with mirroring later on?
- image: quay.io/konveyor/openshift-velero-plugin:latest
name: ""
@kaovilai
What's the point of appending containerRef on name collision if you are doing it on all collisions anyway? To make it obvious which are colliding?
In the sample CSV you provided you have a typo in the second image name, "helpser" instead of "helper" I assume. If you fix that and rerun if the code is working as intended the output should be one relatedImage entry with no prefix since its a collision on both name and image. The name is only prefixed with there are two images with the same name, but different images.
Is it a good idea to change image name to empty? Is this to make a user notice what they need to fix? Is it okay to leave as is or will this cause failures with mirroring later on?
We changed the name to empty since we designed to logic to only list each image one time. We did not test what happens with image mirroring with an empty name. We provided the warning message to make it obvious what was happening then let users decide to update the relatedImage name so they match or bring back an issue to us if there is an alternative to blanking out the name when applying our de-dup logic. If you feel that this is not good behavior for then open a new ticket and tag myself and @joelanford in it for further discussion.
Hope this answers both your questions.
@ryantking not typo. Was testing out the different conditional branches.
The name is only prefixed with there are two images with the same name, but different images.
This does not fix the collision still. Is it just a notification mechanism?
I think a more obvious is to name something in all caps.. like OPERATOR_SDK_NAME_CONFLICT_originalname
You can't Ctrl+F your way to find "" as easily in the yaml.
Sounds like both "" and containerRef appending are notification mechanism.. but not a very consistent one.. since one adds to the name, and one removes the name.
@kaovilai Okay I think I have a better idea of what you're pointing out now. The collision-resolution logic might not correctly resolve colissions when one or both related images are defined in the base CSV.
The logic operates under the assumption that collisions will occur across different containers and deployments. We made this assumption originally since a colission within a container would require two environment variables with the same key in a single container environment, which I believe Kubernetes would not allow. The edge case introduced here makes it so two relatedImages can have the same name, different image, but the same containerRef that war previously assumed to be different if at least one was hardcoded into the base CSV.
I'm not entirely sure what the best resolution is since the relatedImage collector logic assumes we have knowledge of the source container, which we don't for hardcoded relatd images. To get more understanding of the usecase, what is the reasoning behind defining relatedImages in the base CSV spec instead of as environment variables in one or more containers, which will then get picked up when building the output CSV?
Sounds like both "" and containerRef appending are notification mechanism.. but not a very consistent one.. since one adds to the name, and one removes the name.
Neither were defined to be notification mechanisms. Prepending the containerRef was at the time a way to actually resolve conflicts since we didn't have this edge case yet, and empty string for image collisions is because the program cannot determine which name to use if the same image has two different names. The warning that gets printed is the notification mechanism. Maybe we should also warn on name collisions, but that's an aside.
@ryantking same usecase as previously stated in https://github.com/operator-framework/operator-sdk/issues/5000 which links to https://docs.openshift.com/container-platform/4.7/operators/operator_sdk/osdk-generating-csvs.html#olm-enabling-operator-for-restricted-network_osdk-generating-csvs that says to update
Update spec.relatedImages
directly. They made no mention of RELATED_IMAGE_ in deployment.
This was how it was always done before your relatedImage discovery feature.
The specified relatedImage are potential images required in disconnected environments of supported plugins that user may choose to enable after.
@kaovilai The correct procedure is defined in the latest documentation. It instructs you to update the environment in manager.yaml and explains how to use them in the operator code.
Let me know if that solves your usecase. We still probably want to have a discussion on what to do with any relatedImages in the base CSV, but before that we need to establish a usecase where they must be defined in the base CSV and cannot be defined as environment variables the way we officially support.
If the official method works for you then I'm going to hold this PR and have a longer discussion with the team on how to handle existing relatedImage values in the base CSV to decide how to move forward.
Please add a note to the doc with info about prepending env with RELATED_IMAGE_ I don't think that is clear in the doc as is.
For our team, maybe but if you look at original issues linked in first comment, it seems like more people expected the original way to work.. they have hidden expectation that newer operator-sdk will fix bugs AND not break their current generated CSV. Worst case is there is at least a flag you can enable for "legacy relatedImage" to work which may skip current overriding logic.
Resolving our team's relatedImages would definitely require more than dev team code changes since our release team have scripts which expects certain relatedImage names today. Per https://github.com/openshift/oadp-operator/pull/694 there are sideeffects from migrating to this new "supported" relatedImage specification which include changing relatedImage name to - instead of _
What I am doing today (if this PR weren't merged) is to manually revert the removed relatedImages in relatedCSV so that I don't commit those into git. Not elegant.. not terrible, but we would rather wait for the final solution than migrate to this "supported" way if it's not too long a wait. It only make my dev process with make bundle today more complex but I manually revert the undesired changes as to not affect my teammates. I guess I can also run my own custom operator-sdk build now since I made this PR.
@kaovilai Okay so it looks like the related image dicovery feature introduced a breaking change that I didn't notice (and therefore document correctly). Short term will likely be a documentation update and recommending that users define their related images in manager.yaml as the latest docs say. I will investigate the changes required to introduce backwards compatibility with directly defining relatedImages in the base CSV. Ideally I'd like to "unbreak" this change, but losing the container/deployment info is going to make the dedup logic a little trickier.
Since this PR cannot move forward with the existing de-dup logic, I am holding this for now. If you'd still like to move this PR forward then I recommend joining our SDK community meeting Wednesday at 2PM EST to discuss it in more detail. Join the Google Group to get the invites on your google calendar if you don't have them already. If you can't make that meeting and still want to move this PR forward then contact me privately via email or Slack and I can get something together with us and anyone else with context to figure out a solution for the de-dup logic.
If you're fine updating to the official manager.yaml method and don't want to personally drive this fix then I recommend closing the PR.
/hold
I'll try to make the meeting on Wednesday. Sounds like it maybe a good learning experience for me :)
IIUC you're referring to
- Backlog Grooming Meeting: Wednesdays (every 3 weeks) at 11:00 Western Time. Convert to your timezone. * [Meeting Link][meeting-link]: Password: 77777 * Agenda * Meeting recordings.
https://github.com/operator-framework/community/blob/06ba4337c79560448242eca36758b6db3f75927b/README.md?plain=1#L67-L70
@kaovilai Yup! Let's also move all discussion to the issue at #5763. I think what I propose there should work. If the team like the approach and you want to take a crack at implementing it then you can update this PR, and I'll unhold it.
In order to ensure total backwards compatiblity, the base CSV related images need to be preserved entirely so they should take precedence over discovered related images when resolving name conflicts.
Summarized dedupe precedence that will be implemented. https://github.com/operator-framework/operator-sdk/issues/5763#issuecomment-1128358142
Additionally add warnings.
issue a warning about the statically defined related images. If we want to be more vague about why you shouldn't do that, we could just say that specifying static related images is deprecated, that you should start using RELATED_IMAGE_ envs, and we'll remove persistence of static related images is OSDK v2.
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle stale
just fyi that our workload are no longer affected by this issue. Happy to let others takeover.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle rotten /remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.
/close
@openshift-bot: Closed this PR.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting
/reopen. Mark the issue as fresh by commenting/remove-lifecycle rotten. Exclude this issue from closing again by commenting/lifecycle frozen./close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.