operator cannot unconditionally set status in `__init_

A charm cannot unconditionally set status in __init__ Reason is, if the unit is being torn down, status_set will raise an error:

Interestingly enough, it looks like the last events the charm received did NOT include the teardown sequence:

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ timestamp ┃ tcp-requirer-mock/0 ┃ ┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ 10:50:15 │ ingress-per-unit-relation-changed │ │ 10:50:15 │ ingress-per-unit-relation-joined │ │ 10:50:10 │ ingress-per-unit-relation-created │ │ 10:49:59 │ update-status │ │ 10:48:41 │ ingress-per-unit-relation-departed │ │ 10:45:55 │ tcp-server-pebble-ready │ │ 10:45:53 │ ingress-per-unit-relation-changed │ │ 10:45:52 │ ingress-per-unit-relation-changed │ │ 10:45:52 │ ingress-per-unit-relation-joined │ │ 10:45:52 │ ingress-per-unit-relation-created │ └───────────┴────────────────────────────────────┘

I'm not sure if this is related.

Point remains: should we warn against setting status in init, or places where the unit is not guaranteed to exist? Should we raise a more informative error message? Should we simply silently fail if one is trying to set a status on a dying unit? After all, we don't really care. It's just clutter in the debug-log.

Jul 06 '22 09:07 PietroPasotti

We do many things in __init__ as part of charm library instantiation. Here's an example from prometheus. Many things can happen there inside those charm libraries. Not setting status inside charm libraries is only a convention.

Seems to me that silently ignoring or a log.warning are better than a traceback.

Jul 06 '22 13:07 sed-i

For this particular case, it might be worth just catching the exception and putting an entry in the debug log. Especially since charms don't have any way of "knowing" that the status-set will fail in advance.

Jul 13 '22 16:07 rwcarlsen

This has been a while, so we should probably try to reproduce this on Juju 3 and see what events we get.
Probably the least invasive change would be to have ops catch the ModelError in status-set, and if it includes "unit not found" re-raise a more specific exception subclass that helps the charmer debug: "Juju reported unit not found -- is the unit being torn down?" or similar.

May 02 '23 10:05 benhoyt

@PietroPasotti What's the actual concern here: noise in the debug log? The unit blipping into error state for a second during teardown? Or something else? To me the current behaviour seems okay, but just making sure I'm not missing something.

Oct 04 '23 03:10 benhoyt

I struggle to reproduce it, I need some help here @PietroPasotti

I tried to set the status in __init__ then remove a unit. Even tried to add a for loop to set it many times then remove units/kill pods/etc., but still failed to reproduce it...

Simple test charm used:

#!/usr/bin/env python3
import ops


class SampleK8SCharm(ops.CharmBase):
    def __init__(self, *args):
        super().__init__(*args)
        # here also tried in a for loop
        self.unit.status = ops.BlockedStatus("test")
        self.framework.observe(self.on['httpbin'].pebble_ready, self._on_httpbin_pebble_ready)
        # also tried setting status in a handler for `self.on.remove`

    def _on_httpbin_pebble_ready(self, event: ops.PebbleReadyEvent):
        container = event.workload
        container.add_layer("httpbin", self._pebble_layer, combine=True)
        container.replan()
        self.unit.status = ops.ActiveStatus()

    @property
    def _pebble_layer(self) -> ops.pebble.LayerDict:
        return {
            "summary": "httpbin layer",
            "description": "pebble config layer for httpbin",
            "services": {
                "httpbin": {
                    "override": "replace",
                    "summary": "httpbin",
                    "command": "gunicorn -b 0.0.0.0:80 httpbin:app -k gevent",
                    "startup": "enabled",
                    "environment": {
                        "GUNICORN_CMD_ARGS": f"--log-level {self.model.config['log-level']}"
                    },
                }
            },
        }


if __name__ == "__main__":
    ops.main(SampleK8SCharm)

Mar 19 '24 03:03 IronCore864

The charm that this error showed up in originally was https://github.com/canonical/traefik-k8s-operator/blob/main/tests/integration/testers/tcp/src/charm.py

I haven't tried it out since, possibly the issue is gone in latest juju.

As you can see that charm sets active in init. I recall it errored out during teardown, but I can't remember precisely at which stage.

Mar 19 '24 10:03 PietroPasotti

The charm that this error showed up in originally was https://github.com/canonical/traefik-k8s-operator/blob/main/tests/integration/testers/tcp/src/charm.py

I haven't tried it out since, possibly the issue is gone in latest juju.

As you can see that charm sets active in init. I recall it errored out during teardown, but I can't remember precisely at which stage.

Hi Pietro,

I tried with traefic-k8s chart (latest/stable revision 169) with juju (version 3.1.7-genericlinux-amd64 and 3.4.0-genericlinux-amd64) but could not reproduce.

I tried to search the error message, and it's from juju, not OF, and it seems it could happen theoretically: https://github.com/juju/juju/blob/3a43e3202323021dd3f9ee70ef3f28b7691cc153/state/status_unit_test.go#L133.

I tried to add unit and remove unit but it seems the state transfer looks fine, ready -> maintenance -> terminated.

Screenshot:

@benhoyt shall we close this issue for now?

Mar 20 '24 06:03 IronCore864

Yeah, let's close this for now -- presuming the issue has gone away in recent version of Juju. If we have a clear repro in future, happy to reopen.

Mar 20 '24 20:03 benhoyt

cannot unconditionally set status in `__init__`

cannot unconditionally set status in `init`