software-engineering-quality-framework icon indicating copy to clipboard operation
software-engineering-quality-framework copied to clipboard

On-call rotas etc.

Open arctangent opened this issue 5 years ago • 3 comments

Regarding https://github.com/NHSDigital/software-engineering-quality-framework/blob/master/practices/service-reliability.md

Do we need a sub-bullet under "Understand reliability requirements" around what level of out-of-hours support should be made available? (Software devs are not routinely on call OOH in all parts of Product Development)

Example:

  • "Gold-level" services must have at least two software engineers and one infrastructure engineer available 24/7/365, all of whom must have experience in operating the system. Additional support/escalation must be available if service cannot be restored after 1 hour. ...
  • "Bronze-level" services are supported through normal office hours on a "reasonable efforts" basis. Outside these hours, no guarantees of availability are made.

arctangent avatar Sep 16 '20 12:09 arctangent

@arctangent Interesting point. But I would consider for example "services must have at least two software engineers and one infrastructure engineer available" as being product/team-specific, depending on the needs of the business. Usually, that conversation follows after the "Agree an incident severity classification and the response" activity.

stefaniuk avatar Sep 16 '20 12:09 stefaniuk

This is an area we're looking to expand, @arctangent, but I'm not sure we'd want to include quite that level of detail in here. Anyway, a could point. Let's leave this one open until we do something to this section.

ivorc avatar Sep 16 '20 14:09 ivorc

I can't see a meaningful change since 2020 that touches on this, although it might have happened elsewhere.

What's the current thinking here? Is there a problem today in the teams that clarifying this point would address? Or can we close this issue?

regularfry avatar Oct 12 '23 09:10 regularfry