Feature request: enhanced outputs for `create_rank()`
Problem
The current create_rank() approach currently risks creating lots of really small groups that are outliers for the metric, and because of their size, might not be meaningful for a stakeholder.
Solution
To get around this, we could create an option that ranks subgroups based on a new calculated "delta", where delta equals how different would the population average be without the subgroup included. That means that big subgroups with moderately outlying metric values would get prioritized in the ranking over tiny subgroups with extreme outlying metric values.
A weight of population size could allow a stakeholder or a change management executive to target change programs based on population size. Knowing that they've selected a group ranked 5th but has a larger population could be helpful.
Notes
Above issue is abridged from a discussion with Jessalyn Uchacz and Carlos Shrimpton.
This issue is linked with the feature request in #102.
Here an example of how this could work:
| Collab Hours | N | Vs Mean | Rank | Mean without | Delta | Rank | |
|---|---|---|---|---|---|---|---|
| Team 1 | 20.0 | 50 | 0.8x | 3 | 32.1 | - 7.1 | 3 |
| Team 2 | 30.0 | 30 | 1.2x | 2 | 22.3 | 2.7 | 1 |
| Team 3 | 45.0 | 5 | 1.8x | 1 | 23.8 | 1.3 | 2 |
| Total | 25.0 | 85 |
Method is really simple:
- Calculate how will the average look if you excluded that group.
This is: (TotalHours * N - GroupHours * n)/(N-n)
Total Hours: Average for the population Group Hours: Average for the group in scope N = Population Size n = group size
-
Then you can calculate the delta between the real (observed) average and the calculated one excluding that group.
-
Finally you can rank from highest to lowest value of delta.