Experiment with "stat distance"
This is an idea I had in response to the idea of "stat clusters". For comparing armor against a specific piece, we could use euclidean distance (vector distance) to judge the "similarity" of stat rolls within the 6-dimensional stat space. This could then be used in Compare or Triage to show "similar stat armor" given some cutoff. It works pretty well at lower distances, though it should be noted that it can't say anything about "better" or "worse".
Of course, this would be a challenge to communicate / visualize.
Unrefined thoughts:
- I get why euclidean distance is the better choice for "similarity", but perhaps Manhattan distance / L^1 would be simpler and easier to explain?
- I feel like this is a good way to find similar armor in response to a new armor drop (especially when included in Triage), but doesn't really address the "clustering" part for looking at your whole vault since the hard part is finding a centroid / comparison piece.
- Maybe some integration with custom stats? Would be difficult in Compare but maybe simpler in Triage.
Not really related to "stat distance" in particular but still
- I think what people really want is an answer to "which armor pieces should I get rid of while still having access to the best builds?", and outside of
is:statlower, this is just not something that DIM can answer without looking at all 5 slots and the loadouts the user cares about. Two pieces can be very similar but that doesn't mean you can substitute one with the other, and at the end of the day you do want to keep a bunch of armor, perhaps even similar armor, around so that the law of large numbers can do its thing and there are some combinations that end up producing nice round stat numbers.
If I may, though it may sound odd coming from a biostatistician: from my understanding, euclidian distance won't meet your expectations here, as useful as it is in regression matters, because a vector with a rather similar profile (=armor stats set, say -2 on each stat) might have the same distance to your control vector (the item you want to be compared to) as a very different one (say with random ±2). Example: control: {10,10,10} vector1: {8,8,8} vector2: {12,8,12}
Since you're into similarity and geometry, I'd rather see an "orientation" of the vectors, thus making scalar/dot product relevant, or rather cos(control, vector). Have it close to ±1, your armor items are similar; close to 0, their strong spots are opposite. If numbers were real instead of integers, it could even be collinear with 2 different totals; but those are integers, so in practice you'll never reach 1 (except if all the stats are the same, of course). However this does not address "in which stats are they similar", this would need another parameter.
I understand how one could intuitively be seduced by correlation approaches regarding similarity here (see Principal Component Analysis, for example), the problem being that the stats are ABSOLUTELY uncorrelated to one another. If you plot a PCA, the axis will be defined by the specificity of your vault; it won't have any intelligible sense.
Then again, I'm a biologist before a mathematician.
The dot product argument makes sense if you assume that you'll be comparing armor pieces with very different stat totals, but I feel like this feature is meant for people who have a lot of armor that's good on paper but can't decide what to get rid of, after dismantling all the low-stat armor already (<60 is considered VERY low the community). In @bhollis' screenshot all the relevant armor is in the [61-64] bracket, and I have a vault policy of only keeping legendaries in the [65, 68] bracket, so a dot product and a distance metric would probably still serve a very similar purpose. And at that point the better computation might be the one that's easier to explain.
I agree that PCA is not useful here, but not because armor stats are uncorrelated. In fact, some armor stats are heavily correlated, but we don't need a PCA to rediscover that for a base 68 legendary piece, MOB+RES+REC = DIS+INT+STR = 34.
Cosine similarity does both hit the intention of the feature better, and is easier to display as a "similarity percentage" (even though it's not really a percentage). With a carefully chosen threshold we could present a "Similar Stat Profile" button in compare/triage. Euclidean distance can be mixed in a bit to get some more of an order.