interop Change dashboard to support showing scores as of arbitrary wpt.fyi run (or latest run) even if not all browsers have results for that run

Right now, https://wpt.fyi/interop-2023 seems to show scores as of the latest run (from https://wpt.fyi/runs ) that has results for all browsers, I think. This often means that it's showing you results that are a day or so out of date, since these results-for-all-browsers runs are relatively infrequent.

It would be really useful to be able to be able to request that the dashboard show me results for the most recent run (even if it's missing data for some browser(s)), or for a particular run based on the SHA value shown on https://wpt.fyi/runs (included as a parameter in the URL, for example).

wpt.fyi already supports this for its regular presentation of test results, e.g. here's a run that I have in my history (which happened to only have results for Chrome and Firefox): https://wpt.fyi/results/css/motion/animation/offset-interpolation.html?sha=ad172b15c6&label=master

So it would be great to support this on the interop-2023 dashboard as well.

Jun 28 '23 20:06 dholbert

wpt.fyi already supports this for its regular presentation of test results, e.g. here's a run that I have in my history (which happened to only have results for Chrome and Firefox): https://wpt.fyi/results/css/motion/animation/offset-interpolation.html?sha=ad172b15c6&label=master

So for example, it'd be nice if I could use a link with those^ same ?sha=ad172b15c6&label=master URL parameters, like https://wpt.fyi/interop-2023?sha=ad172b15c6&label=master , to show the top-line score breakdown, as of this particular historical wpt.fyi run (with per-focus-area scores and the top level scores at the left, possibly with missing values for whatever browsers didn't complete a run with that github revision).

Jun 28 '23 20:06 dholbert

@DanielRyanSmith do you think this would be straightforward technically?

Sep 15 '23 12:09 foolip

This would likely not as straightforward to implement technically with the way we're aggregating the results currently. The results for the dashboard are aggregated in a separate script and made available as CSV files every few hours via a GitHub Action. This means that making arbitrary runs visible on the dashboard is not as simple as specifying which runs to display, and those runs would need to be aggregated through the same process as the aligned runs that already make up the data visible on the dashboard.

@dholbert An option you can try is selecting two recent runs from https://wpt.fyi/runs?label=master&label=experimental (master and experimenal label) and selecting "VIEW DIFF" at the bottom left of the screen. While this will not show you the overall score change, this should be able to show you which test results have changed. You can narrow your view further by specifying interop labels in the search bar (like label:interop-2023-has). This should can give you views that look like this for example, which are two Firefox run at different times, narrowed down to showing the deltas for the interop-2023-has label. Maybe something like this can help?

Sep 15 '23 19:09 DanielRyanSmith

@DanielRyanSmith Thanks, your this for example link does indeed seem like a handy way to visualize/understand changes here at least.

I need a bit more guidance at how to arrive at that sort of link on my own, though. You mentioned:

selecting two recent runs from https://wpt.fyi/runs?label=master&label=experimental (master and experimenal label) and selecting "VIEW DIFF" at the bottom left of the screen.

I'm not sure how to get to a spot where it shows me this "VIEW DIFF" button/link. I don't see any sort of "VIEW DIFF" link/button on that^ /runs URL. And regarding "selecting two recent runs", which I imagine might be the prerequisite for that UI showing up -- I'm not sure if I'm seeing the correct way to select multiple runs. If I click a run's SHA hash or the "bubble" icon to its left, it just takes me to a per-run page like https://wpt.fyi/results/?sha=9da63dba01&label=master&label=experimental&max-count=1 (which also doesn't show "VIEW DIFF").

I did notice that https://wpt.fyi/runs?label=master&label=experimental&max-count=100 has an "Edit" button at the top-right, and that has a "Show diff" checkbox, but it's grayed out and I'm not sure how to make it clickable.

So I think I'm missing some step about how to get to the "view diff" UI -- or perhaps some UI is missing for me (though I am signed in to https://wpt.fyi/ with my github account, if that matters).

Thanks in advance!

Sep 19 '23 20:09 dholbert

@dholbert I'll go through a step-by-step process here, because I agree that it is confusing and difficult to get to this view if you don't already know the process. 😅

At the top of the /runs page (Recent Runs), there is the "Edit" button at the top to refine your search for specific browsers.
Select the specific product(s) you're interested in viewing. Note that, for interop results, we use experimental/stable aligned runs from the master branch, so be sure to:

Select the "stable" or "experimental" channel.
Check the box for "Aligned runs only".
Check the box for "Only master branch".

After clicking "SUBMIT", this should open a list of runs that can be used for interop scoring.
From here, you select the actual icons of the test runs you want to view (this is where I think the unintuitive nature was especially problematic). Once you have selected exactly two runs, the "VIEW DIFF" prompt will be available in the bottom left.

This should take you to the diff view. Note that the comparisons are done based on the order you selected the runs, so it might be useful to first select the icon of the older run that you want to compare, and then the newer run.

Also possibly useful: if you have the run IDs of the test results you would like to view, you can replace them more manually in run_id params of the query string:

https://wpt.fyi/results/?diff&filter=ADC&run_id={{RUN}}&run_id={{RUN}}

Thanks for the UI insight here. This is something we could improve upon moving forward. 🙂

Sep 20 '23 17:09 DanielRyanSmith

@jgraham does your Rust scoring code address this?

Feb 15 '24 05:02 foolip