[STAL-2019] ci: add action to test for regressions

Open amaanq opened this issue 1 year ago • 0 comments

What problem are you trying to solve?

Currently, if someone upgrades some piece of code that could potentially change the output of the violation results, we would not know immediately until it has impacted someone and they report it to us. This is a problem because we cannot be expected to constantly test grammar upgrades, tree-sitter library upgrades, or anything else that could change the output.

What is your solution?

I've added a CI workflow that will test for possible regressions, they might not always be regressions, but just having an automated notice that something changed is really valuable and allows us to quickly figure out if it's an improvement or regression, without needing to manually test and parse the output ourselves.

Alternatives considered

Continue manually testing and parsing the output ourselves to check for regressions, which is very unideal.

What the reviewer should know

This PR tests on a large-enough corpus for each of the bigger languages we support, and compares the diff using JavaScript, because writing annotations and summaries is very convenient to do so on github with their API. Additionally, the affected files are uploaded to the workflow summary, so that anyone who wants to quickly grab and download the files to test on it themselves can easily do so, without having to clone a massive repo. This should really increase developer velocity when it comes to upgrading tree-sitter and/or tree-sitter grammars.

In this PR, I've updated the Go grammar to demonstrate what the output looks like, updating it has a slight change where we have a false positive no longer showing up, and a false negative now showing up, due to a parsing error in the old version. It is only a +4/-5 diff, with them all being the same kind of violation, so when considering unique changes, it's a +1/-1 diff, which is the most ideal case to test for this because it is the most minimal case where we can see both a result not present in the update, and a result not present in HEAD/main.

Also - a failing CI for this workflow does not necessarily indicate outright failure, it is more of a warning to the author/reviewer that there is in fact a change in output when we run the analyzer before and after this PR, but it might not always necessitate action to be taken.

Some improvements to consider

Currently, I build the analyzer using cargo run --release, even though we could probably reuse this from another workflow. The problem is I do this on the PR's sha, and on main's sha, so if we could somehow reuse both from somewhere else that'd be great, but is not necessary since the runtime isn't super high on the CI anyways (~10m max in my experience)

May 15 '24 21:05 amaanq