Go: Support Go 1.23
This PR makes minimal changes for us to tolerate Go 1.23. Go 1.23 makes changes to how aliases are represented by the compiler. This PR modifies the extractor to extract them as a new kind of type.
This is just a first stab at extracting the alias types and does not include parameters or arguments, yet.
Things to check:
- Is it sensible for us to extract the alias types or just resolve them? I think it makes sense to extract them to mirror how the compiler represents the type AST.
- In light of that, do we need to make any adjustments (beyond implementing
getUnderlyingType) to account for this so that queries which look for certain types don't get fooled by aliases? - Should we extract the type object for aliases?
- Should we extract the names of aliases using the existing table for named types or a separate table?
- Some existing tests fail with the current changes where
interface{}types change toany. I am not entirely sure why this is.
- Is it sensible for us to extract the alias types or just resolve them? I think it makes sense to extract them to mirror how the compiler represents the type AST.
I agree that it makes sense to extract them. I believe this compiler change is paving the way for aliases of generic types where some of the type parameters have been specified. We may find it hard to support that if we go too far from how the compiler represents things.
- In light of that, do we need to make any adjustments (beyond implementing getUnderlyingType) to account for this so that queries which look for certain types don't get fooled by aliases?
If we're lucky then we've already put getUnderlyingType in all the places that need it. We may well find some more things we need to do.
- Should we extract the type object for aliases?
I think that we should, for the same reason as 1. (Assuming the compiler creates objects for them.)
Taken a look over the code; no additional review points from me. Agree with Owen's take on Qs 1-3 above. To Q4:
Should we extract the names of aliases using the existing table for named types or a separate table?
I'd say use the existing table. It's redundant w.r.t. getEntity().getName() anyway, but this is consistent with other named types.
- In light of that, do we need to make any adjustments (beyond implementing getUnderlyingType) to account for this so that queries which look for certain types don't get fooled by aliases?
I've added a predicate type unalias(Type t) which looks through any number of aliases till it gets to a non-alias type. (This is copied from the Go library.) Currently this is only needed for one experimental query. We may find we need to sprinkle it in a few more places, probably where we're trying to say is "is t this particular type" and we actually want to say "is t this particular type, or an alias of it, or an alias of an alias of it, ...". I'll run DCA and then QA to try to identify anywhere this happens.
- Some existing tests fail with the current changes where interface{} types change to any. I am not entirely sure why this is.
Having looked into it, I think this is what I'd expect. any is an alias for interface{}. When any is used as the type for anything, and we try to get information about it, it used to give the information about the thing it was an alias for, but now it gives information about itself.
The DCA results are fine. In one project the number of extractor errors changed, and we lost 5 FPs in unimportant queries.
Do you know why we saw result changes? Were they the same project as that with extraction errors?
Yes, same project. I didn't dig into it deeply. My guess is that we were able to extract a little bit more. All the FPs involved mistakenly thinking that two very similar expressions were actually the same. (None of them involved alias types.)
OK, that's reassuring. QA to confirm we're in a good state, then ready to merge?
Evaluation 1 had two key problems:
- It was contaminated by known problems with the 2.18.3 release (MaD conversion introducing unintentional changes)
- At least the
Type.implements(Type)predicate was broken due to failing to account for identical types being sufficient for interface implementation -- so in particular,F(IntAlias)can implementF(int), and similarlyF([]IntAlias)vsF([]int)and so on. Branch https://github.com/smowton/codeql/tree/smowton/admin/go1.23-wip has a crude first stab at implementing the needed deep-unalias routine (including getting the extractor to ensure the deep-unaliased type actually exists in the DB).
https://github.com/github/codeql-qa-ops/issues/190 is a new experiment to check if the combination of a tighter diff (i.e., not including the MaD changes) and fixing Type.implements is sufficient to fix alert differences, or if there are more missing unalias calls out there.
Note the working branch linked above also includes a couple of other small changes needed to make this work, in particular extracting struct-field tags, which were previously represented in trap labels but not in the database, making some types impossible to distinguish.
Things remaining to do:
- The implementation is noticeably a bit slow -- the deep-unalias operation could probably be better implemented, particularly the forall statements expressing deep-unaliasing w.r.t. function, struct and tuple types.
- Against at least repo https://github.com/cosmos/cosmos-sdk I noticed a concerning sign, that there existed aliases with more than one RHS, and consequently a small number of types with multiple deep-unaliases.
- We still need to do something about the new
packages.Loadcrashes, ideally file an upstream bug.
Evaluation 2 reveals one interesting finding: 30 or so of our test repositories are using https://github.com/crypto-com/cosmos-sdk-codeql. Unless that pack is recompiled and the users upgrade, that means our database downgrade script will be used in anger to execute their queries, since the newer CodeQL will build a database with type-aliases in it, then the CLI will downgrade it so we can execute their queries that expect the old dbscheme.
I note the current downgrade script isn't very faithful -- rather than delete rows that refer to an alias type, we should instead replace the alias with its RHS, thus making it look as if aliases are transparent.
There are also a small number of alert changes, which I'll look into now.
Checklist of things to take care of before this can be merged:
- [x] The
deepUnaliasimplementation is currently quite time-consuming. A recommended optimisation is to replaceforallin the recursive method with recursion over component indices. Fixed by implementing a fast path that unpacks the first few elements of signatures and struct types. Extraction performance was also fixed by checking if alias types are involved before either requesting or extracting a transparent-alias version of a type. - [x] Update the upgrade and downgrade scripts to (a) account for the dbscheme gaining the
component_tagstable and (b) on a downgrade, replace alias types withunalias(ty)instead of just dropping them as is done currently. - [x] Investigate, report upstream and perhaps manage to work around the
packages.Loadpanic seen on ~10 repositories in both QA experiments. Answer: this is https://github.com/golang/go/issues/68877 which will be fixed in go1.23.1. We will wait for its expected release on Tuesday to start the release. (We can merge this PR before then) - [x] Investigate why on
cosmos-sdktheunaliasandAliasType.getRhspredicates are not functional (a small (~5) number of types have more than one alias RHS) @mbg established this is down to multiple versions of the same package being used in the context of a single extraction, a more general problem that isn't currently addressed for Go. The same issue also causes database inconsistencies in Java, and perhaps in other languages. - [x] Investigate outstanding alert differences seen in the round 2 QA run, either fixing or change-noting as appropriate. Complete, see comment below.
- [ ] Add consistency queries checking that unaliasing and deep-unaliasing are single-valued and total
- [x] Verify that all actions in
extractTypeare idempotent now that they can be hit twice, once in the explicit and once in the transparent aliases path. They weren't; omitted them from the extract-type-with-transparent-aliases path.
Residual changes:
-
bhirbec/bitbot,twitchyliquid64/bob-the-builder: positive changes (new TPs) due to apparent changes in how part-broken projects are represented (function targets are successfully extracted that weren't before). Needs a change note. - At least some string literals seem to have started getting their
Types[...].Valuepopulated, leading to population of theconstvaluestable and therefore a result fromgetExactValue. This has fixed FPs from various redundancy / identical-expression quality queries, which were conflating all string literals because they relied ongetExactValueto distinguish them. I suggest we keep the new behaviour (but change-note it). - Found several bugs with transparent-alias type extraction now fixed at https://github.com/smowton/codeql/commits/smowton/admin/go1.23-wip/
Started https://github.com/github/codeql-dca-main/issues/23315 to determine which projects still have alert differences.
DCA results to investigate:
- [x] MainfluxLabs/mainflux: loss of SQL injection. Cause: MongoDB models needed an
unalias - [x] casdoor/casdoor: loss of request-forgery. Cause: pointer-content read and store steps need to match types, which requires us to do a deep-unalias
- [x] go-gitea/gitea: loss of uncontrolled-allocation-size Same cause as
casdoor - [x] jackluo2012/datacenter: loss of incorrect-integer-conversion Same cause as
casdoor - [x] google/gapid: gain of useless-assignment Probable side-effect of compiler changes; database is highly broken
- [x] spotlightpa/almanack: gain of log-injection Correct new results due to project requiring Go 1.23 ==> incompletely extracted under 1.22