codeql icon indicating copy to clipboard operation
codeql copied to clipboard

Go: Support Go 1.23

Open mbg opened this issue 1 year ago • 11 comments

This PR makes minimal changes for us to tolerate Go 1.23. Go 1.23 makes changes to how aliases are represented by the compiler. This PR modifies the extractor to extract them as a new kind of type.

This is just a first stab at extracting the alias types and does not include parameters or arguments, yet.

Things to check:

  • Is it sensible for us to extract the alias types or just resolve them? I think it makes sense to extract them to mirror how the compiler represents the type AST.
  • In light of that, do we need to make any adjustments (beyond implementing getUnderlyingType) to account for this so that queries which look for certain types don't get fooled by aliases?
  • Should we extract the type object for aliases?
  • Should we extract the names of aliases using the existing table for named types or a separate table?
  • Some existing tests fail with the current changes where interface{} types change to any. I am not entirely sure why this is.

mbg avatar Jul 24 '24 10:07 mbg

  1. Is it sensible for us to extract the alias types or just resolve them? I think it makes sense to extract them to mirror how the compiler represents the type AST.

I agree that it makes sense to extract them. I believe this compiler change is paving the way for aliases of generic types where some of the type parameters have been specified. We may find it hard to support that if we go too far from how the compiler represents things.

  1. In light of that, do we need to make any adjustments (beyond implementing getUnderlyingType) to account for this so that queries which look for certain types don't get fooled by aliases?

If we're lucky then we've already put getUnderlyingType in all the places that need it. We may well find some more things we need to do.

  1. Should we extract the type object for aliases?

I think that we should, for the same reason as 1. (Assuming the compiler creates objects for them.)

owen-mc avatar Aug 20 '24 11:08 owen-mc

Taken a look over the code; no additional review points from me. Agree with Owen's take on Qs 1-3 above. To Q4:

Should we extract the names of aliases using the existing table for named types or a separate table?

I'd say use the existing table. It's redundant w.r.t. getEntity().getName() anyway, but this is consistent with other named types.

smowton avatar Aug 20 '24 11:08 smowton

  1. In light of that, do we need to make any adjustments (beyond implementing getUnderlyingType) to account for this so that queries which look for certain types don't get fooled by aliases?

I've added a predicate type unalias(Type t) which looks through any number of aliases till it gets to a non-alias type. (This is copied from the Go library.) Currently this is only needed for one experimental query. We may find we need to sprinkle it in a few more places, probably where we're trying to say is "is t this particular type" and we actually want to say "is t this particular type, or an alias of it, or an alias of an alias of it, ...". I'll run DCA and then QA to try to identify anywhere this happens.

  1. Some existing tests fail with the current changes where interface{} types change to any. I am not entirely sure why this is.

Having looked into it, I think this is what I'd expect. any is an alias for interface{}. When any is used as the type for anything, and we try to get information about it, it used to give the information about the thing it was an alias for, but now it gives information about itself.

owen-mc avatar Aug 21 '24 11:08 owen-mc

The DCA results are fine. In one project the number of extractor errors changed, and we lost 5 FPs in unimportant queries.

owen-mc avatar Aug 22 '24 13:08 owen-mc

Do you know why we saw result changes? Were they the same project as that with extraction errors?

smowton avatar Aug 22 '24 13:08 smowton

Yes, same project. I didn't dig into it deeply. My guess is that we were able to extract a little bit more. All the FPs involved mistakenly thinking that two very similar expressions were actually the same. (None of them involved alias types.)

owen-mc avatar Aug 22 '24 13:08 owen-mc

OK, that's reassuring. QA to confirm we're in a good state, then ready to merge?

smowton avatar Aug 22 '24 14:08 smowton

Evaluation 1 had two key problems:

  1. It was contaminated by known problems with the 2.18.3 release (MaD conversion introducing unintentional changes)
  2. At least the Type.implements(Type) predicate was broken due to failing to account for identical types being sufficient for interface implementation -- so in particular, F(IntAlias) can implement F(int), and similarly F([]IntAlias) vs F([]int) and so on. Branch https://github.com/smowton/codeql/tree/smowton/admin/go1.23-wip has a crude first stab at implementing the needed deep-unalias routine (including getting the extractor to ensure the deep-unaliased type actually exists in the DB).

https://github.com/github/codeql-qa-ops/issues/190 is a new experiment to check if the combination of a tighter diff (i.e., not including the MaD changes) and fixing Type.implements is sufficient to fix alert differences, or if there are more missing unalias calls out there.

Note the working branch linked above also includes a couple of other small changes needed to make this work, in particular extracting struct-field tags, which were previously represented in trap labels but not in the database, making some types impossible to distinguish.

Things remaining to do:

  1. The implementation is noticeably a bit slow -- the deep-unalias operation could probably be better implemented, particularly the forall statements expressing deep-unaliasing w.r.t. function, struct and tuple types.
  2. Against at least repo https://github.com/cosmos/cosmos-sdk I noticed a concerning sign, that there existed aliases with more than one RHS, and consequently a small number of types with multiple deep-unaliases.
  3. We still need to do something about the new packages.Load crashes, ideally file an upstream bug.

smowton avatar Aug 26 '24 23:08 smowton

Evaluation 2 reveals one interesting finding: 30 or so of our test repositories are using https://github.com/crypto-com/cosmos-sdk-codeql. Unless that pack is recompiled and the users upgrade, that means our database downgrade script will be used in anger to execute their queries, since the newer CodeQL will build a database with type-aliases in it, then the CLI will downgrade it so we can execute their queries that expect the old dbscheme.

I note the current downgrade script isn't very faithful -- rather than delete rows that refer to an alias type, we should instead replace the alias with its RHS, thus making it look as if aliases are transparent.

There are also a small number of alert changes, which I'll look into now.

smowton avatar Aug 27 '24 13:08 smowton

Checklist of things to take care of before this can be merged:

  • [x] The deepUnalias implementation is currently quite time-consuming. A recommended optimisation is to replace forall in the recursive method with recursion over component indices. Fixed by implementing a fast path that unpacks the first few elements of signatures and struct types. Extraction performance was also fixed by checking if alias types are involved before either requesting or extracting a transparent-alias version of a type.
  • [x] Update the upgrade and downgrade scripts to (a) account for the dbscheme gaining the component_tags table and (b) on a downgrade, replace alias types with unalias(ty) instead of just dropping them as is done currently.
  • [x] Investigate, report upstream and perhaps manage to work around the packages.Load panic seen on ~10 repositories in both QA experiments. Answer: this is https://github.com/golang/go/issues/68877 which will be fixed in go1.23.1. We will wait for its expected release on Tuesday to start the release. (We can merge this PR before then)
  • [x] Investigate why on cosmos-sdk the unalias and AliasType.getRhs predicates are not functional (a small (~5) number of types have more than one alias RHS) @mbg established this is down to multiple versions of the same package being used in the context of a single extraction, a more general problem that isn't currently addressed for Go. The same issue also causes database inconsistencies in Java, and perhaps in other languages.
  • [x] Investigate outstanding alert differences seen in the round 2 QA run, either fixing or change-noting as appropriate. Complete, see comment below.
  • [ ] Add consistency queries checking that unaliasing and deep-unaliasing are single-valued and total
  • [x] Verify that all actions in extractType are idempotent now that they can be hit twice, once in the explicit and once in the transparent aliases path. They weren't; omitted them from the extract-type-with-transparent-aliases path.

smowton avatar Aug 27 '24 18:08 smowton

Residual changes:

  • bhirbec/bitbot, twitchyliquid64/bob-the-builder: positive changes (new TPs) due to apparent changes in how part-broken projects are represented (function targets are successfully extracted that weren't before). Needs a change note.
  • At least some string literals seem to have started getting their Types[...].Value populated, leading to population of the constvalues table and therefore a result from getExactValue. This has fixed FPs from various redundancy / identical-expression quality queries, which were conflating all string literals because they relied on getExactValue to distinguish them. I suggest we keep the new behaviour (but change-note it).
  • Found several bugs with transparent-alias type extraction now fixed at https://github.com/smowton/codeql/commits/smowton/admin/go1.23-wip/

Started https://github.com/github/codeql-dca-main/issues/23315 to determine which projects still have alert differences.

DCA results to investigate:

  • [x] MainfluxLabs/mainflux: loss of SQL injection. Cause: MongoDB models needed an unalias
  • [x] casdoor/casdoor: loss of request-forgery. Cause: pointer-content read and store steps need to match types, which requires us to do a deep-unalias
  • [x] go-gitea/gitea: loss of uncontrolled-allocation-size Same cause as casdoor
  • [x] jackluo2012/datacenter: loss of incorrect-integer-conversion Same cause as casdoor
  • [x] google/gapid: gain of useless-assignment Probable side-effect of compiler changes; database is highly broken
  • [x] spotlightpa/almanack: gain of log-injection Correct new results due to project requiring Go 1.23 ==> incompletely extracted under 1.22

smowton avatar Aug 28 '24 13:08 smowton