antimony icon indicating copy to clipboard operation
antimony copied to clipboard

WIP: Type checker

Open YerinAlexey opened this issue 2 years ago • 3 comments

Fixes #50

This took around two evenings of work, but there it is.

Currently not integrated in build system and generators because everything is going to break horribly in that case.

A quick design overview:

There are 3 main components to this:

  • Checked AST
  • Type table
  • Scope table

Checked AST

The checked AST is quite similar to the regular parsed AST with a few additions:

  • All expressions have a result type attached to them
  • Types in all structures are mandatory and inferred if needed. For functions that don't have a return type, void is used
  • Variables and types are referenced by IDs in their respective tables

Type table

The type table is actually a tree structure that stores and deduplicates types. Each type is given a unique Id value which can then be used to reference it anywhere in the type checker. The Type structure stores some metadata about the type like size and alignment (necessary for QBE/LLVM codegen) and a Repr (standing for representation) which is what "type" usually refers to.

Basic types are:

  • any (any type can be converted to it, currently has no defined size; TODO)
  • int
  • string
  • bool
  • void

The base types are inserted into the type table during its construction and their IDs are available as fields of the types::Table struct.

Those can be combined into composite types:

  • T[n] - fixed arrays
  • T[] - dynamic arrays
  • struct{...} - structures

There's also a special named type, which just relays information about its inner type but can be referred to by name. This is currently only used for structs, but can allow for more generic type aliases like C's typedef or Rust's type a = b;.

Though, I'm currently not very happy with the design of named types in particular right now

Scope table

Scope table is a tree-like structure, implemented similarly to type table, that tracks scopes and their relations, as well as variables in those scopes. A scope is established by either a function declaration, a loop or a block statement. Scopes also store a loop flag which makes tracking loops for break/continue much easier.

A scope stores mappings of variables to their type IDs, it could possibly also store additional information for codegen.

TODO

  • [x] Function prototype syntax: fn declared_elsewhere(x: int): string;
  • [ ] Write tests
  • [ ] Check for duplicate variable and top-level declarations
  • [ ] Type hints for array types. Currently an empty array literal doesn't work because the type checker doesn't tell apart let x: int[] = [] and let x: string[] = [] when checking the [] expression
  • [ ] Add a system for signalling implicit conversions (such as passing a value to a parameter with any type) to code generators
  • [x] Implement a better system for error reporting. Currently any stages after parsing have no access to position information, would be useful to add that to the syntax tree as well as returning something more meaningful than String for errors (hints should also fit here)
    • [ ] Fix up checker error messages and add more context (such as affected types)
  • [ ] Make naming consistent across parsed and checked AST
  • [ ] Refactor code generation to use the checked AST
  • [ ] Add documentation for type checker internals

YerinAlexey avatar Mar 22 '24 16:03 YerinAlexey

Oh wow, nice work!

This check phase is quite complex, but I assume there's no way around a second AST structure if we want proper typing.

(Please don't forget to add tests and fix up the dev docs afterwards, but I'm sure you're aware of that)

garritfra avatar Mar 22 '24 17:03 garritfra

This check phase is quite complex, but I assume there's no way around a second AST structure if we want proper typing.

It's not actually that complex, there's just a lot of boilerplate code (especially function parameters that get split onto multiple lines). The logic is really simple if you get to it

(Please don't forget to add tests and fix up the dev docs afterwards, but I'm sure you're aware of that)

I'll probably add tests after refactoring error handling so failure tests can explicitly specify which error should be produced

YerinAlexey avatar Mar 22 '24 17:03 YerinAlexey

The type checker finally supports all language features! I also went through a few rounds of refactoring and I think the new internal API turned out to be a bit better. There are some improvements for parser/AST here, which are independent of the type checker. Most notable change is disallowing uninitialized bindings without an explicit type (let x).

I also relaxed type restrictions for +, it can also concatenate strings now. I'm not sure what the exact semantics should be, but I ended up allowing string + (int|string|bool). The code also turned out to be a bit convoluted, I feel like it's better to move concatenation into a separate operator.

A few last things to be solved:

  • There needs to be a way of declaring a prototype for a function so that the type checker knows types of its arguments and return type
  • Handle duplicate objects in a better way. Currently I expect a lot of things to break if something is declared twice
  • Figure out the semantics of any for non-JS backends

My next goal is going to be adding tests for all this, refactoring error handling (it's inconsistent and lacks useful context), and trying to integrate it with the build system and generators

YerinAlexey avatar Mar 23 '24 21:03 YerinAlexey