tinygo Fault status decoding

These changes add functions to decode the fault status register(s) and update the hard fault handler with more detailed error messages. They were motivated by a bus error that was reported as a stack overflow, as well as by this TODO found in the hard fault handler:

https://github.com/tinygo-org/tinygo/blob/1d913a62bc0ceef841953058c2c6fb1dd2a40f6a/src/runtime/runtime_cortexm.go#L106-L109

Each new error string will add to binary sizes, but the fault decoding itself adds almost nothing. GetFaultStatus() compiles to a register load and each call such as fault.Mem().InstructionAccessViolation() compiles to a single instruction (usually a logical shift left lsls).

I discovered the incorrectly reported bus error when I tried to write to a register on a disabled peripheral. The Kinetis K66 uses clock gating to reduce power draw from unused peripherals. If a register on a peripheral is read or written before the peripheral's clock is enabled via the appropriate System Clock Gating Control register (SCGCx), the CPU raises a bus error and calls the hard fault handler. At least for the K66, this results in an "imprecise data bus error", which means that the BusFault Address Register and the stack pointer are invalid. This invalid stack pointer is interpreted by the existing handler as a stack overflow.

At a minimum, TinyGo should check the fault status to determine whether or not the stack pointer can be considered valid. This should prevent the hard fault handler from incorrectly reporting a bus error as a stack overflow. Writing to a register of a disabled peripheral is definitely something an end-user of TinyGo might do, if they are not a microcontroller expert and trying to use a peripheral directly.

Feb 25 '20 23:02 firelizzard18

Quick testing shows that this increases code size by almost 1kB, which is huge. It also tries to access the fault handler on Cortex-M0, which is a problem. I think the code should stay more or less the same for Cortex-M0 as it doesn't offer many ways to discover what went wrong anyway.

What I would suggest is the following:

Provide a way (in the machine package?) to detect the CPU type that is then provided as a constant string (for example machine.CPU). The information can be obtained from SVD files (mostly, AFAIK stm32 doesn't have it) but it may be easier to put it directly in the relevant CPU family files (like src/machine/machine_atsamd51.go). Comparisons between two constant strings are trivially optimized away so can be used for CPU detection. (Sidenote: providing the CPU type in the machine package would also be useful for the ws2812 driver).
Use this information in the fault handler to have roughly two code paths: one for the Cortex-M0 which simply checks whether the stack pointer looks reasonable, and one for Cortex-M3 and up which has a proper check for a data bus error.
Instead of printing each fault as a string, print the raw fault bits. For example: print(" CFSR=", uintptr(arm.SCB.CFSR.Get())). A quick google for "arm cfsr" shows the official manual from where it is not hard to figure out the fault if you know how bits are stored in a hexadecimal number.

This is far less informative than printing the fault as a human-readable string, but it does provide the tools to debug this further without costing so much in terms of code size. Remember that this is included in every build and while 1kB might not sound like much, many of these features together add up quickly.

Sidenote: the tests fail because src/device/arm/fault.go is not formatted correctly (run make fmt to fix).

Feb 26 '20 19:02 aykevl

What about a normally false constant flag to print out full messages? In the usual case, the user will get a minimal error but binary size won’t increase much, but the flag will facilitate easier debugging.

My IDE (VSCode + vscode-golang + gopls) really doesn’t like TinyGo, so the automatic go fmt does not work consistently. I’ll probably add a local pre-commit hook to run make fmt.

Feb 26 '20 19:02 firelizzard18

I expected less size differences with verboseFaultMessage = false. I'll work on that tomorrow.

main-old0.hex 25992 bytes # baseline for feather-m0
main-m0.hex   26080 bytes # feather-m0 with new fault handler

main-old.hex  21768 bytes # baseline for feather-m4
main-new.hex  22480 bytes # feather-m4 with new fault handler
main-full.hex 24680 bytes # feather-m4 with new fault handler in verbose mode

Feb 27 '20 06:02 firelizzard18

As of the latest commit, there's no change for M0, and for M4:

Default: +184 bytes
With showFaultAddresses: +360 bytes
With verboseFaultMessage: +2472 bytes
With both flags: +2648 bytes

I discovered a couple of interesting things: if-statements comparing constant strings will optimize away, but switch statements will not; and given the three cases below, cases 1 and 2 reduce to effectively the same thing, but case 3 adds an extra 44 bytes and the generated code complexity increases noticeably. Switching which if-statement uses Address() makes no difference.

Case 1:

if fault.Mem().ValidAddress() {
	print(" mmar=", uintptr(arm.SCB.MMAR.Get()))
}
if fault.Bus().ValidAddress() {
	print(" bfar=", uintptr(arm.SCB.BFAR.Get()))
}

Case 2:

if addr, ok := fault.Mem().Address(); ok {
	print(" mmar=", addr)
}
if fault.Bus().ValidAddress() {
	print(" bfar=", uintptr(arm.SCB.BFAR.Get()))
}

Case 3:

if addr, ok := fault.Mem().Address(); ok {
	print(" mmar=", addr)
}
if addr, ok := fault.Bus().Address(); ok {
	print(" bfar=", addr)
}

Feb 27 '20 19:02 firelizzard18

I discovered a couple of interesting things: if-statements comparing constant strings will optimize away, but switch statements will not

Huh that's interesting, I guess this is simply missing in the golang.org/x/tools/go/ssa package. However, it should not be very difficult to implement an optimization in TinyGo for this purpose. Note that once the code is in SSA form, there is no distinction between if and switch, everything is done with goto as that makes optimizations easier to implement.

given the three cases below, cases 1 and 2 reduce to effectively the same thing, but case 3 adds an extra 44 bytes and the generated code complexity increases noticeably. Switching which if-statement uses Address() makes no difference.

This is likely related to some optimization triggering or not (I'd guess inlining).

Feb 27 '20 19:02 aykevl

Well, clearly I misunderstood the CPU field. That being said, I successfully implemented compiler-defined constants, though it's not pretty. I'd much rather manipulate the SSA or IR than inject a file, but all my attempts at the former failed miserably. I both attempted to insert a new constant and modify an existing constant, but everything I tried either failed or did nothing.

Feb 29 '20 06:02 firelizzard18