v80.c: v80 assembler in c89
started work on https://github.com/Kroc/v80/issues/4
@Kroc please feel free to scribble any feedback or suggestions all over this PR, it's far from ready to merge at the moment!
- [x] Next task is to rewrite the grammar comment to be line oriented to see if I can reduce the amount of lexical book-keeping compared to the token based grammar I've half implemented so far...
Pasting my question's and @Kroc's answers here for easy reference:
- I have a 32 byte static token buffer for everything right now (label names, const names, numbers etc) to help enforce the token length limit, but presumably we want to handle strings of arbitrary length? There are strings in v80 and they are 'arbitrary' in length, but line-length in v80 is hard capped at 127 cols to limit memory usage on 8-bit systems and the C implementation should enforce this too so that source code written on PC will assemble on Z80.
When v80 encounters a string, it simply writes the bytes to the code-segment one by one so the string is never stored anywhere whole -- with one exception: the file-name of an include .i statement is captured whole, but because CP/M doesn't have subfolders, the length of this is known to be limited. At the moment expressions are not allowed in include file-names, but this might be supported in the future.
- When parsing expressions following .b, and the results don't fit in one byte, do you mask off the low eight bits? mask and right shift (but then that's the same as .w)? write big-endian order bytes? bail out with an error? something else?
It's an error -- when v80 encounters .b it sets a 'parameter size' variable for how many bytes (1, in this case) that expressions must fit into. If an expr > $ff then it's an error. Note that with .w using a string is an error, you can't have an ASCII string expanded to words.
“errors.txt” contains all possible errors in v80 and an explanation of what causes them so it’s a good source of detail on parsing behaviour
- Are values (literal and/or resulting from expressions) limited to 16bits by the assembler? Or in principle could I configure an ISA for a 32bit machine?
yes v80 is limited to a 16-bit number internally for everything. Considering that v80 can only output bytes or words to the code-segment, 32-bit results don't actually have a practical use! Note that v80 allows underflow but errors on overflow! This is so that the negate operator can work because numbers like -7 is a negate unary operator followed by the positive number 7
- Seems like the parser should be line oriented? Or can, say, an incomplete expression continue on a new line?
For memory and parsing-simplification reasons, expressions are limited to one line; the entire parser is line-orientated to allow for parsing a file larger than memory allows. v80 is 335KB of code which obviously doesn't fit into 64 KB of RAM :P
But you have to understand that v80 is purposefully limited to fit into 8-bit hardware and that a C89 version shouldn't be assembling code that can't be assembled on real 8-bit hardware otherwise that defeats the point!
- Would you be interested in discussing using a context free grammar to simplify the implementation, so we don't have to track indentation levels for conditionals, whether tokens are the first on a new line or not for constants and labels etc?
v80 is not trying to be an ideal assembler; it's trying to be minimal so that it can support many systems. Things like context-free grammars, macros etc. are features for a better, more language-orientated assembler (hopefully written in v80) -- v80 exists to bootstrap 8-bit software on 8-bit machines instead of relying on PC-only toolchains. Ergo, it has no goal to be anything more than a brutally simple assembler that acts as the bedrock of a broader range of 8-bit software. If an 8-bit computer can't modify and assemble it's own software then it might as well be proprietary. An 8-bit computer that can only run software that has to be compiled on a PC is not a real computer and v80 aims to break that cycle by allowing code on a PC to also assemble on 8-bit hardware.
@Kroc 'nother question about local labels (possibly leading to reducing heap usage quite a bit):
-
do you have documented support for jumping to local labels from outside of the non-local to which they apply?
In my fantasyvm assembler I have gone back and forth on supporting that, but currently keep all the local labels in their own table without using the non-local prefix. The local labels table is reset every time a new non-local label is defined, and unresolved local label references throw an error at that point. The downside is that if you really do need to jump into a local label from outside the current non-local label's scope, you end up having to promote some of the locals to non-local and there can be a cascade of promotions around that area as a result. I'm thinking about adding persistent locals that are recorded in the non-local label table if I find it problematic later.
Local labels are simply appended to the last non-local label defined forming a complete label-name. "release/readme.txt" documents each feature, are you referring to that?
1.4 Local Labels:
--------------------------------------------------------------------------------
Local labels can be "reused", as they automatically append themselves to
the last defined, non-local, label:
| _local ; error: local label without label!
|
| :label1
| _first ; defines :label1_first
| _second jr _first ; defines :label1_second, jumps to :label1_first
|
| :label2
| _first ; defines :label2_first
| _second jr _first ; defines :label2_second, jumps to :label2_first
Note that the combined length of the local label name and its parent must not
exceed 31 characters, including label sigil:
| :2345678901234567890 ; 20 chars
| _234567890 ; 30 chars - OK
| _23456789012 ; 32 chars - invalid symbol error!
It was done this way for ease of implementation, but I would like to add anonymous labels in the future or change the way local labels are implemented so that they don't take up so much heap space.
Sort of. I wondered whether you want to be able to rely on, eg:
:nonlocal1
_local1
:nonlocal2
_local1 jr :nonlocal1_local1
And if that's not an explicit goal, I think there's some low hanging fruit in heap size savings with segregating local labels into a short-lived table that gets reset at every non-local label boundary. (and allowing local labels a full 31 characters since there's no longer any need to prepend the non-local label)
The heap in v80 cannot deallocate anything, ever! If a label gets added, it cannot ever be removed, because once something else gets added to the heap (like a deferred expression, a new constant), the heap cannot shrink without deleting something else important. The space cannot be reused because that creates a fragmentation problem that would take hundreds of bytes of code to work with. The heap is append only.
Hope is not lost however; we could have label records include a sub-label linked list on the end of it so that only the local labels names are stored attached to the parent label by a linked list. The downside to this would be greater complexity and code size in label searches.
@Kroc Heap limitations make sense. For v80.c, I'll I'll use the same "append local to non-local name" for symbol table entries as you, effectively supporting jumping to local labels from another non-local block.
Largely rewrote v80.c today to take into account your earlier answers. Any other feedback welcome as I make progress...
Hmm.. just occurred to me that you could have local symbols in their own linked list, and as long as each entry is the same size (32bytes for the label name, and 4 bytes for the next entry pointer) and a zero length name marking the end of the list when searching, then there's no need to deallocate anything. When a new non-local label is encountered, we can error out for unresolved local label references, and then put a 0x0 tombstone at the head of the list. New local labels would then overwrite the entries from the local label list in place starting at the head (making sure that if the next entry was allocated, it get's a 0x0 tombstone) and reusing following entries until they are all used up, and then additional local labels get pushed onto the head of the list as before.
The size of the local labels list would only ever be 36bytes * largest-number-of-locals-in-a-single-scope. Surely much better for very large programs, which are the ones most likely to overflow the heap?
What remains:
- [x] fix any errors from
cc -std=c89 -pedantic -D_POSIX_C_SOURCE=1 - [x] diagnose unprintable character literals as an error
- [x] don't rely on host library
strtolavailability - [x] numbers at start of line set PC
- [x] write to code-segment
- [x] constant assignment and references
- [x] label assignment and references
- [x] local label assignment and references
- [x] .f keyword
- [x] .w keyword
- [x] .b keyword
- [x] .b strings
- [x] conditions
- [x] flush code-segment to output file
- [x] command line arguments
- [ ] ~deferred expressions~
- [x] forward references (separate pass ~? or back-patch recorded holes in single pass?~)
- [ ] testsuite
Did I miss anything?
Thinking about it, what I'm trying to get at is that changes to v80's design in Z80 code can take weeks, even months -- it took six months of meticulous crafting instruction-by-instruction and I'm not the fastest developer already. Given that the assembler is now self-assembling, I don't want to break it without careful consideration, and rewriting what already works is equally time consuming, so there had to be clear net wins.
This brings me on to instructions; I hadn't thought far enough ahead about a C version (I didn't actually think anybody would take up the offer), but the C version should reuse the instruction table binary so that this work isn't duplicated for every ISA -- v80 is unique in that support for different CPU instruction sets requires minimal code changes. The instruction set is encoded as a binary tree (see "isa_z80.v80") with a small amount of CPU-specific code to handle parameters ("v80_z80.v80"). However, I'm in the process of rewriting this table (see branch "v2") logic to both greatly simplify the instruction tables (see "is2_6502.v80" in branch "v2" for just how much simpler) and hopefully save more bytes, so you'll want to hold off of parsing instructions for the moment.
Oh, I didn't mean to imply you should change the algorithm, but I think it's definitely worth throwing an error when attempting to jump into the middle of a local label from another scope so that some space optimizations are still on the table in case you want to do that one day 😁
In the unlikely event that the C version catches up, I might bug you for some specs for the v2 tables then. I secretly want to add support for my fantasy vm ISA after all!
I should add tests to flush out bugs in another PR, and I don't have any code to read the opcode tables yet - but the parser handles v1/isa_6502.v80 and v1/isa_z80.v80 and produces plausible looking binary output files, so it seems to be minimally functional.
What's the usual way of building an assembler that does opcode lookup in the tables? And do you have a spec for v2 tables I can implement?
Sorry for the slow response, I'm rather busy at home whilst my son is off school over summer. The process of parsing the instruction tables is covered by parseMnemonic in "v80_asm.v80" (https://github.com/Kroc/v80/blob/cebb0494b72cc78276c63111ceeb2bf74d7222b9/v1/v80_asm.v80#L1234-L1377). Sorry that I don't have it better described somewhere but its a small amount of code; the tables themselves describe and demonstrate the structure so it's possible to use that alone as a guide. I'm getting near the end of the v2 instruction parser but have been struggling a lot with focus. The v2 parser is only guaranteed to make the instruction tables easier to read and write, performance is an unknown factor at the moment until I complete my prototype, so there's a small possibility v2 might be abandoned.
The "build.bat" script does some testing by building samples of the entire Z80/6502 instruction set and comparing against the same produced with WLA-DX maybe this would be a starting point? I haven't examined the PR enough to know what the build requirements of your C version are and if/how this would work as part of the current, rather crude, system. I use a batch file only so that v80 can be built out-of-the-box without having to install any dependencies or deal with high up-front demands like requiring knowledge of Docker -- remember that whatever is required to build v80 is itself a dependency of the 8-bit software at the end of the pipeline and the goal is to get away from gigabytes of constantly evolving build infrastructure :P
No apologies necessary. I'm setting off on a 2-3 week road trip tomorrow, so any free time I would have had for coding will probably be spent on driving instead. Absolutely no hurry on anything from my perspective.
Build requirements for v80.c are a c89 C-compiler toolchain and a libc with support for stdio FILE*streams and a selection of c89 *printf calls (these could be coded around if it needs to build and run in an environment without stdio, but I'd rather not -- it's a lot of boring code) as well as stdlib.h for malloc, free and exit calls (could probably write a custom allocator if malloc and free are missing, managing without exit is probably a bit harder). If sys/param.h is available, it'll use the proper values for some constants, but has sensible fallbacks if not. If sys/stat.h is available, it'll check inode types when opening files for reading.
I was looking at your build.bat, and even though I enforce CP/M compatible filenames for .I arguments, you can pass any path to the compiled v80.c on the command line... I should probably take any directory prefix from the command line input file and prepend it to any filenames that come from .I args so you don't have to run it from the directory with the sources inside to find the include files.
I haven't tried building anything with WLA-DX or runcpm yet, so that's probably a good thing for me to get going to decide how to proceed, but I'd also like to write specific tests to exercise the tokenizer and parser in v80.c which probably needs a custom test harness anyway... which is why I don't want to pile that all on top of this PR.
I'm still not clear on how to assemble the *.v80 files to end up with a working assembler that contains the instruction lookup tables and the code that uses them to assemble instruction op-codes. It appears that that assembler needs to exist before it's possible to assemble the table lookup code?!?
And finally (for now ;-) ) -- I was thinking it might be easier to share the instruction opcode to binary mappings between v80.c and v80 proper if we define the instruction set separately somewhere that v80.c can load directly into a hash table, and I also provide some code to generate the lookup table sources (for v80 sources) rather than you hand coding them. That will let you tune the format for speed/space efficiency without the work of hand coding the tables too. WDYT?
Had a couple of unexpected evenings to finish the code!
This implements the instruction tables for v80.c, as well as loading and parsing. It produces sensible looking (but untested) cpm_z80.com binary from the assembly sources, so can now serve as a bootstrap mechanism.
I need to write some code to generate the isa_*.v80 tables for the v80 assembler from the tbl_*.v80 tables for the C assembler, and validate that the binary it generates runs and regenerates bit-identical content from itself when reassembling itself.
QQ: v80.c is becoming hard to navigate at this size when editing it, but also having everything in a single file makes it easier to compile. I'm tempted to pull the polyfills (for missing libc APIs) and maybe some of the data structures (linked lists, hash tables, perhaps the tokenizer) into individual pseudo-headers. That would mean adding -I$PWD/v1 to the compiler invocation to pull all that code back in (but still a single compilation unit), but would make editing and navigating the code a lot easier for me. Do you have a preference? I could be nudged either way quite easily...
QQ: v80.c is becoming hard to navigate at this size when editing it, but also having everything in a single file makes it easier to compile. I'm tempted to pull the polyfills (for missing libc APIs) and maybe some of the data structures (linked lists, hash tables, perhaps the tokenizer) into individual pseudo-headers. That would mean adding
-I$PWD/v1to the compiler invocation to pull all that code back in (but still a single compilation unit), but would make editing and navigating the code a lot easier for me. Do you have a preference? I could be nudged either way quite easily...
Thank you for hard work! Yes, you should split the code where you are essentially "patching" the base C-functionality; I fully expect that additional replacement functions may be needed for certain combinations of operating system and compiler -- C89 compatibility was very variable in compilers even late into the 90s! Such monkey-patching and non-portable considerations shouldn't factor into the code of v80 itself so that others may have an easier time fixing for their choice of compiler/OS.
Okay, all done @Kroc!
If I compile main.c from bootstrap to make a v80 executable on my machine:
$ cd bootstrap
$ cc -std=c89 -pedantic -ggdb3 -D_POSIX_C_SOURCE=1 -DNO_STRING_H -DNO_SYS_STAT_H -DNO_CTYPE_H -DNO_LIBGEN_H -DNO_SIZE_T -DNDEBUG -I. -o ./v80 main.c
And then use that to make a cpm_z80.com file for CP/M (note the use of the simplified tbl_z80.v80 table to populate the instruction lookup table):
$ ./v80 -i tbl_z80.v80 ../v1/cpm_z80.v80 v80c.com
It produces identical bytes after recompiling itself with ntvcm (according to vbindiff):
$ ntvcm -l ../bootstrap/v80c.com cpm_z80.v80
And also identical bytes to recompiling sources with your most recent v80.com release:
$ ntvcm -l ../release/v80.com cpm_z80.v80
Incidentally the byte encodings for the set* instructions are the same as the res* instructions in your v1/is2_z80.v80 file. I discovered and corrected those in my bootstrap/tbl_z80.com file when comparing binaries, but I haven't done a full audit to see if there are other typos in there.
If you like and merge this PR, I'll be happy to work on generating the is2_*.v80 files from the simpler tbl_*.v80 tables when you've finalized the format. Or to isa_*.v80 if you decide to abandon the v2 format.
Also, feel free to let me know if you have any suggestions for changes or improvements to what is already here.
Incidentally the byte encodings for the
set*instructions are the same as theres*instructions in yourv1/is2_z80.v80file. I discovered and corrected those in mybootstrap/tbl_z80.comfile when comparing binaries, but I haven't done a full audit to see if there are other typos in there.
I had seen this and fixed it, but maybe that was only on the v2 branch :/ I can't remember things straight. My son will be back to school next week and I'll focus on integrating your C version then. I think we should merge it in the current state to a separate branch; are you able to update the PR to use a different branch (or this something I need to do?)
Cool! I can definitely do it if you tell me what branch you'd like me to retarget to. I think you might also be able to do it with the edit button near the very top of the PR page? Let me know whenever you're ready!