Coroutines and resuming function execution.
There are a lot of things that can be programmed very nicely using coroutines (which for this purpose are basically the same thing as fibers). In particular they would be a nice way to write some high level meataxe functionality and very possible matrix group recognition stuff. A coroutine can start a big matrix computation running using a bunch of threads, and then yield control until it is done. Since all the real work is being done in the underlying multi-threaded matrix library it's no problem for the higher level stuff to be limited to a single thread (meaning it doesn't need to think about locking and so on) but awkward to structure it as a single programme.
One could do this using threads in HPC-GAP and a big lock to ensure only one of them ran at once, but that is rather heavyweight.
I've always assumed that adding coroutines to GAP would be very hard to do because quite a bit of the GAP execution context is stored on the C stack, but thinking more closely, I wonder how much of an obstacle that really is.
Consider a GAP function in mid-execution and suppose that you want to suspend it while you run another function, and eventually resume executing it at the point where you left off. Suppose you know the address within the function body bag of the statement after the yield, and you preserve the local variables bag. How much more do you need? The C stack holds information about the loops, conditionals, etc. which are currently executing, but I think that can all be reconstructed by analyzing the body bag. This would be a bit slow as things stand, but adding a fairly small amount of adidtional information (parent pointers for statements, more or less) would make that efficient.
If you want to do stackfull coroutines you would also need to do a similar reconstruction for calling functions up to a point outside all coroutines, which might be a bit harder because the function calls could be buried in expressions, not just statements, but stackless coroutines should be possible.
Firstly, am I right? have I missed a huge complication? Secondly, would this be useful?
One big problem would be when a GAP function calls into a C function which calls back into a Gap function, and that. C function has interesting local state. A simple example is sort, which is written in C but takes a Gap comparator.
One option would be to take the Python model - - use (part or all of) HPC GAP, but only allow one thread to be executing at a time, and only switch at explicit yield points. This shouldn't require any lock functionality for correctness. We could even still use gasman, once it was taught to scan all the stacks.
There also exist some coroutine C libraries I've used, which would achieve the same basic effect.
I actually thought quite a bit about this some time ago, and I think it is rather tricky to be done (at least without "cheating", i.e., (ab)using threads as Chris describes). Because the statement to be executed is not quite enough, because of loops: for a statement nested inside of one or multiple layers of "compound" statements (loops, ifs), you also need to keep track of those; and e.g. the loop state is not preserved within the lvars.
That said, it is not impossible to do this; I've done something like this before in ScummVM, using a variation of Simon Tatham's Coroutines in C, beefed up with some C++ tricks to make it easier to use. That would still require changing virtually all functions in the executor to receive a new argument to carry the "execution state" -- but that's something we'll want to do eventually anyway (well, at least I have been planning on this, to get more state out of the global GAPState). It's doable, but not exactly trivial or "minor" work.
A completely different way to resolve this is to switch GAP from a syntax-tree based interpreter to a bytecode interpreter. That would be desirable for other reasons, too (e.g. it would probably speed up GAP code quite a bit, and simplify implementing various other interesting syntax extensions). But it's also a bigger project. I am hoping that one of us will perhaps one day find a student qualified to tackle that... But I won't hold my breath on that :-).
@ChrisJefferson I was thinking of what seem to be called "stackless" coroutines, where yielding is limited to the "top-level" coroutine, and can't be done in subroutines. That is limiting, of course, but does make things a lot easier.
@fingolfin you're right about loops. I had thought you weren't, since you have both the lvars and the location of the yield point, but in a for loop like
for i in l do
where l is not a range literal, then you need the "hidden" loop counter, since l may change, or contain repeats. :-(