mindcode icon indicating copy to clipboard operation
mindcode copied to clipboard

Remote calls

Open cardillan opened this issue 11 months ago • 8 comments

Remote calls and remote variables

Bleeding-edge version of Mindustry Logic now allows reading and writing variables from other processors, including @counter, by name. This means countless new opportunities for interaction between processors. A straightforward one is calling functions remotely. A prototype will be implemented as soon as possible (i.e. Mindcode 3.2).

Status

Grammar/AST changes

  • [x] remote keyword
  • [x] Remote functions and variables
  • [x] Remote arrays
  • [x] Fully qualified names of remote function output variables

Code generation

  • [x] Creating module dependency graph
  • [x] Recognizing remote functions when building the call graph
  • [x] Function prefixes derived directly from function names
  • [x] Remote functions
  • [x] Remote variables
  • [x] Synchronous remote call
  • [x] Asynchronous remote call
  • [x] Built-in functions: async, finished, await.
  • [x] read and write methods for arbitrary remote variables.
  • [x] Remote arrays (remote processor's arrays access from main processor)

Code generation: modules

  • [x] Initialization code, including remote variables
  • [x] Generating code for remote functions

Schemacode

  • [x] String value handling

Documentation

  • [x] Readme, syntax, tutorial
  • [ ] Create sample code

Architecture

A program will be run in a main processor. This processor will be able to remotely call functions stored in one or more other processors, probably linked ones (remote processors). Functions in each remote processor will be called by exactly one main processor.

A remote processor can act as a main processor to another remote processor. Processors can therefore be arranged into an oriented graph - a tree (in the sense of graph theory - the root is the main processor, which starts and controls the execution over the entire processor tree).

Function prefix of remote functions will be identical to the function name. A compiler option will be created to use function name derived prefixes for all functions.

Remote processors code

Code belonging to a remote processor must be placed in a module (new module keyword will be created). Modules will be named, although at this moment there is no use for the name.

It will be possible to declare remote variable or array in a module. Remote variables are implicitly volatile, and are available through remote access in code which imports the module. Remote variables/arrays need not be initialized. When some remote variables or array elements are unused (and the corresponding mlog variables would therefore not be created when parsing the code by the logic processor), the variables will be explicitly created using draw triangle instruction(s). These instructions are placed at the end of the generated code and are never executed.

Function to be called remotely must be declared using a new remote keyword (e.g. remote def foo(x, y, z)). Function parameters may be input or output (as usual). As the function cannot be declared inline, varags aren't supported. Remote function cannot be overloaded (although it will be possible to overload a remote function with non-remote ones). Function parameters will be mapped to mlog variables :functionName:parameterName, and return address/value will be :functionName*retaddr/:functionName*retval.

All remote functions will be active entry points for the compiler and thus will be included in the compiled code.

The compiler will generate initialization code. This code will create variables holding address of each remote function, named :functionName*address. Furthermore, *mainProcessor variable will be created, initialized to self. After executing the initialization, the code will enter an endless loop.

Remote functions must not be recursive, and they must not be called from the module which declares them. It would be possible to call a recursive function from within a remote function, though.

Note: there's no specific designation of remote modules. Any module may contain remote functions/variables. When such a module is imported locally through plain require, the remote function cannot be called.

Main processor code

The require statement will be enhanced with the remote keyword, which specifies which remote processor contains remote functions:

require "library" remote processor1;

When a module is being requested for remote access from two different processors, or is being requested both locally end remotely, a compilation error occurs.

Modules imported remotely won't be compiled into the code, only remote call implementations will be generated. The processor can be specified using a linked block name, a parameter, or a variable. The variable needs to be initialized, e.g. by a function.

If at least one remote function is called from a processor, guard code for the processor will be generated. The guard code will wait for the processor to be created and its *mainProcessor variable initialized. When this state is reached, the following happens:

  • The *mainProcessor variable in the remote processor is set to @this
  • :functionName*address is copied from the remote processor to the local processor for each function called remotely.

Remote call mechanism

Local side - synchronous call

  • The function parameters are set up in the remote processor.
  • :functionName*finished is set to false in the main processor.
  • @counter is set to :functionName*address in the remote processor.
  • Waiting for call completion: loop while :functionName*finished equals false
  • The output values (:functionName*retval, plus all output parameters) are not copied from the remote processor to the variables in the main processor: the remote processor writes these values to the main processor.

Local side - asynchronous call

  • The function parameters are set up in the remote processor.
  • :functionName*finished is set to false in the main processor.
  • @counter is set to :functionName*address in the remote processor.
  • The code continues in the main processor
  • The main processor can test the call completion using :functionName*finished
  • A special syntax will be provided to access function output values, decoupled from the function call.

Remote side

The call mechanism on the remote side is the same for both synchronous and asynchronous calls.

  • Remote call leads to code which immediately starts executing the function, as all parameters have been set by the caller.
  • Function return
    • All output values (:functionName*retval, plus all output parameters) are set up in the main processor.
    • :functionName*finished in the main processor is set to true
    • Endless loop is entered

If the same, or maybe a different remote function gets called while a previous call is active, the previous call is terminated. The corresponding finished flag won't be set. There's no way to define a cleanup routine in case a remote call is terminated, and Mindcode doesn't guard against possible corruption of the remote processor state.

Remote variables

All variables declared remote in remotely imported modules are accessible in the main processor. The mechanism to access them uses complex ValueStore implementation, almost identical to external variables. It's not required to declare the variable in the main processor again.

To access arbitrary remote processor variables using the read and write instructions., read(variable) and write(variable, value) methods will be created, to be called on processors. The goal is to keep the array access syntax ([]) strictly for numerical indexes.

Remote arrays

An array declared remote in remotely imported modules is accessible in the main processor. The array elements are stored in the remote processor. Access to individual elements happens through the read and write instructions, not through remote calls: appropriately modified jump tables are generated in the main processor code. Random access to remote array elements will be as fast as random access to local array elements. Direct access to remote array elements gets resolved to remote variable access, which is as fast as external variable access (except it supports non-numerical values).

Asynchronous remote call syntax

Asynchronous calls are started using built-in async function (async(foo(x, y, z));). The out modifiers are disallowed here: output only arguments need to be completely omitted.

Waiting for finish: finished(foo) returns true if the call to foo is finished.

Obtaining function return value: result = await(foo);. The await functions waits for foo completion, and then returns its return value. (Note: not just wait, as there's currently an mlog wait function.)

Other output parameters are available under their fully qualified names: foo.x, foo.y, foo.z.

Multiple asynchronous remote calls may be active at the same time, provided each of them is executed on a different remote processor.

Schemacode

All string values defined in a Schemacode source file will be registered as system libraries (except those whose names would clash with existing system libraries). This will make it possible to require source code stored in a string value as a remote module.

Additional notes

  • It is expected that Schemacode will be used to create schematics containing the main processor and all remote processors.
  • Theoretically Mindcode could compile code for all processors in one go, but we're missing a mechanism to place the generated mlog into individual processors without further instructions from the user.
    • In the future Mindcode might split source code among several processors automatically, assuming the previous point is solved in some way.
  • An additional processor might be used to store stack for recursive function calls, including non-numeric values.
  • External events/interrupts are possible: an external processor might read/store @counter, and set @counter to an address of the handler routine. The handler routine would then return to the original counter. All needs to be done in one tick.
  • Ability to read @counter allows creating profilers.

cardillan avatar Feb 09 '25 21:02 cardillan

Rules for resolving remote modules:

  1. A remote module may not be instantiated multiple times. Example: module A requires module R remotely, module B requires module R remotely, main program requires modules A and B --> error, R is being instantiated twice.
  2. Cycles in remote modules won't be detected. Example: module A requires module B remotely, module B requires module A remotely, main program requires module A remotely. A is required twice in different contexts, but it won't be detected or reported (Mindcode simply won't look at the structure of dependencies of remote modules).

The first rule might theoretically be loosened:

  • If A and B placed R on the same processor, they might both call its functions, but they would have to make sure not to perform these calls in parallel.
  • If A and B placed R on different processors, they might make calls in parallel, but Mindcode would have to track whether the call originates from A or B and route the call to the corresponding processor. We currently don't have that information, and it might not be completely straightforward anyway. (I envision creating a parallel execution framework which would handle parallel calls to the same function among different processors in a better way.)

The second rule might be improved to actually detect cycles in dependency graph (cycles cannot be resolved under current execution model, where every remote processor may only be called from one main processor: at least one module in the cycle would have to be called by two different processors) and report them as errors.

cardillan avatar Feb 26 '25 08:02 cardillan

Rules for resolving function calls:

  1. A function is either remote (declared with the remote modifier), or local.
  2. A module is either local (the main module, and any module locally required from another local module), or remote (a module required remotely) - see above.
  3. Local function declared in any local module may be called from any local module.
  4. Remote function declared in a local module may not be called (the function is recognized and attempting to call it results in compile error).
  5. Remote function declared in a remote module may be called remotely from any local module.

Since the dependencies of remote modules aren't traversed (see above), functions (either local or remote) declared by modules required by a remote module aren't known to the compiler.

Consequences:

  • Module A requires module B remotely, and module B requires module C. No function in C (either local or remote) will be available to A.
  • Module A is compiled as a remote module and requires module B. Module B contains remote functions. These functions won't be compiled at all, since they aren't accessible to modules using module A remotely.

The expected architecture is this: modules will typically not contain remote functions. When creating a multiprocessor application, code for remote processors will be defined by dedicated modules which will provide a facade for remote calls to the actual logic imported from other modules.

Rules for visibility of remote variables are the same as functions. This is a bit unpleasant: a remote variable declared in a module required by a remote module can't be accessed remotely. While functions can be easily delegated, variables can't. I'll try to target all this when implementing modules and namespaces properly.

cardillan avatar Feb 27 '25 07:02 cardillan

Additional consideration: optimizations must not remove remote functions and variable initialization code.

cardillan avatar Feb 27 '25 21:02 cardillan

Compilation of remote modules is complete. Example:

module test;

var cnt = 0;
remote var x = 0;

remote def foo(in a, out count)
    count = ++cnt;
    return a + x;
end;

remote void bar(in a, out b)
    b = sin(a);
end;

Resulting code:

set .cnt 0
set .x 0
packcolor 0 :foo:a :foo*retval :foo*finished :bar:a
packcolor 0 :bar*retval :bar*finished null null
set :foo*address 9
set :bar*address 17
set *mainProcessor @this
wait 1000000000000
end
op add .cnt .cnt 1
set :foo:count .cnt
op add :foo*retval :foo:a .x
write :foo*retval *mainProcessor ":foo*retval"
write :foo:count *mainProcessor ":foo:count"
write true *mainProcessor ":foo*finished"
wait 1000000000000
end
op sin :bar:b :bar:a 0
write :bar:b *mainProcessor ":bar:b"
write true *mainProcessor ":bar*finished"
wait 1000000000000

cardillan avatar Feb 28 '25 12:02 cardillan

Synchronous call:

#set target = 8;

require "tmp.mnd" remote processor1;

var b;
var x = foo(10, out b);

println(x);
println(b);

Resulting code:

read *tmp0 processor1 "*mainProcessor"
jump 0 equal *tmp0 null
write @this processor1 "*mainProcessor"
read :foo*address processor1 ":foo*address"
packcolor 0 :foo*finished :foo*retval :foo:count null
write 10 processor1 ":foo:a"
set :foo*finished false
write :foo*address processor1 "@counter"
jump 8 equal :foo*finished null
print :foo*retval
print "\n{0}\n"
format :foo:count
printflush message1

Note: target 8 will be required for the compilation of remote calls.

cardillan avatar Mar 01 '25 22:03 cardillan

The proposed syntax for accessing output variables of an asynchronously called function is function.parameter:

result = await(foo);       // waits for completion, gets the function return value
x = foo.x;                 // gets the value of output parameter x

foo.x is already a valid syntax (used in block.enabled = 10 for example). Therefore, when a remote function foo is used by a program, a special compound global variable foo will be created. The variable will represent a structure. Implementation won't be difficult in current Mindcode and it will serve as a building block to add types and records/structures to Mindcode later on.

Since the foo variable is global, it can be shadowed by local variables, and it will collide with global variables. Furthermore, it can be accessed any time, even when the function was not called or hasn't finished yet. This is unfortunate, and when records/structures are implemented, it will be replaced by a better mechanism:

  1. Remote functions won't be able to use output parameters at all. To return more values at once, all return values will have to be packed into a structure/record.
  2. The await function will return the single output value of the function, potentially a record:
    1. The output values obtained via await can be passed around to other functions or assigned to variables/arrays easily.
    2. Results of an asynchronous call can only be obtained via await; not being able to access them in ways disconnected from the await function makes the call mechanism more secure.
    3. The results will be stored in a variable created and named by the user, avoiding pollution of the global namespace with the names of the remote functions and making the connection between the function results and the resulting variable explicit and more understandable.

cardillan avatar Mar 02 '25 09:03 cardillan

Example of asynchronous call:

#set target = 8;

require "tmp.mnd" remote processor1;

async(foo(10));

var counter = 0;

while !finished(foo) do
    counter++;
end;

x = await(foo);
println(counter);
println(x);
println(foo.count);

Results:

read *tmp0 processor1 "*mainProcessor"
jump 0 equal *tmp0 null
write @this processor1 "*mainProcessor"
read :foo*address processor1 ":foo*address"
packcolor 0 :foo*finished :foo*retval :foo:count null
write 10 processor1 ":foo:a"
set :foo*finished false
write :foo*address processor1 "@counter"
set .counter 0
jump 12 notEqual :foo*finished false
op add .counter .counter 1
jump 10 equal :foo*finished false
jump 12 equal :foo*finished false
print .counter
print "\n{0}\n{0}\n"
format :foo*retval
format :foo:count
printflush message1

The while !finished loop followed by await means await wouldn't have to wait - its jump is superfluous. Current optimization doesn't remove it, and I'm not sure I'll try to implement such optimization before a complete optimization engine rewrite.

cardillan avatar Mar 02 '25 21:03 cardillan

Somehow I've completely forgotten that variables are not initialized on first access, but on code compilation. Therefore there's no need for explicitly initializing function arguments using packcolor, and I'll remove this initialization. I'll also update the first post to reflect this.

Initializing remote variables is still needed, on the other hand. Remote variables must exist even when not accessed by the module at all, because they might be accessed from the main processor. They also must not be removed by any optimizer, which probably still needs to be ensured in the code.

Lastly, I'll add support for remote arrays, as will be described in the first post.

cardillan avatar Mar 06 '25 15:03 cardillan

Included in 3.2.0.

cardillan avatar Mar 16 '25 14:03 cardillan