Client Connection Loss to Agent Exception
Specification
When the PK CLI or GUI loses connection to the agent, there should be a domain level exception in src/client/errors.ts and also a good human readable message.
2 existing exceptions are at play here:
class ErrorGRPCClientTimeout extends ErrorGRPC {}
class ErrorGRPCConnection extends ErrorGRPC {}
These should be caught in the src/client somewhere, and then handled by the src/bin commands.
class ErrorClientClientConnection extends ErrorClient {}
That way we can always know that it's a connection error.
Existing bin scripts already catch the above ErrorGRPCClient... but they should be catching ErrorClientClientConnection. I think both grpc timeouts and grpc connection errors lead to the same result from the client domain perspective.
Additional context
- https://github.com/matrixai/js-polykey/issues/224#issuecomment-952554160 - first discussed here
- https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/198#note_644652543 - older comment that the above issue comment refers to
- Note that this does not involve the
networkdomain at all, this is directly fromsrc/clientandsrc/grpc. - #333 - Need to deal with GRPCClient and derived classes becoming
StartStoppattern
Tasks
- ...
- ...
- ...
@joshuakarp should this part of our feature freeze?
Originally I was seeing this as quite low priority, and so not necessary for the feature freeze. But this might actually be quite useful for our own usage of Polykey when deployed on AWS. Thoughts?
A quick way to solve this is to move all of our comment descriptions of the exceptions into the actual description property and also use the appropriate exit code from sysexits. We can bring in the sysexits package from npm to do this in a more human readable way.
We just need to add a test into our bin code to test this behaviour, the command it should centralise on is agent status.
a shutdownCallback was added to GRPCClient that will be called when a connection failure is detected. otherwise an ErrorGRPCClientTimeout error will be emitted if there is a failure to connect or the connection drops and a call is attempted.
https://github.com/MatrixAI/js-polykey/pull/310#issuecomment-1016118180
Currently NodeConnectionManager wraps the NodeConnection<GRPCClientAgent> and provides withConnection for making calls the the connection/client. if a connection error is emitted by the client then withConnection can re-throw it as a ErrorClientClientConnection error. we may need to provide a similar with pattern for the GRPCClientClient. we should also have the GRPCClientClient make use of the new shutdownCallback to kill itself if the connection fails.
This may look like a with method in GRPCClientClient that exposes all of the GRPC methods in an interface;
@ready(new ErrorClientClientConnection)
public async with<T>(f:(clien: ClientInterface) => Promise<T>) {
try {
return f(this as ClientInterface);
} catch (err) {
if(err typeof someError) {
// Was a connection failure
await this.destroy()
throw new ErrorClientClientConnection;
}
// otherwise
throw err
}
}
I suggest a single GRPC connection error exception that is out into grpc/errors.ts. This is better than creating separate exceptions in client and agent domains. It can be rethrown as NodeConnectionError to be more specific later if need be.
Oh wait I re-read the issue and therefore to might make sense to catch and specialise it to client domain error or agent domain error. Because certain ops will do a client call that does a agent call. So being able to specialise ErrorGRPCConnection to ErrorClientConnection and ErrorAgentConnectiom can be useful. This is just pseudo names btw.
Can you flesh out the task list in this issue based on what we want to end up doing.
#310 enables the ability to resolve this issue, but this would have to be addressed in a later PR following https://github.com/MatrixAI/js-polykey/issues/276#issuecomment-1017027663
The GRPCClient is now exposing the destroyCallback which is being used by NodeConnection. Similarly the GRPCClientClient will be destroyed if the underlying GRPC channel is shutdown.
This means any usage of GRPCClientClient should hit the destruction exception ErrorClientClientDestroyed if it tries to do another call on it.
However we may want to propagate it up all the way to the PolykeyClient class itself, and perform a destruction of that instance too. Which should then propagate to the CLI when it tries to perform a method call on the pkClient.
The original spec relies alot on the way GRPC worked. GRPC isn't relevant anymore as we changed to using js-rpc on https://github.com/MatrixAI/Polykey/tree/4ddd9dd726995e4431a09b26b764f5d53fd5e4ac (October the 6th 2023).
However having good errors is still a good idea, so we should verify this behaviour later. @tegefaulkes
4ddd9dd726995e4431a09b26b764f5d53fd5e4ac