HTTP request hangs with return_route header
In testing the issuance flow (AIP v1) where ACAPy is the issuer and the IBM agent is the holder, the HTTP request which the holder makes to ACAPy in order to deliver the final ACK hangs. In other words, when the IBM agent sends the HTTP reque3st containing the ACK to complete the issuance flow, the ACAPy agent never sends an HTTP response. However, when I change my IBM agent to NOT send the return_route header in the Aries ACK message, then ACAPy immediately sends the HTTP response and all is good.
@ianco can you look at this one? Or at least given an assessment of how quickly a fix is needed?
Thanks!
Wouldn't this be somehow expected?
From the return route RFC:
For HTTP transports, the presence of this message decorator indicates that the receiving agent MAY hold onto the connection and use it to return messages as designated.
Because there is no message to be returned, the connection is kept alive (which is a desirable feature when you want to do e.g. long polling). Another way to solve would be to close the HTTP connection on the IBM holder side, or by not including the return routing when no response is expected.
This will have impact on how mobile agents interact with ACA-Py
I think only messagepickup/1.0/noop should be long polled.
The reason I say this is because otherwise it puts more of a burden on the sender to know when a return_route should be added or not. In some cases, the sender may want to send a message which may or may not terminate the thread. Only the receiver knows for certain if the thread is complete or if/when it will send another message on the thread. Also, I do not want to async send messages because if the message fails to be delivered via HTTP, I want in some cases to return that to the admin controller.

Not sure how legible this is, but it's the basic aca-py flow for receiving and processing a message
To close an inbound return route I suggest one of 2 options:
- add a method to the Responder class to close a return route (i.e. return 200 status) (this would have to be called from the message-specific handler)
- add something in the Conductor's
dispatch_completemethod to close the return route if necessary (this could be called automatically)
I'm not sure of the implications on the Mediator, looking for some input from @TimoGlastra
For fun I made a plantuml version of the drawing above — @ianco please review:

I’m thinking this should go in the ACA-Py docs?
@startuml
participant "Inbound\nMessage\nHandler" as oag
participant "http\nTransport" as ht
participant "Internal\nTransport\nManager" as itm
participant "Inbound\nSession" as is
participant "Message\nHandler" as mh
participant "Conductor" as con
participant "Dispatcher" as disp
participant "Responder" as resp
oag -> ht: "inbound_message_handler()"
ht->itm: "create_session()"
itm -> is: "create"
is --> itm
itm --> ht
ht --> is: "receive()"
is --> is: "parse_inbound()"
is --> is: "receive_inbound()"
is --> is: "process_inbound()"
is --> is: "inbound_inbound()"
is --> con: "inbound_msg_router()"
con --> disp: "queue_msg()"
disp --> disp: "handle_msg()"
disp --> disp: "make_msg()"
disp --> resp: "create()"
disp --> mh: "handle"
mh-->resp: "send_reply()"
mh --> disp: ""
disp --> con: ""
con --> con: "dispatch_complete()"
con --> is
is --> ht
@enduml
A few minor tweaks. Yes I think this should go in aca-py docs.
@startuml
participant "Inbound\nMessage\nHandler" as oag
participant "http\nTransport" as ht
participant "Internal\nTransport\nManager" as itm
participant "Inbound\nSession" as is
participant "Conductor" as con
participant "Dispatcher" as disp
participant "Responder" as resp
participant "Message\nProtocol\nHandler" as mh
oag -> ht: "inbound_message_handler()"
ht->itm: "create_session()"
itm -> is: "create"
is --> itm
itm --> ht
ht --> is: "receive()"
is --> is: "parse_inbound()"
is --> is: "receive_inbound()"
is --> is: "process_inbound()"
is --> is: "inbound_handler()"
is --> con: "inbound_message_router()"
con --> disp: "queue_message()"
disp --> disp: "handle_message()"
disp --> disp: "make_message()"
disp --> resp: "create()"
disp --> mh: "handle()"
mh-->resp: "send_reply()"
mh --> disp: ""
disp --> con: ""
con --> con: "dispatch_complete()"
con --> is
is --> ht
@enduml

OK from a more detailed review of the code, it looks like aca-py should already send an HTTP 200 response even if there is no explicit response message.
This is the method that handles an HTTP request from another agent inbound_message_handler():
https://github.com/hyperledger/aries-cloudagent-python/blob/main/aries_cloudagent/transport/inbound/http.py#L71
You can see that there is logic to return a response message if one is available, however the "fall-through" logic should always return an HTTP 200 response regardless:
return web.Response(status=200)
... unless there is some other error that is crashing aca-py and preventing the response.
@smithbk can you send the aca-py logs for this scenario? --log-level debug
Just ran into this issue myself, so adding some info.
@ianco it will never reach
return web.Response(status=200)
because a few lines above it waits for a message:
https://github.com/hyperledger/aries-cloudagent-python/blob/31659902475de16118d8322f3cbc9c74f0b85ef5/aries_cloudagent/transport/inbound/http.py#L99
This could be useful for long polling, but it also means the http request won't finish until a message should be delivered to the other agent. Currently in AFJ we have a timeout that waits x seconds before aborting the request, which leaves control to the client agent instead of the server agent.
But if we want to change this behaviour on the ACA-Py side we should change the behaviour of that wait line (maybe adding a timeout before it returns, or directly returning and allowing web socket connections to keep the socket open)
Just ran into this issue again and after giving it some more thought, I now 100% agree with @smithbk. ACA-Py shouldn't keep the connection alive as this makes it impossible to know for the mobile client whether the request was successful.
@ianco -- back to you, since you looked at this long ago... :-)
@ianco -- Can you please look at this one again? I'm wondering how this would manifest in mobile scenarios. Could this cause delays in under some conditions -- e.g. intermittent delays, such as we are seeing at some times?
Made a PR to fix this. @ianco could you take a look at the PR? Mabye @smithbk you could verify whether this PR fixes your issue?