Add diagnostic channels
Reimplementation of what @AdriVanHoudt made on the previous PR #4278
What changed since ?
- Node 12 is no longer supported, there is no longer the issue of requiring a polyfill.
- The API
diagnostics_channelis stable since Node 16, The only part of the API we use, is identical in Node 14 but marked as experimental. Node 14 being EOL in less than 2 months, I don't really see a big issue there.
Following what Fastify does on their side https://github.com/fastify/fastify-diagnostics-channel, I added 3 more channels for the route, request and response.
I also added documentation of the exposed channels.
Questions :
- Should we add more channels (
onTimeout,onError) ? - Naming convention of the channel ?
I copied Fastify naming scheme, but I don't really have any strong opinions regarding the naming scheme we pick :
hapi:server,hapi.onServerorhapi.server
Cc'ing @Qard and @vdeturckheim as they are probably interested in the outcome of this PR.
It's been awhile since I've used Hapi so I'll have to take another look at it when I have time to know for sure if this satisfies what APMs need. One thing that does seem to be missing is we'd like an event to track errors.
@rochdev Any other suggestions for events that could help us?
@Qard As you mentioned, definitely an event for errors. I would say also an event for when the route is entered as opposed to just created, unless the route is already available on the request object by the time it's emitted. Events for when plugins/extensions are entered/exited would also be useful, although this could be out of scope of this PR.
Route is available on the request object (see https://github.com/hapijs/hapi/blob/master/API.md#request.route) so I believe the hook on route entered is not necessary.
I should also say, channel naming doesn't really matter as long as the chosen name is documented, and it includes the module name somehow to avoid possible collisions with other modules as diagnostics_channel is a global namespace.
It's also worth mentioning that I have an in-progress API to make the tracing use case easier with diagnostics_channel via https://github.com/nodejs/node/pull/44943. While the API itself will not be available in all the supported versions of hapi for awhile yet, it does document the behaviour as a pattern which can be implemented manually with channels now, so long as you follow the guidelines of how it should behave. It could be worth a look to understand what things we capture and how that could be translated to channels introduced for hapi.
Route is available on the request object (see https://github.com/hapijs/hapi/blob/master/API.md#request.route) so I believe the hook on route entered is not necessary.
request.route won't be available yet when onRequest is triggered. The earliest it can be determined using public API, is with a onPreAuth hook, and it would have to contend with other hooks and not trigger when cookie processing fails.
Regarding channels, I would advocate on limiting it until there is a concrete need. Eg. with just the onServer channel, APM's should be able to hook something that matches hapi.onRoute, hapi.onRequest, and hapi.onResponse.
Exposing just a hapi.onServer channel would also mean that there is zero runtime overhead when channels are not used.
Hapi already contains most of the hooks a basic APM need, provided they get the Server object before the user can call methods on it. I don't see why we need to make it any easier for them.
My main concern is that a
throwin a registered channel handler can propagate to hapi. It seems that the current version guards against this, but a test to verify would be nice.
The diagnostics_channel API has guarded against this since the beginning. It's not possible for a throw in a subscriber to ever impact the publisher.
Exposing just a
hapi.onServerchannel would also mean that there is zero runtime overhead when channels are not used.
It's specifically designed to have zero overhead when a channel is not used. The publish function is a no-op until the channel has subscribers, and if it ever returns to zero subscribers it becomes a no-op again.
Added both hapi.onRequestLifecycle and hapi.onError channels.
I'm not really sure about the name hapi.onRequestLifecycle, if you have some ideas for another name, I'm listening.
I would much prefer that this PR does not add any more channel hooks. Ideally each hook gets a PR to allow simpler reviews. Eg. i have comments regarding hapi.onError, but I don't want to complicate the review further.
Personally, I would just include the bare minimum (ie. hapi.onServer) for an initial PR. As stated in https://github.com/hapijs/hapi/pull/4429#issuecomment-1463799340, it should be enough to get started. For instance, looking at the elastic-apm-node APM it is the only hook it will need. The remaining hooks are handled using the public API.
There's typically greater overhead to needing to hook into some public API like that though, and a greater risk of it changing between versions. APMs would generally prefer that data be published directly. While having only the hapi.onServer event is better than nothing, it's not actually what we want and we wouldn't usually even need the hapi.onServer event if we have the other events.