machinekit-hal icon indicating copy to clipboard operation
machinekit-hal copied to clipboard

HAL lacks filedescriptor-based eventing mechanism

Open ArcEye opened this issue 7 years ago • 5 comments

Issue by mhaberler Sun Jun 12 17:57:34 2016 Originally opened as https://github.com/machinekit/machinekit/issues/963


I'm posting for review and discussion of utility of an idea - the usage of file descriptors for notification within, and beyond HAL

it's an idea I've been toying with for a while, and not necessarily a suggested feature to be added. I'm sounding out feedback here. A pilot implementation exists, see below.

how do I come to this:

  • with the external-thread-syncing work, we're experimenting with file descriptors as notification mechanism, and blocking thread functions, for now in the posix/rt-preempt context
  • HAL is all polling, cyclic execution, non-blocking, with change detection on shared variables and queues being the only eventing mechanism available (NML is too but then I do not care anymore)
  • in the case of HAL this was mandated by RTAI and in-kernel operations. In the case of NML this was a colossal design fault as it would not have been exposed to the HAL limitations.
  • interfacing to HAL therefore requires cyclic polling which has an inherent tradeoff between latency and overhead
  • few would disagree polling is a suboptimal vehicle for change detection
  • the fact that rt-preempt became the prime contender for RT kernels suggest we might consider taking rt-preempt for granted, and build/use features only available on stock (=non-Xenomai 2) kernels
  • any current event-based toolkit worth its powder - comms, UI, lots of others - use file descriptor notification and watching as the core mechanism (based on poll(2), epoll(2) or select() for the oldtimers; using libevent, libev, Python asio or whatever - all based on fd's)
  • this includes in the vicinity: zeroMQ, gtk (any flavor), Qt (any flavor) and probably a dozen other I overlooked
  • the - rather banal - timing architecture (host-based timing and this is it) shows its limitations, and external syncing shows lots of promise.
  • I conjecture the same for other scenarios, like timing derived from an EtherCAT master as recently discussed.

A typical example is watching HAL pins or signals for changes, and act on any change.

Another example is ringbuffer not-full/not-empty conditions which is also a polling operation.

Haltalk does try to remove some of the pain doing this - for pins and signals, not yet for queues - but it is subject to the same limitation: short polling interval, high overhead; long polling interval - lame reaction. Then there is the scaling aspect - change detection is an O(n) complexity problem so overhead grows linear with number of observed objects.

There are two features which make such fd-based eventing possible at very low overhead, and even beyond the RT boundary:

  • the eventfd(2) mechanism which supports notification and semaphore-like operations, while being poll/epoll/select compatible (in fact the underlying vehicle supports any shape and form of file descriptor, it's just that eventfd's are very good at the job)
  • the ability to pass file descriptors beween processes which do not share a common ancestor, via Unix Domain sockets (see http://www.normalesup.org/~george/comp/libancillary/ how this is done; the underlying features are in Linux next to forever but it is a rarely used vehicle, so not commonly known).

What we now can do is something like:

  • create an eventfd in HAL. (similar to fd = open("file") - just that it's fd = eventfd(0,flags) for starters).
  • store the fd in a pin so that thing has a name - a HAL_S32 does the job. We make it an OUT pin because changing the fd post-creation makes no sense; other HAL entities may refer to the fd by linking.
  • to wait for a notification, you do a poll/select/epoll/read on the eventfd.
  • to signal an event, you do a write(eventfd, value). This causes a waiter to wake up as the fd becomes readable.

This would give us notification within HAL. For instance, an EtherCAT master could write to an eventfd every cycle, and the HAL thread do a read, similar like our userland IRQ handler for Mesanet firmware. That would for instance solve the 'let's sync HAL to an EtherCAT master' scenario.

Stage 2 - eventing beyond HAL: we use the Unix Domain socket fd passing capability and do this:

  • rtapi_app becomes a server, listening on a Unix domain socket, waiting for requests for fd's (unfortunately zeroMQ sockets cannot handle this, but it is not much code anyway).
  • a client may connect to this socket, and request a fd to be transferred. It uses the HAL name - the pin name - for that purpose.
  • once transferred, the client process can wait for notifications, or signal events, just as if it were running in the HAL context.
  • or it could integrate this newly obtained fd into an event loop, waiting for several types of events in parallel.

The only restriction here is that eventfd's cannot be shared across hosts, but then zeroMQ - and other eventloop-based stacks - are perfectly capable of dealing with a local fd. So in reality not much of a restriction.

All this sounds more complicated to explain than the code which implements it.

Here is a working preliminary implementation: https://github.com/mhaberler/machinekit/commits/hal-fd-notifiy

it's best to peruse in order:

  • HAL usage: https://github.com/mhaberler/machinekit/blob/hal-fd-notifiy/src/machinetalk/pass-fd-example/example1.hal
  • using an eventfd from Python: https://github.com/mhaberler/machinekit/blob/hal-fd-notifiy/src/machinetalk/pass-fd-example/example.py
  • eventfd creation: https://github.com/mhaberler/machinekit/blob/hal-fd-notifiy/src/hal/i_components/eventfd.icomp
  • thread functs operating on eventfd's: https://github.com/mhaberler/machinekit/blob/hal-fd-notifiy/src/hal/i_components/eventfdops.icomp

Here's a diagram to depict what is going on in example1.hal:

eventfd-hal

Happy to entertain questions!

status: I think basically reliable. A border - reaction to closed fd's - need to be worked out (kernel code says it works, userland says otherwise ;)

ArcEye avatar Aug 03 '18 15:08 ArcEye

Comment by sirop Mon Jun 13 08:08:55 2016


That would for instance solve the 'let's sync HAL to an EtherCAT master' scenario.

I do not quite see if syncing HAL to EtherCAT master has ever been a problem, as the main synchronization that has to take place is the synchronization between the master and the slaves which you do by the means of the EtherLab API.

The only requirement for the HAL thread involved it that this HAL thread is fast enough and that any processing taking place between receiving and sending out EtherCAT datagrams is short/fast enough - or in simple words - should end before the next EtherCAT cycle.

ArcEye avatar Aug 03 '18 15:08 ArcEye

Comment by mhaberler Mon Jun 13 08:59:12 2016


EtherCAT syncing is just one of a wide range of applications of this scheme and would better be discussed in #687 as this issue is about use of fd's in HAL, not timing architecture

that said: AFAICT (and my conjecture is not refuted so far) the HAL thread, and the EtherCAT 'thread' are not synchronized

to see what that can do for noise and aliasing, see the images at https://github.com/machinekit/machinekit/issues/687#issue-88631802 - whether synchronization is achieved by a DPLL or a causal event chain is not relevant for the outcome, and it is not a speed issue, it is an issue of relative timing of actions

Btw the DPLL is essentially an hack around some developers just not understanding the underlying problem, so they got a little 'help' via the firmware

@dkhughes's results with pegging the servo thread to the FPGA timing underline that

just because it's always been done this way, this does not mean there is not a huge potential for improvement

ArcEye avatar Aug 03 '18 15:08 ArcEye

Comment by ArcEye Mon Jun 13 10:02:49 2016


Any kind of event driven system will be a step forwards.

The concept is perhaps analogous to old GUIs v modern ones.

The old GUIs polled each component to check if its value had changed, works but hugely inefficient.

In modern GUIs such a Qt, components emit signals when changes occur and these are handled through event handlers or signal/slot mechanisms.

Yes at some level there is still polling, but this is handled in the framework and everything is not sitting on the main execution thread.

ArcEye avatar Aug 03 '18 15:08 ArcEye

Comment by mhaberler Mon Jun 13 10:11:32 2016


yeah making things more responsive at lower overhead is an application

for instance, haltalk could watch an eventfd just fine, and trigger a scan immediately if signalled

now any userland comp, including UI's could signal this fd, and updates would be pretty much instantaneous - no harm done if not, just a tad slower

the same thing goes for hardware - just studying this ;)

ArcEye avatar Aug 03 '18 15:08 ArcEye

Comment by unseenlaser Mon Jun 13 10:37:03 2016


i'm in favour of one

On 13 June 2016 at 11:11, Michael Haberler [email protected] wrote:

yeah making things more responsive at lower overhead is an application

for instance, haltalk could watch an eventfd just fine, and trigger a scan if signalled

now any userland comp, including UI's could signal this fd, and updates would be pretty much instantaneous - no harm done if not, just a tad slower

the same thing goes for hardware - just studying this https://yurovsky.github.io/2014/10/10/linux-uio-gpio-interrupt/ ;)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/machinekit/machinekit/issues/963#issuecomment-225541204, or mute the thread https://github.com/notifications/unsubscribe/AMqXYQAyAM_Qt_iwXQt6-yxGVc8o6O-wks5qLSzagaJpZM4Iz2KT .

The information contained in this message is confidential and is intended for the addressee only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorised use, disclosure, copying or alteration of this message is strictly forbidden. This mail and any attachments have been scanned for viruses prior to leaving the RcTechnix network. RcTechnix will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on.

RcTechnix reserves the right to monitor and record e-mail messages being sent to and from this address for the purposes of investigating or detecting any unauthorised use of its system and ensuring effective operation.

(c) RcTechnix

ArcEye avatar Aug 03 '18 15:08 ArcEye