[RFE] enable file logging sink that occasionally checks if original inode/device matches the current per-path one
Working on something entirely unrelated, I came across something that has a potential to affect pacemaker+corosync stack (apparently using libqb which can possibly offer the solution) wrt. logging when logrotate program installed and scheduled to run regularly (common case with the former precondition).
The problem is that logrotate.d configuration shipped with corosync:
https://github.com/corosync/corosync/blob/v2.4.2/conf/logrotate/corosync.in#L4
contains copytruncate keyword, which asks for troubles per logrotate(8):
Note that there is a very small time slice between copying the file and truncating it, so some logging data might be lost.
That's definitely undesired effect at any rate.
As that unrelated stuff that lead me here was Python based and related to logging, the solution was quite simple:
https://hg.python.org/cpython/file/3.6/Lib/logging/handlers.py#l416
i.e., using logging.handlers.WatchedFileHandler from standard library.
From there, immediate extrapolation onto libqb feature set can be seen, and hence this RFE.
The Python's implementation is pretty simple, though it's not very
efficient. I can imagine that making the check every X messages
or when the last message was accepted for emitting more than Y
seconds ago, would be a better choice, as it combines the advantages:
- nothing being lost (unless even more extreme circumstances), because at worst, the data are, in the temporary window, written still to the rotated log
- efficiency, because it doesn't
statthe target file every time, especially if there's a burst of messages to proceed
When this is done, it would be wise to have corosync drop copytruncate
directive from logrotate configuration, only relying on implicit rotating
and maybe implicit/explicit create.
When at it, Python offers handlers implementing also their own log rotation, based on:
- size,
RotatingFileHandler: https://hg.python.org/cpython/file/3.6/Lib/logging/handlers.py#l117 - time intervals,
TimedRotatingFileHandler: https://hg.python.org/cpython/file/3.6/Lib/logging/handlers.py#l193
So one of the options would be to mimic these as well/instead.
On the other hand, logrotate is highly configurable, with custom
hooks etc., and is something providing unified interface to log files
handling from administrator's POV.
When speaking of corosync, the risk of losing data in
very small time slice between copying the file and truncating it
is amplified by the fact that corosync + logging thread are running with
real-time priority under normal circumstances, whereas cron daemon
and, in turn, it's logrotate child are running with non-elevated
priority.
If I understand the thing properly, there are possibly two problems
with logrotate + copytruncate:
-
original file
Xis being copied toX'while another logrotation event is started doing the same procedure, possibly silently corruptingX'along -
original file
Xhas been copied toX', but beforeftruncateis called, new data has been written toXthat will get subsequently lost when the truncation kicks in
For 1., it seems solved thanks to O_EXCL to open.
But it doesn't look there's a reliable solution for 2. (there's no
call that would atomically combine file swap (already can be atomic
with renameat2 + RENAME_EXCHANGE) with some kind of freopen
like logic (reassociating existing file descriptors with the newly
swapped file in one go).
Note that pacemaker itself also uses copytruncate option:
https://github.com/ClusterLabs/pacemaker/blob/Pacemaker-1.1.16/extra/logrotate/pacemaker.in#L9
(affects pacemaker prior to switch to corosync's log file in the runtime, or pacemaker_remoted + perhaps more).
One of the things that were brought up in relation this was a question of a backward compatibility, because there's no hard relevance between say pacemaker and its logrotate file and the libqb that is indeed used (which may be short of the new functionality).
One of the possibilities to employ in the updated logrotate files is to
use firstscript option with custom command to check the libqb version
(through *.so version info) and if not high enough, to trigger logrotate
with an out-of-path original configuration (copytruncate) and
subsequently fail to denote the procedure should not continue.
For the new logrotate config itself and regarding what libqb could be
taught to do, it was suggested that the libqb user could opt-in to have
libqb listen for SIGHUP or the like and restart the logging stream
instead of what was originally proposed (i.e. check inode on some
log events actively). Problem is that if the libqb's user is also
dealing with other signals on its own (or via libqb's mainloop),
there may be some undesired interferences.
Another alternative would be to use inotify, but that's limited to
a single platform only.
A compromise could be that libqb will create a named pipe X.pipe for
the target log file X which then will be written to per the logrotate
configuration. This currently seems the best technical solution:
- named pipes should be available everywhere we desire
- custom command in logrotate config would be immediately clear on
whether the libqb supporting the signalling mechanism is in use
thanks to existing
X.pipefile (a latest -p), and thus supporting the sketched fallback scheme for older libqb in use (it could be viceversa: old -> new) - no sign of "bad polling" (i.e., checking whether inode has changes
over and over), just a "good polling" (
pollonX.pipe) - signal-safe
For the new logrotate config itself and regarding what libqb could be taught to do, it was suggested that the libqb user could opt-in to have libqb listen for SIGHUP or the like and restart the logging stream
To clarify, I was thinking that libqb would simply provide a "reopen log files" function, and the library user could call that from its own signal handler for some available signal (e.g. SIGHUP, SIGUSR1 or SIGUSR2).
I don't quite understand the pipe idea. What would write to the pipe, and what would read from it?
Yes, I don't like the pipe idea. libraries have no business providing that sort of unexpected interface. If logs need rotating then i's the application's responsibility to say when and how it's triggered.
On 08/12/16 08:14 -0800, Ken Gaillot wrote:
For the new logrotate config itself and regarding what libqb could be taught to do, it was suggested that the libqb user could opt-in to have libqb listen for SIGHUP or the like and restart the logging stream
To clarify, I was thinking that libqb would simply provide a "reopen log files" function, and the library user could call that from its own signal handler for some available signal (e.g. SIGHUP, SIGUSR1 or SIGUSR2).
Yes, that would be possible, but this is pretty fragile approach:
-
libqb's user would need to be changed and recompiled with new libqb
-
libqb's user would need to be extremely cautious about cases like sigsuspend blocking all the signals but one, etc., opening whole new class of signal handling issues
-
signal is asynchronous, so what if it's delivered just before libqb's writing procedure proceeds to write to the file that has just been rotated?
-
how is the logrotate script supposed to find out what process it should kill if there are more?
-
how is the logrotate script supposed to tell the signal handler has already been installed at the libqb's user at early startup?
-
extra complexity at the side of the libqb's user
I don't quite understand the pipe idea. What would write to the pipe, and what would read from it?
Easy; user's command from logrotate script would echo to it, libqb would read it to know the stream should be reopened.
For extra sanity, there could be a simple protocol:
- logrotate script: prerotate: indicates "be careful" to libqb
- libqb starts checking the inode/idev of the target file upon each message, reopen the stream if has changed
- logrotate: performs the rotation
- libqb switches over
- logrotate: postrotate: indicates "done" to libqb, which is for cases that there was some unexpected failure in logrotate so that libqb doesn't need to be checking inode/idev forever
On 08/12/16 08:20 -0800, Chrissie Caulfield wrote:
Yes, I don't like the pipe idea. libraries have no business providing that sort of unexpected interface. If logs need rotating then i's the application's responsibility to say when and how it's triggered.
Interpolated, libraries have no business in providing any interface, when application needs to do anything, it is its own responsibility :-P
But seriously, libraries are there to encapsulate shared functionality, and the side issue of logrotating is also shared, so why not to tackle the problem at one authoritative place when feasible enough? Why do you think WatchedFileHandler was added to Python's standard library?
-- Jan (Poki)
On 08/12/16 18:57 +0100, Jan Pokorný wrote:
On 08/12/16 08:14 -0800, Ken Gaillot wrote:
For the new logrotate config itself and regarding what libqb could be taught to do, it was suggested that the libqb user could opt-in to have libqb listen for SIGHUP or the like and restart the logging stream
To clarify, I was thinking that libqb would simply provide a "reopen log files" function, and the library user could call that from its own signal handler for some available signal (e.g. SIGHUP, SIGUSR1 or SIGUSR2).
Yes, that would be possible, but this is pretty fragile approach:
libqb's user would need to be changed and recompiled with new libqb
libqb's user would need to be extremely cautious about cases like sigsuspend blocking all the signals but one, etc., opening whole new class of signal handling issues
signal is asynchronous, so what if it's delivered just before libqb's writing procedure proceeds to write to the file that has just been rotated?
how is the logrotate script supposed to find out what process it should kill if there are more?
- remember that there is no 1:1 mapping between the process and the logfiles in question, how would it be decided which one to send signal to if there are more per file (e.g. there are multiple pacemaker daemons possibly sharing /var/log/pacemaker.log)?
What I propose follow 1:1 mapping principle (file - named pipe).
how is the logrotate script supposed to tell the signal handler has already been installed at the libqb's user at early startup?
extra complexity at the side of the libqb's user
Note that the proposed change would be a behind-the-curtain one, actually not imposing any direct change at the library user's side. If required, it could either write to X.pipe on its own as a means to control this feature, or there could be additional API (but the point is it is not strictly required).
-- Poki
My 2cents. Looks like a extremely overengineered solution. Ability to reopen log seems like a viable feature and it's just good enough solution.
On 12/12/16 00:22 -0800, Jan Friesse wrote:
My 2cents. Looks like a extremely overengineered solution. Ability to reopen log seems like a viable feature and it's just good enough solution.
Yes, that would be a subset of what libqb would offer.
In terms of random frobnical package that just want hassle-free logging facility, i.e., push messages through, don't care about the details, if would have to be frobnical that would get overengineered just to control the underlying stream.
Logrotate and the like programs were also "overengineered" to allow to be conveniently offloaded with some funtionality that could otherwise other programs/daemons handle on their own. But maintenance burden many places vs. just a single point was, I believe, the driving factor here. That's also why I don't propose to perform "logrotating" directly in libqb.
-- Jan (Poki)
I don't know why, I've missed there was a pre-existing issue #142.