How lnav deals with "Out-Of-Time-Order" messages ("Received Time" vs "Actual Time")
This is rather a question more than a bug, as it seems to be the way it works by design, but I don't get it.
Have files to process which are not log as such, but lists of files with a timestamp. I am handling these lists with lnav with a custom format using the timestamp (nanoseconds from epoch) as the timestamp, but being lists of files, there is no guarantee all entries in the file will be sorted in time order (which is the expectation for logs, in the general case).
If you are wondering about why the use case, thanks to lnav flexibility I can load up all the files' numerical metrics of interest in SQLite and mine the data from there, filter IN / OUT as necessary, etc. but I fully understand it is not a use case the program is designed for.
Problem comes when handling what the UI itself calls "Out-Of-Time-Order Message" : entries in the input file for which the timestamps are earlier than the timestamp for the previous entry. For example, I have special files with (created) epoch set to "2066-10-24T23:45:52" appearing in random places in the list of files.
On the other hand, the vast majority of the files in the list have more reasonable created epoch (such as for example 2025-03-20T08:58:33), but what the lnav UI shows for a file with an out of order timestamp after one with one in the distant future is the same "Received Time" as for the previous message, even if the "Actual Time" (which can be shown through "p"), is correct.
So I have a file with the correct timestamp in the distant future :
Received Time: 2066-11-02T18:39:50.000 — in the future Format: %6
Then the following message, with a timestamp in 2025, is shown in the details as this :
Out-Of-Time-Order Message -- Received Time: 2066-11-02T18:39:50.000 — in the future; Actual Time: 2025-03-20T08:58:33.705; Diff: 15202d09h41m16s Format: %6
It seems obvious the UI is doing as designed, probably sort of trying to make consecutive lines in the input file to have consecutive timestamps, despite de decoded timestamp value as per the format definition. And this is good I think for log files as such, but for this weird use case it does not look like it.
What's interesting is when consulting the "logline" table, the "log_actual_time" does not match the Actual Time" I can see in the UI when showing each log line details. For this sample file :
/full/path/to/file/file1: mtime: 1742733850073511887 fileid: 16814250 size: 772 type: 9 seg_bytes: 2864 seg_count: 4 redun_seg_count: 0 (0%) pre_lc_size: 2864 post_lc_size: 2712 (95%) mode: 02000100640 start_offset: 0 end_offset: 18446744073709551615
/full/path/to/file/file2: mtime: 3055182352000000000 fileid: 1328 size: 344 type: 9 seg_bytes: 1412 seg_count: 3 redun_seg_count: 0 (0%) pre_lc_size: 1412 post_lc_size: 1339 (95%) mode: 02000100640 start_offset: 0 end_offset: 18446744073709551615
/full/path/to/file/file3: mtime: 3055945190000000000 fileid: 3792 size: 372 type: 9 seg_bytes: 1440 seg_count: 3 redun_seg_count: 0 (0%) pre_lc_size: 1440 post_lc_size: 1219 (85%) mode: 02000100640 start_offset: 0 end_offset: 18446744073709551615
/full/path/to/file/file5: mtime: 1742457513705146153 fileid: 16783289 size: 528 type: 9 seg_bytes: 2620 seg_count: 4 redun_seg_count: 0 (0%) pre_lc_size: 2620 post_lc_size: 2488 (95%) mode: 02000100440 start_offset: 0 end_offset: 18446744073709551615
/full/path/to/file/file6: mtime: 1740992372604264119 fileid: 16575024 size: 912 type: 9 seg_bytes: 3004 seg_count: 4 redun_seg_count: 0 (0%) pre_lc_size: 3004 post_lc_size: 2776 (92%) mode: 02000100640 start_offset: 0 end_offset: 18446744073709551615
What the SQL interface tells me is :
;SELECT log_time,log_actual_time,path,fileid FROM logline
log_time log_actual_time path fileid │
2025-03-23 13:44:10.073│2025-03-23 13:44:10.073│/full/path/to/file/file1│ 16814250 │
2066-10-24 23:45:52.000│2066-10-24 23:45:52.000│/full/path/to/file/file2│ 1328 │
2066-11-02 18:39:50.000│2066-11-02 18:39:50.000│/full/path/to/file/file3│ 3792 │
2066-11-02 18:39:50.000│2066-11-02 18:39:50.000│/full/path/to/file/file5│ 16783289 │
2066-11-02 18:39:50.000│2066-11-02 18:39:50.000│/full/path/to/file/file6│ 16575024
But for the last two lines I would have expected "log_actual_time" to be, as the "Actual Time" reported in the UI :
Received Time: 2066-11-02T18:39:50.000 — in the future; Actual Time: 2025-03-20T08:58:33.705; Diff: 15202d09h41m16s Format: %6
Received Time: 2066-11-02T18:39:50.000 — in the future; Actual Time: 2025-03-03T09:59:32.604; Diff: 15219d08h40m17s Format: %6
As mentioned, I am fully aware I am not using the program from its intended use and adapting the "received" time to make it consecutive may have a technical reason, but having an "actual time" which differs between the UI and the DB may not be that much expected.
Try setting ordered-by-time to false in the format definition.
Thanks, but it doesn't quite get me something I am too comfortable with. The "ordered-by-time": false option in the format specification does as described, however the performance impact seems to be too much.
For a start, when the format is set with "ordered-by-time": false, program slows to a crawl : things that are usually fast (like doing rather simple SQL queries) now run at a fraction of the speed (same query going from 2 seconds to taking 168 seconds instead) :
;SELECT log_time,log_actual_time,path FROM sfs_dump ORDER BY log_actual_time DESC LIMIT 20
This is for a file consisting of some 400,000+ lines with one file and associated information per line, according to the format shown in a previous comment.
As the description for the "ordered-by-time" format parameter advises of performance issues, I guess some slowdown is expected. I don't immediately see any changes in the schema for the "logline" table as a cause for the performance mismatch though.
Intrigued by this (and the fact simple string searches in the LOG view were also extremely slow) I kicked off "strace" against a running lnav with the 400,000+ lines with file data already loaded, and upon starting a simple text string search, lnav went busy for a few minutes (245 seconds to be exact) doing a humongous amount of "pread64" 4 MiB at a time (file being processed just 140 MiB big). "Humongous" here being more than 266,000 of those pread64 , and even if all reads are being serviced from cache, "lnav" has needed to process an astounding amount of around 1 TiB (266,000 times 4 MiB) for doing a simple string search on a 140 MiB file. That's a factor of x7000 overhead and probably something worth checking.
Same behavior with latest version of the code and 0.12.2 by the way.
So in summary, I am good with working around my non-expected use case by adding a "converter" in the format definition (which drops files with bogus mtime values in the future, as they tend to be very small and static), in fact I have this working now, and you can move on without worrying too much about it. That there is a performance / optimization opportunity here, I can't tell.
Other than the supposed inefficiency described in my earlier comment, I am good with the handling of "out of order" messages, although what I did in the end is through a "preprocessor" into my log file format, to squeeze out the blatant outliers for timestamp, and settle on "ordered-by-time: true", because I could not afford the huge performance penalty of setting "ordered-by-time: false".
You may close this bug report if the potential performance issue described is of no interest.