A few request to make ":histogram" even more so useful
Is your feature request related to a problem? Please describe. When using "lnav" to process file data (sort of lists of files in a given export), I find myself looking into ranges of timestamps (in this case, file creation dates) which span several years. Using the "histogram" is an excellent tool to notice visually when a larger (or smaller) than usual amount of files were written in a given period of time, without having to do any ad-hoc statistical analysis.
For example, a location may have files the results of writing backups, and being able to spot anomalies (using the histogram to see how many files were written during given periods of time) comes out very handy.
The two limitations with the histogram I currently see are as follows:
- You can zoom in to seconds per sample, but you can't zoom out more than one week per sample
- It does not matter how much you zoom in or out, the date / time in the histogram does not include the year
- The lack of an option for ":histogram" to show entries for periods of time with zero matches
The first one limits the time frame which I can see in a single screen, to visually determine any anomalies. Let's say I have a 50-row terminal size, that would limit the timeframe I can look into to less than one year (not bad, but probably not a reason to "artificially" limit zooming out to just 7-day periods).
The second one makes it difficult for me to determine which year logs (which year files were written to) correspond to. For the common use case adding the four digits for the year may be a waste of space, but that's just 5 additional characters in a view (the histogram) which is very terse and has lots of real estate available, not risking valuable information for the common case being wrapped around the terminal.
The last one, not showing a sample for periods of time with zero samples is expected, but when using the histogram to look for anomalies or comparing periods of time, that we can see all sampling points be represented in the output, even if there are zero matches, would be a plus.
Describe the solution you'd like What I would love to see added to the histogram feature in some later "lnav" release is :
-
Being able to zoom out further. I'd say zooming out to months (first to last day of each month, ie Mar 1 through Mar 31) and quarters (first day of the quarter and last day of the quarter, ie Jan 1 through Mar 31) would be plenty, being able to zoom out to years (Jan 1 through Dec 31) would be good to have. Don't know of the performance implications of being able to zoom out though
-
The year (as a four digit value) to be added to the timestamps shown at the left in the histogram view
-
Having an option for ":histogram" to print samples for every sampling period in the current zoom level (ie for 24 hour periods, from midnight to midnight), even if the results are zero, so that the histogram does not have any "gaps" and one can visually review the timeline on the left without missing those periods when no matches were found
Describe alternatives you've considered For the inability to zoom out more than "lnav" currently allows, running SQL against the log format table is a possibility, and I can even script this, but taking advantage of the heavy lifting being done by ":histogram" is so much better and reliable.
For the year not being printed in the histogram output, toggling between LOG and HISTOGRAM using capital I (to keep date / times synchronized) works, but this is an additional step (and one which you may have to run repeatedly) when iteratively looking into the histogram for logs (file timestamps) which are both sides of a change of year, when the year being printed in the histogram would avoid the whole swiveling across views and the associated distractions.
Currently "lnav" does not return periods of time with zero matches in the ":histogram" view, so one has to be extremely careful when looking at the output to do some intuitive / visual statistical analysis, as it is easy for samples not being there for some hours / days / weeks, and being unable to quickly realize a pattern.
Additional context I think it's clear enough, but can provide screenshots if needed.
Another thing I don't get is if / why when zooming out to the largest current possible extent (7-day periods), "lnav" seems to lock on a Thursday (start of 7-day periods being a Thursday), or it is just my sample data happening to end with that result, but this is just a curiosity.
I agree. I don't spend much time in the histogram view, but these are definitely problems that should be addressed.
The lack of an option for ":histogram" to show entries for periods of time with zero matches
Would inserting a single blank line when there's a gap be good enough? Very old versions of lnav used to have a blank line for every time-step and it would take up so much space that it was a bit unhelpful.
Which "very old version" would that be (so I can download it and try)? I may see how things would look and realize that's not really helpful in the end.
In either case, I am going to test first the new zoom levels and the year being added, which may help improve my corner case a bit.
Would not want to push for a feature (which seemed to exist in the past but was not deemed useful), nor I want you to make the behavior a toggle (or a configuration option) if we can move on without making things complex.
Thanks for adding the year at the end of the time stamp in the histogram view, it makes things much clearer when looking into data which spans across years.
Also much appreciated the increased zoom out levels to month / year (and the indication the zoom level you are in at the bottom status bar), what I am not sure is when doing zoom out, if the beginning of the periods could be made more "consistent", I'll try to explain with an example:
At the 1-month zoom level probably the expectation is the periods to start on the first day of the month. I guess currently "lnav" calculates the period based on matches alone, but does not latch on the beginning of the months, so it results in fancy timelines like the one above. Same for the year (see below) and for any other period of time:
Again, weird, but not a deal breaker. And something (months / years) of a special for the histogram as for the more granular options (hours down to seconds) it does not make sense to latch on the beginning of an hour / minute, and for weeks, the start of week is not the same for everyone around.
So I'd say as of today the histogram is good for my purpose, bar maybe checking with some older release which had the blank lines added for every time-step to finally make my mind.
Oh, now I see in the source code that the different time-step above a week is an "approximation":
diff --git a/src/lnav_commands.cc b/src/lnav_commands.cc
index 1525eed4..d006bd27 100644
--- a/src/lnav_commands.cc
+++ b/src/lnav_commands.cc
@@ -107,6 +107,8 @@ constexpr std::chrono::microseconds ZOOM_LEVELS[] = {
8h,
24h,
7 * 24h,
+ 30 * 24h,
+ 365 * 24h,
};
In my opinion, no point in spending any time changing this.
I've pushed a change that adds a spacer row with some bullet points to represent the amount of time in the gap:
The partitioning not landing on the exact start/end of the months/years is a bit weird. I'll think about what might be involved, but I'm inclined to leave it alone for now.
Which "very old version" would that be (so I can download it and try)? I may see how things would look and realize that's not really helpful in the end.
I tried v0.7.2 and it had the old behavior. The rewrite of this code that removed the gaps was in a commit dated Dec 8, 2015.
Which "very old version" would that be (so I can download it and try)? I may see how things would look and realize that's not really helpful in the end.
I tried v0.7.2 and it had the old behavior. The rewrite of this code that removed the gaps was in a commit dated Dec 8, 2015.
That's a long time ago. I am more than fine with the recent changes and as you say that partitioning is not made to coincide on the exact start of a period is a minor thing to me as well.
Huge thank you again.
I tried the feature called "[hist] add empty rows" in the recent code (which I missed from your earlier comment).
In the common log analysis case, it does not get in the way and can easily highlight visually if a gap exists in the logs.
For the special file-lists processing I sometimes use the histogram for I will have to give it a further look with more input samples, as the ones I have immediately available are a bit sketchy (there is far from a continuum of files date/times), hence the added spacer rows are too frequent and a bit distracting).
Getting the spacer lines removed when I need to paste some part of the histogram somewhere else is easy enough as usually it's only a few lines I need to share to make a point, so I don't think it it worth the effort for making a toggle to show / hide the spacers on demand.
As I see it, this bug report may be closed at your earliest convenience, as all comments around the histogram have been addressed in the code.
Thanks a lot.
Maybe the view could be optionally condensed by using e.g. red lines instead of complete rows to indicate missing intervals? That way no screen estate is lost, but there is visual indication that something is missing
Maybe the view could be optionally condensed by using e.g. red lines instead of complete rows to indicate missing intervals? That way no screen estate is lost, but there is visual indication that something is missing
Thinking about your suggestion, opened up my test input file again and noticed in the LOG mode some lines are underscored, if I also enable the "elapsed time" feature (capital T) I can't 100% of the time match those underlines to slowdowns (red marking on the left), but some of them do...
An underline would be visually enough to note and not incur in the distraction of empty rows, which would make for post-processing necessary should you need to export the histogram somewhere else.
Not sure I want to drag @tstack into more changes here at this time in the release cycle for something as low importance as this though.
For the special file-lists processing I sometimes use the histogram for I will have to give it a further look with more input samples
Have you tried using a SQL/PRQL query to make the chart? The DB view will show a bar for columns that are graphable. You can use the timeslice() function to compute the chunk of time. For example, to group log messages into one hour chunks:
;SELECT timeslice(log_time_msecs, '1h') AS slice, count(*) FROM all_logs GROUP BY slice
I also added a stats.hist PRQL function:
;from all_logs | stats.hist log_level slice:'1h'
Have you tried using a SQL/PRQL query to make the chart? The DB view will show a bar for columns that are graphable. You can use the
timeslice()function to compute the chunk of time. For example, to group log messages into one hour chunks:;SELECT timeslice(log_time_msecs, '1h') AS slice, count(*) FROM all_logs GROUP BY slice
That's definitely cool! Going one step further I can get the type of information I want without any external post-processing, like :
;SELECT strftime('%Y-%m',timeslice(log_time_msecs, '1y')) AS slice, count(*) FROM location_report GROUP BY slice;
Which for the input data I am looking into produces the useful output below:
Noticed the documentation explains the meaning of the different amount of dots in the lines representing the time gaps, I feel all the enhancements asked for in this issue have been delivered.
Feel free to close the issue if you don't plan on delivering anything else related to this matter.