clipmenu icon indicating copy to clipboard operation
clipmenu copied to clipboard

Speed enhancement

Open nick87720z opened this issue 4 years ago • 4 comments

test-perf

  • turned data generator into persistant pipeline
  • main loop is rewritten in gawk (making ≈4x speed boost in total)

clipmenu

  • somehow sed outmatched cut in speed. Without awk in the end - it's near 3x or so, enough for visible difference with full pipeline.
  • option for intermediate cache buffers in the middle of pipeline (in this case - just regular sequence). Single buffer should be just enough to make alternative pipeline effective, yet reduces time for second menu call if clip list is unchanged since previous call. 2 buffers make it almost instant.
  • when buffering is enabled - use more effective sequence, using extra sort for deduplication instead of awk (sort works 7x slower with pipeline input, than with regular files).

clipmenud

  • great optimization for vacuuming process - processing only removed lines, yet in more effective way. Side effect - orphane files, not removed due to bug on previous runs, are untouched (need separate cleanup).
  • other small optimizations
  • TODO: related to #59 : could try more optimizations by offloading dirty job into high-perf interpreters (sed, awk), find more opportunity for process substitution.

Also clipmenu and clipctl are changed for posix shell compatibility, which also means higher speed if using dash (told to be fastest unix shell, in my earlier tests reached near 4x difference, though depends on usage).

Though doing same for daemon has little sence due to this bug.

nick87720z avatar Feb 03 '22 00:02 nick87720z

What do you think about such pipeline for clipmenu:

LC_ALL=C sort -ruk 2 < "$cache_file" | LC_ALL=C sort -rnk 1 | sed 's/^[^ ]* //'

When I tested it using test-perf (which was not my first way for perf test), it was 10 times slower, than variant, using awk for deduplication. What's interesting, all slowdown was due to second sort, getting complete mess from first sort (in terms or ordering), while original input is already sorted, just needs to be reversion (curious, that tac is even slightly slower than sort).

However, in previous attempt I used a bit smaller line_cache file - just took existing file, created by regular use rather than perf test, and appended to itself multiple times - 10 or 100 (into separate file, of course). There are my preserved results:

# Performance test:
#time for i in $(seq 1000); do { LC_ALL=C sort -rnk 1 | cut -d' ' -f2- | awk '!seen[$0]++'; } < ./_line_cache_test; done >/dev/null
#time for i in $(seq 1000); do { LC_ALL=C sort -ruk 2 | LC_ALL=C sort -rnk 1 | cut -d' ' -f2-; } < ./_line_cache_test; done >/dev/null
# Run with line_cache size (wc output): 9999  50904 530048 _line_cache_test
# Dry run (feeding empty string instead of file): 5.1 -> 4.632 
# qalc in:  10.189 / 19.589, 4.632 / 5.1, (10.189−4.632) / (19.589−5.1), 10.189−4.632, 19.589−5.1
# qalc out: [0.5201, 0.9082, 0.3835, 5.557, 14.49]
# I.e., overhead old to new multiplier - 0.9082 and for text processing - 0.3835.

Can't understand - could it be if second sort was just in better situation, when input cache has just same relatively small chunk, repearing 10 to 100 times, rather than everything truly random?

Anyway, dry run showes, this variant should be faster to start than that, with awk dedup in the end.

nick87720z avatar Feb 03 '22 09:02 nick87720z

3 more ways for awk dedup:

LC_ALL=C sort -rnk 1 < "$cache_file" | awk -v FS='^[^ ]* ' '!seen[$2]++ {print $2}'
LC_ALL=C sort -rnk 1 < "$cache_file" | sed 's/^[^ ]* //' | awk -v FPAT='^.*$' '!seen[$0]++'
LC_ALL=C sort -rnk 1 < "$cache_file" | sed 's/^[^ ]* //' | awk -v FS='\n' '!seen[$0]++'

Must be some difference, but it's just invisible. Tested with 20000 lines cache.

nick87720z avatar Feb 03 '22 11:02 nick87720z

The reason for variant with two sorts to be slow - sort is not so fast with stdin. May be solved by using temporary file, however it will eat space. At cache sizes, when it really matters, it could exceed free disk space - test file for 40000 lines ate 280 Mb, leaving no free space in XDG_RUNTIME_DIR, which for me is just 394 Mb.

This snippet has ≈7/5 times higher performance:

LC_ALL=C sort -ruk 2 < "$cache_file" > "$cache_file-a"
LC_ALL=C sort -rnk 1 < "$cache_file-a" | sed 's/^[^ ]* //'

nick87720z avatar Feb 03 '22 14:02 nick87720z

Hi! Sorry to be late getting back.

In general, we're not going to move away from bash with the convenience provided by bash arrays and other features, and readability is preferred over small performance improvements.

Maybe you can conslidate on a few changes which are particularly pertinent for review? As it is, these are a lot of changes, but performance hasn't been a large concern brought up about clipmenu in the past.

cdown avatar Aug 12 '22 18:08 cdown