Performance Parity with find
Firstly, I want to say I love this tool. Querying the filesystem with SQL feels infinitely more intuitive than looking up the syntax for the find command every time I use it.
However, it seems like find is 70-80% faster for equivalent commands than fselect. Is there an intention to achieve performance parity with find (or to surpass it entirely)? If so, is making it multi-threaded on the table (it seems like a task would benefit from parallelization)?
I really like fselect too - I use it a lot.
find does seem to be very fast and I did a comparison:
$ diff --report-identical-files <(find ~ -name '*.rs' -printf '%M\t%s\t%p\n' | sort) <(fselect mode,size,path from ~ where name =~ '\.rs$' | sort)
Files /dev/fd/63 and /dev/fd/62 are identical
$ hyperfine "find ~ -name '*.rs' -printf '%M\t%s\t%p\n'" "fselect mode,size,path from ~ where name =~ '\.rs$'"
Benchmark 1: find ~ -name '*.rs' -printf '%M\t%s\t%p\n'
Time (mean ± σ): 6.054 s ± 0.307 s [User: 2.856 s, System: 3.123 s]
Range (min … max): 5.879 s … 6.908 s 10 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 2: fselect mode,size,path from ~ where name =~ '\.rs$'
Time (mean ± σ): 8.515 s ± 0.251 s [User: 3.362 s, System: 5.041 s]
Range (min … max): 8.340 s … 9.145 s 10 runs
Summary
find ~ -name '*.rs' -printf '%M\t%s\t%p\n' ran
1.41 ± 0.08 times faster than fselect mode,size,path from ~ where name =~ '\.rs$'
And then another comparison where I think they're both doing essentially the same thing:
$ diff --report-identical-files <(find ~ -regex '.*\.rs$' -printf '%M\t%s\t%p\n' | sort) <(fselect mode,size,path from ~ where name =~ '\.rs$' | sort)
Files /dev/fd/63 and /dev/fd/62 are identical
$ hyperfine "find ~ -regex '.*\.rs$' -printf '%M\t%s\t%p\n'" "fselect mode,size,path from ~ where path =~ '\.rs$'"
Benchmark 1: find ~ -regex '.*\.rs$' -printf '%M\t%s\t%p\n'
Time (mean ± σ): 8.841 s ± 0.318 s [User: 5.567 s, System: 3.161 s]
Range (min … max): 8.617 s … 9.559 s 10 runs
Benchmark 2: fselect mode,size,path from ~ where path =~ '\.rs$'
Time (mean ± σ): 8.639 s ± 0.058 s [User: 3.515 s, System: 5.018 s]
Range (min … max): 8.573 s … 8.746 s 10 runs
Summary
fselect mode,size,path from ~ where path =~ '\.rs$' ran
1.02 ± 0.04 times faster than find ~ -regex '.*\.rs$' -printf '%M\t%s\t%p\n'
Here's another comparison that I was hoping fselect and find would be about the same:
$ diff --report-identical-files <(find ~ -name '*.rs' -printf '%M\t%s\t%p\n' | sort) <(fselect mode,size,path from ~ where ext === 'rs' | sort)
Files /dev/fd/63 and /dev/fd/62 are identical
$ hyperfine "find ~ -name '*.rs' -printf '%M\t%s\t%p\n'" "fselect mode,size,path from ~ where ext == 'rs'"
Benchmark 1: find ~ -name '*.rs' -printf '%M\t%s\t%p\n'
Time (mean ± σ): 5.975 s ± 0.043 s [User: 2.786 s, System: 3.119 s]
Range (min … max): 5.933 s … 6.075 s 10 runs
Benchmark 2: fselect mode,size,path from ~ where ext == 'rs'
Time (mean ± σ): 8.538 s ± 0.365 s [User: 3.350 s, System: 5.077 s]
Range (min … max): 8.324 s … 9.566 s 10 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Summary
find ~ -name '*.rs' -printf '%M\t%s\t%p\n' ran
1.43 ± 0.06 times faster than fselect mode,size,path from ~ where ext == 'rs'
find seems to consistently use more efficient or fewer system calls.