Performance Parity with find

Open akneni opened this issue 8 months ago • 1 comments

Firstly, I want to say I love this tool. Querying the filesystem with SQL feels infinitely more intuitive than looking up the syntax for the find command every time I use it.

However, it seems like find is 70-80% faster for equivalent commands than fselect. Is there an intention to achieve performance parity with find (or to surpass it entirely)? If so, is making it multi-threaded on the table (it seems like a task would benefit from parallelization)?

May 24 '25 14:05 akneni

I really like fselect too - I use it a lot.

find does seem to be very fast and I did a comparison:

$ diff --report-identical-files <(find ~ -name '*.rs' -printf '%M\t%s\t%p\n' | sort) <(fselect mode,size,path from ~ where name =~ '\.rs$' | sort)
Files /dev/fd/63 and /dev/fd/62 are identical

$ hyperfine "find ~ -name '*.rs' -printf '%M\t%s\t%p\n'" "fselect mode,size,path from ~ where name =~ '\.rs$'"
Benchmark 1: find ~ -name '*.rs' -printf '%M\t%s\t%p\n'
  Time (mean ± σ):      6.054 s ±  0.307 s    [User: 2.856 s, System: 3.123 s]
  Range (min … max):    5.879 s …  6.908 s    10 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: fselect mode,size,path from ~ where name =~ '\.rs$'
  Time (mean ± σ):      8.515 s ±  0.251 s    [User: 3.362 s, System: 5.041 s]
  Range (min … max):    8.340 s …  9.145 s    10 runs
 
Summary
  find ~ -name '*.rs' -printf '%M\t%s\t%p\n' ran
    1.41 ± 0.08 times faster than fselect mode,size,path from ~ where name =~ '\.rs$'

And then another comparison where I think they're both doing essentially the same thing:

$ diff --report-identical-files <(find ~ -regex '.*\.rs$' -printf '%M\t%s\t%p\n' | sort) <(fselect mode,size,path from ~ where name =~ '\.rs$' | sort)
Files /dev/fd/63 and /dev/fd/62 are identical

$ hyperfine "find ~ -regex '.*\.rs$' -printf '%M\t%s\t%p\n'" "fselect mode,size,path from ~ where path =~ '\.rs$'"
Benchmark 1: find ~ -regex '.*\.rs$' -printf '%M\t%s\t%p\n'
  Time (mean ± σ):      8.841 s ±  0.318 s    [User: 5.567 s, System: 3.161 s]
  Range (min … max):    8.617 s …  9.559 s    10 runs
 
Benchmark 2: fselect mode,size,path from ~ where path =~ '\.rs$'
  Time (mean ± σ):      8.639 s ±  0.058 s    [User: 3.515 s, System: 5.018 s]
  Range (min … max):    8.573 s …  8.746 s    10 runs
 
Summary
  fselect mode,size,path from ~ where path =~ '\.rs$' ran
    1.02 ± 0.04 times faster than find ~ -regex '.*\.rs$' -printf '%M\t%s\t%p\n'

Here's another comparison that I was hoping fselect and find would be about the same:

$ diff --report-identical-files <(find ~ -name '*.rs' -printf '%M\t%s\t%p\n' | sort) <(fselect mode,size,path from ~ where ext === 'rs' | sort)
Files /dev/fd/63 and /dev/fd/62 are identical

$ hyperfine "find ~ -name '*.rs' -printf '%M\t%s\t%p\n'" "fselect mode,size,path from ~ where ext == 'rs'"
Benchmark 1: find ~ -name '*.rs' -printf '%M\t%s\t%p\n'
  Time (mean ± σ):      5.975 s ±  0.043 s    [User: 2.786 s, System: 3.119 s]
  Range (min … max):    5.933 s …  6.075 s    10 runs
 
Benchmark 2: fselect mode,size,path from ~ where ext == 'rs'
  Time (mean ± σ):      8.538 s ±  0.365 s    [User: 3.350 s, System: 5.077 s]
  Range (min … max):    8.324 s …  9.566 s    10 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  find ~ -name '*.rs' -printf '%M\t%s\t%p\n' ran
    1.43 ± 0.06 times faster than fselect mode,size,path from ~ where ext == 'rs'

find seems to consistently use more efficient or fewer system calls.

May 24 '25 17:05 rickhg12hs