Do slow bit-manipulation routines slow irregex materially?
In a comment here:
https://racket.discourse.group/t/scramble-regexp-question/3319/10
I note that the bit-manipulation routines in irregex are written in terms of general arithmetic operations and are likely much slower than a native implementation might be.
I haven't gone through the irregex source to see how these routines are used, but I have a question: Might replacing these routines with native versions speed irregex noticeably? I'm asking because irregex has been shipped with Gambit for a while.
IrRegex should be fast if it can compile the pattern to a DFA, and slow otherwise.
In the DFA case the inner loops are dfa-match/{shortest,longest}, and these don't use bitwise operations. Those are mostly used during DFA compilation. Tuning other operations in those loops may help though.
Benchmarking regexps is tricky. Highly tuned backtracking implementations tend to be fastest for a lot of cases, but fail miserably when you encounter exponential backtracking.
However some patterns force an exponential explosion of DFA states, forcing IrRegex to fall back to backtracking.
There is a test case in the SRFI 115 reference implementation (chibi regexp) which kills both backtracking and DFA implementations. It can only be matched with a Thomson-style non-backtracking NFA.
-- Alex
On Sat, Oct 4, 2025 at 1:11 AM Bradley Lucier @.***> wrote:
gambiteer created an issue (ashinn/irregex#48) https://github.com/ashinn/irregex/issues/48
In a comment here:
https://racket.discourse.group/t/scramble-regexp-question/3319/10
I note that the bit-manipulation routines in irregex are written in terms of general arithmetic operations and are likely much slower than a native implementation might be.
I haven't gone through the irregex source to see how these routines are used, but I have a question: Might replacing these routines with native versions speed irregex noticeably? I'm asking because irregex has been shipped with Gambit for a while.
— Reply to this email directly, view it on GitHub https://github.com/ashinn/irregex/issues/48, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHCIREMW7NXYBCH454K6DT3V2N4FAVCNFSM6AAAAACIHCEHL6VHI2DSMVQWIX3LMV43ASLTON2WKOZTGQ4DCNRWGQ2TONA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
At one point I thought of installing this regex benchmarking framework
https://github.com/BurntSushi/rebar
to see how irregex might compare with other engines and to see what might be done with hotspots after some instrumentation.
Something for the future.