irregex icon indicating copy to clipboard operation
irregex copied to clipboard

Do slow bit-manipulation routines slow irregex materially?

Open gambiteer opened this issue 4 months ago • 2 comments

In a comment here:

https://racket.discourse.group/t/scramble-regexp-question/3319/10

I note that the bit-manipulation routines in irregex are written in terms of general arithmetic operations and are likely much slower than a native implementation might be.

I haven't gone through the irregex source to see how these routines are used, but I have a question: Might replacing these routines with native versions speed irregex noticeably? I'm asking because irregex has been shipped with Gambit for a while.

gambiteer avatar Oct 03 '25 16:10 gambiteer

IrRegex should be fast if it can compile the pattern to a DFA, and slow otherwise.

In the DFA case the inner loops are dfa-match/{shortest,longest}, and these don't use bitwise operations. Those are mostly used during DFA compilation. Tuning other operations in those loops may help though.

Benchmarking regexps is tricky. Highly tuned backtracking implementations tend to be fastest for a lot of cases, but fail miserably when you encounter exponential backtracking.

However some patterns force an exponential explosion of DFA states, forcing IrRegex to fall back to backtracking.

There is a test case in the SRFI 115 reference implementation (chibi regexp) which kills both backtracking and DFA implementations. It can only be matched with a Thomson-style non-backtracking NFA.

-- Alex

On Sat, Oct 4, 2025 at 1:11 AM Bradley Lucier @.***> wrote:

gambiteer created an issue (ashinn/irregex#48) https://github.com/ashinn/irregex/issues/48

In a comment here:

https://racket.discourse.group/t/scramble-regexp-question/3319/10

I note that the bit-manipulation routines in irregex are written in terms of general arithmetic operations and are likely much slower than a native implementation might be.

I haven't gone through the irregex source to see how these routines are used, but I have a question: Might replacing these routines with native versions speed irregex noticeably? I'm asking because irregex has been shipped with Gambit for a while.

— Reply to this email directly, view it on GitHub https://github.com/ashinn/irregex/issues/48, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHCIREMW7NXYBCH454K6DT3V2N4FAVCNFSM6AAAAACIHCEHL6VHI2DSMVQWIX3LMV43ASLTON2WKOZTGQ4DCNRWGQ2TONA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ashinn avatar Oct 04 '25 03:10 ashinn

At one point I thought of installing this regex benchmarking framework

https://github.com/BurntSushi/rebar

to see how irregex might compare with other engines and to see what might be done with hotspots after some instrumentation.

Something for the future.

gambiteer avatar Oct 04 '25 04:10 gambiteer