utils icon indicating copy to clipboard operation
utils copied to clipboard

RegEx is slow (can we make faster hex functions?)

Open davidmurdoch opened this issue 7 months ago • 2 comments

Noticed a recent attempt at speeding up some functions by switching to RegEx. But RegExes are usually slower than iterating over strings.

For example, this function:

https://github.com/MetaMask/utils/blob/1bea7f378c37117260912af1921556bc57110d20/src/hex.ts#L60

Is probably about 4-5x slower than something like this:

function isHexAddress(str) {
  if (typeof str !== "string" || str.length !== 42) return false;
  if (str[0] !== '0' || str[1] !== 'x') return false;
  for (let i = 2; i < 42; i++) {
    let code = str.charCodeAt(i);
    // numbers and lowercase a - f
    if (!((code >= 48 && code <= 57) || (code >= 97 && code <= 102))) {
      return false;
    }
  }
  return true;
}

(don't take my word for it. this should be tested further)

davidmurdoch avatar Jul 08 '25 22:07 davidmurdoch

I got into some further optimizations:

function isHexAddressCaseInsensitiveFast(str) {
  if (typeof str !== "string" || str.length !== 42) return false;
  if (str[0] !== '0' || str[1] !== 'x') return false;

  for (let i = 2; i < 42; i++) {
    // transform the character code for case-insensitivity
    const charCode = str.charCodeAt(i) & 0xDF;

    // after transformation, '0'-'9' are in the range 16-25
    const numberRange = (charCode - 16) | (25 - charCode);
    // after transformation, 'a'-'f' are in the range 65-70 ('A'-'F'), `A`-`F` remains unchanged
    const letterRange = (charCode - 65) | (70 - charCode);

    // identifies if the charCode is OUTSIDE of both valid ranges.
    if (((numberRange & letterRange) & 0x80000000) !== 0) {
      return false;
    }
  }

  return true;
}

for a 50% spread of 1000000 valid/invalid mix-cased strings the runtimes are:

isHexAddressCaseInsensitive: 238.701ms (uses `HEX_CHECKSUM_ADDRESS_REGEX`)
isHexAddressCaseInsensitiveFast: 50.609ms

Ok, I'm going to stop now. Use bitwise responsibly. :-)

davidmurdoch avatar Jul 08 '25 22:07 davidmurdoch

@davidmurdoch I thought your view was that these changes to the hex functions were in the end micro-optimizations? Do you think it's still worth it to optimize them further?

mcmire avatar Jul 09 '25 22:07 mcmire