Caching compiled `re` patterns
First of all, thank you for creating this Python port.
In my use, I'm checking a large number of short strings to see if they are likely to be a phone number. I found that I'm making a lot of calls to re.compile() (for example here). Caching the patterns leads to about a 6-10x speedup for my use case.
If this is something that would make sense given the contributing guidelines, I'd be happy to do a PR.
Hmm, I was under the impression that the Python re code cached the compiled expressions, so getting a speed-up from an external cache surprises me slightly. I guess you might be blowing the size of the (global) cache -- are you doing lookups across lots of different countries?
Contributions welcomed -- thanks. As an initial thought, one possibility might be to add ..._re fields to the phonemetadata.py classes for each field that's a regexp (e.g. national_number_pattern_re), which get populated on construction (as all the metadata objects are supposed to be immutable).
(However, we'd need to check how much effect precompiling 300 sets of metadata regexps would have on library startup...)
Thank you for your response. I might get a chance to look deeper into it in the next week or two.
The cache in re is limited to 100 in CPython 2.7 and 512 in CPython 3.6, so on Python 2.7 at least it's quite easy to blow past re._MAXCACHE
Any chance to add an option/flag to compile everything in advance so the package runtime will be faster? Thanks!