[script] [common] major refactor of identify noun regex
🚧 Work in progress 🚧
Background
- Increasingly, I'm needing to detect the "noun" of an item as it's parsed from LOOK, RUMMAGE, game output, inventory list, etc and maintaining the current regex is not scalable
- In collaboration with @asechrest and @MahtraDR, we have been working on new heuristics and modularizing the regex to reliably identify the noun
- It's a daunting task, but one I enjoy working on, so here we are!
Changes
- Rework the regex in
remove_flavor_textmethod to more reliably identify nouns of items - Evolution of the proposed regular expression and our tests are at https://regex101.com/r/4lGY6u/latest
- The items in the "Test String" field in the bottom half of the site are items that don't parse yet
- In the left side-bar, there's a "Unit Tests" tab (can take 10-20 minutes for them to complete) that shows current progress towards 100% compatibility
- We have over 1,700 tests on regex101, and we continually add more
- Data is sourced from our personal vaults, trader tables, and elanthipedia
Huge kudos to @KatoakDR . This was a significant undertaking representing many hours of work, including work setting up unit tests at regex101.com (and he even got the dev of that project to do some updates that helped us out).
The resulting regex is not only significantly shorter, but is far more approachable to future participants. I believe that this heuristic approach is about the best we can do without implementing a full natural language module, which might be more robust but would come with its own challenges, especially in application to DR.
We have over 1,700 tests on regex101, and we continually add more
I'll take your word for testability! :)
The code looks a lot more readable! Let me know when it's ready to be merged.
@KatoakDR
What are your thoughts on this? You put so much work into this. It's far better that what exists now at least from a maintainability perspective. Do you think this is ready to push, and clean up some of the hiccups afterward?
Closing as abandoned.