dr-scripts icon indicating copy to clipboard operation
dr-scripts copied to clipboard

[script] [common] major refactor of identify noun regex

Open KatoakDR opened this issue 4 years ago • 3 comments

🚧 Work in progress 🚧

Background

  • Increasingly, I'm needing to detect the "noun" of an item as it's parsed from LOOK, RUMMAGE, game output, inventory list, etc and maintaining the current regex is not scalable
  • In collaboration with @asechrest and @MahtraDR, we have been working on new heuristics and modularizing the regex to reliably identify the noun
  • It's a daunting task, but one I enjoy working on, so here we are!

Changes

  • Rework the regex in remove_flavor_text method to more reliably identify nouns of items
  • Evolution of the proposed regular expression and our tests are at https://regex101.com/r/4lGY6u/latest
    • The items in the "Test String" field in the bottom half of the site are items that don't parse yet
    • In the left side-bar, there's a "Unit Tests" tab (can take 10-20 minutes for them to complete) that shows current progress towards 100% compatibility
  • We have over 1,700 tests on regex101, and we continually add more
  • Data is sourced from our personal vaults, trader tables, and elanthipedia

KatoakDR avatar Jan 17 '22 00:01 KatoakDR

Huge kudos to @KatoakDR . This was a significant undertaking representing many hours of work, including work setting up unit tests at regex101.com (and he even got the dev of that project to do some updates that helped us out).

The resulting regex is not only significantly shorter, but is far more approachable to future participants. I believe that this heuristic approach is about the best we can do without implementing a full natural language module, which might be more robust but would come with its own challenges, especially in application to DR.

asechrest avatar Jan 17 '22 16:01 asechrest

We have over 1,700 tests on regex101, and we continually add more

I'll take your word for testability! :)

The code looks a lot more readable! Let me know when it's ready to be merged.

rpherbig avatar Jan 17 '22 21:01 rpherbig

@KatoakDR

What are your thoughts on this? You put so much work into this. It's far better that what exists now at least from a maintainability perspective. Do you think this is ready to push, and clean up some of the hiccups afterward?

asechrest avatar May 31 '22 00:05 asechrest

Closing as abandoned.

MahtraDR avatar Feb 24 '23 21:02 MahtraDR