usaddress icon indicating copy to clipboard operation
usaddress copied to clipboard

Add data for building names with/without apartments

Open xmedr opened this issue 8 months ago • 1 comments

Overview

This branch makes the model a bit better at identifying building names.

  • Closes #391

Demo

  • input:
    • 6906 Donachie Rd TowsonTown Place Apartments, Apt 1203 Baltimore, MD 21239
  • output:
    • ('6906', 'AddressNumber'), ('Donachie', 'StreetName'), ('Rd', 'StreetNamePostType'), ('TowsonTown', 'BuildingName'), ('Place', 'BuildingName'), ('Apartments,', 'BuildingName'), ('Apt', 'OccupancyType'), ('1203', 'OccupancyIdentifier'), ('Baltimore,', 'PlaceName'), ('MD', 'StateName'), ('21239', 'ZipCode')

Notes

This has a good bit of training data because this was tricky to get right while still passing all the regression test addresses we have. And even then it's still imperfect - there's some really ambiguous apartment names out there. But it seems to do well when there's some indication that it's looking at a building name, like having a "The" at the beginning.

Testing Instructions

  • Confirm that the test suite passes
  • Pull down this branch and install this version to locally spot check some addresses
    • After opening a venv, install and train model with: pip install -e ".[dev]" -v
    • Example address: 5136 Oaklawn Rd Gwynnbrook Townhomes, Unit CA4810 Baltimore, MD 21207

xmedr avatar Jun 11 '25 20:06 xmedr

Hi, just a question, would this PR fix my issue here? Seems to be mentioning the same error according to the mentioned issue that it'll close. Thank you.

gelodefaultbrain avatar Jun 18 '25 04:06 gelodefaultbrain

Hi @gelodefaultbrain! Unfortunately, this pr is resolving a different issue than the one you've linked. We're accounting for better parsing on building names specifically here, while your issue seems to be more about multiple occupancy identifiers in the same address

xmedr avatar Jul 10 '25 14:07 xmedr