Add data for building names with/without apartments
Overview
This branch makes the model a bit better at identifying building names.
- Closes #391
Demo
- input:
- 6906 Donachie Rd TowsonTown Place Apartments, Apt 1203 Baltimore, MD 21239
- output:
- ('6906', 'AddressNumber'), ('Donachie', 'StreetName'), ('Rd', 'StreetNamePostType'), ('TowsonTown', 'BuildingName'), ('Place', 'BuildingName'), ('Apartments,', 'BuildingName'), ('Apt', 'OccupancyType'), ('1203', 'OccupancyIdentifier'), ('Baltimore,', 'PlaceName'), ('MD', 'StateName'), ('21239', 'ZipCode')
Notes
This has a good bit of training data because this was tricky to get right while still passing all the regression test addresses we have. And even then it's still imperfect - there's some really ambiguous apartment names out there. But it seems to do well when there's some indication that it's looking at a building name, like having a "The" at the beginning.
Testing Instructions
- Confirm that the test suite passes
- Pull down this branch and install this version to locally spot check some addresses
- After opening a venv, install and train model with:
pip install -e ".[dev]" -v - Example address:
5136 Oaklawn Rd Gwynnbrook Townhomes, Unit CA4810 Baltimore, MD 21207
- After opening a venv, install and train model with:
Hi, just a question, would this PR fix my issue here? Seems to be mentioning the same error according to the mentioned issue that it'll close. Thank you.
Hi @gelodefaultbrain! Unfortunately, this pr is resolving a different issue than the one you've linked. We're accounting for better parsing on building names specifically here, while your issue seems to be more about multiple occupancy identifiers in the same address