usaddress icon indicating copy to clipboard operation
usaddress copied to clipboard

ERROR: Unable to tag this string because more than one area of the string has the same label

Open rsingh2083 opened this issue 8 years ago • 8 comments

While tagging this

usaddress.tag('Mr. Robbie Thomson,Cal. Hosp 2,Street 11, Block H,Jersey, New Jersey 121889,United States')

Im getting this error : -

---------------------------------------------------------------------------
RepeatedLabelError                        Traceback (most recent call last)
<ipython-input-41-410055ac0cac> in <module>()
----> 1 usaddress.tag('Mr. Robbie Thomson,Cal. Hosp 2,Street 11, Block H,Jersey, New Jersey 121889,United States')

C:\Users\Rahul\Anaconda2\lib\site-packages\usaddress\__init__.pyc in tag(address_string, tag_mapping)
    176         else:
    177             raise RepeatedLabelError(address_string, parse(address_string),
--> 178                                      label)
    179 
    180         last_label = label

RepeatedLabelError: 
ERROR: Unable to tag this string because more than one area of the string has the same label

ORIGINAL STRING:  Mr. Robbie Thomson,Cal. Hosp 2,Street 11, Block H,Jersey, New Jersey 121889,United States
PARSED TOKENS:    [(u'Mr.', 'Recipient'), (u'Robbie', 'Recipient'), (u'Thomson,', 'Recipient'), (u'Cal.', 'Recipient'), (u'Hosp', 'Recipient'), (u'2,', 'AddressNumber'), (u'Street', 'StreetNamePreType'), (u'11,', 'StreetName'), (u'Block', 'Recipient'), (u'H,', 'Recipient'), (u'Jersey,', 'Recipient'), (u'New', 'Recipient'), (u'Jersey', 'Recipient'), (u'121889,', 'AddressNumber'), (u'United', 'StreetName'), (u'States', 'StreetName')]
UNCERTAIN LABEL:  Recipient

When this error is raised, it's likely that either (1) the string is not a valid person/corporation name or (2) some tokens were labeled incorrectly

rsingh2083 avatar May 12 '17 06:05 rsingh2083

Hey @rsingh2083,

Thanks for filing this! That's a real doozy of an address. I haven't been able to figure out what it's referring to.

If you can confirm that this is a valid address pattern, we'd be happy to bring it in as training data. We'll need 4-5 more examples of the pattern to be able to train the model reliably.

jeancochrane avatar May 12 '17 14:05 jeancochrane

I get similar error for this address : 9234 N Loop 1604 W San Antonio TX 78249

gl-ronak avatar Aug 15 '17 16:08 gl-ronak

Hey @gl-ronak,

Can you tell me how you were expecting that address to be parsed? In particular, what does the second set of numerics (1604) refer to?

If you can find 3-4 more examples of this pattern, we'd be glad to bring it in as training data.

jeancochrane avatar Aug 15 '17 18:08 jeancochrane

I just experienced a similar issue.

usaddress.RepeatedLabelError: 
ERROR: Unable to tag this string because more than one area of the string has the same label

ORIGINAL STRING:  1407 7 Ave NW, Calgary, AB T2N 0Z3, Canada
PARSED TOKENS:    [('1407', 'AddressNumber'), ('7', 'StreetName'), ('Ave', 'StreetNamePostType'), ('NW,', 'StreetNamePostDirectional'), ('Calgary,', 'PlaceName'), ('AB', 'StateName'), ('T2N', 'OccupancyIdentifier'), ('0Z3,', 'OccupancyIdentifier'), ('Canada', 'PlaceName')]
UNCERTAIN LABEL:  PlaceName

I'm using this library to automate the parsing of data from Google Maps to input into a SF db of organizations we work with. I'm I think I see where the error occurred , Calgary,, however it is a Canadian address so that could be normal?

NoahCardoza avatar Sep 29 '20 23:09 NoahCardoza

@NoahCardoza I think in this case there are actually two things going on:

  1. The Canadian postal code format is pretty different from zip codes so the postal code is getting tagged as OccupancyIdentifier, which is probably throwing off the tagging of Canada and causing it to get tagged as a repeated PlaceName
  2. We don't support non-US countries, so there's no real valid tag for the Canada string anyway

I was able to get a slightly more sensible parse by removing Canada from the end of the string:

>>> usaddress.tag('1407 7 Ave NW, Calgary, AB T2N 0Z3')
(OrderedDict([('AddressNumber', '1407'), ('StreetName', '7'), ('StreetNamePostType', 'Ave'), ('StreetNamePostDirectional', 'NW'), ('PlaceName', 'Calgary'), ('StateName', 'AB T2N'), ('ZipCode', '0Z3')]), 'Street Address')

jeancochrane avatar Sep 30 '20 14:09 jeancochrane

Ah, that should probably be enough. We don't have many organizations in CA, however, what are your thoughts on https://github.com/datamade/usaddress/pull/254? I'm assuming you might not be merging it seeing as the name of this project is usaddress?

NoahCardoza avatar Oct 01 '20 18:10 NoahCardoza

I don't expect we'll support Canadian addresses in the near future, but if you'd like to support them you might try training your own model using the supplemental training data in #254.

jeancochrane avatar Oct 01 '20 18:10 jeancochrane

Hello,

I just encountered an error

ORIGINAL STRING: Bronx, New York City PARSED TOKENS: [('Bronx,', 'PlaceName'), ('New', 'StateName'), ('York', 'PlaceName'), ('City', 'PlaceName')] UNCERTAIN LABEL: PlaceName


Seems to be a valid place https://www.britannica.com/place/Bronx-borough-New-York-City

LeandroLosaria avatar Mar 10 '22 07:03 LeandroLosaria