libpostal icon indicating copy to clipboard operation
libpostal copied to clipboard

Shire abbreviation in the UK

Open datamacgyver opened this issue 7 years ago • 1 comments

Hi!

I was checking out libpostal, and saw something that could be improved.


My country is

UK


Here's how I'm using libpostal

Currently just evaluating it as a tool we may want to use in a project


Here's what I did

Fed it the following (fake) address:

Home Farm Fen Lane Derbys DE5 2AO


Here's what I got

[('home farm', 'house'), ('fen lane derbys', 'road'), ('de5 2ao', 'postcode')]


Here's what I was expecting

[('home farm', 'house'), ('fen lane', 'road'), ('derbyshire', 'state_district'), ('de5 2ao', 'postcode')]


For parsing issues, please answer "yes" or "no" to all that apply.

  • Does the input address exist in OpenStreetMap? no
  • Do all the toponyms exist in OSM (city, state, region names, etc.)? no
  • If the address uses a rare/uncommon format, does changing the order of the fields yield the correct result? no
  • If the address does not contain city, region, etc., does adding those fields to the input improve the result? NA
  • If the address contains apartment/floor/sub-building information or uncommon formatting, does removing that help? Is there any minimum form of the address that gets the right parse? NA

Here's what I think could be improved

In the UK, many of our counties end in -shire (Derbyshire, Nottinghamshire, Yorkshire). This is owing to the fact that historically shires were a key administrative division. As it's quite common, popular parlance is to abbreviate many counties ending in shire to XXXs. For example Derbys (Derbyshire), Notts (Nottinghamshire), Yorks (Yorkshire). Detection of this would be important for any UK based user.

datamacgyver avatar Mar 05 '19 16:03 datamacgyver

I think #418 better covers this, as in your example you abbreviated Nottinghamshire to Notts, so it isn't just replacing shire -> s

syserr0r avatar May 12 '22 15:05 syserr0r