state abbreviations are not globally unique
Expected Behavior
All countries states are not mixed up
Actual Behavior
Some countries states are mixed up (For instance 82 for state_abbr)
Steps to reproduce the problem
Run without specifying a country
cut -d' ' -f4,5 allCountries.txt | grep 82 | cut -d' ' -f1 | sort | uniq
insert
-d' '
with CTRL+V TAB
These adm1 share 82 as abbr: (As example) Çankiri Khulna Division Královéhradecký kraj Phang Nga Pomerania Region Midtjylland
- Current Version: 1.0.2
- Operating System: FreeBSD
A possible workaround Instead of https://github.com/midwire/free_zipcode_data/blob/292a336047f2c3f313480d4c9ff5dab5164567fd/lib/free_zipcode_data/db_table.rb#L48
def get_state_id(state_abbr, state_name)
sql = "SELECT id FROM states
WHERE abbr = '#{state_abbr}' and name = '#{escape_single_quotes(state_name)}'"
res = select_first(sql)
if(res == nil)
sql = "SELECT id FROM states WHERE name = '#{escape_single_quotes(state_name)}'"
res=select_first(sql)
end
return res
end
The fallback to test only for name is required because of mixed up state_abbr in Denmark's data (One State (Region) has two abbr's for the same state name i.e the state has one name but falsely in the dataset two abbr's, (More of data a date quality issue) and perhaps other countries as well. EDIT: Perhaps using country_id in the state_id lookup would be better... BTW South Africa does not have a adm1 (state) so I used country for state like this
def write(row)
if(row[:short_state] == nil || row[:short_state] == '')
row[:short_state] = row[:country]
row[:state] = country_lookup_table[row[:country]][:name]
end
return nil unless row[:short_state]
row[:state] = 'Marshall Islands' if row[:short_state] == 'MH' && row[:state].nil?
country_id = get_country_id(row[:country])
sql = <<-SQL
INSERT INTO states (abbr, name, country_id)
VALUES ('#{row[:short_state]}',
'#{escape_single_quotes(row[:state])}',
#{country_id}
)
SQL
begin
database.execute(sql)
rescue SQLite3::ConstraintException
# Swallow duplicates
end
Thanks for this. A pull request would be most appreciated.