innatis
innatis copied to clipboard
Moredcai Geoparsing Entity Extractor
Mordecai is a full text geoparsing Python library. Extract the place names from a piece of text, resolve them to the correct place, and return their coordinates and structured geographic information.
>>> from mordecai import Geoparser
>>> geo = Geoparser()
>>> geo.geoparse("I traveled from Oxford to Ottawa.")
[{'country_conf': 0.96474487,
'country_predicted': 'GBR',
'geo': {'admin1': 'England',
'country_code3': 'GBR',
'feature_class': 'P',
'feature_code': 'PPLA2',
'geonameid': '2640729',
'lat': '51.75222',
'lon': '-1.25596',
'place_name': 'Oxford'},
'spans': [{'end': 22, 'start': 16}],
'word': 'Oxford'},
{'country_conf': 0.83302397,
'country_predicted': 'CAN',
'geo': {'admin1': 'Ontario',
'country_code3': 'CAN',
'feature_class': 'P',
'feature_code': 'PPLC',
'geonameid': '6094817',
'lat': '45.41117',
'lon': '-75.69812',
'place_name': 'Ottawa'},
'spans': [{'end': 32, 'start': 26}],
'word': 'Ottawa'}]
So this would be like the Duckling built-in entity extractor, but for places instead of time.
It will be heavy, because Mordecai requires a running Elasticsearch service with Geonames in it.
docker pull elasticsearch:5.5.2
wget https://s3.amazonaws.com/ahalterman-geo/geonames_index.tar.gz --output-file=wget_log.txt
tar -xzf geonames_index.tar.gz
docker run -d -p 127.0.0.1:9200:9200 -v $(pwd)/geonames_index/:/usr/share/elasticsearch/data elasticsearch:5.5.2