Assembling social movement organizations from Stanford tags
The Stanford NER tagger tags individual words as SMO or not. For example, Occupy Wall Street is returned as [('Occupy', 'ORGANIZATION'), ('Wall', 'ORGANIZATION'), ('Street', 'ORGANIZATION')].
To parse this into a single string I've made the assumption that all consecutive organization tags indicate the same SMO. Does this seem like a reasonably robust approach, or should we try to come up with something else?
It seems to work as long as punctuation is included as separate tokens (i.e. a list of SMOs is separated by non-organization tagged commas), but I probably haven't thought about all edge cases.
Yep, that's right. That's how it works. I think it's generally robust but we ought to test it a bit with some odd edge cases.
On Mon, Jun 12, 2017 at 12:57 PM, Erle Holgersen [email protected] wrote:
The Stanford NER tagger tags individual words as SMO or not. For example, Occupy Wall Street is returned as [('Occupy', 'ORGANIZATION'), ('Wall', 'ORGANIZATION'), ('Street', 'ORGANIZATION')].
To parse this into a single string I've made the assumption that all consecutive organization tags indicate the same SMO. Does this seem like a reasonably robust approach, or should we try to come up with something else?
It seems to work as long as punctuation is included as separate tokens (i.e. a list of SMOs is separated by non-organization tagged commas), but I probably haven't thought about all edge cases.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/alexhanna/mpeds/issues/8, or mute the thread https://github.com/notifications/unsubscribe-auth/AAwvDU3nfYHNiwmn7tAp8-yEFjrvBw4Dks5sDW3ogaJpZM4N3ZD2 .
-- Alex Hanna alex-hanna.com @alexhanna