mpeds icon indicating copy to clipboard operation
mpeds copied to clipboard

Assembling social movement organizations from Stanford tags

Open erleholgersen opened this issue 8 years ago • 1 comments

The Stanford NER tagger tags individual words as SMO or not. For example, Occupy Wall Street is returned as [('Occupy', 'ORGANIZATION'), ('Wall', 'ORGANIZATION'), ('Street', 'ORGANIZATION')].

To parse this into a single string I've made the assumption that all consecutive organization tags indicate the same SMO. Does this seem like a reasonably robust approach, or should we try to come up with something else?

It seems to work as long as punctuation is included as separate tokens (i.e. a list of SMOs is separated by non-organization tagged commas), but I probably haven't thought about all edge cases.

erleholgersen avatar Jun 12 '17 16:06 erleholgersen

Yep, that's right. That's how it works. I think it's generally robust but we ought to test it a bit with some odd edge cases.

On Mon, Jun 12, 2017 at 12:57 PM, Erle Holgersen [email protected] wrote:

The Stanford NER tagger tags individual words as SMO or not. For example, Occupy Wall Street is returned as [('Occupy', 'ORGANIZATION'), ('Wall', 'ORGANIZATION'), ('Street', 'ORGANIZATION')].

To parse this into a single string I've made the assumption that all consecutive organization tags indicate the same SMO. Does this seem like a reasonably robust approach, or should we try to come up with something else?

It seems to work as long as punctuation is included as separate tokens (i.e. a list of SMOs is separated by non-organization tagged commas), but I probably haven't thought about all edge cases.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/alexhanna/mpeds/issues/8, or mute the thread https://github.com/notifications/unsubscribe-auth/AAwvDU3nfYHNiwmn7tAp8-yEFjrvBw4Dks5sDW3ogaJpZM4N3ZD2 .

-- Alex Hanna alex-hanna.com @alexhanna

alexhanna avatar Jun 12 '17 16:06 alexhanna