treebender icon indicating copy to clipboard operation
treebender copied to clipboard

Morphology support

Open vgel opened this issue 5 years ago • 1 comments

Right now terminal tokens have to be separate words. Treebender should be able to support morphological rules:

V[ stem: t ] -> walk
V[ stem: t ] -> talk
// stem: f to block walkedededededededed...
V[ tense: past, stem: f ] -> V[ stem: t ] ++ ed  // syntax TBD

Questions:

  • What scope do we want here? Are we only supporting basic concatenative morphology (prefixes and suffixes), or will we try and support allomorphy, sound changes / ablaut, semitic roots...
    • It's tempting to say we just focus on English and support concatenative and allow the user to fall back with a flag:
        V[ can_inflect: y ] -> walk
        V[ can_inflect: n ] -> buy
        V[ tense: past, can_inflect: n ] -> V[ can_inflect: y ] ++ ed
        V[ tense: past, can_inflect: n ] -> bought
    + However, lots of common words in English have changes like bake ~ baked not *bakeed. There's no real way to support that without some more sophisticated tool or tons of duplicate rules.
    
    

Todo:

  • Remind myself of how the LKB does this

vgel avatar Oct 19 '20 23:10 vgel

One way to approach this would actually be to just allow grammar files to define a token-splitting process that runs before parsing.

Something like:

$splitters = [
    /(.+)ed/ => [\1, -ed]
    /(.+)d/  =>  [\1, -ed] // for words like "baked"
    /(.+)s/  => [\1, -s]
    /(.+)es/ => [\1, -s]
]

Then all possible splitters would match on a word, plus an implicit "no expansion" splitter, and split a sentence into a bunch of possible morphological derivations:

"The dogs walked to the beach and baked" "The dogs walk -ed to the beach and baked" "The dogs walke -ed to the beach and baked" "The dog -s walked to the beach and baked" "The dog -s walk -ed to the beach and baked" "The dog -s walke -ed to the beach and baked" "The dogs walked to the beach and bak -ed" "The dogs walk -ed to the beach and bak -ed" "The dogs walke -ed to the beach and bak -ed" "The dog -s walked to the beach and bak -ed" "The dog -s walk -ed to the beach and bak -ed" "The dog -s walke -ed to the beach and bak -ed" "The dogs walked to the beach and bake -ed" "The dogs walk -ed to the beach and bake -ed" "The dogs walke -ed to the beach and bake -ed" "The dog -s walked to the beach and bake -ed" ==> "The dog -s walk -ed to the beach and bake -ed" "The dog -s walke -ed to the beach and bake -ed"

Obviously this has the potential to blow up, but we could also fail fast if a splitter generates a token that doesn't match any nonterminals in the grammar.

vgel avatar Oct 20 '20 00:10 vgel