bricks icon indicating copy to clipboard operation
bricks copied to clipboard

[MODULE] - Noun chunker/ splitter

Open LeonardPuettmannKern opened this issue 2 years ago • 0 comments

Please describe the module you would like to add to bricks A brick that returns an embedding list containing only the nouns of a text, so that they can be used as pointers.

Do you already have an implementation?

ATTRIBUTE = "text" 

def noun_splitter(record):
    nouns_sents = []
    for sent in record[ATTRIBUTE].sents:
        nouns = [token.text for token in sent if token.pos_ == "NOUN" and len(token.text) > 1]
        if nouns:
            nouns_sents.extend([" ".join(nouns[i:i+1]) for i in range(0, len(nouns), 1)])
    return list(set(nouns_sents))

Additional context Can be implemented with SpaCy.

LeonardPuettmannKern avatar Oct 18 '23 12:10 LeonardPuettmannKern