elit AttributeError

when I ran my own data set to decode it gives an error, for tool in tools: shuf = tool.decode(docs)

AttributeError Traceback (most recent call last) in 1 for tool in tools: ----> 2 shuf = tool.decode(docs)

~/anaconda3/lib/python3.7/site-packages/elit/component/tagger/pos_tagger.py in decode(self, docs, **kwargs) 61 if isinstance(docs, Document): 62 docs = [docs] ---> 63 samples = NLPTaskDataFetcher.convert_elit_documents(docs) 64 with self.context: 65 sentences = self.tagger.predict(samples)

~/anaconda3/lib/python3.7/site-packages/elit/component/tagger/corpus.py in convert_elit_documents(docs) 1298 dataset = [] 1299 for d in docs: -> 1300 for s in d.sentences: 1301 sentence = Sentence() 1302

AttributeError: 'str' object has no attribute 'sentences

but it works fine for the example given is documentation! Please help me to figure this out.

Mar 09 '20 16:03 Jeevi10

Not sure what did you put in the docs, but it's supposed to be a str (can contain many sents then the tokenizer will split it into several sents).

Mar 09 '20 16:03 hankcs

Not sure what did you put in the docs, but it's supposed to be a str (can contain many sents then the tokenizer will split it into several sents).

this is one of the sample of my docs(R8 dataset)

'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter'

Mar 09 '20 16:03 Jeevi10

Did you put a tokenizer in your tools? I just ran the POS and it works fine.

from elit.component import NERFlairTagger
from elit.component.tokenizer import EnglishTokenizer
from elit.structure import Document

tagger = NERFlairTagger()
tagger.load()
components = [EnglishTokenizer(), tagger]
docs = 'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter'
for c in components:
    docs = c.decode(docs)
for d in docs:  # type: Document
    print(d)

{'sens': [{'tok': ['france', 'currency', 'intervention', 'debt', 'france', 'today', 'repaid', 'billion', 'francs', 'short', 'term', 'currency', 'intervention', 'debt', 'european', 'monetary', 'cooperation', 'fund', 'finance', 'ministry', 'said', 'said', 'debt', 'part', 'billion', 'franc', 'liability', 'incurred', 'swap', 'facilities', 'defend', 'franc', 'january', 'european', 'monetary', 'system', 'realignment', 'realignment', 'following', 'several', 'weeks', 'speculative', 'pressure', 'produced', 'three', 'pct', 'revaluation', 'west', 'german', 'mark', 'dutch', 'guilder', 'french', 'franc', 'two', 'pct', 'revaluation', 'belgian', 'franc', 'reuter'], 'off': [(0, 6), (7, 15), (16, 28), (29, 33), (34, 40), (41, 46), (47, 53), (54, 61), (62, 68), (69, 74), (75, 79), (80, 88), (89, 101), (102, 106), (107, 115), (116, 124), (125, 136), (137, 141), (142, 149), (150, 158), (159, 163), (164, 168), (169, 173), (174, 178), (179, 186), (187, 192), (193, 202), (203, 211), (212, 216), (217, 227), (228, 234), (235, 240), (241, 248), (249, 257), (258, 266), (267, 273), (274, 285), (286, 297), (298, 307), (308, 315), (316, 321), (322, 333), (334, 342), (343, 351), (352, 357), (358, 361), (362, 373), (374, 378), (379, 385), (386, 390), (391, 396), (397, 404), (405, 411), (412, 417), (418, 421), (422, 425), (426, 437), (438, 445), (446, 451), (452, 458)], 'sid': 0, 'ner': [(0, 1, 'GPE'), (4, 5, 'GPE'), (5, 6, 'DATE'), (7, 9, 'MONEY'), (14, 15, 'NORP'), (24, 26, 'MONEY'), (32, 33, 'DATE'), (39, 41, 'DATE'), (44, 45, 'CARDINAL')]}], 'doc_id': 0}

Mar 09 '20 16:03 hankcs

Did you put a tokenizer in your tools? I just ran the POS and it works fine.

from elit.component import NERFlairTagger
from elit.component.tokenizer import EnglishTokenizer
from elit.structure import Document

tagger = NERFlairTagger()
tagger.load()
components = [EnglishTokenizer(), tagger]
docs = 'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter'
for c in components:
    docs = c.decode(docs)
for d in docs:  # type: Document
    print(d)

{'sens': [{'tok': ['france', 'currency', 'intervention', 'debt', 'france', 'today', 'repaid', 'billion', 'francs', 'short', 'term', 'currency', 'intervention', 'debt', 'european', 'monetary', 'cooperation', 'fund', 'finance', 'ministry', 'said', 'said', 'debt', 'part', 'billion', 'franc', 'liability', 'incurred', 'swap', 'facilities', 'defend', 'franc', 'january', 'european', 'monetary', 'system', 'realignment', 'realignment', 'following', 'several', 'weeks', 'speculative', 'pressure', 'produced', 'three', 'pct', 'revaluation', 'west', 'german', 'mark', 'dutch', 'guilder', 'french', 'franc', 'two', 'pct', 'revaluation', 'belgian', 'franc', 'reuter'], 'off': [(0, 6), (7, 15), (16, 28), (29, 33), (34, 40), (41, 46), (47, 53), (54, 61), (62, 68), (69, 74), (75, 79), (80, 88), (89, 101), (102, 106), (107, 115), (116, 124), (125, 136), (137, 141), (142, 149), (150, 158), (159, 163), (164, 168), (169, 173), (174, 178), (179, 186), (187, 192), (193, 202), (203, 211), (212, 216), (217, 227), (228, 234), (235, 240), (241, 248), (249, 257), (258, 266), (267, 273), (274, 285), (286, 297), (298, 307), (308, 315), (316, 321), (322, 333), (334, 342), (343, 351), (352, 357), (358, 361), (362, 373), (374, 378), (379, 385), (386, 390), (391, 396), (397, 404), (405, 411), (412, 417), (418, 421), (422, 425), (426, 437), (438, 445), (446, 451), (452, 458)], 'sid': 0, 'ner': [(0, 1, 'GPE'), (4, 5, 'GPE'), (5, 6, 'DATE'), (7, 9, 'MONEY'), (14, 15, 'NORP'), (24, 26, 'MONEY'), (32, 33, 'DATE'), (39, 41, 'DATE'), (44, 45, 'CARDINAL')]}], 'doc_id': 0}

yes I put it there tools = [ tok,POS, sdp] tok = SpaceTokenizer()

Mar 09 '20 16:03 Jeevi10

Did you put a tokenizer in your tools? I just ran the POS and it works fine.

from elit.component import NERFlairTagger
from elit.component.tokenizer import EnglishTokenizer
from elit.structure import Document

tagger = NERFlairTagger()
tagger.load()
components = [EnglishTokenizer(), tagger]
docs = 'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter'
for c in components:
    docs = c.decode(docs)
for d in docs:  # type: Document
    print(d)

{'sens': [{'tok': ['france', 'currency', 'intervention', 'debt', 'france', 'today', 'repaid', 'billion', 'francs', 'short', 'term', 'currency', 'intervention', 'debt', 'european', 'monetary', 'cooperation', 'fund', 'finance', 'ministry', 'said', 'said', 'debt', 'part', 'billion', 'franc', 'liability', 'incurred', 'swap', 'facilities', 'defend', 'franc', 'january', 'european', 'monetary', 'system', 'realignment', 'realignment', 'following', 'several', 'weeks', 'speculative', 'pressure', 'produced', 'three', 'pct', 'revaluation', 'west', 'german', 'mark', 'dutch', 'guilder', 'french', 'franc', 'two', 'pct', 'revaluation', 'belgian', 'franc', 'reuter'], 'off': [(0, 6), (7, 15), (16, 28), (29, 33), (34, 40), (41, 46), (47, 53), (54, 61), (62, 68), (69, 74), (75, 79), (80, 88), (89, 101), (102, 106), (107, 115), (116, 124), (125, 136), (137, 141), (142, 149), (150, 158), (159, 163), (164, 168), (169, 173), (174, 178), (179, 186), (187, 192), (193, 202), (203, 211), (212, 216), (217, 227), (228, 234), (235, 240), (241, 248), (249, 257), (258, 266), (267, 273), (274, 285), (286, 297), (298, 307), (308, 315), (316, 321), (322, 333), (334, 342), (343, 351), (352, 357), (358, 361), (362, 373), (374, 378), (379, 385), (386, 390), (391, 396), (397, 404), (405, 411), (412, 417), (418, 421), (422, 425), (426, 437), (438, 445), (446, 451), (452, 458)], 'sid': 0, 'ner': [(0, 1, 'GPE'), (4, 5, 'GPE'), (5, 6, 'DATE'), (7, 9, 'MONEY'), (14, 15, 'NORP'), (24, 26, 'MONEY'), (32, 33, 'DATE'), (39, 41, 'DATE'), (44, 45, 'CARDINAL')]}], 'doc_id': 0}

yes I put it there tools = [ tok,POS, sdp] tok = SpaceTokenizer()

it worked now, for single sentence, I couldn't pass many instances, If I do it throws same error as I mentioned earlier.

Mar 09 '20 16:03 Jeevi10

Did you put a tokenizer in your tools? I just ran the POS and it works fine.

from elit.component import NERFlairTagger
from elit.component.tokenizer import EnglishTokenizer
from elit.structure import Document

tagger = NERFlairTagger()
tagger.load()
components = [EnglishTokenizer(), tagger]
docs = 'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter'
for c in components:
    docs = c.decode(docs)
for d in docs:  # type: Document
    print(d)

{'sens': [{'tok': ['france', 'currency', 'intervention', 'debt', 'france', 'today', 'repaid', 'billion', 'francs', 'short', 'term', 'currency', 'intervention', 'debt', 'european', 'monetary', 'cooperation', 'fund', 'finance', 'ministry', 'said', 'said', 'debt', 'part', 'billion', 'franc', 'liability', 'incurred', 'swap', 'facilities', 'defend', 'franc', 'january', 'european', 'monetary', 'system', 'realignment', 'realignment', 'following', 'several', 'weeks', 'speculative', 'pressure', 'produced', 'three', 'pct', 'revaluation', 'west', 'german', 'mark', 'dutch', 'guilder', 'french', 'franc', 'two', 'pct', 'revaluation', 'belgian', 'franc', 'reuter'], 'off': [(0, 6), (7, 15), (16, 28), (29, 33), (34, 40), (41, 46), (47, 53), (54, 61), (62, 68), (69, 74), (75, 79), (80, 88), (89, 101), (102, 106), (107, 115), (116, 124), (125, 136), (137, 141), (142, 149), (150, 158), (159, 163), (164, 168), (169, 173), (174, 178), (179, 186), (187, 192), (193, 202), (203, 211), (212, 216), (217, 227), (228, 234), (235, 240), (241, 248), (249, 257), (258, 266), (267, 273), (274, 285), (286, 297), (298, 307), (308, 315), (316, 321), (322, 333), (334, 342), (343, 351), (352, 357), (358, 361), (362, 373), (374, 378), (379, 385), (386, 390), (391, 396), (397, 404), (405, 411), (412, 417), (418, 421), (422, 425), (426, 437), (438, 445), (446, 451), (452, 458)], 'sid': 0, 'ner': [(0, 1, 'GPE'), (4, 5, 'GPE'), (5, 6, 'DATE'), (7, 9, 'MONEY'), (14, 15, 'NORP'), (24, 26, 'MONEY'), (32, 33, 'DATE'), (39, 41, 'DATE'), (44, 45, 'CARDINAL')]}], 'doc_id': 0}

yes I put it there tools = [ tok,POS, sdp] tok = SpaceTokenizer()

it worked now, for single sentence, I couldn't pass many instances, If I do it throws same error as I mentioned earlier.

docs = 'Sentence one. Sentence two.'

Mar 09 '20 22:03 hankcs

tools = [tok,POS] doc=shuffle_doc_words_list[0] for tool in tools: doc = tool.decode(doc) print(doc)

above code works fine, but

for tool in tools: doc = tool.decode(shuffle_doc_words_list[0]) print(doc)

above piece of code gives me an error,

AttributeError Traceback (most recent call last) in 2 #doc=shuffle_doc_words_list[0] 3 for tool in tools: ----> 4 doc = tool.decode(shuffle_doc_words_list[0]) 5 print(doc)

~/anaconda3/lib/python3.7/site-packages/elit/component/tagger/pos_tagger.py in decode(self, docs, **kwargs) 61 if isinstance(docs, Document): 62 docs = [docs] ---> 63 samples = NLPTaskDataFetcher.convert_elit_documents(docs) 64 with self.context: 65 sentences = self.tagger.predict(samples)

~/anaconda3/lib/python3.7/site-packages/elit/component/tagger/corpus.py in convert_elit_documents(docs) 1298 dataset = [] 1299 for d in docs: -> 1300 for s in d.sentences: 1301 sentence = Sentence() 1302

AttributeError: 'str' object has no attribute 'sentences'

I don't really understand the differece.

Mar 10 '20 22:03 Jeevi10