AttributeError
when I ran my own data set to decode it gives an error, for tool in tools: shuf = tool.decode(docs)
AttributeError Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/elit/component/tagger/pos_tagger.py in decode(self, docs, **kwargs) 61 if isinstance(docs, Document): 62 docs = [docs] ---> 63 samples = NLPTaskDataFetcher.convert_elit_documents(docs) 64 with self.context: 65 sentences = self.tagger.predict(samples)
~/anaconda3/lib/python3.7/site-packages/elit/component/tagger/corpus.py in convert_elit_documents(docs) 1298 dataset = [] 1299 for d in docs: -> 1300 for s in d.sentences: 1301 sentence = Sentence() 1302
AttributeError: 'str' object has no attribute 'sentences
but it works fine for the example given is documentation! Please help me to figure this out.
Not sure what did you put in the docs, but it's supposed to be a str (can contain many sents then the tokenizer will split it into several sents).
Not sure what did you put in the docs, but it's supposed to be a str (can contain many sents then the tokenizer will split it into several sents).
this is one of the sample of my docs(R8 dataset)
'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter'
Did you put a tokenizer in your tools? I just ran the POS and it works fine.
from elit.component import NERFlairTagger
from elit.component.tokenizer import EnglishTokenizer
from elit.structure import Document
tagger = NERFlairTagger()
tagger.load()
components = [EnglishTokenizer(), tagger]
docs = 'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter'
for c in components:
docs = c.decode(docs)
for d in docs: # type: Document
print(d)
{'sens': [{'tok': ['france', 'currency', 'intervention', 'debt', 'france', 'today', 'repaid', 'billion', 'francs', 'short', 'term', 'currency', 'intervention', 'debt', 'european', 'monetary', 'cooperation', 'fund', 'finance', 'ministry', 'said', 'said', 'debt', 'part', 'billion', 'franc', 'liability', 'incurred', 'swap', 'facilities', 'defend', 'franc', 'january', 'european', 'monetary', 'system', 'realignment', 'realignment', 'following', 'several', 'weeks', 'speculative', 'pressure', 'produced', 'three', 'pct', 'revaluation', 'west', 'german', 'mark', 'dutch', 'guilder', 'french', 'franc', 'two', 'pct', 'revaluation', 'belgian', 'franc', 'reuter'], 'off': [(0, 6), (7, 15), (16, 28), (29, 33), (34, 40), (41, 46), (47, 53), (54, 61), (62, 68), (69, 74), (75, 79), (80, 88), (89, 101), (102, 106), (107, 115), (116, 124), (125, 136), (137, 141), (142, 149), (150, 158), (159, 163), (164, 168), (169, 173), (174, 178), (179, 186), (187, 192), (193, 202), (203, 211), (212, 216), (217, 227), (228, 234), (235, 240), (241, 248), (249, 257), (258, 266), (267, 273), (274, 285), (286, 297), (298, 307), (308, 315), (316, 321), (322, 333), (334, 342), (343, 351), (352, 357), (358, 361), (362, 373), (374, 378), (379, 385), (386, 390), (391, 396), (397, 404), (405, 411), (412, 417), (418, 421), (422, 425), (426, 437), (438, 445), (446, 451), (452, 458)], 'sid': 0, 'ner': [(0, 1, 'GPE'), (4, 5, 'GPE'), (5, 6, 'DATE'), (7, 9, 'MONEY'), (14, 15, 'NORP'), (24, 26, 'MONEY'), (32, 33, 'DATE'), (39, 41, 'DATE'), (44, 45, 'CARDINAL')]}], 'doc_id': 0}
Did you put a tokenizer in your tools? I just ran the POS and it works fine.
from elit.component import NERFlairTagger from elit.component.tokenizer import EnglishTokenizer from elit.structure import Document tagger = NERFlairTagger() tagger.load() components = [EnglishTokenizer(), tagger] docs = 'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter' for c in components: docs = c.decode(docs) for d in docs: # type: Document print(d) {'sens': [{'tok': ['france', 'currency', 'intervention', 'debt', 'france', 'today', 'repaid', 'billion', 'francs', 'short', 'term', 'currency', 'intervention', 'debt', 'european', 'monetary', 'cooperation', 'fund', 'finance', 'ministry', 'said', 'said', 'debt', 'part', 'billion', 'franc', 'liability', 'incurred', 'swap', 'facilities', 'defend', 'franc', 'january', 'european', 'monetary', 'system', 'realignment', 'realignment', 'following', 'several', 'weeks', 'speculative', 'pressure', 'produced', 'three', 'pct', 'revaluation', 'west', 'german', 'mark', 'dutch', 'guilder', 'french', 'franc', 'two', 'pct', 'revaluation', 'belgian', 'franc', 'reuter'], 'off': [(0, 6), (7, 15), (16, 28), (29, 33), (34, 40), (41, 46), (47, 53), (54, 61), (62, 68), (69, 74), (75, 79), (80, 88), (89, 101), (102, 106), (107, 115), (116, 124), (125, 136), (137, 141), (142, 149), (150, 158), (159, 163), (164, 168), (169, 173), (174, 178), (179, 186), (187, 192), (193, 202), (203, 211), (212, 216), (217, 227), (228, 234), (235, 240), (241, 248), (249, 257), (258, 266), (267, 273), (274, 285), (286, 297), (298, 307), (308, 315), (316, 321), (322, 333), (334, 342), (343, 351), (352, 357), (358, 361), (362, 373), (374, 378), (379, 385), (386, 390), (391, 396), (397, 404), (405, 411), (412, 417), (418, 421), (422, 425), (426, 437), (438, 445), (446, 451), (452, 458)], 'sid': 0, 'ner': [(0, 1, 'GPE'), (4, 5, 'GPE'), (5, 6, 'DATE'), (7, 9, 'MONEY'), (14, 15, 'NORP'), (24, 26, 'MONEY'), (32, 33, 'DATE'), (39, 41, 'DATE'), (44, 45, 'CARDINAL')]}], 'doc_id': 0}
yes I put it there tools = [ tok,POS, sdp] tok = SpaceTokenizer()
Did you put a tokenizer in your tools? I just ran the POS and it works fine.
from elit.component import NERFlairTagger from elit.component.tokenizer import EnglishTokenizer from elit.structure import Document tagger = NERFlairTagger() tagger.load() components = [EnglishTokenizer(), tagger] docs = 'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter' for c in components: docs = c.decode(docs) for d in docs: # type: Document print(d) {'sens': [{'tok': ['france', 'currency', 'intervention', 'debt', 'france', 'today', 'repaid', 'billion', 'francs', 'short', 'term', 'currency', 'intervention', 'debt', 'european', 'monetary', 'cooperation', 'fund', 'finance', 'ministry', 'said', 'said', 'debt', 'part', 'billion', 'franc', 'liability', 'incurred', 'swap', 'facilities', 'defend', 'franc', 'january', 'european', 'monetary', 'system', 'realignment', 'realignment', 'following', 'several', 'weeks', 'speculative', 'pressure', 'produced', 'three', 'pct', 'revaluation', 'west', 'german', 'mark', 'dutch', 'guilder', 'french', 'franc', 'two', 'pct', 'revaluation', 'belgian', 'franc', 'reuter'], 'off': [(0, 6), (7, 15), (16, 28), (29, 33), (34, 40), (41, 46), (47, 53), (54, 61), (62, 68), (69, 74), (75, 79), (80, 88), (89, 101), (102, 106), (107, 115), (116, 124), (125, 136), (137, 141), (142, 149), (150, 158), (159, 163), (164, 168), (169, 173), (174, 178), (179, 186), (187, 192), (193, 202), (203, 211), (212, 216), (217, 227), (228, 234), (235, 240), (241, 248), (249, 257), (258, 266), (267, 273), (274, 285), (286, 297), (298, 307), (308, 315), (316, 321), (322, 333), (334, 342), (343, 351), (352, 357), (358, 361), (362, 373), (374, 378), (379, 385), (386, 390), (391, 396), (397, 404), (405, 411), (412, 417), (418, 421), (422, 425), (426, 437), (438, 445), (446, 451), (452, 458)], 'sid': 0, 'ner': [(0, 1, 'GPE'), (4, 5, 'GPE'), (5, 6, 'DATE'), (7, 9, 'MONEY'), (14, 15, 'NORP'), (24, 26, 'MONEY'), (32, 33, 'DATE'), (39, 41, 'DATE'), (44, 45, 'CARDINAL')]}], 'doc_id': 0}yes I put it there tools = [ tok,POS, sdp] tok = SpaceTokenizer()
it worked now, for single sentence, I couldn't pass many instances, If I do it throws same error as I mentioned earlier.
Did you put a tokenizer in your tools? I just ran the POS and it works fine.
from elit.component import NERFlairTagger from elit.component.tokenizer import EnglishTokenizer from elit.structure import Document tagger = NERFlairTagger() tagger.load() components = [EnglishTokenizer(), tagger] docs = 'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter' for c in components: docs = c.decode(docs) for d in docs: # type: Document print(d) {'sens': [{'tok': ['france', 'currency', 'intervention', 'debt', 'france', 'today', 'repaid', 'billion', 'francs', 'short', 'term', 'currency', 'intervention', 'debt', 'european', 'monetary', 'cooperation', 'fund', 'finance', 'ministry', 'said', 'said', 'debt', 'part', 'billion', 'franc', 'liability', 'incurred', 'swap', 'facilities', 'defend', 'franc', 'january', 'european', 'monetary', 'system', 'realignment', 'realignment', 'following', 'several', 'weeks', 'speculative', 'pressure', 'produced', 'three', 'pct', 'revaluation', 'west', 'german', 'mark', 'dutch', 'guilder', 'french', 'franc', 'two', 'pct', 'revaluation', 'belgian', 'franc', 'reuter'], 'off': [(0, 6), (7, 15), (16, 28), (29, 33), (34, 40), (41, 46), (47, 53), (54, 61), (62, 68), (69, 74), (75, 79), (80, 88), (89, 101), (102, 106), (107, 115), (116, 124), (125, 136), (137, 141), (142, 149), (150, 158), (159, 163), (164, 168), (169, 173), (174, 178), (179, 186), (187, 192), (193, 202), (203, 211), (212, 216), (217, 227), (228, 234), (235, 240), (241, 248), (249, 257), (258, 266), (267, 273), (274, 285), (286, 297), (298, 307), (308, 315), (316, 321), (322, 333), (334, 342), (343, 351), (352, 357), (358, 361), (362, 373), (374, 378), (379, 385), (386, 390), (391, 396), (397, 404), (405, 411), (412, 417), (418, 421), (422, 425), (426, 437), (438, 445), (446, 451), (452, 458)], 'sid': 0, 'ner': [(0, 1, 'GPE'), (4, 5, 'GPE'), (5, 6, 'DATE'), (7, 9, 'MONEY'), (14, 15, 'NORP'), (24, 26, 'MONEY'), (32, 33, 'DATE'), (39, 41, 'DATE'), (44, 45, 'CARDINAL')]}], 'doc_id': 0}yes I put it there tools = [ tok,POS, sdp] tok = SpaceTokenizer()
it worked now, for single sentence, I couldn't pass many instances, If I do it throws same error as I mentioned earlier.
docs = 'Sentence one. Sentence two.'
tools = [tok,POS] doc=shuffle_doc_words_list[0] for tool in tools: doc = tool.decode(doc) print(doc)
above code works fine, but
for tool in tools: doc = tool.decode(shuffle_doc_words_list[0]) print(doc)
above piece of code gives me an error,
AttributeError Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/elit/component/tagger/pos_tagger.py in decode(self, docs, **kwargs) 61 if isinstance(docs, Document): 62 docs = [docs] ---> 63 samples = NLPTaskDataFetcher.convert_elit_documents(docs) 64 with self.context: 65 sentences = self.tagger.predict(samples)
~/anaconda3/lib/python3.7/site-packages/elit/component/tagger/corpus.py in convert_elit_documents(docs) 1298 dataset = [] 1299 for d in docs: -> 1300 for s in d.sentences: 1301 sentence = Sentence() 1302
AttributeError: 'str' object has no attribute 'sentences'
I don't really understand the differece.