elit icon indicating copy to clipboard operation
elit copied to clipboard

AttributeError

Open Jeevi10 opened this issue 5 years ago • 7 comments

when I ran my own data set to decode it gives an error, for tool in tools: shuf = tool.decode(docs)

AttributeError Traceback (most recent call last) in 1 for tool in tools: ----> 2 shuf = tool.decode(docs)

~/anaconda3/lib/python3.7/site-packages/elit/component/tagger/pos_tagger.py in decode(self, docs, **kwargs) 61 if isinstance(docs, Document): 62 docs = [docs] ---> 63 samples = NLPTaskDataFetcher.convert_elit_documents(docs) 64 with self.context: 65 sentences = self.tagger.predict(samples)

~/anaconda3/lib/python3.7/site-packages/elit/component/tagger/corpus.py in convert_elit_documents(docs) 1298 dataset = [] 1299 for d in docs: -> 1300 for s in d.sentences: 1301 sentence = Sentence() 1302

AttributeError: 'str' object has no attribute 'sentences

but it works fine for the example given is documentation! Please help me to figure this out.

Jeevi10 avatar Mar 09 '20 16:03 Jeevi10

Not sure what did you put in the docs, but it's supposed to be a str (can contain many sents then the tokenizer will split it into several sents).

hankcs avatar Mar 09 '20 16:03 hankcs

Not sure what did you put in the docs, but it's supposed to be a str (can contain many sents then the tokenizer will split it into several sents).

this is one of the sample of my docs(R8 dataset)

'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter'

Jeevi10 avatar Mar 09 '20 16:03 Jeevi10

Did you put a tokenizer in your tools? I just ran the POS and it works fine.

from elit.component import NERFlairTagger
from elit.component.tokenizer import EnglishTokenizer
from elit.structure import Document

tagger = NERFlairTagger()
tagger.load()
components = [EnglishTokenizer(), tagger]
docs = 'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter'
for c in components:
    docs = c.decode(docs)
for d in docs:  # type: Document
    print(d)

{'sens': [{'tok': ['france', 'currency', 'intervention', 'debt', 'france', 'today', 'repaid', 'billion', 'francs', 'short', 'term', 'currency', 'intervention', 'debt', 'european', 'monetary', 'cooperation', 'fund', 'finance', 'ministry', 'said', 'said', 'debt', 'part', 'billion', 'franc', 'liability', 'incurred', 'swap', 'facilities', 'defend', 'franc', 'january', 'european', 'monetary', 'system', 'realignment', 'realignment', 'following', 'several', 'weeks', 'speculative', 'pressure', 'produced', 'three', 'pct', 'revaluation', 'west', 'german', 'mark', 'dutch', 'guilder', 'french', 'franc', 'two', 'pct', 'revaluation', 'belgian', 'franc', 'reuter'], 'off': [(0, 6), (7, 15), (16, 28), (29, 33), (34, 40), (41, 46), (47, 53), (54, 61), (62, 68), (69, 74), (75, 79), (80, 88), (89, 101), (102, 106), (107, 115), (116, 124), (125, 136), (137, 141), (142, 149), (150, 158), (159, 163), (164, 168), (169, 173), (174, 178), (179, 186), (187, 192), (193, 202), (203, 211), (212, 216), (217, 227), (228, 234), (235, 240), (241, 248), (249, 257), (258, 266), (267, 273), (274, 285), (286, 297), (298, 307), (308, 315), (316, 321), (322, 333), (334, 342), (343, 351), (352, 357), (358, 361), (362, 373), (374, 378), (379, 385), (386, 390), (391, 396), (397, 404), (405, 411), (412, 417), (418, 421), (422, 425), (426, 437), (438, 445), (446, 451), (452, 458)], 'sid': 0, 'ner': [(0, 1, 'GPE'), (4, 5, 'GPE'), (5, 6, 'DATE'), (7, 9, 'MONEY'), (14, 15, 'NORP'), (24, 26, 'MONEY'), (32, 33, 'DATE'), (39, 41, 'DATE'), (44, 45, 'CARDINAL')]}], 'doc_id': 0}

hankcs avatar Mar 09 '20 16:03 hankcs

Did you put a tokenizer in your tools? I just ran the POS and it works fine.

from elit.component import NERFlairTagger
from elit.component.tokenizer import EnglishTokenizer
from elit.structure import Document

tagger = NERFlairTagger()
tagger.load()
components = [EnglishTokenizer(), tagger]
docs = 'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter'
for c in components:
    docs = c.decode(docs)
for d in docs:  # type: Document
    print(d)

{'sens': [{'tok': ['france', 'currency', 'intervention', 'debt', 'france', 'today', 'repaid', 'billion', 'francs', 'short', 'term', 'currency', 'intervention', 'debt', 'european', 'monetary', 'cooperation', 'fund', 'finance', 'ministry', 'said', 'said', 'debt', 'part', 'billion', 'franc', 'liability', 'incurred', 'swap', 'facilities', 'defend', 'franc', 'january', 'european', 'monetary', 'system', 'realignment', 'realignment', 'following', 'several', 'weeks', 'speculative', 'pressure', 'produced', 'three', 'pct', 'revaluation', 'west', 'german', 'mark', 'dutch', 'guilder', 'french', 'franc', 'two', 'pct', 'revaluation', 'belgian', 'franc', 'reuter'], 'off': [(0, 6), (7, 15), (16, 28), (29, 33), (34, 40), (41, 46), (47, 53), (54, 61), (62, 68), (69, 74), (75, 79), (80, 88), (89, 101), (102, 106), (107, 115), (116, 124), (125, 136), (137, 141), (142, 149), (150, 158), (159, 163), (164, 168), (169, 173), (174, 178), (179, 186), (187, 192), (193, 202), (203, 211), (212, 216), (217, 227), (228, 234), (235, 240), (241, 248), (249, 257), (258, 266), (267, 273), (274, 285), (286, 297), (298, 307), (308, 315), (316, 321), (322, 333), (334, 342), (343, 351), (352, 357), (358, 361), (362, 373), (374, 378), (379, 385), (386, 390), (391, 396), (397, 404), (405, 411), (412, 417), (418, 421), (422, 425), (426, 437), (438, 445), (446, 451), (452, 458)], 'sid': 0, 'ner': [(0, 1, 'GPE'), (4, 5, 'GPE'), (5, 6, 'DATE'), (7, 9, 'MONEY'), (14, 15, 'NORP'), (24, 26, 'MONEY'), (32, 33, 'DATE'), (39, 41, 'DATE'), (44, 45, 'CARDINAL')]}], 'doc_id': 0}

yes I put it there tools = [ tok,POS, sdp] tok = SpaceTokenizer()

Jeevi10 avatar Mar 09 '20 16:03 Jeevi10

Did you put a tokenizer in your tools? I just ran the POS and it works fine.

from elit.component import NERFlairTagger
from elit.component.tokenizer import EnglishTokenizer
from elit.structure import Document

tagger = NERFlairTagger()
tagger.load()
components = [EnglishTokenizer(), tagger]
docs = 'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter'
for c in components:
    docs = c.decode(docs)
for d in docs:  # type: Document
    print(d)

{'sens': [{'tok': ['france', 'currency', 'intervention', 'debt', 'france', 'today', 'repaid', 'billion', 'francs', 'short', 'term', 'currency', 'intervention', 'debt', 'european', 'monetary', 'cooperation', 'fund', 'finance', 'ministry', 'said', 'said', 'debt', 'part', 'billion', 'franc', 'liability', 'incurred', 'swap', 'facilities', 'defend', 'franc', 'january', 'european', 'monetary', 'system', 'realignment', 'realignment', 'following', 'several', 'weeks', 'speculative', 'pressure', 'produced', 'three', 'pct', 'revaluation', 'west', 'german', 'mark', 'dutch', 'guilder', 'french', 'franc', 'two', 'pct', 'revaluation', 'belgian', 'franc', 'reuter'], 'off': [(0, 6), (7, 15), (16, 28), (29, 33), (34, 40), (41, 46), (47, 53), (54, 61), (62, 68), (69, 74), (75, 79), (80, 88), (89, 101), (102, 106), (107, 115), (116, 124), (125, 136), (137, 141), (142, 149), (150, 158), (159, 163), (164, 168), (169, 173), (174, 178), (179, 186), (187, 192), (193, 202), (203, 211), (212, 216), (217, 227), (228, 234), (235, 240), (241, 248), (249, 257), (258, 266), (267, 273), (274, 285), (286, 297), (298, 307), (308, 315), (316, 321), (322, 333), (334, 342), (343, 351), (352, 357), (358, 361), (362, 373), (374, 378), (379, 385), (386, 390), (391, 396), (397, 404), (405, 411), (412, 417), (418, 421), (422, 425), (426, 437), (438, 445), (446, 451), (452, 458)], 'sid': 0, 'ner': [(0, 1, 'GPE'), (4, 5, 'GPE'), (5, 6, 'DATE'), (7, 9, 'MONEY'), (14, 15, 'NORP'), (24, 26, 'MONEY'), (32, 33, 'DATE'), (39, 41, 'DATE'), (44, 45, 'CARDINAL')]}], 'doc_id': 0}

yes I put it there tools = [ tok,POS, sdp] tok = SpaceTokenizer()

it worked now, for single sentence, I couldn't pass many instances, If I do it throws same error as I mentioned earlier.

Jeevi10 avatar Mar 09 '20 16:03 Jeevi10

Did you put a tokenizer in your tools? I just ran the POS and it works fine.

from elit.component import NERFlairTagger
from elit.component.tokenizer import EnglishTokenizer
from elit.structure import Document

tagger = NERFlairTagger()
tagger.load()
components = [EnglishTokenizer(), tagger]
docs = 'france currency intervention debt france today repaid billion francs short term currency intervention debt european monetary cooperation fund finance ministry said said debt part billion franc liability incurred swap facilities defend franc january european monetary system realignment realignment following several weeks speculative pressure produced three pct revaluation west german mark dutch guilder french franc two pct revaluation belgian franc reuter'
for c in components:
    docs = c.decode(docs)
for d in docs:  # type: Document
    print(d)

{'sens': [{'tok': ['france', 'currency', 'intervention', 'debt', 'france', 'today', 'repaid', 'billion', 'francs', 'short', 'term', 'currency', 'intervention', 'debt', 'european', 'monetary', 'cooperation', 'fund', 'finance', 'ministry', 'said', 'said', 'debt', 'part', 'billion', 'franc', 'liability', 'incurred', 'swap', 'facilities', 'defend', 'franc', 'january', 'european', 'monetary', 'system', 'realignment', 'realignment', 'following', 'several', 'weeks', 'speculative', 'pressure', 'produced', 'three', 'pct', 'revaluation', 'west', 'german', 'mark', 'dutch', 'guilder', 'french', 'franc', 'two', 'pct', 'revaluation', 'belgian', 'franc', 'reuter'], 'off': [(0, 6), (7, 15), (16, 28), (29, 33), (34, 40), (41, 46), (47, 53), (54, 61), (62, 68), (69, 74), (75, 79), (80, 88), (89, 101), (102, 106), (107, 115), (116, 124), (125, 136), (137, 141), (142, 149), (150, 158), (159, 163), (164, 168), (169, 173), (174, 178), (179, 186), (187, 192), (193, 202), (203, 211), (212, 216), (217, 227), (228, 234), (235, 240), (241, 248), (249, 257), (258, 266), (267, 273), (274, 285), (286, 297), (298, 307), (308, 315), (316, 321), (322, 333), (334, 342), (343, 351), (352, 357), (358, 361), (362, 373), (374, 378), (379, 385), (386, 390), (391, 396), (397, 404), (405, 411), (412, 417), (418, 421), (422, 425), (426, 437), (438, 445), (446, 451), (452, 458)], 'sid': 0, 'ner': [(0, 1, 'GPE'), (4, 5, 'GPE'), (5, 6, 'DATE'), (7, 9, 'MONEY'), (14, 15, 'NORP'), (24, 26, 'MONEY'), (32, 33, 'DATE'), (39, 41, 'DATE'), (44, 45, 'CARDINAL')]}], 'doc_id': 0}

yes I put it there tools = [ tok,POS, sdp] tok = SpaceTokenizer()

it worked now, for single sentence, I couldn't pass many instances, If I do it throws same error as I mentioned earlier.

docs = 'Sentence one. Sentence two.'

hankcs avatar Mar 09 '20 22:03 hankcs

tools = [tok,POS] doc=shuffle_doc_words_list[0] for tool in tools: doc = tool.decode(doc) print(doc)

above code works fine, but

for tool in tools: doc = tool.decode(shuffle_doc_words_list[0]) print(doc)

above piece of code gives me an error,

AttributeError Traceback (most recent call last) in 2 #doc=shuffle_doc_words_list[0] 3 for tool in tools: ----> 4 doc = tool.decode(shuffle_doc_words_list[0]) 5 print(doc)

~/anaconda3/lib/python3.7/site-packages/elit/component/tagger/pos_tagger.py in decode(self, docs, **kwargs) 61 if isinstance(docs, Document): 62 docs = [docs] ---> 63 samples = NLPTaskDataFetcher.convert_elit_documents(docs) 64 with self.context: 65 sentences = self.tagger.predict(samples)

~/anaconda3/lib/python3.7/site-packages/elit/component/tagger/corpus.py in convert_elit_documents(docs) 1298 dataset = [] 1299 for d in docs: -> 1300 for s in d.sentences: 1301 sentence = Sentence() 1302

AttributeError: 'str' object has no attribute 'sentences'

I don't really understand the differece.

Jeevi10 avatar Mar 10 '20 22:03 Jeevi10