cogcomp-nlpy icon indicating copy to clipboard operation
cogcomp-nlpy copied to clipboard

infinite loop?

Open danyaljj opened this issue 8 years ago • 9 comments

Inside get_view function we call add_view function, and inside add_view function we call get_view. There's danger of falling in an infinite recursive loop.

danyaljj avatar May 03 '17 22:05 danyaljj

Here is an example producing it:

## get trump speeches 
import urllib2
url = "https://raw.githubusercontent.com/ryanmcdermott/trump-speeches/master/speeches.txt"
speeches = urllib2.urlopen(url)
speechText = speeches.read()

firstFewLines = speechText[0: 2000]

## create a remote pipeline 
from sioux import remote_pipeline

p = remote_pipeline.RemotePipeline(server_api='http://austen.cs.illinois.edu:5800')

for d in firstFewLines.splitlines():  
    print d 
    print "---- \n"
    doc = p.doc(d)
    ner_view = doc.get_ner_conll
    print ner_view

And here is the error:

SPEECH 1


...Thank you so much.  That's so nice.  Isn't he a great guy.  He doesn't get a fair press; he doesn't get it.  It's just not fair.  And I have to tell you I'm here, and very strongly here, because I have great respect for Steve King and have great respect likewise for Citizens United, David and everybody, and tremendous resect for the Tea Party.  Also, also the people of Iowa.  They have something in common.  Hard-working people.  They want to work, they want to make the country great.  I love the people of Iowa.  So that's the way it is.  Very simple.
With that said, our country is really headed in the wrong direction with a president who is doing an absolutely terrible job.  The world is collapsing around us, and many of the problems we've caused.  Our president is either grossly incompetent, a word that more and more people are using, and I think I was the first to use it, or he has a completely different agenda than you want to know about, which could be possible.  In any event, Washington is broken, and our country is in serious trouble and total disarray.  Very simple.  Politicians are all talk, no action.  They are all talk and no action.  And it's constant; it never ends.
And I'm a conservative, actually very conservative, and I'm a Republican.  And I'm very disappointed by our Republican politicians.  Because they let the president get away with absolute murder.  You see always, oh we're going to do this, we're going to--.  Nothing ever happens; nothing ever happens.
You look at Obamacare.  A total catastrophe and by the way it really kicks in in '16 and it is going to be a disaster.  People are closing up shops.  Doctors are quitting the business.  I have a friend of mine who's a doctor, a very good doctor, a very successful guy.  He said, I have more accountants than I have patients.  And he needs because it is so complicated and so terrible and he's never had that before and he's going to close up his business.  And he was very succe
SPEECH 1
---- 

NER_CONLL view: this view does not have constituents in your input text. 

---- 

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-50-417d3012cc1a> in <module>()
     29     print "---- \n"
     30     doc = p.doc(d)
---> 31     ner_view = doc.get_ner_conll
     32     print ner_view
     33 

/Users/daniel/.virtualenvs/python-utils/lib/python2.7/site-packages/sioux/core/text_annotation.pyc in get_ner_conll(self)
     67         @return: View Instance of the NER_CONLL view.
     68         """
---> 69         return self.get_view("NER_CONLL")
     70 
     71     @property

/Users/daniel/.virtualenvs/python-utils/lib/python2.7/site-packages/sioux/core/text_annotation.pyc in get_view(self, view_name)
    159             if view_name not in self.view_dictionary:
    160                 additional_response = self.pipeline.call_server(self.text, view_name)
--> 161                 self.add_view(view_name, additional_response)
    162 
    163             if type(self.view_dictionary[view_name]) != type([]):

/Users/daniel/.virtualenvs/python-utils/lib/python2.7/site-packages/sioux/core/text_annotation.pyc in add_view(self, view_name, response)
    145                 self.view_dictionary[name] = self._view_builder(view)
    146 
--> 147         requested_view = self.get_view(view_name)
    148         if requested_view is None:
    149             # "token" view will always be included

... last 2 frames repeated, from the frame below ...

/Users/daniel/.virtualenvs/python-utils/lib/python2.7/site-packages/sioux/core/text_annotation.pyc in get_view(self, view_name)
    159             if view_name not in self.view_dictionary:
    160                 additional_response = self.pipeline.call_server(self.text, view_name)
--> 161                 self.add_view(view_name, additional_response)
    162 
    163             if type(self.view_dictionary[view_name]) != type([]):

RuntimeError: maximum recursion depth exceeded in cmp

danyaljj avatar May 03 '17 22:05 danyaljj

I agree that it is not safe and I will change it.

And I will also look into it because this infinity loop shouldn't happen (by the time we call get_view inside add_view, the view should have been created)

GHLgh avatar May 03 '17 23:05 GHLgh

I see your point that it shouldn't happen. But it actually happened to me unfortunately :-/ Trying to understand why it's happening ...

danyaljj avatar May 04 '17 03:05 danyaljj

So the infinite loop happens if user passed in an empty string.

On calling server with empty string, different from "TOKENS" view which will be in the response with no constituents, the response will not contain other views requested, which causes the endless call on add_view.

My suggestion is to check string length when user call pipeline.doc(string) and log error on empty string. But do you still want to create a text annotation for empty string (to be consistent with response from server)?

GHLgh avatar May 04 '17 13:05 GHLgh

@danyaljj and your script need to check if pipeline.doc(string) return None:

## get trump speeches 
import urllib2
url = "https://raw.githubusercontent.com/ryanmcdermott/trump-speeches/master/speeches.txt"
speeches = urllib2.urlopen(url)
speechText = speeches.read()

firstFewLines = speechText[0: 2000]

## create a remote pipeline 
from sioux import remote_pipeline

p = remote_pipeline.RemotePipeline(server_api='http://austen.cs.illinois.edu:5800')

for d in firstFewLines.splitlines():  
    print d 
    print "---- \n"
    doc = p.doc(d)
    if doc is not None:  # HERE
        ner_view = doc.get_ner_conll
        print ner_view

Also The server returns status code 500 (Internal Server Error) when this line (firstFewLines.splitlines()[5]) is sent: "And I'm a conservative, actually very conservative, and I'm a Republican. And I'm very disappointed by our Republican politicians. Because they let the president get away with absolute murder. You see always, oh we're going to do this, we're going to--. Nothing ever happens; nothing ever happens."

I wonder if that also happens to you, it seems like the server side couldn't handle this text.

GHLgh avatar May 04 '17 15:05 GHLgh

My suggestion is to check string length when user call pipeline.doc(string) and log error on empty string. But do you still want to create a text annotation for empty string (to be consistent with response from server)?

Yeah I think we should do this. Let's create an empty TextAnnotation. Checking whether the output is None or not is extra complication for the user.

danyaljj avatar May 04 '17 16:05 danyaljj

About error code 500: we should clarify in the readme, and give an example usage.

danyaljj avatar May 04 '17 16:05 danyaljj

@danyaljj changed to returning empty TextAnnotation.

And what do you want to talk about status code 500 in README? Can you check the log on server side to see what caused this?

GHLgh avatar May 04 '17 17:05 GHLgh

I wanna show that usage that you showed me above.

It is true that we should see what's happening on this specific example in the pipeline side. That being said, there is always going g to be issues on the server side that we cannot handle. Like anything random like network failure. So I think we should let users know that things might go wrong

danyaljj avatar May 04 '17 17:05 danyaljj