tor-browser-crawler icon indicating copy to clipboard operation
tor-browser-crawler copied to clipboard

Hanged if firefox process is defunct

Open parmegv opened this issue 9 years ago • 0 comments

2016-04-25 21:25:51,183 - INFO - *** Visit #3 to http://norton.com *** 2016-04-25 21:25:53,400 - INFO - dumpcap -P -a duration:120 -a filesize:40000 -i eth0 -s 0 -f 'tcp and not host 127.0.0.1 and not tcp port 22 and not tcp port 20' -w /home/ubuntu/tor-browser-crawler/results/160425_194146/0_21_3/capture.pcap 2016-04-25 21:26:17,108 - INFO - Dumpcap killed. Capture size: 1954898 Bytes /home/ubuntu/tor-browser-crawler/results/160425_194146/0_21_3/capture.pcap 2016-04-25 21:26:22,437 - INFO - Filtering packets without a guard IP. 2016-04-25 21:26:23,212 - INFO - *** Visit #4 to http://norton.com *** 2016-04-25 21:26:25,424 - INFO - dumpcap -P -a duration:120 -a filesize:40000 -i eth0 -s 0 -f 'tcp and not host 127.0.0.1 and not tcp port 22 and not tcp port 20' -w /home/ubuntu/tor-browser-crawler/results/160425_194146/0_21_4/capture.pcap 2016-04-25 21:26:48,185 - INFO - Dumpcap killed. Capture size: 1975077 Bytes /home/ubuntu/tor-browser-crawler/results/160425_194146/0_21_4/capture.pcap 2016-04-25 21:26:53,416 - INFO - Filtering packets without a guard IP. 2016-04-25 21:26:54,134 - INFO - *** Visit #5 to http://norton.com *** Traceback (most recent call last): File "./bin/tbcrawler.py", line 16, in run() File "/home/ubuntu/tor-browser-crawler/tbcrawler/pytbcrawler.py", line 68, in run crawler.crawl(job) File "/home/ubuntu/tor-browser-crawler/tbcrawler/crawler.py", line 28, in crawl self.do_batch() File "/home/ubuntu/tor-browser-crawler/tbcrawler/crawler.py", line 45, in __do_batch self.__do_instance() File "/home/ubuntu/tor-browser-crawler/tbcrawler/crawler.py", line 52, in __do_instance with self.driver.launch(): File "/usr/lib/python2.7/contextlib.py", line 17, in __enter return self.gen.next() File "/home/ubuntu/tor-browser-crawler/tbcrawler/pytbcrawler.py", line 205, in launch self.driver = TorBrowserDriver(_self.args, *_self.kwargs) File "/home/ubuntu/tor-browser-crawler/src/tbselenium/tbselenium/tbdriver.py", line 68, in init timeout=60) File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/webdriver.py", line 107, in init keep_alive=True) File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 91, in init self.start_session(desired_capabilities, browser_profile) File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 173, in start_session 'desiredCapabilities': desired_capabilities, File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 231, in execute response = self.command_executor.execute(driver_command, params) File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 395, in execute return self._request(command_info[0], url, body=data) File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 426, in _request resp = self._conn.getresponse() File "/usr/lib/python2.7/httplib.py", line 1051, in getresponse response.begin() File "/usr/lib/python2.7/httplib.py", line 415, in begin version, status, reason = self._read_status() File "/usr/lib/python2.7/httplib.py", line 371, in _read_status line = self.fp.readline(_MAXLINE + 1) File "/usr/lib/python2.7/socket.py", line 476, in readline data = self._sock.recv(self._rbufsize) socket.error: [Errno 104] Connection reset by peer ubuntu@ip-172-31-29-121:~/tor-browser-crawler$

It got stuck, and I killed the firefox process.

The hard timeout should stop that visit, but it doesn't.

parmegv avatar Apr 26 '16 12:04 parmegv