python-scraping icon indicating copy to clipboard operation
python-scraping copied to clipboard

Error on "urllib.request import urlopen" from Chapter01_BeginningToScrape.ipynb

Open efebuyuk opened this issue 5 years ago • 2 comments

Hi,

I am getting below error after the code

`from urllib.request import urlopen

html = urlopen('http://pythonscraping.com/pages/page1.html')`

`Traceback (most recent call last): File "C:\Anaconda3\envs\py38\lib\http\client.py", line 871, in _get_hostport port = int(host[i+1:]) ValueError: invalid literal for int() with base 10: 'port'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "", line 1, in File "C:\Anaconda3\envs\py38\lib\urllib\request.py", line 222, in urlopen return opener.open(url, data, timeout) File "C:\Anaconda3\envs\py38\lib\urllib\request.py", line 525, in open response = self._open(req, data) File "C:\Anaconda3\envs\py38\lib\urllib\request.py", line 542, in _open result = self._call_chain(self.handle_open, protocol, protocol + File "C:\Anaconda3\envs\py38\lib\urllib\request.py", line 502, in _call_chain result = func(*args) File "C:\Anaconda3\envs\py38\lib\urllib\request.py", line 1379, in http_open return self.do_open(http.client.HTTPConnection, req) File "C:\Anaconda3\envs\py38\lib\urllib\request.py", line 1319, in do_open h = http_class(host, timeout=req.timeout, **http_conn_args) File "C:\Anaconda3\envs\py38\lib\http\client.py", line 833, in init (self.host, self.port) = self._get_hostport(host, port) File "C:\Anaconda3\envs\py38\lib\http\client.py", line 876, in _get_hostport raise InvalidURL("nonnumeric port: '%s'" % host[i+1:]) http.client.InvalidURL: nonnumeric port: 'port'`

I am using the latest version of Python (3.8.5). What could be the problem?

Thank you.

efebuyuk avatar Aug 12 '20 07:08 efebuyuk

~ bpython
bpython version 0.18 on top of Python 3.8.5 /usr/bin/python3
>>> from urllib.request import urlopen
>>> response = urlopen('http://pythonscraping.com/pages/page1.html')
>>> response
<http.client.HTTPResponse object at 0x7f196406b850>
>>> 

And read the data:

>>> data = response.read().decode('utf-8')
>>> data
'<html>\n<head>\n<title>A Useful Page</title>\n</head>\n<body>\n<h1>An Interesting Title</h1>\n<div>\nLorem ipsum dolor sit amet, consectetur adipisic
ing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi u
t aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur si
nt occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\n</div>\n</body>\n</html>\n'
>>>

CrustyBarnacle avatar Nov 23 '20 02:11 CrustyBarnacle

Try this:

import urllib.request request_url = urllib.request.urlopen('https://www.pythonscraping.com/pages/page1.html') print(request_url.read())

Read here: https://www.geeksforgeeks.org/python-urllib-module/

miroslavsavel avatar Feb 24 '22 14:02 miroslavsavel