python-scraping
python-scraping copied to clipboard
Question in ch2
from urllib.request import urlopen from bs4 import BeautifulSoup html=urlopen("http://www.pythonscraping.com/pages/warandpeace.html") bs=BeautifulSoup(html,"html.parser") nameList = bs.find_all(text='the prince') print(len(nameList))
I run the code above and the result is 7. However, when I use 'ctrl+F' to search 'the prince' in the the browser, the result is 11. I'm confused why the results are inconsistent.
That is because of casing. You have only captured 'the prince' but left out 'The prince' :) I got 11 by doing similar but with requests. You can just replace find_prince in your original code and it will work too
import re
import requests
from bs4 import BeautifulSoup
URL = "http://www.pythonscraping.com/pages/warandpeace.html"
# ignoring casing
find_prince = re.compile(r'the prince', re.IGNORECASE)
s = requests.Session()
r = s.get(URL)
soup = BeautifulSoup(r.content,'html5lib')
prince_found = soup.find_all(text = find_prince)
print(len(prince_found)) #11