Question in ch2

Open shufanzhang opened this issue 6 years ago • 1 comments

from urllib.request import urlopen from bs4 import BeautifulSoup html=urlopen("http://www.pythonscraping.com/pages/warandpeace.html") bs=BeautifulSoup(html,"html.parser") nameList = bs.find_all(text='the prince') print(len(nameList))

I run the code above and the result is 7. However, when I use 'ctrl+F' to search 'the prince' in the the browser, the result is 11. I'm confused why the results are inconsistent.

Jul 16 '19 09:07 shufanzhang

That is because of casing. You have only captured 'the prince' but left out 'The prince' :) I got 11 by doing similar but with requests. You can just replace find_prince in your original code and it will work too

import re

import requests
from bs4 import BeautifulSoup

URL = "http://www.pythonscraping.com/pages/warandpeace.html"

# ignoring casing
find_prince = re.compile(r'the prince', re.IGNORECASE)

s = requests.Session()
r = s.get(URL)

soup = BeautifulSoup(r.content,'html5lib')

prince_found = soup.find_all(text = find_prince)

print(len(prince_found)) #11

Jul 17 '19 06:07 Proteusiq