Does not parse the page vk.com

Open Vponed opened this issue 4 years ago • 1 comments

raw_html = requests.get('https://vk.com/neurosciencenews').text
results = Extractor().extract(raw_html)

It does not return almost anything. Why it can be? It works great with other sites. Also, I would like to know more about manipulations with the extractor. It is very interesting whether it is possible to obtain from it not only data, but also the way in which he extracted them.

Jan 19 '22 04:01 Vponed

My guess is this page is a client side generated site which the content are loaded after the website was loaded. Using requests only returns empty web page ( contents are not yet loaded ). You might need to render the page and try again.

You can view these two files for understanding how the extraction works

nn_models.py
pipeline.py

May 03 '22 01:05 theblackcat102