pyquery icon indicating copy to clipboard operation
pyquery copied to clipboard

Inconsistent between direct url and saved contents.

Open madlee opened this issue 12 years ago • 2 comments

PyQuery can be construct from an url or the content text. I used to think that the two modes should be same. But I found that in some situation it give me different results.

Here is my example:

from urllib2 import urlopen 
from pyquery import PyQuery
import unittest

class MyTestCase(unittest.TestCase):
    def test_pyquery(self):
        url = "http://www.ncbi.nlm.nih.gov/pubmed?term=(Biomarker%5BTitle%2FAbstract%5D)%20AND%20kidney%5BTitle%2FAbstract%5D"
        filters = [".rprt .title", ".rprt .desc", ".rprt .jrnl", ".rprt .title a", ".rprt .rprtid > dt+dd"]

        pq1 = PyQuery(url=url) # Construct from URL directly.
        pq2 = PyQuery(urlopen(url).read()) # Construct from the content of same page.

        for i in filters:
            self.assertEqual(len(pq1.find(i)), len(pq2.find(i)), "Inconsistent for '%s'" % i)

if __name__ == '__main__':
    unittest.main()

For most filters it works fine but for ".rprt .title a" it give me different result. Construction from URL give the correct answer and construction from the saved content cannot find out result. I am working in Python2.7 and the PyQuery 1.2.4 I

madlee avatar Jul 29 '13 03:07 madlee

Similar for me - I'm passing a value as arg which can be a HTML. Sometimes this is a pure URL and PyQuery tries to download HTML... from lined PNG file (for example).

The assumtion that first arg is a URL to HTML is wrong and it should be removed. This is inconsistend and produces unexpected results. URL to HTML page should be passed only via url kwarg. At least PyQuery.__init__ should accept content argument to achieve backward compatibility for some time.

I'd like to suggest factory pattern.

marcinn avatar Apr 03 '18 00:04 marcinn

Five years of this bug? It's time to fix it.

marcinn avatar Apr 03 '18 00:04 marcinn