requests-html
requests-html copied to clipboard
Recording html-requests with VCR (for testing purposes)
Hi, I'm trying to record the requests using pytest-vcr / pytest-recording, but
- recorded request seems incomplete (no path or query)
- when pytest tries to use the recorded cassette the request hangs indefinitely or fails to overwrite the cassette (depending which match_on parameters I use)
the casette (trimmed the response as it's very long):
cassette
interactions:
- request:
body: null
headers:
Accept:
- '*/*'
Accept-Encoding:
- gzip, deflate
Connection:
- keep-alive
User-Agent:
- Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/603.3.8 (KHTML,
like Gecko) Version/10.1.2 Safari/603.3.8
method: GET
uri: https://www.cinema-city.pl/
response:
body:
string: !!binary |
H4sIAAAAAAAAA+y9a3MbOZYo+L1/BayOnXLdFii+H2pLtbIkl1W2Hi3J5S53dTiQCZAEmQlk50N0
cmYiehztWz9gYiPGt2Lvx424n+vT9tSntbQ/pH/JxgHygSSTFGXLLteOHDNdFAkcAAcHB+eNB/eo
...
...
...
headers:
Age:
- '232'
CF-Cache-Status:
- HIT
CF-RAY:
- 886f8bc6aa8534ec-WAW
Cache-Control:
- public, max-age=300
Connection:
- keep-alive
Content-Encoding:
- gzip
Content-Type:
- text/html;charset=UTF-8
Date:
- Mon, 20 May 2024 21:51:04 GMT
Last-Modified:
- Mon, 20 May 2024 21:47:12 GMT
Server:
- cloudflare
Set-Cookie:
- __cf_bm=7T66fb9eQIG7yoTnQFW_nNy9vKSyv8ZZvUOZ7joqka4-1716241864-1.0.1.1-CsXR1eX.ZfGRXvnRaH9fu1vRe_E9j9rY126apPaIn4WCIVZfDJjSYKt.v2BWb4Kf63O4ZqivMDR3NBzbRM4mpw;
path=/; expires=Mon, 20-May-24 22:21:04 GMT; domain=.cinema-city.pl; HttpOnly;
Secure; SameSite=None
Transfer-Encoding:
- chunked
content-language:
- pl-PL
vary:
- Accept-Encoding
x-b3-spanid:
- cd5ee136aa03a1d8
x-b3-traceid:
- 11eae3b3aabb43
x-cache:
- MISS
x-frame-options:
- SAMEORIGIN
status:
code: 200
message: OK
- request:
body: null
headers:
Connection:
- close
Host:
- 127.0.0.1:52534
User-Agent:
- Python-urllib/3.12
method: GET
uri: http://127.0.0.1:52534/json/version
response:
body:
string: "{\r\n \"Browser\": \"HeadlessChrome/124.0.6313.0\",\r\n \"Protocol-Version\":
\"1.3\",\r\n \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/124.0.6313.0 Safari/537.36\",\r\n
\ \"V8-Version\": \"12.3.219\",\r\n \"WebKit-Version\": \"537.36 (@0000000000000000000000000000000000000000)\",\r\n
\ \"webSocketDebuggerUrl\": \"ws://127.0.0.1:52534/devtools/browser/e00df6db-d8b2-47b6-9db5-303d163252fb\"\r\n}\r\n"
headers:
Content-Length:
- '438'
Content-Security-Policy:
- frame-ancestors 'none'
Content-Type:
- application/json; charset=UTF-8
status:
code: 200
message: OK
version: 1
the request is done like so:
session = HTMLSession()
url = "https://www.cinema-city.pl/#/buy-tickets-by-cinema?in-cinema=1080&at=2024-05-20"
response: HTMLResponse = session.get(url)
response.html.render() # render JS elements
session.close() # otherwise Chromium process will leak