ScrapySharp icon indicating copy to clipboard operation
ScrapySharp copied to clipboard

Issue with cookie path

Open balexandrov opened this issue 7 years ago • 4 comments

I am trying to scrape a server that returns me multiple Set-Cookie headers and the alternative parser does not work correctly. The default one works but throws invalid cookie path because there is a cookie with a path.

This Uri construction here omits the path and supplies domain only for example https://example.com:443/ but if the cookie is with path for example /forum - cookieContainer.SetCookies throws exception with invalid path. Just passing the original url here fixed it for me. Not sure why that url construction was needed.

ScrapingBrowser.cs private async Task<HttpWebResponse> GetWebResponseAsync(Uri url, HttpWebRequest request)

var cookieUrl =
                        new Uri(string.Format("{0}://{1}:{2}/", response.ResponseUri.Scheme, response.ResponseUri.Host,
                                              response.ResponseUri.Port));
         
                    if (UseDefaultCookiesParser)
                        cookieContainer.SetCookies(url, cookiesExpression);
                    else
                        SetCookies(url, cookiesExpression);

...

Regards, Bojo

balexandrov avatar Jun 01 '18 13:06 balexandrov

I am running into this also with Set-Cookie: JSESSIONID=1DB030FB14CB6BE89638E86ACEXXXXXX.node1; Path=/iam/im; Secure

jeffmikan avatar Oct 30 '18 03:10 jeffmikan

I have answered the same problem in this issue

khantoocool avatar Dec 23 '19 11:12 khantoocool

I have answered the same problem in this issue

Your "answer" is simply disabling cookie processing. That is not always a viable solution - like when you're logging in, for example.

toquehead avatar Nov 07 '20 05:11 toquehead

This appears to still be an issue 2 years later. I'm a bit perplexed as it seems like such a show stopper. What am I missing?

I would add that you can avoid the error by appending the offending cookie's path to the Uri. So with a multi-cookie path you'd need to parse the cookies and call cookieContainer.SetCookies() once for each unique path. But I can't figure out how to do with with ScrapySharp as there are no hooks into the processing of cookies.

The cookies I'm getting (Wordpress) are also formatted with date strings that include a comma, so ScrapySharp.Network.CookieParser fails to correctly parse the cookies.

toquehead avatar Nov 07 '20 05:11 toquehead