AngleSharp.Io icon indicating copy to clipboard operation
AngleSharp.Io copied to clipboard

Doesn't handle cookies where the Domain starts with a dot.

Open TonyDev314 opened this issue 1 year ago • 7 comments

The AdvancedCookieProvider does not cater for cookies lacking the "www" on the front.

The website I am working with has cookies with the "www.websitename.com" domain and cookies with the ".websitename.com" domain.

On adding the cookies the code currently strips off the "." and when selecting cookies to send on the next request, ignores all these cookies because they don't match the host name.

Should the code be altering the Domain name of cookies??

I think it needs to leave the Domain alone when set, and do a comparison on the basis of the host and the cookie domain excluding a leading "www." or "."

TonyDev314 avatar Feb 28 '24 23:02 TonyDev314

OK we need a spec to back this up. Can you point to a spec where this is written?

excluding a leading "www." or "."

I can already guarantee you that "www." should not be excluded. It's a subdomain like any other. Regarding the "." I'd like to see a spec that defines such edge cases and how they should be handled. I think this might be a case for some normalization, but I need some spec confirmation.

FlorianRappl avatar Feb 28 '24 23:02 FlorianRappl

I believe that the leading . means that the cookie should be included in all subdomains The "." was deemed no longer necessary in this spec:

https://www.rfc-editor.org/rfc/rfc6265#section-4.1.2.3

But then the issue is that mywebsite.com cookies are not being included in requests to www.mywebsite.com

Perhaps the change needs to be in the matching algorithm and instead of looking for an absolute match, checking to see if the cookie domain forms part of the URL domain?

On Wed, 28 Feb 2024 at 23:36, Florian Rappl @.***> wrote:

OK we need a spec to back this up. Can you point to a spec where this is written?

excluding a leading "www." or "."

I can already guarantee you that "www." should not be excluded. It's a subdomain like any other. Regarding the "." I'd like to see a spec that defines such edge cases and how they should be handled. I think this might be a case for some normalization, but I need some spec confirmation.

— Reply to this email directly, view it on GitHub https://github.com/AngleSharp/AngleSharp.Io/issues/36#issuecomment-1970095977, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACSKIUQFWICFE5OGB7ARFH3YVZUW5AVCNFSM6AAAAABD65UP3OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZQGA4TKOJXG4 . You are receiving this because you authored the thread.Message ID: @.***>

TonyDev314 avatar Feb 29 '24 00:02 TonyDev314

Perhaps the change needs to be in the matching algorithm and instead of looking for an absolute match, checking to see if the cookie domain forms part of the URL domain?

I'd not change the matching algorithm - I'd rather normalize what we see in the domain field. From the RFC it seems that the leading dot is the change necessary, otherwise all algorithms can continue to work as-is. So if we normalize this input from the domain field (only chance a leading dot appears) then we should be fine.

FlorianRappl avatar Feb 29 '24 00:02 FlorianRappl

Surely you have to change the matching algorithm though. At present the match must be exact between the URL domain and the cookie domain. So whether or not a leading dot is included, the cookie is excluded.

For me a change from cookie.Domain.Is(domain) to domain.EndsWith(cookie.Domain, StringComparison.OrdinalIgnoreCase) in both FindCookie and FindCookies makes the behaviour as expected.

TonyDev314 avatar Feb 29 '24 00:02 TonyDev314

Surely you have to change the matching algorithm though. At present the match must be exact between the URL domain and the cookie domain. So whether or not a leading dot is included, the cookie is excluded.

No - as I wrote; if we normalize the input then the matching algorithm does not need to be changed. There will never be a dot in front of a domain.

For me a change from cookie.Domain.Is(domain) to domain.EndsWith(cookie.Domain, StringComparison.OrdinalIgnoreCase) in both FindCookie and FindCookies makes the behaviour as expected.

This will not work in general. Yes, of course it works in your case - but then you could also just "return true". Because if domain is "www.foobar.com" it would also match a cookie.Domain of "bar.com", but it should only match "foobar.com".

FlorianRappl avatar Feb 29 '24 06:02 FlorianRappl

It works if you include the "." in the cookie retained. That requires further change

Need to store a cookie ".mywebsite.com" such that it matches requests to www.mywebsite.com AND www.foo.mywebsite.com etc. But as you point out, it should NOT match www.thisismywebsite.com.

There's no way around that without altering the matching algorithm.

TonyDev314 avatar Feb 29 '24 09:02 TonyDev314

There's no way around that without altering the matching algorithm.

I agree that the matching algorithm in general needs to be changed, but not due to the dot. The dot has to be taken care of in an input normalization. We'll remove the leading dot and handle all those cases in the matching algorithm.

If you'd retain the dot you'd always need to take care of it in the matching algorithm, which does not make much sense. You can already remove it beforehand and drop that case immediately. Matching will be done more often than setting / writing - so normalizing the input makes sense also from a performance POV. It's just unnecessary work to retain the dot.

As the spec said - "foo.com" and ".foo.com" should behave the same. So let's normalize ".foo.com" to be "foo.com".

FlorianRappl avatar Feb 29 '24 09:02 FlorianRappl