passff URL matching: Use public suffix list to extract domain base

Right now PassFF assumes that if we're on bbc.co.uk we might want amazon.co.uk credentials filled out because hey, that's .co.uk too.

The Best Possible thing here would be to use the Public Suffix List https://publicsuffix.org/ Since Mozilla uses this heavily (e.g. it won't let you make cookies for co.uk but it will let you make cookies for github.com because one of those is a Public Suffix and the other is not) it may already be baked into Firefox's APIs somewhere accessible? If not perhaps the PSL, or a subset of it, can be compressed and baked into PassFF.

At the very least, please can PassFF learn a manual list of exceptions with .co.uk, .co.jp and other popular ones excluded as well as the TLDs ?

Aug 28 '18 13:08 tialaramex

I thought about this before, but was not happy with the overhead this would add to our project, since it requires using a third-party library like TLD.js. Once Firefox exposes the Gecko effective TLD service (https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/Interface/nsIEffectiveTLDService) to extensions, I'm going to look into this again.

Aug 28 '18 14:08 tuxor1337

For reference: https://bugzilla.mozilla.org/show_bug.cgi?id=1315558

Aug 28 '18 15:08 tuxor1337

Yeah, a WebExtension for that would be really useful, but I can't see myself doing that work in this lifetime and Moz seems to be in no hurry.

How about if while we wait PassFF grows another one of those little text entry fields that users can clumsily shove their own customisations into, but this time with suffixes in it ?

I figure I can tweak the code that matches DNS names, special case the handful of configured suffixes and have it work decently well without bloating PassFF more than a few lines.

If I offered to write this, would you be likely to take the PR? I confess in advance this would be the first time I ever wrote more than five lines of Javascript, but I'm at least halfway competent in a bunch of semicolon languages so how hard can it be right? We can default to .co.uk and a handful of other popular ones, or leave it empty, I am of course being completely selfish here.

Aug 28 '18 16:08 tialaramex

Yes, we could add a new section in the preferences for "browsing and searching the password store". This section would include the following preferences:

list directories on top (moved from section "general")
ignore case (moved from section "general")
enter key behavior (moved from section "general")
suffixes to strip from domain names (comma separated list, yet to be implemented)

I'm eagerly awaiting your PR :+1:

Aug 28 '18 16:08 tuxor1337

Apparently even a shared TLD is enough to trigger a ‘match’! Until a Public Suffix List solution is implemented, would it be possible to require that at least the last two fragments match (so TLD and one more)? Currently, I am greeted by a pinentry box whenever I visit a site on a TLD for which I have a password file for another domain under that TLD. And when I type in my pass phrase, I get a warning message that the domains don't match (obviously gleaned from the url field in the password file). This is inconvenient.

I assumed in the beginning that in my file names, I needed to put the most general domain name that I wanted that password file be a candidate for at the end of the file name (after the last @ in the file name), giving preference to more specific (longer) domain names (so …@example.org.gpg and …@example.co.uk.gpg). But passff is clearly not that strict. Such a strict mode (checkbox in the settings?) would put the user in control.

Sep 19 '18 14:09 equaeghe

@tialaramex already implemented (in #363) the workaround proposed in https://github.com/passff/passff/issues/361#issuecomment-416648154. It's merged, but hasn't been released yet. This issue remains open until a clean PSL solution is available.
You want to require that at least the last two fragments of a URL match. This has been the case already for a long time: https://github.com/passff/passff/blob/b97dd096806e3002e9ed06ee55495c5557dcfab3/src/modules/pass.js#L101
Please submit a new issue if you want to request a new feature that is not related to this topic.

Sep 19 '18 15:09 tuxor1337

@tuxor1337: regarding 2., on site c.b.a, I get a match for a password file with name of the form [email protected]@f.e.d.a.gpg. AFAICT, the only part where a match would be possible is the TLD a.

Sep 19 '18 15:09 equaeghe

Impossible with PassFF unless you're using an outdated version that is more than 5 months old.

Here is a proof: https://jsfiddle.net/ofcw1kLy/ Only values greater than 0 are considered matches. Your example gives match quality -1.

Sep 19 '18 16:09 tuxor1337

I'm on 1.5.1/1.0.2.

Try hostMatchQuality({fullKey:"fxyzf.e.d.a"}, "c.xyz.a"), which gives me a score of 201. I guess a partial regex match is used in the code instead of a full one. (I now see that this problem indeed does not belong to this issue, sorry for hijacking, it was unintended.)

Sep 19 '18 20:09 equaeghe

How frequently does the public suffix list change? It seems like passff could hard-code a not-that-long list and improve the situation for 80%+ of cases. Related to https://github.com/passff/passff/issues/535

Jul 04 '23 07:07 bkazez

Rather than hard-coding, how about adding something like this as a Makefile build step?

#!/usr/bin/env python3

import requests
import json
import sys

PUBLICSUFFIX_LIST = 'https://publicsuffix.org/list/public_suffix_list.dat'

def emplace_tld(target, this_part, *parts):
    if this_part not in target:
        target[this_part] = {}
    if parts:
        emplace_tld(target[this_part], *parts)


def read_tlds(dat_lines):
    out = {}
    for line in dat_lines:
        line = line.partition('//')[0].strip()
        if line:
            emplace_tld(out, *reversed(line.split('.')))
    return out


def download_tld_list():
    response = requests.get(PUBLICSUFFIX_LIST)
    response.raise_for_status()
    return read_tlds(response.text.split("\n"))


if __name__ == '__main__':
    print("var tldList = ", end='')
    json.dump(download_tld_list(), sys.stdout)

which would then allow passff to define its own getPublicSuffix that would answer most cases correctly, and allow for eventual drop-in replacement with browser.dns.getPublicSuffix(url) if Mozilla ever give us that.

function getPublicSuffix(domain) {
	var parts = domain.split(/\./g).reverse();
	var cursor = tldList;
	var tld = [];
	for (var part of parts) {
		if (cursor = (cursor['*'] || cursor[part]))
			tld.push(part);
		else
			break;
	}
	return tld.reverse().join('.');
}

Here the total amount of code we're pushing to users is one machine-generated json dict, and a 12-line function. I feel like this'd result in better match results most of the time, but carry some obvious downsides:

bundling a 132 KB json file
synchronization with the public suffix list (and it's delayed in synchronizing with the ever-changing morass of corporate TLDs)
no punycode support

Jul 04 '23 14:07 drmoose

Thanks for your efforts! I don't like the idea of bundling a 132 KB file - in particular because our users have the data already (it's part of Firefox), but we cannot use it.

It seems to me like our workaround already solves most problems with URL matching (like the one about .co.uk in the original post of this issue). Maybe you can give me some more examples that illustrate why this is an urgent issue that would justify bundling this file that is dynamically loaded from an external source during the build process.

Jul 04 '23 20:07 tuxor1337

What about a hard-coded shortlist to improve handling for 80% of cases? Or even a dumber algorithm like ".co."?

Jul 04 '23 20:07 bkazez

As I said, I'm not convinced that this would bring a significant benefit. If you can list some examples where the current URL matching is useless, you might be able to convince me otherwise.

Jul 05 '23 06:07 tuxor1337