VscodeAdblockSyntax icon indicating copy to clipboard operation
VscodeAdblockSyntax copied to clipboard

Allow using unescaped `$` symbol in the value of some special modifiers

Open DandelionSprout opened this issue 1 year ago • 10 comments

I noticed this when working on https://github.com/DandelionSprout/adfilt/commit/7f50937e2924f16129baa20810a276c9f44a6962 some minutes before I submitted this issue report, where the $ sign in ||www.amazon.$removeparam=/^[a-z_]{1,20}=[a-zA-Z0-9._-]{80,}$/ is shown by the VS syntax as invalid, despite such a setup being relatively important when handling RegEx in general.

image

So I can only presume that a fix of some sort has to be made, so that the $ sign would be shown as not-invalid.

DandelionSprout avatar Mar 21 '24 11:03 DandelionSprout

Shouldn't $ be escaped inside a regexp in the rule?

Alex-302 avatar Mar 21 '24 13:03 Alex-302

No, because the $ marks the end of the RegEx line.

DandelionSprout avatar Mar 21 '24 13:03 DandelionSprout

I mean when you copy regexp to a rule. https://github.com/gorhill/uBlock/wiki/Static-filter-syntax - I see \$ in some examples with regexps.

Btw it seems you forgot a directive here after removing the line 48 https://github.com/DandelionSprout/adfilt/commit/ee3faac9930f536dc6436286bf0f54066ac0b663#diff-11d0fe9cc978acebe3ce9040b1f319079c575ad786eeae23659c15715b360116L88

Alex-302 avatar Mar 21 '24 14:03 Alex-302

\$ is when there's a $ inside the matching text.

$ is for a textline end.

DandelionSprout avatar Mar 21 '24 15:03 DandelionSprout

The @Alex-302's point is that $ is generally considered a special character according to the filtering rule syntax and thus it needs to be escaped whenever you are using it in this context.

ameshkov avatar Mar 25 '24 07:03 ameshkov

@ameshkov So there is an incompatibility with uBO here?

Alex-302 avatar Mar 26 '24 17:03 Alex-302

Meshkov does raise a very important question there.

/^parameter=value$/ is not only fully supported in uBO, but my current personal impression is that it's strongly recommended in uBO.

The way you at Team AdGuard above describe it, would require /^parameter=value\$/, which'd be a little odd compared to the semi-standardised RegEx code standards, but also indeed does result in some sort of AG-uBO syntax differences.

DandelionSprout avatar Mar 26 '24 19:03 DandelionSprout

We should probably add support for this in AGTree/tsurlfilter just for compatibility reasons then.

ameshkov avatar Mar 27 '24 09:03 ameshkov

@ameshkov If we add support, we need to modify:

  • AGTree for proper parsing in linter & browser extension
  • TMLanguage for proper syntax highlighting

scripthunter7 avatar Mar 27 '24 10:03 scripthunter7

@ameshkov @Alex-302 There are two other points here to consider:

  1. The regular expression may contain unescaped commas. Normally, we would use such a comma to separate modifiers, this is the reason why AGLint gives the error that /^[a-z_]{1 is not a valid modifier for input $removeparam=/^[a-z_]{1,20}=[a-zA-Z0-9._-]{80,}$/, because AGTree splits it at the first unescaped comma within the regexp.
  2. There are some modifiers, like $domain, where the value can be completely regexp or just a list with some elements regexp. In other words both are $domain=/regex/ and $domain=example.org|/regex/|example.net valid cases.

Tokenization should work in these cases as well.

scripthunter7 avatar Apr 08 '24 09:04 scripthunter7