JS: Skip minified file if avg line length > 200
Improves the detection of minified files, by classifying files with an average line length over 200 as minified.
Minifiers typically compile a file to one long line, but sometimes there's a copyright comment in front of it, so we need to account for some normal-looking lines. Sometimes multiple such files are concatenated in a bundle.
Note that we already skip certain files based on filenames, such as *.min.js, but previously we did not detect minification based on contents.
I'm wondering if there's some classes of cases we're going to miss by doing that (i.e. I would imagine URLs could result in >200 length lines, which are potentially prone to injection?), and if: 1) we'd be able to find this out when rolling it out; or 2) have a slightly different heuristic to avoid this?
I don't have a strong opinion here but something like:
const apiEndpoint = "https://api.example.com/v2/users/profile/settings/notifications/preferences/email?includeArchived=true&format=json&apiKey=abc123def456";
let blah = someOtherNormalJSStuff();
if(blah) {
blah.obviouslyNotMinifiedJS();
}
Would be considered minified. No clue as to what the data looks like in terms of distribution of line lengths
I'm wondering if there's some classes of cases we're going to miss by doing that (i.e. I would imagine URLs could result in >200 length lines, which are potentially prone to injection?), and if: 1) we'd be able to find this out when rolling it out; or 2) have a slightly different heuristic to avoid this?
I don't have a strong opinion here but something like:
const apiEndpoint = "https://api.example.com/v2/users/profile/settings/notifications/preferences/email?includeArchived=true&format=json&apiKey=abc123def456"; let blah = someOtherNormalJSStuff(); if(blah) { blah.obviouslyNotMinifiedJS(); }Would be considered minified. No clue as to what the data looks like in terms of distribution of line lengths
I don't think that's quite right. Your code snippet is ~240 characters and 5 line breaks, for an average line length of 48 -- well below the limit.
To get above an average of 200, you either need a few really long lines, or a lot of >200 character lines.
Ya you're right thanks @tausbn -- my brain forgot what an average is and read it just as "max" 🙃 . A 200 average length one for the above example would basically a helper js file that just had URL definitions hah