[YouTube] Throttling parameter decryption is broken, decrypt function is not again fully extracted
With player 1f7d5369, the decryption of the throttling parameter fails because the function is not again fully extracted:
Left: what is extracted by the extractor; right: the real function
The extractor still works, because this time an exception catch is properly made.
I just noticed the same issue. This time regex literals are to blame:
/,,[/,913,/](,)}/,
Avoiding these is not as easy as braces in strings. We cant simply treat slashes like quotes, because regex character ranges can have slashes in them.
At this point, wouldn't it be the best solution to use an actual JavaScript lexer to extract the function?
At this point, wouldn't it be the best solution to use an actual JavaScript lexer to extract the function?
Yep, seems the only reasonnable option to me. And I'm pretty sure that functions wil get harder and harder to parse as the time goes on.
I am currently working on a YouTube downloader/client library in Rust (thats how noticed the issue). So I wrote a test implementation of the fix for it, using the ress lexer.
fn extract_js_fn(js: &str, name: &str) -> Result<String> {
let scan = ress::Scanner::new(js);
let mut state = 0;
let mut level = 0;
let mut start = 0;
let mut end = 0;
for item in scan {
let it = item?;
let token = it.token;
match state {
// Looking for fn name
0 => {
if token.matches_ident_str(name) {
state = 1;
start = it.span.start;
}
}
// Looking for equals
1 => {
if token.matches_punct(ress::tokens::Punct::Equal) {
state = 2;
} else {
state = 0;
}
}
// Looking for begin/end braces
2 => {
if token.matches_punct(ress::tokens::Punct::OpenBrace) {
level += 1;
} else if token.matches_punct(ress::tokens::Punct::CloseBrace) {
level -= 1;
if level == 0 {
end = it.span.end;
state = 3;
break;
}
}
}
_ => break,
};
}
if state != 3 {
return Err(anyhow!("could not extract js fn"));
}
Ok(js[start..end].to_owned())
}
This works fine with the new player.js. And it looks like Mozilla Rhino, the JS interpreter we are using, has an API for its parser. So it should be possible to implement this for NewPipe without additional dependencies.
https://javadoc.io/doc/org.mozilla/rhino/latest/index.html http://ramkulkarni.com/blog/understanding-ast-created-by-mozilla-rhino-parser/
A lexer isn't really needed. The function body can be extracted by carefully keeping track of the quotes and braces. Equivalent code in yt-dlp: https://github.com/yt-dlp/yt-dlp/blob/b76e9cedb33d23f21060281596f7443750f67758/yt_dlp/jsinterp.py#L229-L254
But if your dependency already has a Lexer, ig why not use it
I now have a working prototype. It is not pretty and definitely needs cleanup, so I have to do that first before I make a PR. I ended up having to copy Rhino's tokenizer class because it is private. The higher-level parser is accessable, but it only parses entire JS documents into syntax trees, which would take too much time.
I also found an issue with the Rhino JS interpreter. Version 1.7.14 uses javax.lang.model.SourceVersion, which is not available on android. This causes the app to load indefinitely when opening a video. If you have any idea how to fix this without downgrading, please help me. I have no idea why this error did not occur before.
https://github.com/mozilla/rhino/issues/1149
The problem described here will also be partially fixed with https://github.com/TeamNewPipe/NewPipeExtractor/pull/882#issuecomment-1221596544
A lexer isn't really needed. The function body can be extracted by carefully keeping track of the quotes and braces.
I think that's a good approach.
But if your dependency already has a Lexer, ig why not use it
It does, but as mentioned by @Theta-Dev, it is unfortunately private, and I don't think we should copy the lexer to our codebase.
An alternative is to fork Rhino and make the lexer public.
An alternative is to fork Rhino and make the lexer public.
Or maybe contribute the changes to Mozilla ;)
If they would accept it, sure. ;)
I am currently working on a YouTube downloader/client library in Rust
@Theta-Dev are you still rewriting NewPipeExtractor in Rust? Is it public yet? ;-)
Sorry for writing this comment here, but since you're not on IRC I didn't know how to write to you otherwise.
@Stypox yes, RustyPipe is basically finished. You can get it here:
https://code.thetadev.de/ThetaDev/rustypipe
btw: how can I join you on IRC?
Check out Contributing.md