NewPipeExtractor icon indicating copy to clipboard operation
NewPipeExtractor copied to clipboard

Use Piped's fork of nanojson

Open Stypox opened this issue 3 years ago • 15 comments

  • [x] I carefully read the contribution guidelines and agree to them.
  • [x] I have tested the API against NewPipe.
  • [x] I agree to create a pull request for NewPipe as soon as possible to make it compatible with the changed API.

Our fork of nanojson was initially introduced in #317. That PR contains no description, but if I remember correctly our own fork was created in order to be able to parse some particular Json format returned by some service.

Anyway, I have run the tests with Piped's / @FireMasterK's fork and there does not seem to be any problem. The upstream library (i.e. com.grack.nanojson) cannot be used directly since it returns null instead of new JsonObject() in some methods, causing NPEs (fixed by 7056f30 in our fork). In any case, @FireMasterK's PR is up-to-date with upstream (last upstream commit is Aug 9), contains 7056f30, and contains one more commit related to performance (https://github.com/FireMasterK/nanojson/commits/master). Since our fork changed some things related to how the JSON is parsed, it turned out to be thousands of times slower than upstream: see this analysis by @FireMasterK.

These are the only tests that fail: they are unrelated to the changes in this PR, since they are the same as here.

Stypox avatar Nov 24 '22 08:11 Stypox

You also need to update https://github.com/TeamNewPipe/NewPipeExtractor/blob/c953e2341493a554fd464c2c93f60f7c54a6c49d/timeago-parser/build.gradle#L2

TobiGr avatar Nov 24 '22 09:11 TobiGr

Yeah sorry, I was about to do that

Stypox avatar Nov 24 '22 09:11 Stypox

I also include fastutil (which is not a problem for Piped since I use all over in its code), but is an additional dependency, maybe we don't want to include that commit here?

FireMasterK avatar Nov 24 '22 10:11 FireMasterK

Dang bro that's pretty slow. I'm not really sure why I was requested though.

But anyway, if it runs fine and better, then it looks good to me 👍

It'll probably also need to be changed here too, right?

TacoTheDank avatar Nov 25 '22 18:11 TacoTheDank

I don't understand why the mocks are missing. That's an indicator for changes in the requests, most likely those which have a JSON body. We need to update the mocks and should have a brief look on how the requests changed exactly

TobiGr avatar Nov 25 '22 20:11 TobiGr

I don't understand why the mocks are missing. That's an indicator for changes in the requests, most likely those which have a JSON body. We need to update the mocks and should have a brief look on how the requests changed exactly

I had a look into this, and now keys for the objects generated just aren't sorted: Old:

{"cpn":"uGaaNCVq9tAJ9K8j","contentCheckOk":true,"playbackContext":{"contentPlaybackContext":{"referer":"https://www.youtube.com/watch?v=INVALID_ID_","signatureTimestamp":"19215"}},"context":{"request":{"internalExperimentFlags":[],"useSsl":true},"client":{"hl":"en-GB","gl":"GB","clientName":"WEB","originalUrl":"https://www.youtube.com","clientVersion":"2.20220809.02.00","platform":"DESKTOP"},"user":{"lockedSafetyMode":false}},"racyCheckOk":true,"videoId":"INVALID_ID_"}

New:

{"context":{"client":{"hl":"en-GB","gl":"GB","clientName":"WEB","clientVersion":"2.20220809.02.00","originalUrl":"https://www.youtube.com","platform":"DESKTOP"},"request":{"internalExperimentFlags":[],"useSsl":true},"user":{"lockedSafetyMode":false}},"playbackContext":{"contentPlaybackContext":{"signatureTimestamp":"19215","referer":"https://www.youtube.com/watch?v=INVALID_ID_"}},"cpn":"uGaaNCVq9tAJ9K8j","videoId":"INVALID_ID_","contentCheckOk":true,"racyCheckOk":true}

YouTube doesn't order fields, so I feel this may indeed be a good thing

I think we should regenerate the mocks

No clue what they changed for this to happen tho: https://github.com/TeamNewPipe/nanojson/compare/01856169ed97c213ff9135d67eb352eb04e1c8c7...FireMasterK:nanojson:master~2

FireMasterK avatar Nov 26 '22 03:11 FireMasterK

I had a look into this, and now keys for the objects generated just aren't sorted

Wouldn't this make mocks useless?

No clue what they changed for this to happen tho: https://github.com/TeamNewPipe/nanojson/compare/01856169ed97c213ff9135d67eb352eb04e1c8c7...FireMasterK:nanojson:master~2

The only change which could make a difference in my opinion is https://github.com/FireMasterK/nanojson/commit/6b8c1ef25d8544ea2c0ad157730e18025dbfcd53.

Also, why not updating our fork instead of using a fork of our fork?

AudricV avatar Nov 27 '22 19:11 AudricV

Wouldn't this make mocks useless?

No, because now they're in the order that we add them in the builder I believe

The only change which could make a difference in my opinion is FireMasterK/nanojson@6b8c1ef.

Also, why not updating our fork instead of using a fork of our fork?

That could be done as well, but how would you remove the old commits? I'm not sure what's the team's policy on force pushing...

Edit: Or actually, we could create a new branch and rename this one

FireMasterK avatar Nov 27 '22 20:11 FireMasterK

No, because now they're in the order that we add them in the builder I believe

Then, that's fine for me!

AudricV avatar Nov 27 '22 21:11 AudricV

Also, why not updating our fork instead of using a fork of our fork?

That would be fine, but since FireMasterK's fork is actively maintained and serves almost the same purpose, I think it would make more sense to just use that. It wouldn't be difficult to switch away if the needs ever diverged. We could archive our fork in the meantime.

Stypox avatar Nov 27 '22 21:11 Stypox

if I remember correctly our own fork was created in order to be able to parse some particular Json format returned by some service

Hi, the context is here: https://github.com/TeamNewPipe/NewPipeExtractor/pull/232#issuecomment-590074806. I believe that Bandcamp has switched to valid JSON in the meantime, without any comments. :)

fynngodau avatar Apr 02 '23 20:04 fynngodau

I would also be in favor of updating our fork instead. I also think using lazy parsing of e.g. numbers would be great.


That PR contains no description, but if I remember correctly our own fork was created in order to be able to parse some particular Json format returned by some service.

There were 2 reasons for our fork of nanojson:

  1. To return an empty JsonObject instead of null to prevent NullPointerExceptions
  2. Bandcamp returned what I call "semi-JSON" i.e. { what: 'ever' } (instead of { "what": "ever" }) and we needed to be able to parse that

wb9688 avatar May 28 '23 17:05 wb9688

I would also be in favor of updating our fork instead. I also think using lazy parsing of e.g. numbers would be great.

That PR contains no description, but if I remember correctly our own fork was created in order to be able to parse some particular Json format returned by some service.

There were 2 reasons for our fork of nanojson:

  1. To return an empty JsonObject instead of null to prevent NullPointerExceptions
  2. Bandcamp returned what I call "semi-JSON" i.e. { what: 'ever' } (instead of { "what": "ever" }) and we needed to be able to parse that
  1. You're right, I have cherry-picked that one commit from the TeamNewPipe repository.
  2. This is no longer the case and is no longer necessary. I don't think modifying the JSON parser should have been the solution for this, as in this case, we created significant performance regressions. I think using regexes would have been a better solution here.

I'd like to upstream as many changes as possible from my fork! It becomes lesser work for me to maintain my fork then anyway.

A lot of my changes could probably be useful for TeamNewPipe's fork, but not upstream nanojson due to some limitations by my changes.

I've made a lot of performance-oriented changes in my fork in comparison to upstream.

I've also added support for what I call "Lazy String" parsing in my fork. This could probably even be upstreamed to the original repository even. (I don't have the bandwidth for that currently)

I'd like to express my dislike for the nanojson library for doing all the parsing eagerly. If feasible, I would even like to replace it in the extractor with a more performant and well-built library like Jackson.

FireMasterK avatar May 29 '23 22:05 FireMasterK