nndownload icon indicating copy to clipboard operation
nndownload copied to clipboard

feat: get rolling comments

Open shovel-kun opened this issue 11 months ago • 12 comments

Resolves #134

Comment fetching logic is taken from https://github.com/tanbatu/comment-zouryou.

If you want, a flag could be added for specifying a timestamp.

shovel-kun avatar May 05 '25 15:05 shovel-kun

Hi @AlexAplin, thanks for reviewing my code.

I've made changes to resolves most of the issues, but I need clarification for some. I plan to implement the datetime range flag once we have resolved all of this.

By default, datetime range will be from current unix timestamp (start) until 2007-03-03 11:59 pm unix timestamp (end) (as set comment-zouryou) and cannot exceed that range.

Additionally, by default nndownload will try to fetch all comments in the datetime range or there are no more comments to fetch, whichever comes first. Some videos have millions of comments and it might take a very long time to fetch them all, so I should include a flag for the comment fetching to stop once it has fetched X number of comments (or perhaps timeout after some time? not sure which is better). This limit will take priority, then datetime range, and finally no more comments.

I would love to hear your thoughts.

shovel-kun avatar May 06 '25 10:05 shovel-kun

By default, datetime range will be from current unix timestamp (start) until 2007-03-03 11:59 pm unix timestamp (end) (as set comment-zouryou) and cannot exceed that range.

March 6th is when γ launched, along with the first video uploads, but maybe test data goes back to then. I think it's probably fine to use that as a limit if you specify it came from comment-zouryou.

This limit will take priority, then datetime range, and finally no more comments.

I'd add --comments-limit and set it with a default of 1000, which is what the site provides. Rather than a date range, --comments-from-date or similar should get comments at the timestamp requested, respecting --comments-limit. Additionally, --request-all-comments should do as said, and ignore the limit and date flags.

  • --download-comments: Request 1000 comments from today.
  • --download-comments --comment-limit <n>: Request n comments from today. If not a valid integer, do nothing and output a warning.
  • --download-comments --comments-from-date "%Y-%m-%d": Request 1000 comments looking back from %Y-%m-%d (initial when header) to 2007-03-03. If the date provided is invalid or before 2007-03-03, do nothing and output a warning.
  • --download-comments --comments-limit <n> --comments-from-date "%Y-%m-%d": Request n comments looking back from %Y-%m-%d to 2007-03-03. Same integer and date checks as above.
  • --download-comments --request-all-comments: Ignore all other flags and request going back from today to 2007-03-03

If any of the new flags are specified without --download-comments, a warning should be output saying they need to specify --download-comments additionally.

Hope this makes sense, but please let me know if you have any questions. Thanks for your efforts!

AlexAplin avatar May 06 '25 21:05 AlexAplin

I think it's probably fine to use that as a limit if you specify it came from comment-zouryou.

I'll specify.

I'd add --comments-limit and set it with a default of 1000, which is what the site provides.

On some videos, such as sm9, I get 250 comments instead of 1000 per fetch. Not sure if I'm being rate-limited, or niconico limits number of comments fetched based on total number of comments on the video. Could you check if this is the case for you?

Anyways, this sounds good to me, just that we might not get the exact number of comments given that per fetch number is variable, but that shouldn't be an issue. Just add in the flag description it might not be exact.

--download-comments --comments-from-date "%Y-%m-%d"

I'll make it datetime instead of just date for sake of granularity. If only date is provided, assume 11:59:59.

We should also want to accomodate the flags --download-comments --request-all-comments --comments-from-date "%Y-%m-%d", since the user could have stopped fetching comments halfway through and want to resume from when they last started. Hmm, I should also accept unix timestamps so that the user can just copy and paste the last when from their comment json data.

Off-topic: do you kmow what easy threads/comments are? What makes them different from main?

shovel-kun avatar May 06 '25 22:05 shovel-kun

That additional combination is a good call, feel free to handle that.

main comments are normal comments. easy comments should be the preset comments you see below videos, like かわいい or うぽつ, which is a fairly recent addition. Nicopedia article about them (says they were added 2020-07-27): https://dic.nicovideo.jp/a/%E3%81%8B%E3%82%93%E3%81%9F%E3%82%93%E3%82%B3%E3%83%A1%E3%83%B3%E3%83%88 In the past there was also an owner thread, no idea if that's still around.

On some videos, such as sm9, I get 250 comments instead of 1000 per fetch. Not sure if I'm being rate-limited, or niconico limits number of comments fetched based on total number of comments on the video. Could you check if this is the case for you?

On sm9 I get 980 comments received in the browser request to https://public.nvcomment.nicovideo.jp/v1/threads. Not sure why you'd get less, except maybe if your account's language is set differently, but it's likely to not always be 1000 anyway because of deleted and moderated comments.

AlexAplin avatar May 07 '25 04:05 AlexAplin

Hi @AlexAplin, thanks for the reply. Yes, the owner thread can still be fetched (sm9 has that).

I've added the flags you've requested. Note that I've changed --request-all-comments to just --all-comments to make it shorter.

You can test the following commands:

python nndownload.py -s --comments-limit 2000 "https://www.nicovideo.jp/watch/sm9"

  • Result: Comment downloading qualifiers --comments-limit, --request-all-comments, or --comments-from were specified, but --download-comments was not. Did you forget to set --download-comments?

python nndownload.py -sc --comments-limit 2000 "https://www.nicovideo.jp/watch/sm9"

  • Result: Downloads ≈2000 comments

python nndownload.py -sc --comments-limit 2000 --comments-from "2025-05-04T16:30:37+09:00" "https://www.nicovideo.jp/watch/sm9"

  • Result: First comment in JSON is after that date

python nndownload.py -sc --comments-limit 2000 --all-comments --comments-from "2025-05-04T16:30:37" "https://www.nicovideo.jp/watch/sm9"

  • Result: Ignores --comments-limit, saves downloaded comments on CTRL+C.

In addition, I've changed the comment fetching logic such that it will append to global COMMENT_DATA_JSON on every fetch. I did this so that if the comment processing takes too long and the user wants to stop it, nndownload will save whatever we've already fetched instead discarding all progress.

Since downloading comments can take quite a while, I want to implement an estimated progress output. Something that takes one line per thread, like:

Downloaded 123 out of 426 comments from main thread (est. time left: 56 seconds)

If you have any preferences on the format or tool to use for this, let me know.

shovel-kun avatar May 07 '25 15:05 shovel-kun

We settled on using rich for progress bars, so specifying the total should work, and you can set up a task for each thread if desired.

AlexAplin avatar May 07 '25 22:05 AlexAplin

@AlexAplin added the progress bar + a check on whether comments.json exists so that we don't accidentally overwrite previous progress.

shovel-kun avatar May 13 '25 20:05 shovel-kun

This got really spaghetti-fied. I've simplified a lot of the logic and made improvements. I've tested the different flows but may need reverification. I also removed the save on Ctrl+C because of the threading changes, but should be decently easy to add back I think.

I'll leave some feedback to explain my changes

AlexAplin avatar May 22 '25 05:05 AlexAplin

@AlexAplin sorry, been a bit busy. I'll try to finish this up in around 3-5 days.

shovel-kun avatar Jun 07 '25 11:06 shovel-kun

@AlexAplin sorry, been a bit busy. I'll try to finish this up in around 3-5 days.

No rush, thanks for your efforts

AlexAplin avatar Jun 07 '25 19:06 AlexAplin

@shovel-kun Hi again, just checking in. I'd like to get this merged before #205. If it's okay, I can make some of the changes on the branch

AlexAplin avatar Jul 16 '25 00:07 AlexAplin

Yes, sorry, go ahead @AlexAplin

shovel-kun avatar Jul 16 '25 05:07 shovel-kun