ngoyal2707 comments

Results 5 comments of


                                            ngoyal2707

RE: PLSQUARE spoj problem

@PrashanthVenkatesan First of all, glad that the solution helped you and thanks for the comment. The code is very old (3-4 years) and I haven't had chance to visit that...

Fixes for higher sequence lenghts

is it okay with you if we merge this directly to v3? and we can push the small PR to metaseq to make it work with v3?

List of Usability Changes

On nccl logs, instead of not doing debug, lets put them in different file using https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/env.html#nccl-debug-file

Allow to skip batches during training

lets not merge this, I tried this locally and I think current way of creating skip iterator always return has_next() False, which makes the code think every update is end_of_epoch....

logging every step for 175B model is roughly costing 1 second per step and not getting counted in WPS and UPS

How I identified the issue: added record_function for logging_stats which is after the train_step: https://github.com/facebookresearch/metaseq/blob/11bf89f3aa128acc44de359aa1de02c275e54f8f/metaseq_cli/train.py#L283-L295 the profile roughly looks something like: ![Screen Shot 2022-09-01 at 4 26 31 PM](https://user-images.githubusercontent.com/7836935/188005874-7cf71aed-9a54-4aec-98bb-bba057601d67.png) The...