sungjun lee comments

Results 19 comments of


                                            sungjun lee

[BUG] RuntimeError: still have inflight params [<bound method Init._convert_to_deepspeed_param.<locals>.ds_summary of Parameter containing:

I have the same problem using zero3 in pytorch lightning and using the generate function. Is there a solution?

where can I find COYO-Labels-300M?

Hi @Soonhwan-Kwon, thank you for your interest. We are currently preparing for the release of coyo-labeled-300M. We are also preparing ViT-L performance and training code using coyo-labeled-300M. You can meet...

where can I find COYO-Labels-300M?

Hi @Soonhwan-Kwon, we just updated[ COYO-Labeled-300M](https://github.com/kakaobrain/coyo-dataset/tree/main/subset/COYO-Labeled-300M). Thank you for waiting. :)

Zero shot classification

@rom1504 We did not evaluate clip trained with coyo on zero shot classification. I think the person who asking the question got confused with knn results. (imagenet)

What is the total file size of COYO dataset?

Hi @dlwogns0128 , sorry for late reply. Actually, it depends on how you save the images. When we saved the images at 95% quality using pil, the average size was...

Memory overhead in multiprocessing

I have a question regarding memory overhead. I created and ran an executor designed to count tokens on approximately 2TB of text (jsonl), but it gets stuck every time I...

Memory overhead in multiprocessing

Reducing workers or batch_size temporarily fixes memory overflows, but the real issue is the module’s inability to detect these problems. Enhancements are needed for stable, efficient performance.

Enhancing word_tokenize (like nltk) Support for Multiple Languages

Cool! But setting the language_filter's threshold to 0 and getting a language_id value seems weird. To address this, I've made it possible to extract useful language ID related statistics while...

Enhancing word_tokenize (like nltk) Support for Multiple Languages

@vsabolcec Nice work, macab in Spacy is known to be a good word_tokenizer for Korean When do you plan to make a pull request?

Filter very slow

I think you use an English tokenizer to handle Japanese.