sungjun lee

Results 19 comments of sungjun lee

I have the same problem using zero3 in pytorch lightning and using the generate function. Is there a solution?

Hi @Soonhwan-Kwon, thank you for your interest. We are currently preparing for the release of coyo-labeled-300M. We are also preparing ViT-L performance and training code using coyo-labeled-300M. You can meet...

Hi @Soonhwan-Kwon, we just updated[ COYO-Labeled-300M](https://github.com/kakaobrain/coyo-dataset/tree/main/subset/COYO-Labeled-300M). Thank you for waiting. :)

@rom1504 We did not evaluate clip trained with coyo on zero shot classification. I think the person who asking the question got confused with knn results. (imagenet)

Hi @dlwogns0128 , sorry for late reply. Actually, it depends on how you save the images. When we saved the images at 95% quality using pil, the average size was...

I have a question regarding memory overhead. I created and ran an executor designed to count tokens on approximately 2TB of text (jsonl), but it gets stuck every time I...

Reducing workers or batch_size temporarily fixes memory overflows, but the real issue is the module’s inability to detect these problems. Enhancements are needed for stable, efficient performance.

Cool! But setting the language_filter's threshold to 0 and getting a language_id value seems weird. To address this, I've made it possible to extract useful language ID related statistics while...

@vsabolcec Nice work, macab in Spacy is known to be a good word_tokenizer for Korean When do you plan to make a pull request?

I think you use an English tokenizer to handle Japanese.