Pedro Vitor Quinta de Castro comments

Results 35 comments of


                                            Pedro Vitor Quinta de Castro

Training from scratch on TPU

@ngoanpv and @wxp16, I did a training for a v3-8 TPU and a Tesla V100 from a DGX-1, using batch sizes 512 and 24 (couldn't fit 32). What amazes me...

> @peregilk I really run with batch size 512. > > @stefan-it I run with python module > > ```python > import sentencepiece as spm > spm.SentencePieceTrainer.Train('--input= --vocab_size=30000 --model_prefix=prefix_name --pad_id=0...

tokenization encode_pieces error

Ran into the same problem, I'm using your solution too.

Dataset about argument mining and storytelling?

For argument mining there is a strong research group at Darmstadt: https://www.informatik.tu-darmstadt.de/ukp/research_6/research_areas/argumentation_mining/index.en.jsp They keep some annotated datasets for benchmark under the "Resources" section in their website above.

Loss Curve

Sorry, missed the results folder

Loss Curve

Thanks @plkmo ! Are you uploading an updated one?

Loss Curve

![loss_vs_epoch_0](https://user-images.githubusercontent.com/12713359/78023667-701ea300-732d-11ea-916b-70a233dac815.png) ![accuracy_vs_epoch_0](https://user-images.githubusercontent.com/12713359/78023672-7280fd00-732d-11ea-9672-60c30f7054da.png) I'm reopening since we're still discussing this :sweat_smile: I got these losses. Do you think they are ok?

Pedro Vitor Quinta de Castro

Training from scratch on TPU

Training from scratch on TPU

tokenization encode_pieces error

Dataset about argument mining and storytelling?

Loss Curve

Loss Curve

Loss Curve

Loss Curve

Problem with special tokens

Problem with special tokens