trisongz comments

Results 13 comments of


                                            trisongz

Allow for NodeSelector as Admin Defined and User Option?

> Would encourage you to join us [on Slack](https://cdr.co/join-community) so we can discuss these in more detail. > Requested to join! > Follow up question: How were you provisioning these...

Using Natural Language Transformers for Classification

The idea would be modeling it after something like the [SQuAD/SWAG](https://github.com/huggingface/transformers/tree/master/examples#squad) dataset for Question Answer, where you have typically a large body of text as initial context (virus sequence), followed...

Using Natural Language Transformers for Classification

@amoux That's pretty awesome! I hadn't thought of using a node graph, mainly because I don't work with them as often as I'd like to. So I've been messing around...

BioBERT Model Available - Trained on BioASQ for Question Answer

The objectives for this BERT model is extractive QA in SQuAD style - so it should be able to do Question Answering given the input text. I figured that would...

BioBERT Model Available - Trained on BioASQ for Question Answer

I actually spent a bit of time cleaning up the CORD-19 dataset and compiled it into a single jsonl file. It's pre-processed along with using SciBERT to label potential diseases...

what's the meaning of the hyperparameter "text_list=texts" ?

If you have a text file with one text per line, a quick way to create the text_list object to be loaded into the function: ``` from numpy import loadtxt...

Building TokenDataset consumes excessive amounts of RAM

Was able to get it to train with batch-size 1 on the 50% split dataset (8.6gb -> 7747316 lines) Here's the configs: ``` GPT2Config { "activation_function": "gelu_new", "attn_pdrop": 0.0, "bos_token_id":...

Building TokenDataset consumes excessive amounts of RAM

Also - I've found GPT2 works more effectively with custom one-hot tokens, which I generally build in during pre-processing. ` Run this task: ... ` Would you consider adding that...

Building TokenDataset consumes excessive amounts of RAM

Tokenizers library's documentation is really sparse and has a lot of nuances that I've ran into as well. What I've found works really well is[ following this notebook.](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb#scrollTo=QDNgPls7_l13) ``` from...

Building TokenDataset consumes excessive amounts of RAM

I wrote a custom function for batching, which was able to fit into memory where it previously didn't (even though it's still 150GB+ of memory for a 8GB File, it's...