trisongz
trisongz
> Would encourage you to join us [on Slack](https://cdr.co/join-community) so we can discuss these in more detail. > Requested to join! > Follow up question: How were you provisioning these...
The idea would be modeling it after something like the [SQuAD/SWAG](https://github.com/huggingface/transformers/tree/master/examples#squad) dataset for Question Answer, where you have typically a large body of text as initial context (virus sequence), followed...
@amoux That's pretty awesome! I hadn't thought of using a node graph, mainly because I don't work with them as often as I'd like to. So I've been messing around...
The objectives for this BERT model is extractive QA in SQuAD style - so it should be able to do Question Answering given the input text. I figured that would...
I actually spent a bit of time cleaning up the CORD-19 dataset and compiled it into a single jsonl file. It's pre-processed along with using SciBERT to label potential diseases...
If you have a text file with one text per line, a quick way to create the text_list object to be loaded into the function: ``` from numpy import loadtxt...
Was able to get it to train with batch-size 1 on the 50% split dataset (8.6gb -> 7747316 lines) Here's the configs: ``` GPT2Config { "activation_function": "gelu_new", "attn_pdrop": 0.0, "bos_token_id":...
Also - I've found GPT2 works more effectively with custom one-hot tokens, which I generally build in during pre-processing. ` Run this task: ... ` Would you consider adding that...
Tokenizers library's documentation is really sparse and has a lot of nuances that I've ran into as well. What I've found works really well is[ following this notebook.](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb#scrollTo=QDNgPls7_l13) ``` from...
I wrote a custom function for batching, which was able to fit into memory where it previously didn't (even though it's still 150GB+ of memory for a 8GB File, it's...