Ashim Gupta comments

Results 11 comments of


                                            Ashim Gupta

Label probabilities for the CRF layer

Hi @agarwalishan I would suggest you look into pytorch based implementations of CRF. Many of these output marginal probabilities. Even if you can not find such an implementation on github,...

what is the difference between softmax and marginal mode in crf

@linxihui But there is mathematical formulation in a linear chain CRF to find the marginal probabilities at each time step. Please refer to [this](http://info.usherbrooke.ca/hlarochelle/ift725/3_05_computing_marginals.pdf). What we need to calculate this...

Any plan to release the code?

Hi @Eric-Wallace , If possible, can you please release the code. It is fine if its not well documented or anything. One of the reviewers wants us to compare with...

how can I access to OntoNotes 5.0 data?

You have to first create an order, it's free for academic use. Then in a day or two, you get the email that the order is fulfilled. After that you...

Missing jar/Missing instructions about Ant

For anyone still stuck here, you can run `ant` in the source directory to generate the jar file.

Reproduce result of Boolq on LLaMA-7B

@bmosaicml , @vchiley I can't reproduce the zero-shot results on boolq either. For both `llama-7b` and `mpt-7b`. My two yaml scripts are: ``` max_seq_len: 2048 seed: 1 model_name_or_path: huggyllama/llama-7b #...

Reproduce result of Boolq on LLaMA-7B

@dakinggg I ran the `mpt-7b` model without fsdp with the following config: ```max_seq_len: 2048 seed: 1 model_name_or_path: mosaicml/mpt-7b # Tokenizer tokenizer: name: ${model_name_or_path} kwargs: model_max_length: ${max_seq_len} model: name: mpt_causal_lm init_device:...

Multi GPU inference

Thank you @abhi-mosaic for responding. I have access to a node with 8 A100s with 80GB GPU RAM each. I quickly ran tests with the following configurations on `llama-30b`: |...

Multi GPU inference

@abhi-mosaic Thanks for responding. Using `composer` did the trick for the full precision model. For anyone running it in future, on `boolq` in 0-shot, I get the following accuracy: ```...

Benchmarking GLUE tasks for in-context learning

Update: Here is the yaml file we used: ``` max_seq_len: 4096 seed: 28 model_name_or_path: ~/huggingface_cache/Llama-2-7b-hf # Tokenizer tokenizer: name: ${model_name_or_path} kwargs: model_max_length: ${max_seq_len} models: - model_name: ${model_name_or_path} model: name: hf_causal_lm...