Ashim Gupta
Ashim Gupta
Hi @agarwalishan I would suggest you look into pytorch based implementations of CRF. Many of these output marginal probabilities. Even if you can not find such an implementation on github,...
@linxihui But there is mathematical formulation in a linear chain CRF to find the marginal probabilities at each time step. Please refer to [this](http://info.usherbrooke.ca/hlarochelle/ift725/3_05_computing_marginals.pdf). What we need to calculate this...
Hi @Eric-Wallace , If possible, can you please release the code. It is fine if its not well documented or anything. One of the reviewers wants us to compare with...
You have to first create an order, it's free for academic use. Then in a day or two, you get the email that the order is fulfilled. After that you...
For anyone still stuck here, you can run `ant` in the source directory to generate the jar file.
@bmosaicml , @vchiley I can't reproduce the zero-shot results on boolq either. For both `llama-7b` and `mpt-7b`. My two yaml scripts are: ``` max_seq_len: 2048 seed: 1 model_name_or_path: huggyllama/llama-7b #...
@dakinggg I ran the `mpt-7b` model without fsdp with the following config: ```max_seq_len: 2048 seed: 1 model_name_or_path: mosaicml/mpt-7b # Tokenizer tokenizer: name: ${model_name_or_path} kwargs: model_max_length: ${max_seq_len} model: name: mpt_causal_lm init_device:...
Thank you @abhi-mosaic for responding. I have access to a node with 8 A100s with 80GB GPU RAM each. I quickly ran tests with the following configurations on `llama-30b`: |...
@abhi-mosaic Thanks for responding. Using `composer` did the trick for the full precision model. For anyone running it in future, on `boolq` in 0-shot, I get the following accuracy: ```...
Update: Here is the yaml file we used: ``` max_seq_len: 4096 seed: 28 model_name_or_path: ~/huggingface_cache/Llama-2-7b-hf # Tokenizer tokenizer: name: ${model_name_or_path} kwargs: model_max_length: ${max_seq_len} models: - model_name: ${model_name_or_path} model: name: hf_causal_lm...