Thomas Müller comments

Results 12 comments of


                                            Thomas Müller

‘answer_coordinates’ and ‘answers’ predicted as '[]'

Thanks! For TabFact we only train the classification layer you shouldn't use the ‘answer_coordinates’ or ‘answers’ field in the CSV (we might actually remove them at some point but for...

Error: Got unexpected extra argument (app.main:app)

This should also work: ```Dockerfile ENTRYPOINT ["uvicorn"] CMD ["app.main:app", "--host", "0.0.0.0", "--port", "6565"] ``` (otherwise you will execute `uvicorn uvicorn app.main:app ...`)

MosaicMl compatibility?

I don't think it is compatible. Here is what I tried: 1. Install qlora and all deps 2. `pip install einops` 3. Run training ```shell python qlora.py \ --model_name_or_path mosaicml/mpt-7b...

MosaicMl compatibility?

Looks like they are working on it: https://huggingface.co/mosaicml/mpt-7b/discussions/23

You can work around the above error using the solution from the discussion above. So you checkout the model manually: ```shell git lfs install git clone https://huggingface.co/mosaicml/mpt-7b ``` And then...

MosaicMl compatibility?

Turns out it's peft that is adding the `input_embeds` parameter to the call. I accidentally stumbled over this: https://huggingface.co/cekal/mpt-7b-peft-compatible Which fixes the input embed problem as well as the gradient...

MosaicMl compatibility?

Trying to make this all a bit more straight-forward: https://huggingface.co/mosaicml/mpt-7b/discussions/42

MosaicMl compatibility?

The fix has been added to the main branch of `mpt-7b-peft-compatible`. So now you can just run this: ```shell python qlora.py \ --model_name_or_path cekal/mpt-7b-peft-compatible \ --trust_remote_code True \ --output_dir output...

undefined symbol: cquantize_blockwise_fp16_fp4

Just in case this is helpful for someone: If you get this with docker make sure to use an image with cuda toolkit installed, e.g.: `pytorch/pytorch:2.0.1-cuda11.7-cudnn8-devel`

Can this be interacted with in API-like request way?

Implemented this relatively simple python-based solution if that's helpful to anyone: https://github.com/muelletm/alpaca.py.