AlexanderZhx
AlexanderZhx
This could be somewhat easily solved with something like this (highlighted pseudocode) in `llm_studio/app_utils/sections/experiment.py` 
> Is this happing only when using the weights from an old experiment to continue training with "Use previous experiment weights"? No, this is happening with a "freshly" trained model,...
> Thank you for reporting. When pushing the model to huggingface hub or downloading from the UI, these weights will be automatically sharded into smaller chunks (by default safetensors with...
> Sorry, can't fully follow. So you are using a local model or a model from Huggingface to start your Experiment? starting with a model from HF > And what...
Would be awesome! Looking forward to it.
same. Phind-codellama in fp16 from the repository works, when loaded from a gguf doesn't though (ubuntu, docker, A100, ollama 0.1.37)
same. Phind-codellama from the repository works, loading gguf doesn't though
Same with using the openai api
> Edit your post to include your config.json please, (just ctrl + f replace your username). This looks like a known bug that occurs when using offload, but I’m not...
Yes, setting these two variables to false manually in the config.json does work, and the model gets successfully unloaded from GPU. Haven't figured a way to turn them off via...