Old Man
Old Man
I asked llama-3-70b-instruct and it basically said it's a common, generic error. It said try running it on CPU or do you have enough memory?
Same problem here. Seems like a regression
It used to work. Some dev broke something. We need them to fix it. As you've already discovered, trying to fix it yourself just breaks more stuff
Seems slow to me though. You?
Well, not having flash attention makes a big difference, especially in memory-constrained scenarios. People need to stop rushing releases. I've already switched to ollama and will evaluate llm studio today...
> If you are on Windows, be advised that nightlies do not have FA v2 (so i.e. they don't have FA **at all**), see https://github.com/pytorch/pytorch/issues/108175 I'm on linux stable. No...
This needs to be fixed. It's wasting a ton of bandwidth. If the updater keeps re-downloading and installing the same packages over and over, it *is* the updater's fault, obviously
Since you have a windows release, you should warn users when things don't apply to windows. Expecting users to ask chatGPT to translate your documentation is not reasonable. Why wouldn't...
Welp, just ran into another breaking bug. Searched around and found this: https://github.com/h2oai/h2ogpt/issues/1248#issue-2060401402 There's a clear pattern of bugs and excuses here. I'm moving on. Good luck out there
> If you just launch Ollama it will not take up that memory. However, if you load a model and then close the terminal, the memory will still be used...