Old Man comments

Results 147 comments of


                                            Old Man

Add c4ai-command-r-v01 Support

I asked llama-3-70b-instruct and it basically said it's a common, generic error. It said try running it on CPU or do you have enough memory?

UserWarning: 1Torch was not compiled with flash attention.

Same problem here. Seems like a regression

UserWarning: 1Torch was not compiled with flash attention.

It used to work. Some dev broke something. We need them to fix it. As you've already discovered, trying to fix it yourself just breaks more stuff

UserWarning: 1Torch was not compiled with flash attention.

Seems slow to me though. You?

UserWarning: 1Torch was not compiled with flash attention.

Well, not having flash attention makes a big difference, especially in memory-constrained scenarios. People need to stop rushing releases. I've already switched to ollama and will evaluate llm studio today...

UserWarning: 1Torch was not compiled with flash attention.

> If you are on Windows, be advised that nightlies do not have FA v2 (so i.e. they don't have FA **at all**), see https://github.com/pytorch/pytorch/issues/108175 I'm on linux stable. No...

Requirements downloaded from URLs are re-downloaded on every update

This needs to be fixed. It's wasting a ton of bandwidth. If the updater keeps re-downloading and installing the same packages over and over, it *is* the updater's fault, obviously

Text Embedding Inference Server -- how to use in windows.

Since you have a windows release, you should warn users when things don't apply to windows. Expecting users to ask chatGPT to translate your documentation is not reasonable. Why wouldn't...

Text Embedding Inference Server -- how to use in windows.

Welp, just ran into another breaking bug. Searched around and found this: https://github.com/h2oai/h2ogpt/issues/1248#issue-2060401402 There's a clear pattern of bugs and excuses here. I'm moving on. Good luck out there

`ollama` process on macOS using up a lot of RAM while being idle

> If you just launch Ollama it will not take up that memory. However, if you load a model and then close the terminal, the memory will still be used...