nutspiano

Results 13 comments of nutspiano

Something seems off with the required VRAM calculations. I ran into this when trying to find my max context length on the new Llama 3.1 models. It seems to overestimate...

And btw, I am not complaining about slow inference speeds because of a few non-offloaded layers, I just think the entire required VRAM calculation is off. Here's a more aggressive...

Good catch, that seems like it. Here is the same 100k context one without flash attention. Very similarly 25/33 offloaded, but now VRAM is suitably crammed after loading the model,...

@AeneasZhu As I understand your case and read your logs, I think your slowness simply comes from not all of the model being offloaded to your GPU. Looking at these...

@dhiltgen I don't think the original post here was a bug, but I do believe I (with the help of @rick-github) uncovered something about VRAM allocation when using flash attention....

Making the scroll wheel change the page is a start, but still makes things paginated, which in itself is bad UI. Being able to put the images you want to...

Thank you @psychedelicious for taking the time to write such a detailed answer. No reason for any coding ability self deprecation though, it sounds like a great idea to start...

That there seems very close! A promising prototype. I just did a request for the full list of images from `/api/v1/images/?board_id=X&limit=3000&offset=0` on a ~1k images board, the response was ~2...

The cache keyed by image name sounds great. Premature/early optimization aside, I want to make sure I have gotten my point across about delaying sending image names (large for their...

Mr. Raskin is quite delusional indeed. Regardless, this issue seems to have stranded. We will see if the other Invoke features will convince people the UI is worth it.