Wonkyo Choe
Wonkyo Choe
@gruler Basically, `$rets = new \PHTRES\Session($config);` is a session itself so you don't need to try `$rests->session` again. Here is the example for connection if someone get trouble with connection....
@DeFek1 you can use a linear interpolation in between those timestep indices
The original repo includes the code or you could use numpy or other linear algebra library. https://research.nvidia.com/labs/toronto-ai/AlignYourSteps/howto.html
@SA-j00u The bottleneck mainly comes from the MUL_MAT operator. You can profile your run with `GGML_PERF` @ring-c If you are using CUDA, that is the normal behavior. If you want...
@FSSRepo I understand the repo is a bit infant compared to Pytorch and therefore, the slow inference is because of under-optimization. Yet, the one thing that I do not understand...
I found that this issue was actually created from my end although diffusers is still better. For some reason, I used `CMAKE_BUILD_TYPE=Debug` for the build and this took out `-O3`...
@JohnAlcatraz I just updated the first post and am reopening the issue.
Which tokenizer did you use to generate the dataset?
Okay. I found that `device_map` actually only offloads the model weight, not the execution as well. If there is a GPU then the GPU is the main priority in executing...
I think this is intentional. Other blocks have those variables but only the first one does not. This inconsistent behavior can be found in this: https://github.com/BlinkDL/RWKV-LM/blob/d6a1efc06c46681b61694a67b8591120865446ba/RWKV-v7/train_temp/src/model.py#L173-L176