Eddie Liao
Eddie Liao
No I haven't; am I supposed to? Why would I need `deb https://isaac.download.nvidia.cn/isaac-ros/ubuntu/main focal main` in my local apt source list if it's supposed to be running in a Docker...
I believe I'm having the same issue. I'm trying to run a model with the following input/output dimensions:  I've already updated my `yolov8_decoder_node.cpp` to match the amount of classes...
I've managed to get it to work with an image, but if I input a video it will crash when the video ends. > Could you please provide the full...
Looking to instead allocate a certain amount based on liveness and then overlap running kernels and loading during runtime to shorten the amount of time spent waiting.
Added a stream for copies in this [weight_streaming](https://github.com/ROCm/AMDMIGraphX/tree/weight_streaming) branch (not sure why I can't link it directly to this issue). Currently, the `@literal` instruction is taking up the majority of...
It appears that the `std::copy()` call in `make_shared_array` is responsible for the slowdown. The `@literal` instruction doesn't show up during the scheduling pass, so not sure what optimizations can be...
Adding the `@literal` instruction to the stream marginally improves performance (e.g. using a budget of 50000000 on resnet50 speeds up `@literals` from ~8.2ms to ~7.6ms). This does cause a lot...
Removed use of `std::copy` when weight streaming which decreases the time spent on `@literal` instructions drastically. Still need to investigate why increasing the amount of streams does not help performance.
After some testing it appears that weight streaming does work, although with a few caveats: - There doesn't appear to be a good way to find how much gpu memory...
> Parameters also take up space and are allocated after compilation, meaning they can't be considered during the write_literals pass Part of this is also due to this issue #3310.