bdf
bdf
In addition, the existence of data coverage will lead to uncertainty. For example, height = 2, width =2, when c_im=0, h_im=2, w_im=0, data_im[4] will be writed, but c_im=1, h_im=0, w_im=0...
I solved the problem by setting images _hz:=1(not equal to zero is ok) I think the time is equal to 1/frequency ,so 1/0 is gigantic
> I have resolve the problem,There is a bug in the rostime module,rate.cpp > Rate::Rate(double frequency) > : start_(Time::now()) > , expected_cycle_time_() > , actual_cycle_time_(0.0) > { > if (frequency!=0)...
Hi,where is the rate.cpp
@KimSoybean Hi I read the answer above,you mean one GPU can not reach the large batch_size, I think 128 means accum_batch_size, we can use one GPU read 4 batch_size by...
Or B of BN means batch_size that one gpu can reach.Looking forward to your answer。
@wenzhengyin 有下面几个内容需要确认一下: 1、ml_nms 只需要和第一个box进行相比较吗?根据 [iouCompute](https://github.com/wenzhengyin/mlu-ops/blob/ml_nms/bangc-ops/test/mlu_op_gtest/pb_gtest/src/zoo/ml_nms/ml_nms.cpp#:~:text=float%20iou%20%3D%20iouCompute(boxes_data_ptr%5B0%5D%2C%20boxes_data_ptr%5Bi%5D)%3B),看到cpu的计算逻辑是仅和第一个box进行计算。 2、当前的测例无法通过测试; 3、麻烦根据comment进行检查。
@tridao Hello, I plan to add a bias mask in flashattention2. I noticed that in order to integrate the scale and add operations [scale_apply_exp2](https://github.com/Dao-AILab/flash-attention/blob/main/csrc/flash_attn/src/softmax.h#78) ,the scale is delayed until after...
@LyricZhao Hi, I met the similar error too. I found after updating to the latest code, I encountered a RuntimeError: CUDA error: an illegal memory access was encountered. However, when...
WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path. WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path. WARN: ibgda_alloc_and_map_qp_uar with...