bdf comments

Results 16 comments of

bdf

MaskedCol2imForward exist the bug of array out of bounds

In addition, the existence of data coverage will lead to uncertainty. For example, height = 2, width =2, when c_im=0, h_im=2, w_im=0, data_im[4] will be writed, but c_im=1, h_im=0, w_im=0...

terminate called after throwing an instance of 'std::runtime_error' what(): Duration is out of dual 32-bit range

I solved the problem by setting images _hz:=1(not equal to zero is ok) I think the time is equal to 1/frequency ,so 1/0 is gigantic

terminate called after throwing an instance of 'std::runtime_error' what(): Duration is out of dual 32-bit range

> I have resolve the problem,There is a bug in the rostime module,rate.cpp > Rate::Rate(double frequency) > : start_(Time::now()) > , expected_cycle_time_() > , actual_cycle_time_(0.0) > { > if (frequency!=0)...

terminate called after throwing an instance of 'std::runtime_error' what(): Duration is out of dual 32-bit range

Hi,where is the rate.cpp

@foralliance Hi！I think we should speak English because your questions may help the people in other countries.

@KimSoybean Hi I read the answer above,you mean one GPU can not reach the large batch_size, I think 128 means accum_batch_size, we can use one GPU read 4 batch_size by...

@foralliance Hi！I think we should speak English because your questions may help the people in other countries.

Or B of BN means batch_size that one gpu can reach.Looking forward to your answer。

Feature: Add new operator ml_nms

@wenzhengyin 有下面几个内容需要确认一下： 1、ml_nms 只需要和第一个box进行相比较吗？根据 [iouCompute](https://github.com/wenzhengyin/mlu-ops/blob/ml_nms/bangc-ops/test/mlu_op_gtest/pb_gtest/src/zoo/ml_nms/ml_nms.cpp#:~:text=float%20iou%20%3D%20iouCompute(boxes_data_ptr%5B0%5D%2C%20boxes_data_ptr%5Bi%5D)%3B)，看到cpu的计算逻辑是仅和第一个box进行计算。 2、当前的测例无法通过测试； 3、麻烦根据comment进行检查。

[v2] Attention Masking

@tridao Hello, I plan to add a bias mask in flashattention2. I noticed that in order to integrate the scale and add operations [scale_apply_exp2](https://github.com/Dao-AILab/flash-attention/blob/main/csrc/flash_attn/src/softmax.h#78) ,the scale is delayed until after...

error when testing test_internode.sh deep_ep.cpp:83 'an illegal memory access was encountered'

@LyricZhao Hi, I met the similar error too. I found after updating to the latest code, I encountered a RuntimeError: CUDA error: an illegal memory access was encountered. However, when...

error when testing test_internode.sh deep_ep.cpp:83 'an illegal memory access was encountered'

WARN: cudaHostRegister with IoMemory failed with error=800. We may need to use a fallback path. WARN: ibgda_nic_mem_gpu_map failed. We may need to use the CPU fallback path. WARN: ibgda_alloc_and_map_qp_uar with...