cuda-samples I got some error with 'cudaMemcpy'

Im doing with a signal processing program, i try to deal with the whole signale with cut it for 4 pieces, but the front three pieces is operated while the fourth pieces is wrong.

I try to find where is the problem at, so i locate this code.

Every pieces has the rest part, and i put them in s_dem_last, then i will load next pieces from disk file to s_dem, immediately after to demodulation. After these things done, I merge them into a new variable s_dem_GPU, put the s_dem_last in front and after demodulation s_dem behind.

When i printed them out, I find this issue.

Please help me to figure it out! Thanks!

upload

By the way, my data size is about 10^7 while every pieces is about 10^6.

I used the jetson nano and the mirror is latese version (up to 2021.6.14).

Jun 21 '21 07:06 MASIJUN99

If the value of s_dem_last_length is not equal to 1000, please use the same size for cudaMemcpy() on line 124. It might be copying more memory than allocated.

cudaMemcpy(s_dem_test, s_dem_GPU, s_dem_last_length * sizeof(float2), cudaMemcpyDeviceToHost);

Jun 21 '21 08:06 Ru7w1k

If the value of s_dem_last_length is not equal to 1000, please use the same size for cudaMemcpy() on line 124. It might be copying more memory than allocated.

cudaMemcpy(s_dem_test, s_dem_GPU, s_dem_last_length * sizeof(float2), cudaMemcpyDeviceToHost);

That is my problem, I try to make it easier to read, beacuse the length of s_dem_GPU is consist of the length of s_dem and s_dem_last, im afraid there were too many variable to understand.

In fact my s_dem_last_length is more than 1000, the size is about more than 10^5.

Whatever, thank you!

Jun 22 '21 06:06 MASIJUN99

I had try the DeviceToDevice and HostToDevice and HostToHostToDevice, all no use, Im really confused

Jun 22 '21 06:06 MASIJUN99

You can check the return value of those CUDA APIs. cudaMemcpy() from HostToDevice or DeviceToHost might be failing due to some reason.

Something like this: checkCudaErrors()

Jun 22 '21 14:06 Ru7w1k

You can check the return value of those CUDA APIs. cudaMemcpy() from HostToDevice or DeviceToHost might be failing due to some reason.

Something like this: checkCudaErrors()

Thank you homie, i successfully got the unspecified launch failure but i still cannot find out the reason.

by the way, someone told me thats cannot solved.

Im success in my win10 VS2019, but i still cannot success on jetson nano. It was the reason about jetson?

Jun 22 '21 15:06 MASIJUN99

You can check the return value of those CUDA APIs. cudaMemcpy() from HostToDevice or DeviceToHost might be failing due to some reason.

Something like this: checkCudaErrors()

I doubt it occured because of the performance?

what did you think?

By the way. the last error was find after the kernel function. is the reason about kernel? but the first three kernel was success? why the fourth is wrong?

Jun 23 '21 06:06 MASIJUN99

Maybe the 4th kernel launch is accessing some invalid index of array?

Jun 24 '21 06:06 Ru7w1k

Maybe the 4th kernel launch is accessing some invalid index of array?

i think myabe it was not that reason, because my windows with gtx1070 run it successfully.

Jun 26 '21 11:06 MASIJUN99

@Ru7w1k I got a new question: i wanna using 'clion' to ssh my jetson nano, when i load the CMakeLists.txt, it report cannot find the CMAKE_CUDA_COMPILER, i find it in CMAKE, it said i need to add CUDACXX to environment variable. but how to do it?

Aug 18 '21 09:08 MASIJUN99

I haven't used CLion yet. But you can use following bash command to set any environment variable export CUDACXX=<path to nvcc>

e.g. export CUDACXX=/usr/local/cuda/bin/nvcc

Aug 18 '21 09:08 Ru7w1k