I got some error with 'cudaMemcpy'
Im doing with a signal processing program, i try to deal with the whole signale with cut it for 4 pieces, but the front three pieces is operated while the fourth pieces is wrong.
I try to find where is the problem at, so i locate this code.
Every pieces has the rest part, and i put them in s_dem_last, then i will load next pieces from disk file to s_dem, immediately after to demodulation. After these things done, I merge them into a new variable s_dem_GPU, put the s_dem_last in front and after demodulation s_dem behind.
When i printed them out, I find this issue.
Please help me to figure it out! Thanks!

By the way, my data size is about 10^7 while every pieces is about 10^6.
I used the jetson nano and the mirror is latese version (up to 2021.6.14).
If the value of s_dem_last_length is not equal to 1000, please use the same size for cudaMemcpy() on line 124. It might be copying more memory than allocated.
cudaMemcpy(s_dem_test, s_dem_GPU, s_dem_last_length * sizeof(float2), cudaMemcpyDeviceToHost);
If the value of
s_dem_last_lengthis not equal to 1000, please use the same size forcudaMemcpy()on line 124. It might be copying more memory than allocated.
cudaMemcpy(s_dem_test, s_dem_GPU, s_dem_last_length * sizeof(float2), cudaMemcpyDeviceToHost);
That is my problem, I try to make it easier to read, beacuse the length of s_dem_GPU is consist of the length of s_dem and s_dem_last, im afraid there were too many variable to understand.
In fact my s_dem_last_length is more than 1000, the size is about more than 10^5.
Whatever, thank you!
I had try the DeviceToDevice and HostToDevice and HostToHostToDevice, all no use, Im really confused
You can check the return value of those CUDA APIs. cudaMemcpy() from HostToDevice or DeviceToHost might be failing due to some reason.
Something like this: checkCudaErrors()
You can check the return value of those CUDA APIs.
cudaMemcpy()fromHostToDeviceorDeviceToHostmight be failing due to some reason.Something like this: checkCudaErrors()
Thank you homie, i successfully got the unspecified launch failure but i still cannot find out the reason.
by the way, someone told me thats cannot solved.
Im success in my win10 VS2019, but i still cannot success on jetson nano. It was the reason about jetson?
You can check the return value of those CUDA APIs.
cudaMemcpy()fromHostToDeviceorDeviceToHostmight be failing due to some reason.Something like this: checkCudaErrors()
I doubt it occured because of the performance?
what did you think?
By the way. the last error was find after the kernel function. is the reason about kernel? but the first three kernel was success? why the fourth is wrong?
Maybe the 4th kernel launch is accessing some invalid index of array?
Maybe the 4th kernel launch is accessing some invalid index of array?
i think myabe it was not that reason, because my windows with gtx1070 run it successfully.
@Ru7w1k I got a new question:
i wanna using 'clion' to ssh my jetson nano, when i load the CMakeLists.txt, it report cannot find the CMAKE_CUDA_COMPILER, i find it in CMAKE, it said i need to add CUDACXX to environment variable. but how to do it?
I haven't used CLion yet. But you can use following bash command to set any environment variable
export CUDACXX=<path to nvcc>
e.g. export CUDACXX=/usr/local/cuda/bin/nvcc