cuda-samples icon indicating copy to clipboard operation
cuda-samples copied to clipboard

An odd error happened when i try to alloc a gpu memory in loop

Open MASIJUN99 opened this issue 3 years ago • 2 comments

I try to using 'cudaMalloc' to alloc a gpu memory in the beginning of each loop and i using 'cudaFree' to destory it in the ending of each loop. (that variable is defined out of loop)

The loop run about 4 times, but the first 3 times it seems like success, when i start the fourth loop it suddenly stop and given a error code 719.

I try to read the offical guide, but i have no clue about the error happen when alloc memory.

BTW, it was success on windos10 by GTX1070, but failure on Ubuntu18.04 by Jetson Nano

    /**
     * An exception occurred on the device while executing a kernel. Common
     * causes include dereferencing an invalid device pointer and accessing
     * out of bounds shared memory. Less common cases can be system specific - more
     * information about these cases can be found in the system specific user guide.
     * This leaves the process in an inconsistent state and any further CUDA work
     * will return the same error. To continue using CUDA, the process must be terminated
     * and relaunched.
     */
    cudaErrorLaunchFailure                =      719,

And this is the code of catch error, i used this code to wrapper every cuda function:

void validate(cudaError_t error, char* msg) {
	if (error != cudaSuccess) {
		std::cout << msg;
		fprintf(stderr, "Failed to execute cuda function: %s! Error code: %d\n",
			cudaGetErrorString(error), error);
		exit(EXIT_FAILURE);
	}
}

Here is the console out:

Alloc the s_GPU Array  (this is the msg)
Failed to execute cuda function: unspecified launch failure! Error code: 719

MASIJUN99 avatar Mar 07 '22 16:03 MASIJUN99

Also: the first time alloc size is 154900 * sizeof(float2) the second time alloc size is 280142 * sizeof(float2) the third time alloc size is 406426 * sizeof(float2) the third time alloc size is 428178 * sizeof(float2) and error when alloc it

MASIJUN99 avatar Mar 07 '22 16:03 MASIJUN99

Without understanding what you're doing it's very difficult to speculate; there are many possibilities. For instance are you freeing memory before your kernel finishes with it? Accessing the array out of bounds with a misbehaving kernel? I don't think this looks like a memory allocation failure per se, my gut feeling is that it looks more like the allocation fails because the device is not in a good state for some other reason (usually this is a misbehaving kernel).

rwarmstr avatar Mar 16 '22 15:03 rwarmstr