arrayfire icon indicating copy to clipboard operation
arrayfire copied to clipboard

[BUG] device memory leak in cuBLAS (matmul)

Open willyborn opened this issue 3 years ago • 1 comments

Each launch of a new threat leaves around 10KB of device memory allocated.

Description

Launching a function with a matmul operation in a loop of consecutive threads on CUDA will result in memory overflow on the device, and exception: CUBLAS Error (3): CUBLAS_STATUS_ALLOC_FAILED. Running the same function in a loop on the main thread operates normally, even after 10,000 loops. On OpenCL, the thread version as the main thread version operate normally. Full trace logging, indicates the existing device buffers are reused as expected (also in the separate threads).

Arrayfire build: master 3.9.0 b05da694 Back-end: CUDA Workaround available: No Reproducibility: Yes Logging: Logging.txt Device memory in main thread: device memory main thread

Device memory in consecutive threads: device memory threads

Reproducible Code and/or Steps

int main() {
    class trainer {
        int device;

       public:
        trainer(const int device) : device(device){};
        void train() {
            af::setDevice(device);
            const af::array a{af::iota(af::dim4(10, 10))};
            // size has no impact
            af::array c{af::matmul(a, a)};
        };
    };

    try {
        af::info();
        trainer trainers{af::getDevice()};

        for (int i{0}; i < 1000; ++i) {
            std::cout << i << ", ";
            // OK
            trainers.train();

            // trows exception
            // std::thread t(&trainer::train, std::ref(trainers));
            // t.join();
        }
    } catch (af::exception &ae) { std::cerr << ae.what() << std::endl; }
    return 0;
}

System Information

  1. ArrayFire version : master 3.9.0 b05da694
  2. Devices installed on the system: GTX 750 Ti
  3. (optional) Output from the af::info() function if applicable: see logging
  4. Output from the following scripts: Output cmds.txt

Checklist

  • [x] Using the latest available ArrayFire release
  • [x] GPU drivers are up to date

willyborn avatar Sep 05 '22 14:09 willyborn

I cannot replicate this issue using master commit 1eb6bca (most recent as of this morning). Using the latest driver with an RTX 3050. I even extended the loop to 10k iterations, commenting out the train that runs in the main thread and only running the part commented out as problematic.

Bug3286.zip

mfzmullen avatar Jul 06 '23 15:07 mfzmullen