OpenBLAS icon indicating copy to clipboard operation
OpenBLAS copied to clipboard

AVX2: dgemm on HASWELL leads to invalid results (low accuracy)

Open davidcl opened this issue 1 year ago • 3 comments

After upgrading Scilab OpenBLAS build, we detected a poor precision result impacting dgemm when using AVX2 kernels.

With OpenBLAS 0.3.27:

$ OPENBLAS_CORETYPE=NEHALEM scilab/test_dgemm
[1, 0.666667] * [-6 0 ; 9 0]
        0       0

$ OPENBLAS_CORETYPE=HASWELL scilab/test_dgemm
[1, 0.666667] * [-6 0 ; 9 0]
        -3.33067E-16    0

With refBLAS 3.12.0 :

$ scilab/test_dgemm
[1, 0.666667] * [-6 0 ; 9 0]
        0       0
test code

#include <math.h>
#include <stdio.h>

extern void dgemm_(char *transa, char *transb, int *m, int *n, int *k, double *alpha, double *a, int *lda, double *b, int *ldb, double *beta, double *c, int *ldc);

int main()
{
    int ONE = 1;
    int TWO = 2;
    double dZERO = 0;
    double dONE = 1;

    double A_tab[] = {1., 2. / 3.};
    double* A = A_tab;
    double B_tab[] = {-6., 9., 0., 0.};
    double* B = B_tab;
    double C_tab[] = {NAN, NAN};
    double* C = C_tab;

    dgemm_("n", "n", &ONE, &TWO, &TWO, &dONE, A, &ONE, B, &TWO, &dZERO, C, &ONE);

    printf("[%lG, %lG] * [%lG %lG ; %lG %lG]\n", A[0], A[1], B[0], B[2], B[1], B[3]);
    printf("\t%lG\t%lG\n", C[0], C[1]);

    return 0;
}

davidcl avatar Jun 10 '24 14:06 davidcl

Curious, the current DGEMM kernel for Haswell has been in use for years. Have you only started testing/using OpenBLAS recently, or did something else (like the default compiler version or options) change in your setup lately ?

martin-frbg avatar Jun 10 '24 15:06 martin-frbg

We were compiling without DYNAMIC_TARGET thus selecting NEHALEM arch as the minimal supported target. All the opened bugs are due to a more active cross-checking of Windows vs Linux results on our test base after the GCC 8 to GCC 11 Linux migration.

Note: all these corner-case bugs might or might not be valid at some points; eg. in that case, EPSILON might be acceptable in most codebases.

davidcl avatar Jun 10 '24 18:06 davidcl

Symbolic result is e-7 off, so both code paths do very well at rounding artifacts

brada4 avatar Jun 20 '24 22:06 brada4