CMPLXFOIL icon indicating copy to clipboard operation
CMPLXFOIL copied to clipboard

Tests fail when using new `ifx` compiler

Open eirikurj opened this issue 1 year ago • 6 comments

Description

When compiling with the new ifx compiler, some tests are failing. This is preventing us from updating our docker images as part of https://github.com/mdolab/docker/pull/266.

Steps to reproduce issue

  1. Pull the docker container mdolab/public:u22-intel-impi-latest-amd64-failed (specifically sha256:d318081f9bf4cc2c110685d4592fe6ee2b0f7799aff8cbefe87e138d04b224b7)
  2. In ~/repos/cmplxfoil run testflo -v -n 1 .

Current behavior

When running one of the failed tests, for example, testflo -n 1 -s -v ./tests/test_solver_class.py:TestDerivativesCST.test_alpha_sens on the docker container the following error is printed

mdolabuser@a6661930c69d:~/repos/cmplxfoil$ testflo -n 1 -s -v ./tests/test_solver_class.py:TestDerivativesCST.test_alpha_sens
######## Fitting CST coefficients to coordinates in /home/mdolabuser/repos/cmplxfoil/tests/n0012BluntTE.dat ########
Upper surface
    L2 norm of coordinates in dat file versus fit coordinates: 0.0003504064468410577
    Fit CST coefficients: [0.16601024 0.13092967]
Lower surface
    L2 norm of coordinates in dat file versus fit coordinates: 0.0003504064499657832
    Fit CST coefficients: [-0.16601024 -0.13092967]
+----------------------------------------------------------------------+
|  Switching to Aero Problem: fc                                       |
+----------------------------------------------------------------------+
 LEXITFLAG TRUE, GOING TO 90...
 LEXITFLAG TRUE, GOING TO 90...
 LEXITFLAG TRUE, GOING TO 90...
 LEXITFLAG TRUE, GOING TO 90...
 LEXITFLAG TRUE, GOING TO 90...
 LEXITFLAG TRUE, GOING TO 90...
 LEXITFLAG TRUE, GOING TO 90...
./tests/test_solver_class.py:TestDerivativesCST.test_alpha_sens  ... FAIL (00:00:5.03, 172 MB)
Traceback (most recent call last):
  File "/home/mdolabuser/repos/cmplxfoil/./tests/test_solver_class.py", line 369, in test_alpha_sens
    np.testing.assert_allclose(checkSensFD, actualSensCS, rtol=relTol, atol=absTol)
  File "/home/mdolabuser/.pyenv/versions/3.11.9/lib/python3.11/site-packages/numpy/testing/_private/utils.py", line 1504, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/home/mdolabuser/.pyenv/versions/3.11.9/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/mdolabuser/.pyenv/versions/3.11.9/lib/python3.11/site-packages/numpy/testing/_private/utils.py", line 797, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.002, atol=1e-05

Mismatched elements: 1 / 1 (100%)
Max absolute difference: 10005205.62637494
Max relative difference: inf
 x: array(10005205.626375)
 y: array(0.)



The following tests failed:
test_solver_class.py:TestDerivativesCST.test_alpha_sens


Passed:  0
Failed:  1
Skipped: 0


Ran 1 test using 1 processes
Wall clock time:   00:00:5.76

Expected behavior

All tests should pass

Observations

  • The build process looks to be a bit messy when using intel. There seems to be a mix of compilers used, gcc for interface c-code, ifx for compiling source and ifort for library. While this is probably not an issue, we should address this.
  • Since this is a f77 code, its possible that we have encountered an issue when using ifx that we have not encountered yet on other repositories, since they are mostly >f90. The porting guide might help, but it states that f77 is completely implemented.
  • I did some minor tests, and it seems that just removing any optimization, i.e., change from -O2 to -O0 and rebuilding, makes the tests pass. This indicates that some optimization is affecting the code when using ifx that does not show up with ifort for some reason. I would appreciate it if someone can dig into this and identify the issue and possible solutions.

eirikurj avatar Dec 19 '24 11:12 eirikurj

~~To add to the confusion, if you build cmplxfoil using the gcc config file on these images, the tests still fail, even though every part of the build process is done using a gcc compiler (either gcc or gfortran). See the attached log below.~~

cmplxfoil-make.log

~~Given that the tests don't fail on the GCC images, what is different between the intel and gcc images that could be causing gcc compiled code to behave differently?~~

Ignore the above, I was forgetting to pip install cmplxfoil again after rebuilding with gcc, after doing that the tests pass, so this is just an intel compiler issue.

A-CGray avatar Dec 19 '24 13:12 A-CGray

This is tenuous at best, but seems like as of 2023, ifx was known to not work very well with complex numbers, at least from a performance perspective.

Also, some of the default floating point arithmetic behaviour is different between ifort and ifx, ifort checks for NaNs by default while ifx doesn't

A-CGray avatar Jan 14 '25 01:01 A-CGray

The NaN check might be a problem since -fp-model=fast is the default. Even though I did not report it here, I feel I did run this with precise and strict at some point, and it did not have any effect. This we can test. The issue with ifx and complex numbers and optimization seems like a more possible explanation, since the code does seem to work with no optimizations -O0. Its possible that a newer compiler version will fix this, but we should check the compiler release notes.

eirikurj avatar Jan 14 '25 13:01 eirikurj

Did a very quick test with the latest image testing these combinations of optimization flags and floating point models. All pass, but only when the optimizations are turned off.

opt flag fast precise strict
O0 pass pass pass
O1 fail fail fail
O2 fail fail fail

eirikurj avatar Jan 14 '25 13:01 eirikurj

Damn, it's not that then. I'm also not fully convinced this is purely a complex number issue either, as some of the failing tests don't involve the complexified code.

A-CGray avatar Jan 14 '25 14:01 A-CGray

@eirikurj , as a fallback, and to avoid holding up https://github.com/mdolab/docker/pull/266 any more we could just change the logic in the intel config file so that we use ifort -O2 if available and ifx -O0 if not?

I've implemented this in https://github.com/mdolab/CMPLXFOIL/pull/33

A-CGray avatar Jan 14 '25 21:01 A-CGray