SU2 icon indicating copy to clipboard operation
SU2 copied to clipboard

CHT (and/or heat zone) restart (primal+adjoint) issues

Open TobiKattmann opened this issue 4 years ago • 6 comments

Describe the bug Hi all,

I noticed some issues with restarts (primal only and for the primal-iteration in the discrete adjoint). I do the following:

  1. Run a simulation with X+1 iterations. The residuals of the X+1st iterations are basically what we try to recreate in the restarted version. This is the ground truth
  2. Run a simulation with X iterations. This gives us a file to restart from. (A good sanity check with the 1. simulation is to diff with the history file to see whether the simulations are deterministic at all)
  3. Run a primal restarted simulation from the restart file of 2. with just 1 iteration. The residuals should match the results of 1.
  4. Run an adjoint simulation with the restart file of 2. . The residuals should match the results of 1.

What you will see in the following is 3 lines with multiple residual values each. The first line corresponds to the X+1st history entry of simulation 1. (the ground truth). The second line is the restarted primal from simulation 3., also last history line. The third line is from the adjoint-primal-restart, grabbed from the screen output with OUTPUT_PRECISION=12 (see #1394 )

1.res[1], 1.res[s] ...
3.res[1] ....
4.res[1]

Of course the best outcome would be 3 identical lines ... which we dont get :(

Pin Array setup 2D - Fluid Only

(p, vx, vy, T, k, w) 200 iterations

-7.16607941386 -7.34805457325 -6.99877222345 -1.01313133295 -8.55717653108 -1.6476144338
-7.16607941386 -7.34805457325 -6.99877222345 -1.01313133295 -8.55717653108 -1.6476144338
-7.16607941386 -7.34805457325 -6.99877222345 -1.01313133295 -8.55717653108 -1.6476144338

everythings fine 👍

Pin Array setup 2D - Solid Only

Note that this will only work with the fix in #1394

10 iterations (8 Linear Solver Iter)

-6.83193258622
-6.83193258622
-6.83193258622

10 iterations (10 Linear Solver Iter)

-7.38737630018
-7.38737630016
-7.38737630016

10 iterations (20 Linear Solver Iter)

-8.92762658265
-8.92762658317
-8.92762658317

10 iterations (200 Linear Solver Iter)

-8.92702259526
-8.92702259594
-8.92702259594

Here I suspect some floating point things going with some minor error that accumulates up to a certain point. Doesn't worry me too much to be fair

200 iterations (10 Linear Solver Iter)

-16.5822916687
-16.2000952843
-16.2000952843

But with more iterations more problems arise. So back to the drawing board for that. Maybe here the root cause for the cht problems is hidden as well.

CHT Pin Array setup 2D

Here things get really weird.

  1. With low iteration count it looks like the primal-only restart works perfectly and only the solid residual of temperature is flawed
  2. With higher iteration counts the solid temperature is still different but now both restarted mean flow residuals are not in line with the X+1 iteration simulation ... what?

(p, vx, vy, T_fluid, T_solid) 10 Iterations

-4.5580336629 -4.71337114354 -4.64920624665 1.52390474896 -5.8150835186
-4.5580336629 -4.71337114354 -4.64920624665 1.52390474896 -5.8150835186
-4.5580336629 -4.71337114354 -4.64920624665 1.52390474896 -6.27627665971

200 Iterations

-12.6894989871 -13.0272466772 -12.776380701  -1.01446550457 -7.17890161426
-12.6894989199 -13.0272465259 -12.7763807181 -1.01446550457 -7.17890161426
-12.6894989199 -13.0272465259 -12.7763807181 -1.01446550457 -7.30259065606

200 Iterations (No CHT interface at all, i.e. still "multizone" but no coupling between the zones)

-12.6993664689 -13.037441642  -12.7880987801 -0.895636121058 -16.5806369934
-12.6993665267 -13.0374417614 -12.7880988088 -0.895636121058 -16.1994417242
-12.6993665267 -13.0374417614 -12.7880988088 -0.895636121058 -16.1994417242

2000 Iterations

-17.5073098614 -17.7104073858 -17.9003808832 -3.34538088409 -9.30160418764
-17.4072816449 -17.5306206426 -17.7140334705 -3.34538088409 -9.30160418771
-17.4072816449 -17.5306206426 -17.7140334705 -3.34538088409 -9.425709713

Also note that the residual for the adjoint-restart is better than expected, and not even by a tiny amount. This naturally leads to the hypothesis that the direct-solution is not reset after the CLEAR_INDICES run. But that is the case, I checked and I also Print the DirectResdiual for all DIrectIterations (2 flow + 2 mesh ones) and they are always the same. If the residual were to drop dramatically for the adjoint restart that would prob be easier to debug.

I of course also checked whether the correct Solution values are read, which I am somewhat sure they are... also I can only do spot checks.

4000 iterations

-17.5190807322 -17.7163086125 -17.8778784145 -5.70791061685 -11.6640663533
-17.418063519 -17.5356055663 -17.7081078178 -5.70791062246 -11.6640664169
-17.418063519 -17.5356055663 -17.7081078178 -5.70791062246 -11.7881662873

CHT Pin in Crossflow 2D

For another CHT testcase the findings are similar with one notable difference: The solid_T res between the primal restarted and adjoint restarted now match much better (although still diffreent) but both differ quite significantly from the X+1st ground truth.

10 Iter

-4.926899175 -7.918963781 -8.148204896 1.135311148 -4.163756124
-4.926899175 -7.918963781 -8.148204896 1.135311148 -4.163756124
-4.926899175 -7.918963781 -8.148204896 1.135311148 -4.472390292

200 Iter

-16.0186192 -18.98207162 -19.03351791 -2.806755076 -5.585674129
-16.00510692 -18.97843574 -19.03334954 -2.806755076 -5.585674129
-16.00510692 -18.97843574 -19.03334954 -2.806755076 -5.602103211

2000 Iter

-16.50635481 -19.9763931 -20.42969871 -10.29288196 -14.21416876
-16.44211089 -19.76100653 -20.35694756 -10.23037364 -13.79321985
-16.44211089 -19.76100653 -20.35694756 -10.23037364 -13.79321861

Note that for this specific 2000 Iter case the adjoint(-primal)-residual for the solid_T is worse compared to the ground truth X+1 ... which is the other way round for all other here seen cht cases (the "no-coupling" case shouldnt be counted for this I feel).

To Reproduce I post my setups later here. Cannot upload through vpn. I also use a simple bash script to do these comparisons for me. So the chance for manual errors is much lower.

Additional Notes A few things ahead, I run FGMRES+ILU for all configurations. No periodic boundaries at all. I went without turbulence for the cht cases to make it simpler.

In the past and now we were able to see some good gradient validation against FD. So this issue is not super dramatic (although I am pretty annoyed by that) and I think I simply overlooked it in the past.

In case there is sth unclear pls let me know, I'll will try to clarify asap

I still have some debugging to do but I appreciate all hints as I am currently more poking into the fog.

Thanks already , Tobi

Desktop (please complete the following information):

  • OS: [RHELS 7.6 Maipo]
  • C++ compiler and version: [g++ (GCC) 5.3.0]
  • MPI implementation and version: [OpenMPI 3.1.6]
  • SU2 Version: [#1394]

TobiKattmann avatar Oct 06 '21 09:10 TobiKattmann

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is still a relevant issue please comment on it to restart the discussion. Thank you for your contributions.

stale[bot] avatar Mar 02 '22 11:03 stale[bot]

Still relevant

TobiKattmann avatar Mar 02 '22 15:03 TobiKattmann

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is still a relevant issue please comment on it to restart the discussion. Thank you for your contributions.

stale[bot] avatar May 01 '22 15:05 stale[bot]

This is still the case, but for all practical applications like restarts or adjoint computations this does not have a notable influence. I leave this open and might tackle this at a later stage :+1:

TobiKattmann avatar May 02 '22 08:05 TobiKattmann

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is still a relevant issue please comment on it to restart the discussion. Thank you for your contributions.

stale[bot] avatar Jul 10 '22 08:07 stale[bot]

Dear stale-bot,

this is still relevant. Might make some debugging efforts at some point.

Thanks for the reminder, Tobi

TobiKattmann avatar Jul 10 '22 09:07 TobiKattmann