Add KOKKOS versions of fixes
Summary
Add KOKKOS versions:
-
[x]
fix cmap/kk(to have all needed fixes in KOKKOS for performance on GPUs) -
[x]
fix nve/limit/kk(to relax overlapped systems with KOKKOS) -
[x]
fix recenter/kk(to have all needed fixes in KOKKOS for performance on GPUs), thanks to @srtee for suggesting this fix on matsci, very useful. -
[x]
fix wall/region/kk(to have all needed fixes in KOKKOS for performance on GPUs) -
[x]
region sphere/kk(bugfix for dynamic_cast in FixWallRegionKokkos)
Related Issue(s)
n/a
Author(s)
@alphataubio
Licensing
By submitting this pull request, I agree, that my contribution will be included in LAMMPS and redistributed under either the GNU General Public License version 2 (GPL v2) or the GNU Lesser General Public License version 2.1 (LGPL v2.1).
Backward Compatibility
n/a
Implementation Notes
ADDED UNIT TESTS:
- fix-timestep-recenter-coords.yaml
- fix-timestep-recenter-init.yaml
- fix-timestep-recenter-null.yaml
- fix-timestep-wall_region_lj1043.yaml
- fix-timestep-wall_region_lj126.yaml
- fix-timestep-wall_region_lj93.yaml
- fix-timestep-wall_region_morse.yaml
Post Submission Checklist
- [x] The feature or features in this pull request is complete
- [x] Licensing information is complete
- [x] Corresponding author information is complete
- [x] The source code follows the LAMMPS formatting guidelines
- [x] Suitable new documentation files and/or updates to the existing docs are included
- [x] The added/updated documentation is integrated and tested with the documentation build system
- [x] The feature has been verified to work with the conventional build system
- [x] The feature has been verified to work with the CMake based build system
- [x] Suitable tests have been added to the unittest tree.
- [ ] A package specific README file has been included or updated
- [x] One or more example input decks are included
Further Information, Files, and Links
Running this PR through internal Kokkos regression testing now.
The fix recenter changes still need to be remove. What is there cannot be correct.
We'll have to look into testing and fixing this some other time. You'll have to stash away the force-style test yaml files as well for the time being.
fix recenter test skipped for respa
@alphataubio for the new Kokkos styles, can you please add the boiler plate note as in other styles (if it isn't in there already)? It goes near the bottom. Thanks
----------
.. include:: accel_styles.rst
----------
I updated this PR to latest develop and it failed a compilation test with a weird error:
h5cc -I include -D_DEFAULT_SOURCE -O2 -DH5_NO_DEPRECATED_SYMBOLS -Wall -fPIC -c src/ch5md.c -o build/ch5md.o
In file included from /usr/include/hdf5/serial/H5public.h:32,
from /usr/include/hdf5/serial/hdf5.h:22,
from src/ch5md.c:9:
/usr/include/hdf5/serial/H5version.h:39:4: error: #error "Can't choose old API versions when deprecated APIs are disabled"
39 | #error "Can't choose old API versions when deprecated APIs are disabled"
| ^~~~~
make: *** [Makefile.h5cc:21: build/ch5md.o] Error 1
make: Leaving directory '/data/workspace/dev/pull_requests/ubuntu/legacy_serial_openmp_smallsmall_static/build/lammps/lib/h5md'
++ exit 1
I updated this PR to latest develop and it failed a compilation test with a weird error:
This must be due to me updating the ubuntu container images on the Jenkins cluster from ubuntu 20.04LTS to ubuntu 22.04LTS so we can merge with PR #4344
Update: @stanmoore1 I am trying to fix this with https://github.com/lammps/lammps/pull/4348/commits/b2e8648e6c8f2ab46e99fdb1e0a31df73548567f Check out the build status on PR #4344
I updated this PR to latest develop and it failed a compilation test with a weird error:
This must be due to me updating the ubuntu container images on the Jenkins cluster from ubuntu 20.04LTS to ubuntu 22.04LTS so we can merge with PR #4344
i see a different compile error on jenkins
I updated this PR to latest develop and it failed a compilation test with a weird error:
h5cc -I include -D_DEFAULT_SOURCE -O2 -DH5_NO_DEPRECATED_SYMBOLS -Wall -fPIC -c src/ch5md.c -o build/ch5md.o In file included from /usr/include/hdf5/serial/H5public.h:32, from /usr/include/hdf5/serial/hdf5.h:22, from src/ch5md.c:9: /usr/include/hdf5/serial/H5version.h:39:4: error: #error "Can't choose old API versions when deprecated APIs are disabled" 39 | #error "Can't choose old API versions when deprecated APIs are disabled" | ^~~~~ make: *** [Makefile.h5cc:21: build/ch5md.o] Error 1 make: Leaving directory '/data/workspace/dev/pull_requests/ubuntu/legacy_serial_openmp_smallsmall_static/build/lammps/lib/h5md' ++ exit 1
it works for me on macos, and rhel8 power9/v100. seems like a problem with hdf5 on your local machine. i know nothing about hdf5.
i see a different compile error on automated commit testing, it doesnt happen to me either. likely related to @akohlmey comment below.
g++ -std=c++11 -O3 -g -fPIC -Wall -c poemstreenode.cpp In file included from poemstreenode.cpp:18: poemstreenode.h: In constructor 'TreeNode::TreeNode(const int&, TreeNode*, TreeNode*, int)': poemstreenode.h:36:13: warning: 'TreeNode::data' will be initialized after [-Wreorder] 36 | int data; | ^~~~ poemstreenode.h:32:19: warning: 'TreeNode* TreeNode::left' [-Wreorder] 32 | TreeNode *left; | ^~~~ poemstreenode.cpp:22:1: warning: when initialized here [-Wreorder] 22 | TreeNode::TreeNode (const int & item, TreeNode *lptr,TreeNode *rptr, | ^~~~~~~~
it works for me on macos, and rhel8 power9/v100. seems like a problem with hdf5 on your local machine. i know nothing about hdf5.
It is not the local machine, but rather the Jenkins server pool at Temple. I just updated some containers with newer OS versions. The HDF5 version must have been updated which the H5MD has not been adapted. It should not be a problem with CMake builds, since it does not set the compiler define to not use deprecated APIs. Only the GNU make build does that.
i see a different compile error on automated commit testing
These are warnings and have been around a long time. The reorder one is mostly harmless. There are others more worrisome. This is why the POEMS package will be flagged for removal after the next stable release.
Internal Kokkos regression tests passed.
@alphataubio for the new Kokkos styles, can you please add the boiler plate note as in other styles (if it isn't in there already)? It goes near the bottom. Thanks
---------- .. include:: accel_styles.rst ----------
done
Compilation with Kokkos/HIP currently fails with:
/home/akohlmey/compile/lammps/src/KOKKOS/fix_wall_region_kokkos.cpp:181:23: warning: variable 'engKK' set but not used [-Wunused-but-set-variable]
181 | double fwallKK, engKK;
| ^
/home/akohlmey/compile/lammps/src/KOKKOS/fix_wall_region_kokkos.cpp:173:21: error: reference to __host__ function 'surface' in __host__ __device__ function
173 | int n = region->surface(d_x(i,0), d_x(i,1), d_x(i,2), cutoff);
| ^
/home/akohlmey/compile/lammps/src/KOKKOS/fix_wall_region_kokkos.cpp:380:16: note: in instantiation of member function 'LAMMPS_NS::FixWallRegionKokkos<Kokkos::HIP>::wall_particle' requested here
380 | template class FixWallRegionKokkos<LMPDeviceType>;
| ^
/home/akohlmey/compile/lammps/src/region.h:82:7: note: 'surface' declared here
82 | int surface(double, double, double, double);
| ^
@stanmoore1 this is waiting on you now
Running this through internal Kokkos regression testing again.
@alphataubio @weinbe2 I absolutely cannot get this to work with -diag_suppress 128. I think it has to do with nvcc_wrapper only taking 1 argument from -Xcudafe.
Also it is really confusing because nvcc --help and the output warning message both say to use diag-suppress (dash) but it only works with diag_suppress (underscore).
I'm using CUDA 12.4, @alphataubio do you remember which Cuda version failed for you?
I'm using CUDA 12.4, @alphataubio do you remember which Cuda version failed for you?
i tried both cuda/11.8.0 and cuda 12.2, it didnt work with either so i had to manualy remove diag_suppress in my cmakelist.txt file when i build on the cluster(s).
it's a problem with a flood of fmt::format warnings, i wish we could solve that @akohlmey ?
@alphataubio I think I have a good solution now, can you test?
No new regression test failures.
@alphataubio I think I have a good solution now, can you test?
yes i just did and it works now.
No new regression test failures.
maybe you should add examples/wall/in.wall.block and examples/wall/in.wall.sphere to KOKKOS regression tests for RegBlockKokkos and RegSphereKokkos ?
Running the internal Kokkos regression tests again.
Kokkos regression tests passed. I'm going to run a few of my own tests, then this should be ready to merge.
I'm seeing lost atoms with fix recenter/kk in a different input, need to debug.
@alphataubio your latest changes fixed the lost atoms issue I saw, nice work. I'm running a few more tests.
Broken for sycl
lammps/src/KOKKOS/fix_cmap_kokkos.cpp:886:38: error: SYCL kernel cannot use a non-const static data variable
886 | for (k = 0; k < 16; k++) xx += wt[in][k]*x[k]
@rarensu would removing the static (or replacing with const static) on line 853 fix this problem ?
sorry i dont access to anything SYCL (or HIP) to test.
Adding const should do the trick, see https://godbolt.org/z/YxM4Wf4q1.
@akohlmey please add this to your next small changes PR.
You're the cpp compiler expert do you think const static is the correct fix for this ?
@akohlmey please add this to your next small changes PR.
You're the cpp compiler expert do you think
const staticis the correct fix for this ?
Way ahead of you. Check out: https://github.com/lammps/lammps/pull/4387/commits/e200d557ec94cc698d6d95d0290aff6d6202420c Which was part of the last pull request for the new release.
const static inside a function makes next to no sense. Just const will do. In fact, I tested it on the "godbolt" page with the same compiler setting to compile with SYCL and there was not difference in the assembly code between static const and plain const. Using static const makes more sense outside a function. But for that we now use static constexpr whereever possible to explicitly make it a compile time constant.