Description

Recently I have been debugging a segmentation fault in our code that seems to have been caused by transitive dependencies of occa. During this process I noticed that all kernels get linked to libocca and occa headers are #included at the top of the kernel code. However, none of our kernels require anything from inside occa, so I think it would be nice to be able to control this. Not linking in occa speeds things up a little bit and potentially prevents unexpected behaviour.

This PR keeps the default behaviour as-is, but makes linking to occa, and #include(i)ng the occa headers and namespace opt-out per kernel. This seems to work well in our code at this moment. Is this something that seems generally useful enough to include?

Open questions: should we do some bikeshedding on the flag name and where should this new option be documented? And I'm not quite sure if the markdown files in docs should just be edited directly or if there is some kind of automation for that? Remark: I have not been able to test with HIP, but the change looks straightforward enough (famous last words).

I also noticed some inconsistencies in how the kernelHash is calculated, based on what goes into the compiler invocation so this is a separate fix commit. It could be taken out of this PR if requested.

Jul 05 '22 10:07 SFrijters

Codecov Report

Merging #603 (c6e2177) into development (ee8c6e4) will increase coverage by 0.01%. The diff coverage is 93.33%.

:exclamation: Current head c6e2177 differs from pull request most recent head d366a2f. Consider uploading reports for the commit d366a2f to get more accurate results

Additional details and impacted files

@@               Coverage Diff               @@
##           development     #603      +/-   ##
===============================================
+ Coverage        76.29%   76.31%   +0.01%     
===============================================
  Files              291      291              
  Lines            18777    18791      +14     
===============================================
+ Hits             14326    14340      +14     
  Misses            4451     4451

Impacted Files	Coverage Δ
src/occa/internal/modes/serial/device.cpp	`88.63% <88.88%> (+0.01%)`	:arrow_up:
src/loops/typelessForLoop.cpp	`98.30% <100.00%> (+0.02%)`	:arrow_up:
src/occa/internal/lang/modes/serial.cpp	`30.04% <100.00%> (+1.04%)`	:arrow_up:
src/occa/internal/utils/sys.cpp	`79.60% <0.00%> (+0.05%)`	:arrow_up:
src/occa/internal/lang/specialMacros.cpp	`60.00% <0.00%> (+0.68%)`	:arrow_up:

Jul 05 '22 10:07 codecov[bot]

Other than in the kernel launcher that is generated for the GPU backends, I don't think anything from the OCCA namespace is needed. It is probably reasonable to not include the OCCA headers or link against the library by default.

Maybe we can bring this up at the next OCCA TAF meeting and then go from there?

Jul 05 '22 17:07 kris-rowe

Sounds good, I didn't want to make breaking changes based on my assumptions about what people use in their kernels, but if we can agree that we can flip the default, or remove this entirely (if we remove this part of the code people can still add occa to their own OCCA_xxx compiler flags if they have a good reason?), that would be even cleaner.

Jul 06 '22 08:07 SFrijters

Trying to switch defaults, as discussed. Maybe include_std is a bridge too far...

Jul 14 '22 10:07 SFrijters

Current state of the PR: I've split the occa dependency options into "include_occa" and "link_occa" since it seems harder to get rid of "include_occa" by default (and the linking is the bigger problem).

Defaults:

kernel/include_occa: true
kernel/ink_occa: false
serial/include_std: true

Not sure what is going on with the failing builds yet.

Open issues: documentation, determining exactly what goes into a kernelHash. I also noticed that while I was changing the 'inline' kernel code in e.g. fpMath, the kernel didn't get rebuilt, so it looks like that is missing something as well?

Jul 14 '22 15:07 SFrijters

Current state of the PR: I've split the occa dependency options into "include_occa" and "link_occa" since it seems harder to get rid of "include_occa" by default (and the linking is the bigger problem).

Defaults:

kernel/include_occa: true

kernel/ink_occa: false

serial/include_std: true

Not sure what is going on with the failing builds yet.

Open issues: documentation, determining exactly what goes into a kernelHash. I also noticed that while I was changing the 'inline' kernel code in e.g. fpMath, the kernel didn't get rebuilt, so it looks like that is missing something as well?

Interesting. If the fpMath and intMath are the only tests failing, you could remove them for now and we can reopen the issue about adding math function support.

Jul 14 '22 16:07 kris-rowe

Update, test suite passes for me:

Defaults:

kernel/include_occa: false, except for launcher kernels
kernel/ink_occa: false
serial/include_std: false

Some tests have explict headers / directives to include std headers now.

Jul 14 '22 17:07 SFrijters

Hopefully fixed the OpenCL failure, still not sure what is going on with the MacOS build(s).

Jul 18 '22 15:07 SFrijters

Hm, it looks like it's not MacOS, but OpenCL that is the culprit? I have no experience with that; does that one need occa linked in for the launcher kernel maybe? Also, why are the opencl tests disabled for the CMake actions? Should we also have a MacOS + CMake variant that does try to use OpenCL?

Jul 18 '22 17:07 SFrijters

Hm, it looks like it's not MacOS, but OpenCL that is the culprit? I have no experience with that; does that one need occa linked in for the launcher kernel maybe?

The OpenCL kernel launcher should be the same as the other GPU backends. I don't have any way to test Mac OS locally.

Also, why are the openCL tests disabled for the CMake actions? Should we also have a MacOS + CMake variant that does try to use OpenCL?

Some tests are disabled for the OpenCL backend when running GitHub actions because the corresponding functionality isn't implemented (like atomics), but OpenCL is one of the few backends that can still be tested on CPU. If you run CTest locally, however, the tests will still run (and probably fail).

Jul 18 '22 18:07 kris-rowe

I tried adding more stuff from the .cpp to the headers, and it does seem to work in the sense that it changes the error message, but it wants even more functions. I also just enabled linking again for launcher kernels and that seems to have fixed the tests (still need to clean up the commits).

So, options:

Keep it as it is: we pay the small overhead of linking only the launcher kernels
Go through all the functions that the linker seems to want and move it to the headers. Then we only need to include headers, but then the headers are bigger and more complicated for all kernels that include occa.
Try to optimize things in some way, e.g. move the functions around so we can include or link a subset of occa.

I'm still confused why these MacOS/OpenCL tests complain. I checked locally for a CUDA-enabled test run and it indeed builds a launcher kernel

#include <occa/core/kernel.hpp>

extern "C" void f(occa::modeKernel_t ** deviceKernels,
                  const int & dummy_arg) {
  {
    occa::dim outer, inner;
    outer.dims = 1;
    inner.dims = 1;
    int b = 0;
    outer[0] = 1 - 0;
    int t = 0;
    inner[0] = 1 - 0;
    occa::kernel kernel(deviceKernels[0]);
    kernel.setRunDims(outer, inner);
    kernel(dummy_arg);
  }
}

and it just compiles with

g++ -O3 -std=c++11 -fPIC -shared /<redacted>/occa/build/job-env/foss-2021a-a562743/debug-gnu-cuda/occa/cache/92ab9b5fa650eee1/launcher_source.cpp -o /<redacted>/occa/build/job-env/foss-2021a-a562743/debug-gnu-cuda/occa/cache/92ab9b5fa650eee1/515860e4a7cfd98f.launcher_binary -I/<redacted>/occa/include -I/<redacted>/occa/build/job-env/foss-2021a-a562743/debug-gnu-cuda/include

$ ldd /<redacted>/occa/build/job-env/foss-2021a-a562743/debug-gnu-cuda/occa/cache/92ab9b5fa650eee1/launcher_binary 
	linux-vdso.so.1 =>  (0x00007ffea2d39000)
	libstdc++.so.6 => /apps/EasyBuild/software/GCCcore/10.3.0/lib64/libstdc++.so.6 (0x00007ff0b9206000)
	libm.so.6 => /lib64/libm.so.6 (0x00007ff0b8ebf000)
	libgcc_s.so.1 => /apps/EasyBuild/software/GCCcore/10.3.0/lib64/libgcc_s.so.1 (0x00007ff0b8ea6000)
	libc.so.6 => /lib64/libc.so.6 (0x00007ff0b8ad8000)
	/lib64/ld-linux-x86-64.so.2 (0x00007ff0b91d4000)

I'm not sure why it doesn't complain about e.g. setRunDims.

Jul 27 '22 16:07 SFrijters

@SFrijters wrote:

I tried adding more stuff from the '.cpp' to the headers, and it does seem to work in the sense that it changes the error message, but it wants even more functions.

I also just enabled linking again for launcher kernels and that seems to have fixed the tests (still need to clean up the commits).

In non-Windows builds, all those const primitiveinfo templates are only "declared" in file typeinfo.hpp. To find the actual "definitions", you'd need to link the source file typeinfo.cpp.

On the other hand, the MSVC compiler forces me to fully define the templates in the header. See lines 42 and 208 in NBN: typeinfo.hpp.

Here's the difference:

gcc:

  // NBN: these are defined in /occa/src/types/typeinfo.cpp
  template <> const std::string primitiveinfo<char>::id;
  template <> const std::string primitiveinfo<char>::name;
  template <> const bool        primitiveinfo<char>::isUnsigned;

Visual C++:

  // NBN: MSVC error C2737: "const object must be initialized"
  template <> const std::string primitiveinfo<char>::id = "c";
  template <> const std::string primitiveinfo<char>::name = "char";
  template <> const bool        primitiveinfo<char>::isUnsigned = false;

just a thought, Nigel

Jul 27 '22 20:07 nnnunnn

Any updates on getting that PR in?

Feb 28 '23 11:02 amikstcyr

Any updates on getting that PR in?

For sure if it is good-to-go. Stefan had marked this as a draft; however, it could be that he was just waiting for use to say if it would affect any downstream projects.

Mar 02 '23 19:03 kris-rowe

Superseded by #675 .

Mar 29 '23 10:03 SFrijters

Add kernel/require_occa property to control including and linking occa into kernels

Description

Codecov Report