simd-math icon indicating copy to clipboard operation
simd-math copied to clipboard

parallel_reduce with native simd type => seg fault in OpenMP

Open pkestene opened this issue 4 years ago • 0 comments

Hello,

I was just trying to test a parallel_reduce (sum) using one of the native simd type and found a seg fault that seems to be associated with a wrong memory alignment in the return value of HostThreadTeamData::pool_reduce_local()

To illustrate this, I've updated avx.hpp to provide operator += (used in the reduce join operation), and used a custom reducer provided below.

// custom reducer for simd type (here avx)
template <class T, class Space>
struct SimdReducer {
 public:

  using simd_t = simd::simd<float,simd::simd_abi::native>;
  //using simd_t = simd::simd<T,simd::simd_abi::pack<4>>;

  using simd_storage_t = simd_t::simd_storage_t;

  // Required
  using reducer = SimdReducer<T, Space>;
  using value_type = simd_t;
  using value_type_storage = simd_storage_t;
  using result_view_type = Kokkos::View<value_type, Space, Kokkos::MemoryUnmanaged>;

 private:
  result_view_type value;

 public:
  KOKKOS_INLINE_FUNCTION
  SimdReducer(value_type& value_) : value(&value_) {}

  // Required
  KOKKOS_INLINE_FUNCTION
  void join(value_type& dest, const value_type& src) const {
    dest += src;
  }

  KOKKOS_INLINE_FUNCTION
  void join(volatile value_type& dest, const volatile value_type& src) const {
    dest += src;
  }

  KOKKOS_INLINE_FUNCTION
  void init(value_type& val) const {
    printf("before init %p\n",&val);
    val = simd_t(0.0); // seg fault here
    printf("after init\n");
  }

  KOKKOS_INLINE_FUNCTION
  value_type& reference() const { return *value.data(); }

  KOKKOS_INLINE_FUNCTION
  result_view_type view() const { return value; }

  KOKKOS_INLINE_FUNCTION
  bool references_scalar() const { return true; }
};

  • a parallel_reduce with this reducer works fine if device is Serial, but gives me a segmentation fault when I use device OpenMP (whatever the number of threads)
  • If I change simd type to be e.g. simd_abi::pack<4>, the crash disappears, and it works fine.
  • here when compiling for avx, simd<float,simd::simd_abi::native> is 32 bytes, but when I print in reducer init the address of the reference value coming from the call to pool_reduce_local() (in HostThreadTeamData), the address is 16 bytes aligned, but I think it should be 32 bytes aligned. I think this explains the seg fault.

I may be wrong but I think it is necessary to control alignment inside HostThreadTeamData so that the returned pointer is properly align.

pkestene avatar Oct 27 '21 15:10 pkestene