numo-narray icon indicating copy to clipboard operation
numo-narray copied to clipboard

Lots of memcpy are issued on a.dot(b.transpose)

Open sonots opened this issue 7 years ago • 6 comments

With numo-linalg (I did not investigate without numo-linalg),

a = Numo::SFloat.new(3, 4).seq(0)
b = Numo::SFloat.new(3, 4).seq(0)
a.dot(b.transpose)

issues 12 times of memcpy at

https://github.com/ruby-numo/numo-narray/blob/7bba089c3114ff08d110e47ba8bc448aa775f0d4/ext/numo/narray/ndloop.c#L1127

This drastically slows down matrix dot computation.

sonots avatar Apr 23 '18 08:04 sonots

The transpose method returns a view of NArray. In general, operations on view arrays are slow due to discontinuous memory access. I recommend the use of copied array obtained by b.transpose.dup.

masa16 avatar Apr 24 '18 03:04 masa16

If we can find whether narray data is C contiguous, we can provide a helper at extra.rb like:

unless a.c_contiguous?
   b = b.dup
end
a.dot(b)

How about this idea?

FYI: CuPy does this https://github.com/cupy/cupy/blob/f96dacb7f9ff34fb806046072a65ac02d7cdf3d1/cupy/core/core.pyx#L3858

sonots avatar Apr 24 '18 03:04 sonots

Also, cuBLAS (maybe cBLAS also) has transa and transb option, so it is possible to compute

a.dot(b.transpose)

like

a.gemm(b, transb: true)

for faster computation.

I want to provide a helper feature to convert the former into the latter internally automatically if possible. I felt it could be done if narray has a flag which was transposed, but it looks it does not have. Any idea to make it possible?

sonots avatar Apr 24 '18 03:04 sonots

I feel these two examples are semantically different. It may be possible to recognize as a transposed array by stride length.

masa16 avatar Apr 24 '18 06:04 masa16

It may be possible to recognize as a transposed array by stride length.

I think that it is possible to recognize it as a transposed array with stride length at the following judgment.

  • View Case
  • offset = 0
  • stridx[0, 1, ... ,N-2 ,N-1] == [dtype * shape[N-1] * shape[N-2] .. * shape[2] * shape[1], dtype * shape[N-1] * shape[N-2] .. * shape[2], : dtype * shape[N-1], dtype]
    • => Not Transposed
  • stridx[0, 1, ... ,N-2 ,N-1] == [dtype, dtype * shape[0], : dtype * shape[0] * shape[1] .. * shape[N-3], dtype * shape[0] * shape[1] .. * shape[N-3] * shape[N-2]]
    • => Transposed

Sample Code

$ cat stride.rb
a = Numo::SFloat.new(1, 2, 3).seq(0)
at = a.transpose
att = at.transpose

puts "[Not Transpose Case (View)]"
p att
p att.debug_info
puts "[Transpose Case (View)]"
p at
p at.debug_info
$ ruby stride.rb
[Not Transpose Case (View)]
Numo::SFloat(view)#shape=[1,2,3]
[[[0, 1, 2], 
  [3, 4, 5]]]
Numo::SFloat:
  id     = 0x7fcff285daa8
  type   = 2
  flag   = [0,0]
  size   = 6
  ndim   = 3
  shape  = 0x7fcff0efb3e0
  shape  = [ 1 2 3 ]
  data   = 0x7fcff285dfa8
  offset = 0
  stridx = 0x7fcff0efb400
  stridx = [ 24 12 4 ]
nil
[Transpose Case (View)]
Numo::SFloat(view)#shape=[3,2,1]
[[[0], 
  [3]], 
 [[1], 
  [4]], 
 [[2], 
  [5]]]
Numo::SFloat:
  id     = 0x7fcff285dad0
  type   = 2
  flag   = [0,0]
  size   = 6
  ndim   = 3
  shape  = 0x7fcff3004c80
  shape  = [ 3 2 1 ]
  data   = 0x7fcff285dfa8
  offset = 0
  stridx = 0x7fcff0ec2ca0
  stridx = [ 4 12 24 ]
nil

Not Transposed Case

  • SFloat = 4 (=dtype)
  • shape = [ 1 2 3 ]
  • stridx = [ 24 12 4 ]

Not Transposed Chack => match

  • stridx[0] == dtype * shape[2] * shape[1] => 24
  • stridx[1] == dtype * shape[2] => 12
  • stridx[2] == dtype => 4

Transposed Chack => not match

  • stridx[0] != dtype => 4
  • stridx[1] != dtype * shape[0] => 4
  • stridx[2] != dtype * shape[0] * shape[1] => 8

Transposed Case

  • SFloat = 4 (=dtype)
  • shape = [ 3 2 1 ]
  • stridx = [ 4 12 24 ]

Not Transposed Chack => not match

  • stridx[0] != dtype * shape[2] * shape[1] => 8
  • stridx[1] != dtype * shape[2] => 4
  • stridx[2] != dtype => 4

Transposed Chack => match

  • stridx[0] == dtype => 4
  • stridx[1] == dtype * shape[0] => 12
  • stridx[2] == dtype * shape[0] * shape[1] => 24

Fix Code Sample

If Numo::NArray has a "transpose?" method, I think that it is better to use "transa", "transb" designation after "transpose" instead of "dup".

https://github.com/ruby-numo/numo-linalg/blob/master/lib/numo/linalg/function.rb#L82

  def dot(a, b)
    a = NArray.asarray(a)
    b = NArray.asarray(b)
    case a.ndim
    when 1
      case b.ndim
      when 1
        func = blas_char(a, b) =~ /c|z/ ? :dotu : :dot
        Blas.call(func, a, b)
      else
        Blas.call(:gemv, b, a, trans:'t')
      end
    else
      case b.ndim
      when 1
        Blas.call(:gemv, a, b)
      else
        if a.transpose?
          transa='t'
          a = a.transpose
        else
          transa='n'
        end
        if b.transpose?
          transb='t'
          b = b.transpose
        else
          transb='n'
        end

        Blas.call(:gemm, a, b, transa:transa, transb:transb)
      end
    end
  end

This is considered only for gemm case, but I think that gemv can cope in the same way.

naitoh avatar Oct 28 '18 10:10 naitoh

I think this problem has been fixed in numo-narray 0.9.1.4, numo-linalg 0.1.4, but what do you think?

naitoh avatar Jan 11 '19 11:01 naitoh