OpenBLAS icon indicating copy to clipboard operation
OpenBLAS copied to clipboard

ARM results in error

Open culurciello opened this issue 10 years ago • 8 comments

Dear developers, thank you for your great work on openBLAS.

using it on ARM 32 bit platforms and Ubuntu 14.04, we found some erroneous results used with Torch7:

The Lua code below should always give 1 as the result. On ARM it gives random numbers, if compiled with OpenMP (and 0.99999999993838 if compiled without).

require 'nn'

torch.setdefaulttensortype('torch.FloatTensor')
data = torch.Tensor(4, 58, 58)
for i = 1,4 do
 for j = 1,58 do
   for k = 1,58 do
     data[i][j][k] = i+j+k  
   end
 end
end
n = nn.Sequential()
n:add( nn.SpatialConvolutionMM(4, 64, 5, 5, 1, 1) )
n.modules[1].weight = torch.Tensor(64,100)
for i = 1,100 do
 n.modules[1].weight[1][i] = i
end
n.modules[1].bias = torch.Tensor(64)

n2 = nn.Sequential()
n2:add( nn.SpatialConvolutionMM(64, 64, 5, 5, 1, 1) )
n2.modules[1].weight = torch.Tensor(64,1600)
for i = 1,1600 do
 n2.modules[1].weight[1][i] = i
end
n2.modules[1].bias = torch.Tensor(64)

data = n:forward(data)
data = n2:forward(data)
out = 0
for i = 1,50 do
for j = 1,50 do
  out = out + data[1][i][j]
end
end
print(out/259643747536)

culurciello avatar Sep 16 '15 19:09 culurciello

@culurciello , which kernel do you use? Now, OpenBLAS only supports ARM hard FP ABI. Is it possible an ABI issue?

xianyi avatar Oct 05 '15 19:10 xianyi

I am working with @culurciello. We use the Odroid U3 and XU3. They use the hard FP ABI. This problem has been present one year ago and it's still present, we have switched various kernels and OpenBLAS versions. I have tried to write a simple C program that shows this defect, but unfortunately I did not succeed. This problem only appears in complex environments, but by printing intermediate results I found that the errors in calculations come from OpenBLAS. Thank you.

mvitez avatar Oct 05 '15 21:10 mvitez

@mvitez , could you try export OMP_NUM_THREADS=1? It looks like the application uses float, sgemm. Am I right?

xianyi avatar Oct 27 '15 00:10 xianyi

It works correctly with only one thread. We actually make OpenBLAS without NO_AFFINITY=1 USE_OPENMP=1 as we should and in such case it works with some limitations, but without errors, besides some segmentation faults, which are quite rare fortunately.

The applications uses float, sgemm, you are right.

mvitez avatar Oct 27 '15 08:10 mvitez

This old issue will hopefully have been fixed by the several rounds of thread safety improvements after about december 2016.

martin-frbg avatar Jul 04 '18 07:07 martin-frbg

Actually still same results unfortunately (though the OPENMP build seems to give "correct" results of the 0.99999...38 type with OMP_NUM_THREADS=2 as well, on a quad-core Asus tinkerboard). The recently added NUM_PARALLEL option does not appear to have any effect either. Not sure how to debug this, as both helgrind and tsan do not work well with OpenMP.

martin-frbg avatar Jul 07 '18 17:07 martin-frbg

Switching to USE_SIMPLE_THREADED_LEVEL3 "solves" it however.

martin-frbg avatar Jul 07 '18 21:07 martin-frbg

This appears to have been fixed in the meantime (to the extent that it now returns 0.99999..38 in every case), probably by the correction for #1851 that went into 0.3.4 already.

martin-frbg avatar Jan 02 '19 18:01 martin-frbg