Clp.jl icon indicating copy to clipboard operation
Clp.jl copied to clipboard

Support for Apple M1

Open mkyl opened this issue 3 years ago • 50 comments

I cannot run Clp through JuMP, it crashes when as soon as I start the optimization. Here is a minimal example, where I am trying to minimize c^T x where A x = b and x >=0.

julia> using JuMP, Clp

julia> C =[-2.0,3.0,0.0, 0.0]
4-element Vector{Float64}:
 -2.0
  3.0
  0.0
  0.0

julia> A=[1.0 1.0 1.0 0.0;
       1.0 -1.0 0.0 1.0]
2×4 Matrix{Float64}:
 1.0   1.0  1.0  0.0
 1.0  -1.0  0.0  1.0

julia> b= [4.0,6.0]
2-element Vector{Float64}:
 4.0
 6.0

julia> model = Model(Clp.Optimizer)
A JuMP Model
Feasibility problem with:
Variables: 0
Model mode: AUTOMATIC
CachingOptimizer state: EMPTY_OPTIMIZER
Solver name: Clp

julia> @variable(model,x[1:4]>=0)
4-element Vector{VariableRef}:
 x[1]
 x[2]
 x[3]
 x[4]

julia> @objective(model,Min,C'*x)
-2 x[1] + 3 x[2]

julia> @constraint(model,A*x.==b)
2-element Vector{ConstraintRef{Model, MathOptInterface.ConstraintIndex{MathOptInterface.ScalarAffineFunction{Float64}, MathOptInterface.EqualTo{Float64}}, ScalarShape}}:
 x[1] + x[2] + x[3] = 4.0
 x[1] - x[2] + x[4] = 6.0

julia> optimize!(model)
julia(91740,0x100cd8580) malloc: *** error for object 0xe00000000000000: pointer being freed was not allocated
julia(91740,0x100cd8580) malloc: *** set a breakpoint in malloc_error_break to debug

signal (6): Abort trap: 6
in expression starting at REPL[9]:1
__pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
Allocations: 46636441 (Pool: 46623528; Big: 12913); GC: 37
fish: Job 1, 'julia' terminated by signal SIGABRT (Abort)

mkyl avatar Mar 01 '22 22:03 mkyl

I cannot reproduce. What is import Pkg; Pkg.status() and versioninfo()?

odow avatar Mar 01 '22 22:03 odow

I had a similar error on my Apple M1 machine:

julia> using Clp

julia> optimizer = Clp.Optimizer()
Clp.Optimizer

julia> exit()
julia(99879,0x1007ebd40) malloc: *** error for object 0xe00000000000000: pointer being freed was not allocated
julia(99879,0x1007ebd40) malloc: *** set a breakpoint in malloc_error_break to debug

signal (6): Abort trap: 6
in expression starting at REPL[4]:1
__pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
Allocations: 4250078 (Pool: 4247945; Big: 2133); GC: 3
zsh: abort      /Applications/Julia-1.7.app/Contents/Resources/julia/bin/Julia
julia> import Pkg; Pkg.status()
      Status `~/tmp/Clp/Project.toml`
  [e2554f3b] Clp v1.0.0 `https://github.com/jump-dev/Clp.jl#master`
julia> versioninfo()
Julia Version 1.7.1
Commit ac5cc99908 (2021-12-22 19:35 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.2.0)
  CPU: Apple M1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, cyclone)

oyamad avatar Mar 03 '22 03:03 oyamad

Apple M1 machine

Apple M1 is currently tier-3 support at Julia: https://julialang.org/downloads/#supported_platforms, so it's highly likely that you'll encounter segfaults like this.

Use Rosetta instead.

odow avatar Mar 03 '22 03:03 odow

Right, but I just wondered if it might work, having found this JuliaPackaging/Yggdrasil#4015.

oyamad avatar Mar 03 '22 03:03 oyamad

It compiles, but I don't know if we've ever tested all of the bugs and issues. I don't have an M1, so I'm not much help.

odow avatar Mar 03 '22 03:03 odow

We should rebuild the M1 libraries with a recent toolchain. Should I just bump Clp_jll to a recent version?

M1 is almost tier 1 - not yet but very close now.

ViralBShah avatar Oct 25 '22 13:10 ViralBShah

Should I just bump Clp_jll to a recent version?

I assume we need to rebuild the entire stack, not just Clp or Cbc. I'll take a look.

odow avatar Oct 25 '22 19:10 odow

Yeah should be the whole stack. Also MUMPS has had new releases for example. So good idea to bump the whole stack. And I would love to link to LBT as well. But one step at a time...

ViralBShah avatar Oct 25 '22 20:10 ViralBShah

To anyone stumbling across this until we get it fixed: use HiGHS.jl instead.

odow avatar Oct 25 '22 21:10 odow

The issue still exists after recompiling:

(m1-support) pkg> st Clp_jll
Status `~/Code/m1-support/Project.toml`
  [06985876] Clp_jll v100.1700.700+0

julia> versioninfo()
Julia Version 1.8.2
Commit 36034abf260 (2022-09-29 15:21 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.3.0)
  CPU: 8 × Apple M1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 1 on 4 virtual cores

julia> run(`$(Clp_jll.clp())`);
Coin LP version 1.17.7, build Oct 25 2022
clp(4926,0x1008c8580) malloc: *** error for object 0x600000e4c2a0: pointer being freed was not allocated
clp(4926,0x1008c8580) malloc: *** set a breakpoint in malloc_error_break to debug

so I guess this means it is likely an upstream problem.

odow avatar Oct 26 '22 20:10 odow

Should we file a Clp issue?

ViralBShah avatar Oct 26 '22 21:10 ViralBShah

cc @tkralphs have you ever looked at Clp on the M1?

odow avatar Oct 26 '22 21:10 odow

I have had some bug reports from people trying to use Clp/Cbc on M1, but they were related to bugs in the build system. My impression is that there are people successfully using Clp/Cbc on M1, but this is anecdotal and I have not tried myself (I also don't have an M1, but am considering getting one). I would suggest maybe just posting a Discussion asking if anyone has succeeded.

tkralphs avatar Oct 27 '22 18:10 tkralphs

I have access to an M1, so I'll try compiling locally, excluding all the BinaryBuilder stuff

odow avatar Oct 27 '22 19:10 odow

So I build [email protected] using coinbrew, and it worked without any issues.

oscardowson@Oscars-Mac-mini clp % ./dist/bin/clp test.mps
Coin LP version 1.17.7, build Oct 28 2022
command line - ./dist/bin/clp test.mps 
At line 1 NAME
At line 2 OBJSENSE
MIN found after OBJSENSE - Coin ignores
At line 4 ROWS
At line 7 COLUMNS
At line 11 RHS
At line 13 BOUNDS
At line 15 ENDATA
Problem no_name has 1 rows, 1 columns and 1 elements
Model was imported from ./test.mps in 0.000103 seconds
Presolve 0 (-1) rows, 0 (-1) columns and 0 (-1) elements
Empty problem - 0 rows, 0 columns and 0 elements
Optimal - objective value 1
After Postsolve, objective 1, infeasibilities - dual 0 (0), primal 0 (0)
Optimal objective 1 - 0 iterations time 0.002, Presolve 0.00

So I guess this is really some problem in the BB toolchain.

odow avatar Oct 28 '22 00:10 odow

Can you give a pointer to the coinbrew build file for Clp? We should see if the build recipes are the same.

ViralBShah avatar Oct 28 '22 03:10 ViralBShah

We should see if the build recipes are the same

https://github.com/coin-or/coinbrew/blob/master/coinbrew

but I'm pretty sure there are some differences.

The non-standard-y things we do are these sorts of flags: https://github.com/JuliaPackaging/Yggdrasil/blob/011176638a923e509ac0c64749867c9bd41a2284/C/Coin-OR/CoinUtils/build_tarballs.jl#L43-L47 https://github.com/JuliaPackaging/Yggdrasil/blob/011176638a923e509ac0c64749867c9bd41a2284/C/Coin-OR/Clp/build_tarballs.jl#L42-L47

there are also mumps and Metis: https://github.com/JuliaPackaging/Yggdrasil/blob/011176638a923e509ac0c64749867c9bd41a2284/C/Coin-OR/Clp/build_tarballs.jl#L52-L55

and we set an __arm__ flag: https://github.com/JuliaPackaging/Yggdrasil/blob/011176638a923e509ac0c64749867c9bd41a2284/C/Coin-OR/Clp/build_tarballs.jl#L34-L36

but I don't know what impact that has on M1.

odow avatar Oct 28 '22 04:10 odow

Those differences probably don't explain the segfault, but of course anything is possible.

@giordano are you familiar with any toolchain issues for M1 that might be causing this failure?

ViralBShah avatar Oct 28 '22 15:10 ViralBShah

Not really.

giordano avatar Oct 28 '22 15:10 giordano

@tkralphs The error is non-malloced memory being freed, which suggests some memory corruption.

https://github.com/JuliaLang/julia/issues/44824#issue-1189672682

ViralBShah avatar Oct 29 '22 17:10 ViralBShah

Just as an FYI, I have a current crash I'm seeing in a private application, the error:

julia-debug(61247,0x16becb000) malloc: *** error for object 0xe00000000000000: pointer being freed was not allocated
julia-debug(61247,0x16becb000) malloc: *** set a breakpoint in malloc_error_break to debug

the lldb backtrace includes:

 * frame #0: 0x00000001c01aad98 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x00000001c01dfee0 libsystem_pthread.dylib`pthread_kill + 288
    frame #2: 0x00000001c011a340 libsystem_c.dylib`abort + 168
    frame #3: 0x00000001bfffc8c0 libsystem_malloc.dylib`malloc_vreport + 552
    frame #4: 0x00000001bfffff34 libsystem_malloc.dylib`malloc_report + 64
    frame #5: 0x00000001bffeecf4 libsystem_malloc.dylib`free + 300
    frame #6: 0x000000029d4ecb00 libClp.1.14.6.dylib`ClpModel::~ClpModel() + 80
    frame #7: 0x000000029d534d84 libClp.1.14.6.dylib`ClpPresolve::gutsOfPresolvedModel(ClpSimplex*, double, bool, int, bool, bool, char const*, char const*) + 3376
    frame #8: 0x000000029d59b934 libClp.1.14.6.dylib`ClpSimplex::initialSolve(ClpSolve&) + 1252

@giordano, if you happen to know; how hard would it be to get debug builds of the Clp_jll so we coudl investigate further? unfortunately my lldb efforts weren't very fruitful because it's just trying to call MathOptInterface.optimize! and then evertyhign in Clp is assembly.

quinnj avatar Nov 10 '22 07:11 quinnj

Maybe at least rebuild these libraries in BB with --enable-debug to see if that helps.

ViralBShah avatar Nov 10 '22 13:11 ViralBShah

I was trying to do a debug build for Jacob, but aarch64-apple-darwin fails with some weird linking errors at the end (the non-debug build works, only the debug one fails, why?!?) which honestly I don't have the time to look at for the next ~10 days. In case anyone was curious:

/bin/sh ../../libtool --tag=CXX --mode=link c++  -g -O0 -pipe -Wparentheses -Wreturn-type -Wcast-qual -Wall -Wpointer-arith -Wwrite-strings -Wconversion -Wno-unknown-pragmas -Wno-long-long   -DCLP_BUILD   -o libClp.la -rpath /workspace/destdir/lib -no-undefined -version-info 15:7:14 ClpCholeskyBase.lo ClpCholeskyDense.lo ClpConstraint.lo ClpConstraintLinear.lo ClpConstraintQuadratic.lo Clp_C_Interface.lo ClpDualRowDantzig.lo ClpDualRowPivot.lo ClpDualRowSteepest.lo ClpDummyMatrix.lo ClpDynamicExampleMatrix.lo ClpDynamicMatrix.lo ClpEventHandler.lo ClpFactorization.lo ClpGubDynamicMatrix.lo ClpGubMatrix.lo ClpHelperFunctions.lo ClpInterior.lo ClpLinearObjective.lo ClpMatrixBase.lo ClpMessage.lo ClpModel.lo ClpNetworkBasis.lo ClpNetworkMatrix.lo ClpNonLinearCost.lo ClpNode.lo ClpObjective.lo ClpPackedMatrix.lo ClpPlusMinusOneMatrix.lo ClpPredictorCorrector.lo ClpPdco.lo ClpPdcoBase.lo ClpLsqr.lo ClpPresolve.lo ClpPrimalColumnDantzig.lo ClpPrimalColumnPivot.lo ClpPrimalColumnSteepest.lo ClpQuadraticObjective.lo ClpSimplex.lo ClpSimplexDual.lo ClpSimplexNonlinear.lo ClpSimplexOther.lo ClpSimplexPrimal.lo ClpSolve.lo Idiot.lo IdiSolve.lo ClpCholeskyPardiso.lo ClpPESimplex.lo ClpPEPrimalColumnDantzig.lo ClpPEPrimalColumnSteepest.lo ClpPEDualRowDantzig.lo ClpPEDualRowSteepest.lo     ClpCholeskyMumps.lo  -L/workspace/destdir/lib -ldmumps -lzmumps -lcmumps -lsmumps -lmumps_common -lmpiseq -lpord -lmetis -lopenblas -lgfortran -lpthread -lCoinUtils  
c++ -r -keep_private_externs -nostdlib -o .libs/libClp.1.14.7.dylib-master.o  .libs/ClpCholeskyBase.o .libs/ClpCholeskyDense.o .libs/ClpConstraint.o .libs/ClpConstraintLinear.o .libs/ClpConstraintQuadratic.o .libs/Clp_C_Interface.o .libs/ClpDualRowDantzig.o .libs/ClpDualRowPivot.o .libs/ClpDualRowSteepest.o .libs/ClpDummyMatrix.o .libs/ClpDynamicExampleMatrix.o .libs/ClpDynamicMatrix.o .libs/ClpEventHandler.o .libs/ClpFactorization.o .libs/ClpGubDynamicMatrix.o .libs/ClpGubMatrix.o .libs/ClpHelperFunctions.o .libs/ClpInterior.o .libs/ClpLinearObjective.o .libs/ClpMatrixBase.o .libs/ClpMessage.o .libs/ClpModel.o .libs/ClpNetworkBasis.o .libs/ClpNetworkMatrix.o .libs/ClpNonLinearCost.o .libs/ClpNode.o .libs/ClpObjective.o .libs/ClpPackedMatrix.o .libs/ClpPlusMinusOneMatrix.o .libs/ClpPredictorCorrector.o .libs/ClpPdco.o .libs/ClpPdcoBase.o .libs/ClpLsqr.o .libs/ClpPresolve.o .libs/ClpPrimalColumnDantzig.o .libs/ClpPrimalColumnPivot.o .libs/ClpPrimalColumnSteepest.o .libs/ClpQuadraticObjective.o .libs/ClpSimplex.o .libs/ClpSimplexDual.o .libs/ClpSimplexNonlinear.o .libs/ClpSimplexOther.o .libs/ClpSimplexPrimal.o .libs/ClpSolve.o .libs/Idiot.o .libs/IdiSolve.o .libs/ClpCholeskyPardiso.o .libs/ClpPESimplex.o .libs/ClpPEPrimalColumnDantzig.o .libs/ClpPEPrimalColumnSteepest.o .libs/ClpPEDualRowDantzig.o .libs/ClpPEDualRowSteepest.o .libs/ClpCholeskyMumps.o
ldid.cpp(707): _assert(): Swap(mach_header_->filetype) == MH_EXECUTE || Swap(mach_header_->filetype) == MH_DYLIB || Swap(mach_header_->filetype) == MH_DYLINKER || Swap(mach_header_->filetype) == MH_BUNDLE
c++ -dynamiclib  -o .libs/libClp.1.14.7.dylib .libs/libClp.1.14.7.dylib-master.o  -L/workspace/destdir/lib -ldmumps -lzmumps -lcmumps -lsmumps -lmumps_common -lmpiseq -lpord -lmetis -lopenblas -lgfortran -lpthread -lCoinUtils  -install_name  /workspace/destdir/lib/libClp.1.dylib -Wl,-compatibility_version -Wl,16 -Wl,-current_version -Wl,16.7
Undefined symbols for architecture arm64:
  "__ZN17CoinIndexedVector10checkClearEv", referenced from:
      __ZN18ClpDualRowSteepest13updateWeightsEP17CoinIndexedVectorS1_S1_S1_ in libClp.1.14.7.dylib-master.o
      __ZNK16ClpFactorization20updateColumnForDebugEP17CoinIndexedVectorS1_b in libClp.1.14.7.dylib-master.o
      __ZN16ClpSimplexPrimal11pivotResultEi in libClp.1.14.7.dylib-master.o
      __ZN12ClpPESimplex22identifyCompatibleColsEiPKiP17CoinIndexedVectorS3_ in libClp.1.14.7.dylib-master.o
      __ZN12ClpPESimplex22identifyCompatibleRowsEP17CoinIndexedVectorS1_ in libClp.1.14.7.dylib-master.o
ld: symbol(s) not found for architecture arm64
clang-13: error: linker command failed with exit code 1 (use -v to see invocation)
make[3]: *** [Makefile:802: libClp.la] Error 1
make[3]: Leaving directory '/workspace/srcdir/Clp/build/Clp/src'
make[2]: *** [Makefile:714: all] Error 2
make[2]: Leaving directory '/workspace/srcdir/Clp/build/Clp/src'
make[1]: *** [Makefile:519: all-recursive] Error 1
make[1]: Leaving directory '/workspace/srcdir/Clp/build/Clp'
make: *** [Makefile:324: all-recursive] Error 1

giordano avatar Nov 10 '22 14:11 giordano

OpenBLAS version bump. Perhaps that can help. https://github.com/JuliaPackaging/Yggdrasil/pull/5844

ViralBShah avatar Nov 10 '22 15:11 ViralBShah

The missing symbol is

sandbox:${WORKSPACE}/srcdir/Clp/build # c++filt _ZN17CoinIndexedVector10checkClearEv
CoinIndexedVector::checkClear()

I'm not sure OpenBLAS is related (but I also don't know Clp/Coin source code)

giordano avatar Nov 10 '22 15:11 giordano

No - but given how old our openblas is in Coin-OR (0.3.10!), perhaps there were other issues on M1? Just being optimistic. It won't help with this issue, but we can rebuild the known working version with a new openblas.

ViralBShah avatar Nov 10 '22 15:11 ViralBShah

But when doing dynamic linking the version of OpenBLAS shouldn't matter much (if at all), the ABI has been pretty stable. And the problem is only in the debug build (--enable-debug), the non-debug one (--disable-debug) still works fine.

giordano avatar Nov 10 '22 15:11 giordano

Ah ok, good to know. Maybe I should remove mention of OpenBLAS32 version in the coin-or builds then? Remember this is OpenBLAS32, and not OpenBLAS that is in Julia.

ViralBShah avatar Nov 10 '22 15:11 ViralBShah

Specifying the version is useful for building for compatibility purposes (very often if you build against a new version of a library then you can't use at runtime an older version on macOS), but then at runtime we can use whatever version is available.

giordano avatar Nov 10 '22 15:11 giordano

Thanks all for poking at this. I don't have much bandwidth for it at the moment, maybe next week.

tkralphs avatar Nov 10 '22 15:11 tkralphs