Executing long IR listing in MCJIT works on macOS but crashes on Windows (both .NET Core 3)

Open delreluca opened this issue 5 years ago • 1 comments

Hi there,

I am tinkering with LLVMSharp 5 to compile numerical computations described via formulas. The first goal is running them via MCJIT on x86-64. I have code at https://github.com/delreluca/AutoExpr/commit/d6ebef7cc41f06079c5b6ebb0783d640a4c4a737

I am a beginner and want to describe my problem to understand whether it's more on the .NET side or on the LLVM side. If this is not the correct place I'm very grateful for pointers on where to ask. I'm also grateful if you have any pointers on how to debug something like this in depth (I can use Visual Studio but not sure on how to step through the JITted code?).

I decided to use Intel's IPP library instead of doing arithmetic in IR directly (because of calculating 10,000 scenarios).

The listing in the end is just a long sequence of calls with some intermediate calculations (see IPP reference for the definitions), which I set up here and here

The code I referenced works fine on macOS with .NET Core 3. But on Windows I get an AccessViolationException; to run it on Windows I changed

LLVM.SetTarget(Generator.Module, Marshal.PtrToStringAnsi(LLVM.GetDefaultTargetTriple()) + "-elf"); because the LLVM version does not support PE
LLVM.SetFunctionCallConv(fn, 64); to set stdcall, although I guess it is not even needed
Change paths to DLL (from: libipp{}.dylib to ipp{}.dll) but still load via NativeLibrary.Load/GetExport)

Interestingly, if I reduce the computation length (and thus the IR listing length) it works on Windows as well.

If I dump the module I get the following. The shorter code that runs would only use variable up to%843. The fixed addresses 2615317749824, 2615317829824, etc. are where the result, its gradient and initial inputs are kept, it is allocated on the .NET side (via Marshal.AllocHGlobal) before executing (and read out after execution).

; ModuleID = 'CodeGen'
source_filename = "CodeGen"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc-elf"

define void @FUNC() #0 {
ENTRY:
  %0 = call i64 @ippsMalloc_64f(i32 20000)
  %1 = add i64 %0, mul (i64 ptrtoint (double* getelementptr (double, double* null, i32 1) to i64), i64 10000)
  %2 = call i32 @ippsZero_64f(i64 %0, i32 20000)
  %3 = call i64 @ippsMalloc_64f(i32 30000)
  %4 = add i64 %3, mul (i64 ptrtoint (double* getelementptr (double, double* null, i32 1) to i64), i64 10000)
  %5 = add i64 %3, mul (i64 ptrtoint (double* getelementptr (double, double* null, i32 1) to i64), i64 20000)
  %6 = call i32 @ippsSet_64f(double 1.000000e+00, i64 %3, i32 30000)
  %7 = call i32 @ippsCopy_64f(i64 2615317909824, i64 2615317749824, i32 10000)
  %8 = call i32 @ippsSet_64f(double 1.000000e+00, i64 2615317829824, i32 10000)

            SNIP
 
  call void @ippsFree(i64 %168)
  %890 = call i32 @ippsAdd_64f_I(i64 2615317749824, i64 %112, i32 10000)
  %891 = call i32 @ippsAdd_64f_I(i64 2615317829824, i64 %113, i32 10000)
  %892 = call i32 @ippsCopy_64f(i64 %112, i64 2615317749824, i32 10000)
  %893 = call i32 @ippsCopy_64f(i64 %113, i64 2615317829824, i32 10000)
  call void @ippsFree(i64 %112)
  %894 = call i32 @ippsAdd_64f_I(i64 2615317749824, i64 %56, i32 10000)
  %895 = call i32 @ippsAdd_64f_I(i64 2615317829824, i64 %57, i32 10000)
  %896 = call i32 @ippsCopy_64f(i64 %56, i64 2615317749824, i32 10000)
  %897 = call i32 @ippsCopy_64f(i64 %57, i64 2615317829824, i32 10000)
  call void @ippsFree(i64 %56)
  %898 = call i32 @ippsAdd_64f_I(i64 2615317749824, i64 %0, i32 10000)
  %899 = call i32 @ippsAdd_64f_I(i64 2615317829824, i64 %1, i32 10000)
  %900 = call i32 @ippsCopy_64f(i64 %0, i64 2615317749824, i32 10000)
  %901 = call i32 @ippsCopy_64f(i64 %1, i64 2615317829824, i32 10000)
  call void @ippsFree(i64 %0)
  %902 = call i32 @ippsExp_64f_I(i64 2615317749824, i32 10000)
  %903 = call i32 @ippsMul_64f_I(i64 2615317749824, i64 2615317829824, i32 10000)
  ret void
}
declare x86_stdcallcc i64 @ippsMalloc_64f(i32) #0
 
declare x86_stdcallcc void @ippsFree(i64) #0
 
declare x86_stdcallcc i32 @ippsSet_64f(double, i64, i32) #0
 
declare x86_stdcallcc i32 @ippsZero_64f(i64, i32) #0
 
declare x86_stdcallcc i32 @ippsExp_64f_I(i64, i32) #0
 
declare x86_stdcallcc i32 @ippsCopy_64f(i64, i64, i32) #0
 
declare x86_stdcallcc i32 @ippsAdd_64f_I(i64, i64, i32) #0
 
declare x86_stdcallcc i32 @ippsMul_64f_I(i64, i64, i32) #0
 
; Function Attrs: nounwind
declare void @llvm.stackprotector(i8*, i8**) #1
 
attributes #0 = { "no-frame-pointer-elim"="false" }
attributes #1 = { nounwind }

I also checked the pointers of the JIT entry point and the loaded DLL exports. Then I looked at the disassembly of the entry point and while it seems reasonable (not an expert though) I was wondering why the call addresses have changed?

Pointer to ippsMalloc_64f: 0x00007ffb2d731036
Entry point from MCJIT:    0x00000260ecd70000
 
00000260ECD70000  mov         eax,1118h 
00000260ECD70005  call        00000260982A1730 
00000260ECD7000A  sub         rsp,rax 
00000260ECD7000D  mov         ecx,4E20h  //20,000
00000260ECD70012  call        00000260ECD775A3 
00000260ECD70017  mov         edx,4E20h  //20,000
00000260ECD7001C  xor         ecx,ecx 
00000260ECD7001E  mov         r8d,ecx 
00000260ECD70021  add         r8,8 
00000260ECD70025  imul        r8,r8,2710h  // 10,000
00000260ECD7002C  mov         r9,rax 
00000260ECD7002F  add         r9,r8 
00000260ECD70032  mov         rcx,rax 
00000260ECD70035  mov         qword ptr [rsp+1110h],rax 
00000260ECD7003D  mov         qword ptr [rsp+1108h],r9 
00000260ECD70045  call        00000260ECD775A9

I also tested simpler code that doesn't dynamically allocate memory, but once it is long enough we crash. So I think it's a more general issue.

Dec 04 '20 13:12 delreluca

I solved it by updating libLLVM to 6.0.1. (As a side effect the ELF target is also not needed anymore.) Even libLLVM 11.0.0 seems to work together with LLVMSharp 5.0.0.

Is this officially supported or might these combinations lead to unexpected errors?

Dec 05 '20 12:12 delreluca