Executing long IR listing in MCJIT works on macOS but crashes on Windows (both .NET Core 3)
Hi there,
I am tinkering with LLVMSharp 5 to compile numerical computations described via formulas. The first goal is running them via MCJIT on x86-64. I have code at https://github.com/delreluca/AutoExpr/commit/d6ebef7cc41f06079c5b6ebb0783d640a4c4a737
I am a beginner and want to describe my problem to understand whether it's more on the .NET side or on the LLVM side. If this is not the correct place I'm very grateful for pointers on where to ask. I'm also grateful if you have any pointers on how to debug something like this in depth (I can use Visual Studio but not sure on how to step through the JITted code?).
I decided to use Intel's IPP library instead of doing arithmetic in IR directly (because of calculating 10,000 scenarios).
The listing in the end is just a long sequence of calls with some intermediate calculations (see IPP reference for the definitions), which I set up here and here
The code I referenced works fine on macOS with .NET Core 3. But on Windows I get an AccessViolationException; to run it on Windows I changed
-
LLVM.SetTarget(Generator.Module, Marshal.PtrToStringAnsi(LLVM.GetDefaultTargetTriple()) + "-elf");because the LLVM version does not support PE -
LLVM.SetFunctionCallConv(fn, 64);to set stdcall, although I guess it is not even needed - Change paths to DLL (from:
libipp{}.dylibtoipp{}.dll) but still load viaNativeLibrary.Load/GetExport)
Interestingly, if I reduce the computation length (and thus the IR listing length) it works on Windows as well.
If I dump the module I get the following. The shorter code that runs would only use variable up to%843.
The fixed addresses 2615317749824, 2615317829824, etc. are where the result, its gradient and initial inputs are kept, it is allocated on the .NET side (via Marshal.AllocHGlobal) before executing (and read out after execution).
; ModuleID = 'CodeGen'
source_filename = "CodeGen"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc-elf"
define void @FUNC() #0 {
ENTRY:
%0 = call i64 @ippsMalloc_64f(i32 20000)
%1 = add i64 %0, mul (i64 ptrtoint (double* getelementptr (double, double* null, i32 1) to i64), i64 10000)
%2 = call i32 @ippsZero_64f(i64 %0, i32 20000)
%3 = call i64 @ippsMalloc_64f(i32 30000)
%4 = add i64 %3, mul (i64 ptrtoint (double* getelementptr (double, double* null, i32 1) to i64), i64 10000)
%5 = add i64 %3, mul (i64 ptrtoint (double* getelementptr (double, double* null, i32 1) to i64), i64 20000)
%6 = call i32 @ippsSet_64f(double 1.000000e+00, i64 %3, i32 30000)
%7 = call i32 @ippsCopy_64f(i64 2615317909824, i64 2615317749824, i32 10000)
%8 = call i32 @ippsSet_64f(double 1.000000e+00, i64 2615317829824, i32 10000)
SNIP
call void @ippsFree(i64 %168)
%890 = call i32 @ippsAdd_64f_I(i64 2615317749824, i64 %112, i32 10000)
%891 = call i32 @ippsAdd_64f_I(i64 2615317829824, i64 %113, i32 10000)
%892 = call i32 @ippsCopy_64f(i64 %112, i64 2615317749824, i32 10000)
%893 = call i32 @ippsCopy_64f(i64 %113, i64 2615317829824, i32 10000)
call void @ippsFree(i64 %112)
%894 = call i32 @ippsAdd_64f_I(i64 2615317749824, i64 %56, i32 10000)
%895 = call i32 @ippsAdd_64f_I(i64 2615317829824, i64 %57, i32 10000)
%896 = call i32 @ippsCopy_64f(i64 %56, i64 2615317749824, i32 10000)
%897 = call i32 @ippsCopy_64f(i64 %57, i64 2615317829824, i32 10000)
call void @ippsFree(i64 %56)
%898 = call i32 @ippsAdd_64f_I(i64 2615317749824, i64 %0, i32 10000)
%899 = call i32 @ippsAdd_64f_I(i64 2615317829824, i64 %1, i32 10000)
%900 = call i32 @ippsCopy_64f(i64 %0, i64 2615317749824, i32 10000)
%901 = call i32 @ippsCopy_64f(i64 %1, i64 2615317829824, i32 10000)
call void @ippsFree(i64 %0)
%902 = call i32 @ippsExp_64f_I(i64 2615317749824, i32 10000)
%903 = call i32 @ippsMul_64f_I(i64 2615317749824, i64 2615317829824, i32 10000)
ret void
}
declare x86_stdcallcc i64 @ippsMalloc_64f(i32) #0
declare x86_stdcallcc void @ippsFree(i64) #0
declare x86_stdcallcc i32 @ippsSet_64f(double, i64, i32) #0
declare x86_stdcallcc i32 @ippsZero_64f(i64, i32) #0
declare x86_stdcallcc i32 @ippsExp_64f_I(i64, i32) #0
declare x86_stdcallcc i32 @ippsCopy_64f(i64, i64, i32) #0
declare x86_stdcallcc i32 @ippsAdd_64f_I(i64, i64, i32) #0
declare x86_stdcallcc i32 @ippsMul_64f_I(i64, i64, i32) #0
; Function Attrs: nounwind
declare void @llvm.stackprotector(i8*, i8**) #1
attributes #0 = { "no-frame-pointer-elim"="false" }
attributes #1 = { nounwind }
I also checked the pointers of the JIT entry point and the loaded DLL exports. Then I looked at the disassembly of the entry point and while it seems reasonable (not an expert though) I was wondering why the call addresses have changed?
Pointer to ippsMalloc_64f: 0x00007ffb2d731036
Entry point from MCJIT: 0x00000260ecd70000
00000260ECD70000 mov eax,1118h
00000260ECD70005 call 00000260982A1730
00000260ECD7000A sub rsp,rax
00000260ECD7000D mov ecx,4E20h //20,000
00000260ECD70012 call 00000260ECD775A3
00000260ECD70017 mov edx,4E20h //20,000
00000260ECD7001C xor ecx,ecx
00000260ECD7001E mov r8d,ecx
00000260ECD70021 add r8,8
00000260ECD70025 imul r8,r8,2710h // 10,000
00000260ECD7002C mov r9,rax
00000260ECD7002F add r9,r8
00000260ECD70032 mov rcx,rax
00000260ECD70035 mov qword ptr [rsp+1110h],rax
00000260ECD7003D mov qword ptr [rsp+1108h],r9
00000260ECD70045 call 00000260ECD775A9
I also tested simpler code that doesn't dynamically allocate memory, but once it is long enough we crash. So I think it's a more general issue.
I solved it by updating libLLVM to 6.0.1. (As a side effect the ELF target is also not needed anymore.) Even libLLVM 11.0.0 seems to work together with LLVMSharp 5.0.0.
Is this officially supported or might these combinations lead to unexpected errors?