feast icon indicating copy to clipboard operation
feast copied to clipboard

segfaults when using go feature server

Open achals opened this issue 3 years ago • 2 comments

Expected Behavior

The go feature server should not segfault

Current Behavior

With the following diff:

diff --git a/sdk/python/tests/integration/online_store/test_universal_online.py b/sdk/python/tests/integration/online_store/test_universal_online.py
index b01448e7..212611b8 100644
--- a/sdk/python/tests/integration/online_store/test_universal_online.py
+++ b/sdk/python/tests/integration/online_store/test_universal_online.py
@@ -524,6 +524,9 @@ def test_stream_feature_view_online_retrieval(
 def test_online_retrieval(
     environment, universal_data_sources, feature_server_endpoint, full_feature_names
 ):
+
+    import faulthandler; faulthandler.enable()
+
     fs = environment.feature_store
     entities, datasets, data_sources = universal_data_sources
     feature_views = construct_universal_feature_views(data_sources)
@@ -620,13 +623,17 @@ def test_online_retrieval(
     unprefixed_feature_refs.remove("conv_rate_plus_100")
     unprefixed_feature_refs.remove("conv_rate_plus_val_to_add")

-    online_features_dict = get_online_features_dict(
-        environment=environment,
-        endpoint=feature_server_endpoint,
-        features=feature_refs,
-        entity_rows=entity_rows,
-        full_feature_names=full_feature_names,
-    )
+    import psutil
+
+    for i in range(1000):
+        print(psutil.virtual_memory())
+        online_features_dict = get_online_features_dict(
+            environment=environment,
+            endpoint=feature_server_endpoint,
+            features=feature_refs,
+            entity_rows=entity_rows,
+            full_feature_names=full_feature_names,
+        )

If I run test_online_retrieval, I can reliable segfault and terminate the pytest process.

With some digging, I was able to get a stack trace that looks like this:

runtime: pointer 0xc0006b0151 to unallocated span span.base()=0xc0006b0000 span.limit=0xc0006b2000 span.state=0
fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?)

runtime stack:
runtime.throw({0x144b9685e?, 0x10?})
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/panic.go:992 +0x71 fp=0x7ff7b3210460 sp=0x7ff7b3210430 pc=0x143d19df1
runtime.badPointer(0x142bfb068, 0xc0004f75c0?, 0x0, 0x7ff7b32104f0?)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mbitmap.go:368 +0xa5 fp=0x7ff7b32104b0 sp=0x7ff7b3210460 pc=0x143cf7a25
runtime.findObject(0xc0006b0151, 0xd0?, 0x7ff7b3210530?)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mbitmap.go:410 +0xed fp=0x7ff7b3210500 sp=0x7ff7b32104b0 pc=0x143cf7cad
runtime.wbBufFlush1(0xc00004c000)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mwbbuf.go:260 +0xe5 fp=0x7ff7b3210570 sp=0x7ff7b3210500 pc=0x143d14f25
runtime.wbBufFlush.func1()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mwbbuf.go:201 +0x2a fp=0x7ff7b3210588 sp=0x7ff7b3210570 pc=0x143d14d6a
runtime.systemstack()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:469 +0x49 fp=0x7ff7b3210590 sp=0x7ff7b3210588 pc=0x143d49329

goroutine 17 [running, locked to thread]:
runtime.systemstack_switch()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:436 fp=0xc00071ad58 sp=0xc00071ad50 pc=0x143d492c0
runtime.wbBufFlush(0x1428ec923?, 0x143cef65e?)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mwbbuf.go:200 +0x6e fp=0xc00071ad78 sp=0xc00071ad58 pc=0x143d14dee
runtime.bulkBarrierPreWrite(0xc00071ae20?, 0xc00071af40, 0x10)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mbitmap.go:620 +0x271 fp=0xc00071adf8 sp=0xc00071ad78 pc=0x143cf8371
runtime.typedmemmove(0x144ccb0e0, 0xc000992460, 0xc00071af40)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mbarrier.go:162 +0x45 fp=0xc00071ae30 sp=0xc00071adf8 pc=0x143cf7125
sync.(*Map).Store(0x1453da440, {0x144ca1480, 0xc00062ce20}, {0x144d763a0, 0xc00059e840})
	/usr/local/Cellar/go/1.18.3/libexec/src/sync/map.go:137 +0x91 fp=0xc00071af28 sp=0xc00071ae30 pc=0x143d4fc51
github.com/apache/arrow/go/v8/arrow/cdata.storeData({0x144e032d8, 0xc00059e840})
	/Users/achal/go/pkg/mod/github.com/apache/arrow/go/[email protected]/arrow/cdata/exports.go:45 +0xc5 fp=0xc00071af90 sp=0xc00071af28 pc=0x1441674e5
github.com/apache/arrow/go/v8/arrow/cdata.exportArray({0x144e044c0, 0xc0005ad6c0}, 0x7fd7443c0990, 0x0)
	/Users/achal/go/pkg/mod/github.com/apache/arrow/go/[email protected]/arrow/cdata/cdata_exports.go:356 +0x365 fp=0xc00071b260 sp=0xc00071af90 pc=0x144166945
github.com/apache/arrow/go/v8/arrow/cdata.exportArray({0x144e04800, 0xc0005ad540}, 0x6000025299b0, 0x0)
	/Users/achal/go/pkg/mod/github.com/apache/arrow/go/[email protected]/arrow/cdata/cdata_exports.go:385 +0xae7 fp=0xc00071b530 sp=0xc00071b260 pc=0x1441670c7
github.com/apache/arrow/go/v8/arrow/cdata.ExportArrowRecordBatch({0x144e04b40, 0xc00081fc20}, 0x6000025299b0, 0x6000025298b0)
	/Users/achal/go/pkg/mod/github.com/apache/arrow/go/[email protected]/arrow/cdata/interface.go:217 +0x413 fp=0xc00071b740 sp=0xc00071b530 pc=0x14415c433
github.com/feast-dev/feast/go/embedded.(*OnlineFeatureService).GetOnlineFeatures(0xc0004b0000, {0xc0000b8c00, 0xa, 0x10}, {0x0, 0x0}, {0x600002529a30, 0x600002528530}, {0x600002529db0, 0x60000252a430}, ...)
	/Users/achal/tecton/feast/go/embedded/online_features.go:220 +0xef0 fp=0xc00071bbc8 sp=0xc00071b740 pc=0x1449fa470
main.embedded_OnlineFeatureService_GetOnlineFeatures(0x2, 0x3fd, 0x10d233320, 0x3fe, 0x3ff, 0x1, 0x400)
	/Users/achal/tecton/feast/build/lib.macosx-12.0-x86_64-3.8/feast/embedded_go/lib/embedded.go:1350 +0x365 fp=0xc00071bda0 sp=0xc00071bbc8 pc=0x144a063e5
_cgoexp_79ecca462223_embedded_OnlineFeatureService_GetOnlineFeatures(0x7ff7b3210608)
	_cgo_gotypes.go:1936 +0xcd fp=0xc00071be38 sp=0xc00071bda0 pc=0x144a0ec6d
runtime.cgocallbackg1(0x144a0eba0, 0xc00008ffe0?, 0x0)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/cgocall.go:314 +0x2b3 fp=0xc00071bf00 sp=0xc00071be38 pc=0x143ce5673
runtime.cgocallbackg(0x0?, 0x0?, 0x0?)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/cgocall.go:233 +0xf9 fp=0xc00071bf90 sp=0xc00071bf00 pc=0x143ce5319
runtime.cgocallbackg(0x144a0eba0, 0x7ff7b3210608, 0x0)
	<autogenerated>:1 +0x2f fp=0xc00071bfb8 sp=0xc00071bf90 pc=0x143d4d78f
runtime.cgocallback(0x0, 0x0, 0x0)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:971 +0xb4 fp=0xc00071bfe0 sp=0xc00071bfb8 pc=0x143d4b2f4
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc00071bfe8 sp=0xc00071bfe0 pc=0x143d4b501

goroutine 35 [select]:
runtime.gopark(0x144df51f8, 0x0, 0x9, 0x18, 0x1)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:361 +0xf2 fp=0xc000123cd8 sp=0xc000123ca8 pc=0x143d1c892
runtime.selectgo(0xc000123f80, 0xc000123e80, 0x0?, 0x0, 0x0?, 0x1)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/select.go:328 +0xa45 fp=0xc000123e30 sp=0xc000123cd8 pc=0x143d2bf65
github.com/go-redis/redis/v8/internal/pool.(*ConnPool).reaper(0xc000355400, 0xdf8475800)
	/Users/achal/go/pkg/mod/github.com/go-redis/redis/[email protected]/internal/pool/pool.go:485 +0x15d fp=0xc000123fb0 sp=0xc000123e30 pc=0x14462025d
github.com/go-redis/redis/v8/internal/pool.NewConnPool.func1()
	/Users/achal/go/pkg/mod/github.com/go-redis/redis/[email protected]/internal/pool/pool.go:111 +0x39 fp=0xc000123fe0 sp=0xc000123fb0 pc=0x14461d199
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc000123fe8 sp=0xc000123fe0 pc=0x143d4b501
created by github.com/go-redis/redis/v8/internal/pool.NewConnPool
	/Users/achal/go/pkg/mod/github.com/go-redis/redis/[email protected]/internal/pool/pool.go:111 +0x36e

goroutine 2 [force gc (idle)]:
runtime.gopark(0x144df51a0, 0x1453d9ff0, 0x11, 0x14, 0x1)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:361 +0xf2 fp=0xc00007af88 sp=0xc00007af58 pc=0x143d1c892
runtime.goparkunlock(0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:367 +0x2a fp=0xc00007afb8 sp=0xc00007af88 pc=0x143d1c92a
runtime.forcegchelper()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:301 +0xa5 fp=0xc00007afe0 sp=0xc00007afb8 pc=0x143d1c6c5
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc00007afe8 sp=0xc00007afe0 pc=0x143d4b501
created by runtime.init.6
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:289 +0x25

goroutine 18 [GC sweep wait]:
runtime.gopark(0x144df51a0, 0x1453daa40, 0xc, 0x14, 0x1)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:361 +0xf2 fp=0xc000076768 sp=0xc000076738 pc=0x143d1c892
runtime.goparkunlock(0x1?, 0x0?, 0x0?, 0x0?)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:367 +0x2a fp=0xc000076798 sp=0xc000076768 pc=0x143d1c92a
runtime.bgsweep(0x0?)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgcsweep.go:297 +0xd1 fp=0xc0000767c8 sp=0xc000076798 pc=0x143d07951
runtime.gcenable.func1()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:177 +0x26 fp=0xc0000767e0 sp=0xc0000767c8 pc=0x143cfd626
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc0000767e8 sp=0xc0000767e0 pc=0x143d4b501
created by runtime.gcenable
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:177 +0x6b

goroutine 19 [GC scavenge wait]:
runtime.gopark(0x144df51a0, 0x1453daa00, 0xd, 0x14, 0x1)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:361 +0xf2 fp=0xc000076ef0 sp=0xc000076ec0 pc=0x143d1c892
runtime.goparkunlock(0x1453f5d28?, 0x0?, 0x0?, 0x0?)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:367 +0x2a fp=0xc000076f20 sp=0xc000076ef0 pc=0x143d1c92a
runtime.bgscavenge(0x0?)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgcscavenge.go:364 +0x2ca fp=0xc000076fc8 sp=0xc000076f20 pc=0x143d0556a
runtime.gcenable.func2()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:178 +0x26 fp=0xc000076fe0 sp=0xc000076fc8 pc=0x143cfd5c6
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc000076fe8 sp=0xc000076fe0 pc=0x143d4b501
created by runtime.gcenable
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:178 +0xaa

goroutine 34 [finalizer wait]:
runtime.gopark(0x144df51a0, 0x14540d140, 0x10, 0x14, 0x1)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:361 +0xf2 fp=0xc000120e00 sp=0xc000120dd0 pc=0x143d1c892
runtime.goparkunlock(0x0?, 0x60?, 0xd6?, 0xc000136000?)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:367 +0x2a fp=0xc000120e30 sp=0xc000120e00 pc=0x143d1c92a
runtime.runfinq()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mfinal.go:177 +0xab fp=0xc000120fe0 sp=0xc000120e30 pc=0x143cfc6ab
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc000120fe8 sp=0xc000120fe0 pc=0x143d4b501
created by runtime.createfing
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mfinal.go:157 +0x45

goroutine 36 [select, locked to thread]:
runtime.gopark(0x144df51f8, 0x0, 0x9, 0x18, 0x1)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:361 +0xf2 fp=0xc0003e6df8 sp=0xc0003e6dc8 pc=0x143d1c892
runtime.selectgo(0xc0003e6fa8, 0xc0003e6fa0, 0x0?, 0x0, 0x0?, 0x1)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/select.go:328 +0xa45 fp=0xc0003e6f50 sp=0xc0003e6df8 pc=0x143d2bf65
runtime.ensureSigM.func1()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/signal_unix.go:973 +0x14a fp=0xc0003e6fe0 sp=0xc0003e6f50 pc=0x143d3006a
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc0003e6fe8 sp=0xc0003e6fe0 pc=0x143d4b501
created by runtime.ensureSigM
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/signal_unix.go:956 +0xbd

goroutine 3 [syscall]:
runtime.sigNoteSleep(0x0)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/os_darwin.go:123 +0x1e fp=0xc00007b790 sp=0xc00007b758 pc=0x143d168fe
os/signal.signal_recv()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/sigqueue.go:148 +0x28 fp=0xc00007b7b0 sp=0xc00007b790 pc=0x143d47968
os/signal.loop()
	/usr/local/Cellar/go/1.18.3/libexec/src/os/signal/signal_unix.go:23 +0x1d fp=0xc00007b7e0 sp=0xc00007b7b0 pc=0x143ea003d
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc00007b7e8 sp=0xc00007b7e0 pc=0x143d4b501
created by os/signal.Notify.func1.1
	/usr/local/Cellar/go/1.18.3/libexec/src/os/signal/signal.go:151 +0x2e

goroutine 37 [GC worker (idle)]:
runtime.gopark(0x144df5038, 0xc0001b2da0, 0x18, 0x14, 0x0)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:361 +0xf2 fp=0xc0003e7750 sp=0xc0003e7720 pc=0x143d1c892
runtime.gcBgMarkWorker()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1207 +0x107 fp=0xc0003e77e0 sp=0xc0003e7750 pc=0x143cff187
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc0003e77e8 sp=0xc0003e77e0 pc=0x143d4b501
created by runtime.gcBgMarkStartWorkers
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1131 +0x25

goroutine 38 [runnable]:
runtime.gcMarkDone()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:775 +0x239 fp=0xc0003e2750 sp=0xc0003e2748 pc=0x143cfe299
runtime.gcBgMarkWorker()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1331 +0x296 fp=0xc0003e27e0 sp=0xc0003e2750 pc=0x143cff316
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc0003e27e8 sp=0xc0003e27e0 pc=0x143d4b501
created by runtime.gcBgMarkStartWorkers
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1131 +0x25

goroutine 23 [GC worker (idle)]:
runtime.gopark(0x144df5038, 0xc00014a2a0, 0x18, 0x14, 0x0)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:361 +0xf2 fp=0xc0003e6750 sp=0xc0003e6720 pc=0x143d1c892
runtime.gcBgMarkWorker()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1207 +0x107 fp=0xc0003e67e0 sp=0xc0003e6750 pc=0x143cff187
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc0003e67e8 sp=0xc0003e67e0 pc=0x143d4b501
created by runtime.gcBgMarkStartWorkers
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1131 +0x25

goroutine 52 [GC worker (idle)]:
runtime.gopark(0x144df5038, 0xc0001b2d20, 0x18, 0x14, 0x0)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:361 +0xf2 fp=0xc0003e2f50 sp=0xc0003e2f20 pc=0x143d1c892
runtime.gcBgMarkWorker()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1207 +0x107 fp=0xc0003e2fe0 sp=0xc0003e2f50 pc=0x143cff187
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc0003e2fe8 sp=0xc0003e2fe0 pc=0x143d4b501
created by runtime.gcBgMarkStartWorkers
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1131 +0x25

goroutine 67 [GC worker (idle)]:
runtime.gopark(0x144df5038, 0xc00003a0c0, 0x18, 0x14, 0x0)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:361 +0xf2 fp=0xc000506750 sp=0xc000506720 pc=0x143d1c892
runtime.gcBgMarkWorker()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1207 +0x107 fp=0xc0005067e0 sp=0xc000506750 pc=0x143cff187
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc0005067e8 sp=0xc0005067e0 pc=0x143d4b501
created by runtime.gcBgMarkStartWorkers
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1131 +0x25

goroutine 4 [GC worker (idle)]:
runtime.gopark(0x144df5038, 0xc0001b2d40, 0x18, 0x14, 0x0)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:361 +0xf2 fp=0xc00007bf50 sp=0xc00007bf20 pc=0x143d1c892
runtime.gcBgMarkWorker()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1207 +0x107 fp=0xc00007bfe0 sp=0xc00007bf50 pc=0x143cff187
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc00007bfe8 sp=0xc00007bfe0 pc=0x143d4b501
created by runtime.gcBgMarkStartWorkers
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1131 +0x25

goroutine 68 [GC worker (idle)]:
runtime.gopark(0x144df5038, 0xc0001b2d60, 0x18, 0x14, 0x0)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:361 +0xf2 fp=0xc000506f50 sp=0xc000506f20 pc=0x143d1c892
runtime.gcBgMarkWorker()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1207 +0x107 fp=0xc000506fe0 sp=0xc000506f50 pc=0x143cff187
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc000506fe8 sp=0xc000506fe0 pc=0x143d4b501
created by runtime.gcBgMarkStartWorkers
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1131 +0x25

goroutine 5 [GC worker (idle)]:
runtime.gopark(0x144df5038, 0xc0001b2d80, 0x18, 0x14, 0x0)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:361 +0xf2 fp=0xc00007c750 sp=0xc00007c720 pc=0x143d1c892
runtime.gcBgMarkWorker()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1207 +0x107 fp=0xc00007c7e0 sp=0xc00007c750 pc=0x143cff187
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc00007c7e8 sp=0xc00007c7e0 pc=0x143d4b501
created by runtime.gcBgMarkStartWorkers
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1131 +0x25

goroutine 6 [GC worker (idle)]:
runtime.gopark(0x144df5038, 0xc0001b2dc0, 0x18, 0x14, 0x0)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:361 +0xf2 fp=0xc00007cf50 sp=0xc00007cf20 pc=0x143d1c892
runtime.gcBgMarkWorker()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1207 +0x107 fp=0xc00007cfe0 sp=0xc00007cf50 pc=0x143cff187
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc00007cfe8 sp=0xc00007cfe0 pc=0x143d4b501
created by runtime.gcBgMarkStartWorkers
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1131 +0x25

goroutine 7 [GC worker (idle)]:
runtime.gopark(0x144df5038, 0xc0001b2de0, 0x18, 0x14, 0x0)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:361 +0xf2 fp=0xc00007d750 sp=0xc00007d720 pc=0x143d1c892
runtime.gcBgMarkWorker()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1207 +0x107 fp=0xc00007d7e0 sp=0xc00007d750 pc=0x143cff187
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc00007d7e8 sp=0xc00007d7e0 pc=0x143d4b501
created by runtime.gcBgMarkStartWorkers
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1131 +0x25

goroutine 69 [GC worker (idle)]:
runtime.gopark(0x144df5038, 0xc0001b2e00, 0x18, 0x14, 0x0)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:361 +0xf2 fp=0xc000507750 sp=0xc000507720 pc=0x143d1c892
runtime.gcBgMarkWorker()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1207 +0x107 fp=0xc0005077e0 sp=0xc000507750 pc=0x143cff187
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc0005077e8 sp=0xc0005077e0 pc=0x143d4b501
created by runtime.gcBgMarkStartWorkers
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1131 +0x25

goroutine 70 [GC worker (idle)]:
runtime.gopark(0x144df5038, 0xc0001b2e20, 0x18, 0x14, 0x0)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:361 +0xf2 fp=0xc000507f50 sp=0xc000507f20 pc=0x143d1c892
runtime.gcBgMarkWorker()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1207 +0x107 fp=0xc000507fe0 sp=0xc000507f50 pc=0x143cff187
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc000507fe8 sp=0xc000507fe0 pc=0x143d4b501
created by runtime.gcBgMarkStartWorkers
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1131 +0x25

goroutine 39 [GC worker (idle)]:
runtime.gopark(0x144df5038, 0xc00003a0e0, 0x18, 0x14, 0x0)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:361 +0xf2 fp=0xc0003e7f50 sp=0xc0003e7f20 pc=0x143d1c892
runtime.gcBgMarkWorker()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1207 +0x107 fp=0xc0003e7fe0 sp=0xc0003e7f50 pc=0x143cff187
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc0003e7fe8 sp=0xc0003e7fe0 pc=0x143d4b501
created by runtime.gcBgMarkStartWorkers
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1131 +0x25

goroutine 71 [GC worker (idle)]:
runtime.gopark(0x144df5038, 0xc00003a100, 0x18, 0x14, 0x0)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:361 +0xf2 fp=0xc000508750 sp=0xc000508720 pc=0x143d1c892
runtime.gcBgMarkWorker()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1207 +0x107 fp=0xc0005087e0 sp=0xc000508750 pc=0x143cff187
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc0005087e8 sp=0xc0005087e0 pc=0x143d4b501
created by runtime.gcBgMarkStartWorkers
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1131 +0x25

goroutine 8 [GC worker (idle)]:
runtime.gopark(0x144df5038, 0xc0001b2e40, 0x18, 0x14, 0x0)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:361 +0xf2 fp=0xc00007df50 sp=0xc00007df20 pc=0x143d1c892
runtime.gcBgMarkWorker()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1207 +0x107 fp=0xc00007dfe0 sp=0xc00007df50 pc=0x143cff187
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc00007dfe8 sp=0xc00007dfe0 pc=0x143d4b501
created by runtime.gcBgMarkStartWorkers
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1131 +0x25

goroutine 72 [GC worker (idle)]:
runtime.gopark(0x144df5038, 0xc0001b2e60, 0x18, 0x14, 0x0)
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/proc.go:361 +0xf2 fp=0xc000508f50 sp=0xc000508f20 pc=0x143d1c892
runtime.gcBgMarkWorker()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1207 +0x107 fp=0xc000508fe0 sp=0xc000508f50 pc=0x143cff187
runtime.goexit()
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc000508fe8 sp=0xc000508fe0 pc=0x143d4b501
created by runtime.gcBgMarkStartWorkers
	/usr/local/Cellar/go/1.18.3/libexec/src/runtime/mgc.go:1131 +0x25
/Users/achal/.pyenv/versions/3.8.12/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

The most suspicious segment of this trace is here:

sync.(*Map).Store(0x1453da440, {0x144ca1480, 0xc00062ce20}, {0x144d763a0, 0xc00059e840})
	/usr/local/Cellar/go/1.18.3/libexec/src/sync/map.go:137 +0x91 fp=0xc00071af28 sp=0xc00071ae30 pc=0x143d4fc51
github.com/apache/arrow/go/v8/arrow/cdata.storeData({0x144e032d8, 0xc00059e840})
	/Users/achal/go/pkg/mod/github.com/apache/arrow/go/[email protected]/arrow/cdata/exports.go:45 +0xc5 fp=0xc00071af90 sp=0xc00071af28 pc=0x1441674e5
github.com/apache/arrow/go/v8/arrow/cdata.exportArray({0x144e044c0, 0xc0005ad6c0}, 0x7fd7443c0990, 0x0)
	/Users/achal/go/pkg/mod/github.com/apache/arrow/go/[email protected]/arrow/cdata/cdata_exports.go:356 +0x365 fp=0xc00071b260 sp=0xc00071af90 pc=0x144166945
github.com/apache/arrow/go/v8/arrow/cdata.exportArray({0x144e04800, 0xc0005ad540}, 0x6000025299b0, 0x0)
	/Users/achal/go/pkg/mod/github.com/apache/arrow/go/[email protected]/arrow/cdata/cdata_exports.go:385 +0xae7 fp=0xc00071b530 sp=0xc00071b260 pc=0x1441670c7
github.com/apache/arrow/go/v8/arrow/cdata.ExportArrowRecordBatch({0x144e04b40, 0xc00081fc20}, 0x6000025299b0, 0x6000025298b0)
	/Users/achal/go/pkg/mod/github.com/apache/arrow/go/[email protected]/arrow/cdata/interface.go:217 +0x413 fp=0xc00071b740 sp=0xc00071b530 pc=0x14415c433
github.com/feast-dev/feast/go/embedded.(*OnlineFeatureService).GetOnlineFeatures(0xc0004b0000, {0xc0000b8c00, 0xa, 0x10}, {0x0, 0x0}, {0x600002529a30, 0x600002528530}, {0x600002529db0, 0x60000252a430}, ...)
	/Users/achal/tecton/feast/go/embedded/online_features.go:220 +0xef0 fp=0xc00071bbc8 sp=0xc00071b740 pc=0x1449fa470
main.embedded_OnlineFeatureService_GetOnlineFeatures(0x2, 0x3fd, 0x10d233320, 0x3fe, 0x3ff, 0x1, 0x400)
	/Users/achal/tecton/feast/build/lib.macosx-12.0-x86_64-3.8/feast/embedded_go/lib/embedded.go:1350 +0x365 fp=0xc00071bda0 sp=0xc00071bbc8 pc=0x144a063e5
_cgoexp_79ecca462223_embedded_OnlineFeatureService_GetOnlineFeatures(0x7ff7b3210608)
	_cgo_gotypes.go:1936 +0xcd fp=0xc00071be38 sp=0xc00071bda0 pc=0x144a0ec6d

Which makes me suspect that there's either a bug in arrow somewhere?

Steps to reproduce

I'll push my changes to a branch.

Possible Solution

?

achals avatar Jun 23 '22 22:06 achals

Fun finding:

Running this same test with GOGC=off fixes everything.

If I understand correctly, this is probably happening because we're using the default arrow allocator (memory.NewGoAllocator) but the docs for https://pkg.go.dev/github.com/apache/arrow/go/[email protected]/arrow/cdata#ExportArrowRecordBatch state the following:

As a result, if the function you're calling is going to hold onto the pointers or otherwise continue to reference the memory after the call returns, you should use the CgoArrowAllocator rather than the GoAllocator (or DefaultAllocator) so that the memory which is allocated for the record batch in the first place is allocated in C, not by the Go runtime and is therefore not subject to the Garbage collection.

So we should try to switch to using CgoArrowAllocator (which will proabably need some updates to gopy).

achals avatar Jun 24 '22 17:06 achals

Note that #2936 makes some progress towards resolving these segfaults by using CgoArrowAllocator, but there seem to be some lingering issues.

felixwang9817 avatar Jul 15 '22 00:07 felixwang9817

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Nov 12 '22 11:11 stale[bot]