basilisk icon indicating copy to clipboard operation
basilisk copied to clipboard

pybind11/GIL error with basilisk + reticulate 1.42.0

Open csoneson opened this issue 9 months ago • 6 comments

Hello,

since last week we are running into issues with some basilisk environments, which also manifest on the Bioc macOS builders (see orthos, both release and devel). The error is as follows:

pybind11::handle::dec_ref() is being called while the GIL is either not held or invalid. 
Please see https://pybind11.readthedocs.io/en/stable/advanced/misc.html#common-sources-of-global-interpreter-lock-errors for debugging advice.
If you are convinced there is no bug in your code, you can #define PYBIND11_NO_ASSERT_GIL_HELD_INCREF_DECREF
to disable this check. In that case you have to ensure this #define is consistently used for all translation units linked into 
a given pybind11 extension, otherwise there will be ODR violations. The failing pybind11::handle::dec_ref() call was triggered 
on a type object.
libc++abi: terminating due to uncaught exception of type std::runtime_error: pybind11::handle::dec_ref() PyGILState_Check() failure.

We can reproduce this locally (and on GitHub Actions), and it is triggered when the R session is closed, after executing the basilisk-based code (this can be the interactive session, or the session where R CMD check renders the vignette; we also see it in Rmd files where running rmarkdown::render does not trigger the error while clicking the Knit button does). Not all environments are affected (of our packages, only the orthos environment seems to suffer, but it's not due to orthos itself, see minimal example below).

More details and reproducible example: To trigger the error, we ran the following code in an interactive R session:

library(basilisk)

setupBasiliskEnv(
  envpath = file.path(tempdir(), "my-basilisk-env"), 
  packages = c("python==3.12.9", "tensorflow==2.18.0")
)
res <- basilisk::basiliskRun(
  env = file.path(tempdir(), "my-basilisk-env"), 
  fun = function() {
    tensorflow::tf$version$VERSION
  })

The code executes fine, but when the R session is closed (normally, with q()), the error above is generated.

Session info (in particular, `reticulate 1.42.0`)
R version 4.5.0 RC (2025-04-04 r88112)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.3.2

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] C

time zone: Europe/Zurich
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] basilisk_1.19.3   reticulate_1.42.0

loaded via a namespace (and not attached):
 [1] base64enc_0.1-3       Matrix_1.7-3          lattice_0.22-6
 [4] dir.expiry_1.15.0     magrittr_2.0.3        filelock_1.0.3
 [7] parallel_4.5.0        png_0.1-8             lifecycle_1.0.4
[10] cli_3.6.4             tensorflow_2.16.0     grid_4.5.0
[13] withr_3.0.2           tfruns_1.5.3          compiler_4.5.0
[16] tools_4.5.0           whisker_0.4.1         basilisk.utils_1.19.1
[19] Rcpp_1.0.14           rlang_1.1.5           jsonlite_2.0.0

Attempting to achieve the same result directly with reticulate works fine and does not generate the error upon closing the session:

library(reticulate)
py_require(packages = "tensorflow==2.18.0", 
           python_version = "3.12.9")
py_require()
tf <- import("tensorflow")
tf$version$VERSION

Downgrading reticulate to version 1.41.0.1 also solves the problem; the basilisk code above runs through and no errors are generated upon closing the R session.

We don't understand exactly where the error is happening, but decided to post it here first since we can't trigger it with reticulate directly, to hear if anyone else has seen similar issues and/or have an idea of the root cause or if there is something we can do on our end to address it.

Thanks in advance!

cc @mbstadler

csoneson avatar Apr 08 '25 15:04 csoneson

One more observation: enabling the python finalizer with reticulate v1.42.0 also solves the problem:

Sys.setenv("RETICULATE_ENABLE_PYTHON_FINALIZER" = "yes")

csoneson avatar Apr 08 '25 16:04 csoneson

Sounds like a reticulate issue with Conda. Perhaps try creating a reproducible example with a basilisk (i.e., conda) environment instead of using py_require.

LTLA avatar Apr 09 '25 08:04 LTLA

That seems very possible, we were able to reproduce it with reticulate only:

library(reticulate)
envpath <- file.path(tempdir(), "pybind_issue_env")
conda_create(
    envname = envpath,
    packages = c("python==3.12.9", "tensorflow==2.18.0")
)
use_condaenv(condaenv = envpath)
tf <- import("tensorflow")
tf$version$VERSION

After running this in R and closing the session, the error is triggered. Again, adding the following before use_condaenv suppresses the error:

Sys.setenv("RETICULATE_ENABLE_PYTHON_FINALIZER" = "yes")

This provides a possible workaround for packages using basilisk and hitting this error, however the use of Sys.setenv will cause an error in BiocCheck.

It seems that reticulate chose to expose the issue of R packages maintaining references to python objects by default, but provides the above environment variable to call the "finalizer" and thus suppress the error. Should this be incorporated in basilisk in one way or another, or would you suggest we raise the issue with the reticulate team?

mbstadler avatar Apr 09 '25 15:04 mbstadler

Yes, a reticulate solution would be preferable, and probably appropriate given the severity of the failure and the fact that no errors were thrown in previous versions.

LTLA avatar Apr 10 '25 10:04 LTLA

I have opened an issue with reticulate: https://github.com/rstudio/reticulate/issues/1785

mbstadler avatar Apr 10 '25 12:04 mbstadler

This has been fixed in reticulate version 1.42.0.9000 (https://github.com/rstudio/reticulate/pull/1786/files).

mbstadler avatar Apr 11 '25 05:04 mbstadler

Closing as this has been fixed in reticulate

csoneson avatar Jun 09 '25 07:06 csoneson