hvplot icon indicating copy to clipboard operation
hvplot copied to clipboard

Memory leak

Open jmontoyam opened this issue 5 years ago • 8 comments

Hello,

first of all, thank you very much for this amazing project! ;). I think I have detected a possible memory leak. In my use case, I use a function that generates a big xarray DataArray, and another function that receives as input such big xarray DataArray an generates an image from it using hvplot.image(rasterize=True). I use jupyterlab to interact with such functions and visualize the generated image. I have noticed that every time I re-execute the code cell containing both functions, the memory usage keeps growing and growing. Please see the memory usage showed in the attached images (top right corner):

First execution of the code cell: execution1

Second execution of the code cell: execution2

Third execution of the code cell: execution3

I tried the suggestion given by @philippjfr in holoviz/holoviews#1821 (%reset out) but it did not work.

Expected behavior: I expected the memory usage to keep constant between re-execution of the same code cell (the memory occupied but the data generated in the previous execution is supposed to be garbage collected), am I right?, or am I missing something?.

Software info:

hvplot: 0.6.0 holoviews: 1.13.3 datashader: 0.11.0 bokeh: 2.1.1

jupyter core : 4.6.3 jupyter-notebook : 6.0.3 qtconsole : 4.7.5 ipython : 7.16.1 ipykernel : 5.3.0 jupyter client : 6.1.3 jupyter lab : 2.1.5 nbconvert : 5.6.1 ipywidgets : 7.5.1 nbformat : 5.0.7 traitlets : 4.3.3

JupyterLab v2.1.5 Known labextensions: @bokeh/jupyter_bokeh v2.0.2 enabled OK @jupyter-widgets/jupyterlab-manager v2.0.0 enabled OK @pyviz/jupyterlab_pyviz v1.0.4 enabled OK

Google Chrome Version 84.0.4147.125 (Official Build) (64-bit)

OS Ubuntu 18.04.4 LTS

Thank you very much for all your help! ;)

jmontoyam avatar Aug 14 '20 14:08 jmontoyam

Same with df[df.columns[0]].hvplot(datashade=True) - memory increases on every execution (hvplot=0.7.3). Cannot use %reset out (causes an exception) because following https://docs.ray.io/en/latest/using-ray-with-jupyter.html?highlight=jupyter.

conda's env.yaml:

name: test
channels:
  - pyviz
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=1_gnu
  - abseil-cpp=20210324.2=h9c3ff4c_0
  - alembic=1.7.3=pyhd8ed1ab_0
  - alsa-lib=1.2.3=h516909a_0
  - anyio=3.3.0=py37h89c1867_0
  - argcomplete=1.12.3=pyhd8ed1ab_2
  - argon2-cffi=20.1.0=py37h5e8e339_2
  - arrow-cpp=5.0.0=py37hdf48254_5_cpu
  - async_generator=1.10=py_0
  - attrs=21.2.0=pyhd8ed1ab_0
  - autopage=0.4.0=pyhd8ed1ab_0
  - aws-c-cal=0.5.11=h95a6274_0
  - aws-c-common=0.6.2=h7f98852_0
  - aws-c-event-stream=0.2.7=h3541f99_13
  - aws-c-io=0.10.5=hfb6a706_0
  - aws-checksums=0.1.11=ha31a3da_7
  - aws-sdk-cpp=1.8.186=hb4091e7_3
  - babel=2.9.1=pyh44b312d_0
  - backcall=0.2.0=pyh9f0ad1d_0
  - backports=1.0=py_2
  - backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
  - backports.zoneinfo=0.2.1=py37h5e8e339_4
  - bleach=4.1.0=pyhd8ed1ab_0
  - bokeh=2.3.3=py37h89c1867_0
  - brotlipy=0.7.0=py37h5e8e339_1001
  - bzip2=1.0.8=h7f98852_4
  - c-ares=1.17.2=h7f98852_0
  - ca-certificates=2021.5.30=ha878542_0
  - certifi=2021.5.30=py37h89c1867_0
  - cffi=1.14.6=py37hc58025e_0
  - chardet=4.0.0=py37h89c1867_1
  - charset-normalizer=2.0.0=pyhd8ed1ab_0
  - click=8.0.1=py37h89c1867_0
  - clickhouse-cityhash=1.0.2.3=py37h3340039_2
  - clickhouse-driver=0.2.1=py37h5e8e339_0
  - cliff=3.9.0=pyhd8ed1ab_0
  - cloudpickle=2.0.0=pyhd8ed1ab_0
  - cmaes=0.8.2=pyh44b312d_0
  - cmd2=2.2.0=py37h89c1867_0
  - colorama=0.4.4=pyh9f0ad1d_0
  - colorcet=2.0.6=pyhd8ed1ab_0
  - colorlog=6.4.1=py37h89c1867_0
  - conda=4.10.3=py37h89c1867_1
  - conda-package-handling=1.7.3=py37h5e8e339_0
  - cramjam=2.3.1=py37h5e8e339_1
  - cryptography=3.4.7=py37h5d9358c_0
  - cycler=0.10.0=py_2
  - cytoolz=0.11.0=py37h5e8e339_3
  - dask=2021.9.0=pyhd8ed1ab_0
  - dask-core=2021.9.0=pyhd8ed1ab_0
  - datashader=0.13.0=pyh6c4a22f_0
  - datashape=0.5.4=py_1
  - dbus=1.13.6=h48d8840_2
  - debugpy=1.4.1=py37hcd2ae1e_0
  - decorator=5.1.0=pyhd8ed1ab_0
  - defusedxml=0.7.1=pyhd8ed1ab_0
  - distributed=2021.9.0=py37h89c1867_0
  - entrypoints=0.3=py37hc8dfbb8_1002
  - expat=2.4.1=h9c3ff4c_0
  - fastparquet=0.7.1=py37hb1e94ed_0
  - filelock=3.0.12=pyh9f0ad1d_0
  - fontconfig=2.13.1=hba837de_1005
  - freetype=2.10.4=h0708190_1
  - fsspec=2021.8.1=pyhd8ed1ab_0
  - gettext=0.19.8.1=h0b5b191_1005
  - gflags=2.2.2=he1b5a44_1004
  - gitdb=4.0.7=pyhd8ed1ab_0
  - gitpython=3.1.23=pyhd8ed1ab_1
  - glib=2.68.4=h9c3ff4c_0
  - glib-tools=2.68.4=h9c3ff4c_0
  - glog=0.5.0=h48cff8f_0
  - greenlet=1.1.1=py37hcd2ae1e_0
  - grpc-cpp=1.40.0=h850795e_0
  - gst-plugins-base=1.18.5=hf529b03_0
  - gstreamer=1.18.5=h76c114f_0
  - heapdict=1.0.1=py_0
  - holoviews=1.14.5=py_0
  - hvplot=0.7.3=py_0
  - icu=68.1=h58526e2_0
  - idna=3.1=pyhd3deb0d_0
  - importlib-metadata=4.8.1=py37h89c1867_0
  - importlib_metadata=4.8.1=hd8ed1ab_0
  - importlib_resources=5.2.2=pyhd8ed1ab_0
  - ipykernel=6.4.1=py37h6531663_0
  - ipympl=0.7.0=pyhd8ed1ab_0
  - ipython=7.27.0=py37h6531663_0
  - ipython_genutils=0.2.0=py_1
  - ipywidgets=7.6.5=pyhd8ed1ab_0
  - jbig=2.1=h7f98852_2003
  - jedi=0.18.0=py37h89c1867_2
  - jinja2=3.0.1=pyhd8ed1ab_0
  - joblib=1.0.1=pyhd8ed1ab_0
  - jpeg=9d=h36c2ea0_0
  - json5=0.9.5=pyh9f0ad1d_0
  - jsonschema=3.2.0=py37hc8dfbb8_1
  - jupyter-server-mathjax=0.2.3=pyhd8ed1ab_0
  - jupyter_client=7.0.2=pyhd8ed1ab_0
  - jupyter_contrib_core=0.3.3=py_2
  - jupyter_contrib_nbextensions=0.5.1=py37hc8dfbb8_1
  - jupyter_core=4.7.1=py37h89c1867_0
  - jupyter_highlight_selected_word=0.2.0=py37h89c1867_1002
  - jupyter_latex_envs=1.4.6=py37h89c1867_1001
  - jupyter_nbextensions_configurator=0.4.1=py37h89c1867_2
  - jupyter_server=1.11.0=pyhd8ed1ab_0
  - jupyterlab=3.1.11=pyhd8ed1ab_0
  - jupyterlab-git=0.32.2=pyhd8ed1ab_0
  - jupyterlab_pygments=0.1.2=pyh9f0ad1d_0
  - jupyterlab_server=2.8.1=pyhd8ed1ab_0
  - jupyterlab_widgets=1.0.2=pyhd8ed1ab_0
  - kiwisolver=1.3.2=py37h2527ec5_0
  - krb5=1.19.2=hcc1bbae_0
  - lcms2=2.12=hddcbb42_0
  - ld_impl_linux-64=2.36.1=hea4e1c9_2
  - lerc=2.2.1=h9c3ff4c_0
  - libarchive=3.5.2=hccf745f_0
  - libblas=3.9.0=11_linux64_openblas
  - libbrotlicommon=1.0.9=h7f98852_5
  - libbrotlidec=1.0.9=h7f98852_5
  - libbrotlienc=1.0.9=h7f98852_5
  - libcblas=3.9.0=11_linux64_openblas
  - libclang=11.1.0=default_ha53f305_1
  - libcurl=7.78.0=h2574ce0_0
  - libdeflate=1.7=h7f98852_5
  - libedit=3.1.20191231=he28a2e2_2
  - libev=4.33=h516909a_1
  - libevent=2.1.10=hcdb4288_3
  - libffi=3.3=h58526e2_2
  - libgcc-ng=11.1.0=hc902ee8_8
  - libgfortran-ng=11.1.0=h69a702a_8
  - libgfortran5=11.1.0=h6c583b3_8
  - libglib=2.68.4=h3e27bee_0
  - libgomp=11.1.0=hc902ee8_8
  - libiconv=1.16=h516909a_0
  - liblapack=3.9.0=11_linux64_openblas
  - libllvm11=11.1.0=hf817b99_2
  - libnghttp2=1.43.0=h812cca2_0
  - libogg=1.3.4=h7f98852_1
  - libopenblas=0.3.17=pthreads_h8fe5266_1
  - libopus=1.3.1=h7f98852_1
  - libpng=1.6.37=h21135ba_2
  - libpq=13.3=hd57d9b9_0
  - libprotobuf=3.16.0=h780b84a_0
  - libsodium=1.0.18=h36c2ea0_1
  - libsolv=0.7.19=h780b84a_5
  - libssh2=1.10.0=ha56f1ee_0
  - libstdcxx-ng=11.1.0=h56837e0_8
  - libta-lib=0.4.0=h516909a_0
  - libthrift=0.14.2=he6d91bd_1
  - libtiff=4.3.0=hf544144_1
  - libutf8proc=2.6.1=h7f98852_0
  - libuuid=2.32.1=h7f98852_1000
  - libuv=1.42.0=h7f98852_0
  - libvorbis=1.3.7=h9c3ff4c_0
  - libwebp-base=1.2.1=h7f98852_0
  - libxcb=1.13=h7f98852_1003
  - libxkbcommon=1.0.3=he3ba5ed_0
  - libxml2=2.9.12=h72842e0_0
  - libxslt=1.1.33=h15afd5d_2
  - llvmlite=0.37.0=py37h9d7f4d0_0
  - locket=0.2.0=py_2
  - lxml=4.6.3=py37h77fd288_0
  - lz4-c=1.9.3=h9c3ff4c_1
  - lzo=2.10=h516909a_1000
  - mako=1.1.5=pyhd8ed1ab_0
  - mamba=0.15.3=py37h7f483ca_0
  - markdown=3.3.4=pyhd8ed1ab_0
  - markupsafe=2.0.1=py37h5e8e339_0
  - matplotlib=3.4.3=py37h89c1867_0
  - matplotlib-base=3.4.3=py37h1058ff1_0
  - matplotlib-inline=0.1.3=pyhd8ed1ab_0
  - mistune=0.8.4=py37h5e8e339_1004
  - modin-core=0.10.2=py37h89c1867_3
  - modin-ray=0.10.2=py37h89c1867_3
  - msgpack-python=1.0.2=py37h2527ec5_1
  - multipledispatch=0.6.0=py_0
  - mysql-common=8.0.25=ha770c72_2
  - mysql-libs=8.0.25=hfa10184_2
  - nb_conda_kernels=2.3.1=py37h89c1867_0
  - nbclassic=0.3.1=pyhd8ed1ab_1
  - nbclient=0.5.4=pyhd8ed1ab_0
  - nbconvert=6.1.0=py37h89c1867_0
  - nbdime=3.1.0=pyhd8ed1ab_0
  - nbformat=5.1.3=pyhd8ed1ab_0
  - ncurses=6.2=h58526e2_4
  - nest-asyncio=1.5.1=pyhd8ed1ab_0
  - notebook=6.4.3=pyha770c72_0
  - nspr=4.30=h9c3ff4c_0
  - nss=3.69=hb5efdd6_0
  - numba=0.54.0=py37h2d894fd_0
  - numpy=1.20.3=py37h038b26d_1
  - olefile=0.46=pyh9f0ad1d_1
  - openjpeg=2.4.0=hb52868f_1
  - openssl=1.1.1l=h7f98852_0
  - optuna=2.9.1=pyhd8ed1ab_0
  - orc=1.6.10=h58a87f1_0
  - packaging=21.0=pyhd8ed1ab_0
  - pandas=1.3.2=py37he8f5f7f_0
  - pandoc=2.14.2=h7f98852_0
  - pandocfilters=1.4.2=py_1
  - panel=0.12.1=py_0
  - param=1.11.1=pyh6c4a22f_0
  - parquet-cpp=1.5.1=1
  - parso=0.8.2=pyhd8ed1ab_0
  - partd=1.2.0=pyhd8ed1ab_0
  - patsy=0.5.2=pyhd8ed1ab_0
  - pbr=5.6.0=pyhd8ed1ab_0
  - pcre=8.45=h9c3ff4c_0
  - pexpect=4.8.0=py37hc8dfbb8_1
  - pickle5=0.0.11=py37h5e8e339_0
  - pickleshare=0.7.5=py37hc8dfbb8_1002
  - pillow=8.3.2=py37h0f21c89_0
  - pip=21.2.4=pyhd8ed1ab_0
  - prettytable=2.2.0=pyhd8ed1ab_0
  - prometheus_client=0.11.0=pyhd8ed1ab_0
  - prompt-toolkit=3.0.20=pyha770c72_0
  - psutil=5.8.0=py37h5e8e339_1
  - pthread-stubs=0.4=h36c2ea0_1001
  - ptyprocess=0.7.0=pyhd3deb0d_0
  - pyarrow=5.0.0=py37h58331f5_5_cpu
  - pycosat=0.6.3=py37h5e8e339_1006
  - pycparser=2.20=pyh9f0ad1d_2
  - pyct=0.4.6=py_0
  - pyct-core=0.4.6=py_0
  - pygments=2.10.0=pyhd8ed1ab_0
  - pykalman=0.9.5=py_1
  - pyopenssl=20.0.1=pyhd8ed1ab_0
  - pyparsing=2.4.7=pyh9f0ad1d_0
  - pyperclip=1.8.2=pyhd8ed1ab_2
  - pyqt=5.12.3=py37h89c1867_7
  - pyqt-impl=5.12.3=py37he336c9b_7
  - pyqt5-sip=4.19.18=py37hcd2ae1e_7
  - pyqtchart=5.12=py37he336c9b_7
  - pyqtwebengine=5.12.1=py37he336c9b_7
  - pyrsistent=0.17.3=py37h5e8e339_2
  - pysocks=1.7.1=py37h89c1867_3
  - python=3.7.10=hffdb5ce_100_cpython
  - python-dateutil=2.8.2=pyhd8ed1ab_0
  - python_abi=3.7=2_cp37m
  - pytz=2021.1=pyhd8ed1ab_0
  - pyviz_comms=2.1.0=py_0
  - pyyaml=5.4.1=py37h5e8e339_1
  - pyzmq=22.2.1=py37h336d617_0
  - qt=5.12.9=hda022c4_4
  - ray-core=1.6.0=py37hf931bba_0
  - re2=2021.09.01=h9c3ff4c_0
  - readline=8.1=h46c0cb4_0
  - redis-py=3.5.3=pyh9f0ad1d_0
  - reproc=14.2.3=h7f98852_0
  - reproc-cpp=14.2.3=h9c3ff4c_0
  - requests=2.26.0=pyhd8ed1ab_0
  - requests-unixsocket=0.2.0=py_0
  - ruamel_yaml=0.15.80=py37h5e8e339_1004
  - s2n=1.0.10=h9b69904_0
  - scikit-learn=0.24.2=py37hf0f1638_1
  - send2trash=1.8.0=pyhd8ed1ab_0
  - setproctitle=1.1.10=py37h5e8e339_1004
  - setuptools=58.0.4=py37h89c1867_0
  - six=1.16.0=pyh6c4a22f_0
  - smmap=3.0.5=pyh44b312d_0
  - snappy=1.1.8=he1b5a44_3
  - sniffio=1.2.0=py37h89c1867_1
  - sortedcontainers=2.4.0=pyhd8ed1ab_0
  - sqlalchemy=1.4.25=py37h5e8e339_0
  - sqlite=3.36.0=h9cd32fc_1
  - statsmodels=0.12.2=py37hb1e94ed_0
  - stevedore=3.4.0=py37h89c1867_0
  - ta-lib=0.4.19=py37ha21ca33_2
  - tabulate=0.8.9=pyhd8ed1ab_0
  - tblib=1.7.0=pyhd8ed1ab_0
  - tensorboardx=2.4=pyhd8ed1ab_0
  - terminado=0.12.1=py37h89c1867_0
  - testpath=0.5.0=pyhd8ed1ab_0
  - threadpoolctl=2.2.0=pyh8a188c0_0
  - thrift=0.13.0=py37hcd2ae1e_2
  - tk=8.6.11=h27826a3_1
  - toolz=0.11.1=py_0
  - tornado=6.1=py37h5e8e339_1
  - tqdm=4.62.2=pyhd8ed1ab_0
  - traitlets=5.1.0=pyhd8ed1ab_0
  - typing_extensions=3.10.0.0=pyha770c72_0
  - tzdata=2021a=he74cb21_1
  - tzlocal=3.0=py37h89c1867_2
  - urllib3=1.26.6=pyhd8ed1ab_0
  - wcwidth=0.2.5=pyh9f0ad1d_2
  - webencodings=0.5.1=py_1
  - websocket-client=0.57.0=py37h89c1867_4
  - wheel=0.37.0=pyhd8ed1ab_1
  - widgetsnbextension=3.5.1=py37h89c1867_4
  - xarray=0.19.0=pyhd8ed1ab_1
  - xeus=2.0.0=h7d0c39e_0
  - xeus-python=0.13.0=py37h4b46df4_1
  - xeus-python-shell=0.1.5=pyhd8ed1ab_0
  - xorg-libxau=1.0.9=h7f98852_0
  - xorg-libxdmcp=1.1.3=h7f98852_0
  - xz=5.2.5=h516909a_1
  - yaml=0.2.5=h516909a_0
  - zeromq=4.3.4=h9c3ff4c_1
  - zict=2.0.0=py_0
  - zipp=3.5.0=pyhd8ed1ab_0
  - zlib=1.2.11=h516909a_1010
  - zstandard=0.15.2=py37h5e8e339_0
  - zstd=1.5.0=ha95c52a_0
  - pip:
    - absl-py==0.13.0
    - aiohttp==3.7.4.post0
    - aiohttp-cors==0.7.0
    - aioredis==1.3.1
    - async-timeout==3.0.1
    - autograd==1.3
    - bayesian-optimization==1.2.0
    - blessings==1.7
    - cachetools==4.2.2
    - cma==2.7.0
    - colorful==0.5.4
    - cython==0.29.24
    - future==0.18.2
    - google-api-core==1.31.2
    - google-auth==1.35.0
    - google-auth-oauthlib==0.4.6
    - googleapis-common-protos==1.53.0
    - gpustat==0.6.0
    - gpy==1.10.0
    - gpytorch==1.5.1
    - grpcio==1.40.0
    - hebo==0.1.0
    - hiredis==2.0.0
    - multidict==5.1.0
    - nevergrad==0.4.3.post8
    - nvidia-ml-py3==7.352.0
    - oauthlib==3.1.1
    - opencensus==0.7.13
    - opencensus-context==0.1.2
    - paramz==0.9.5
    - protobuf==3.17.3
    - py-spy==0.3.9
    - pyasn1==0.4.8
    - pyasn1-modules==0.2.8
    - pymoo==0.4.2.2
    - requests-oauthlib==1.3.0
    - rsa==4.7.2
    - scipy==1.5.4
    - sklearn==0.0
    - tensorboard==2.6.0
    - tensorboard-data-server==0.6.1
    - tensorboard-plugin-wit==1.8.0
    - torch==1.9.1
    - werkzeug==2.0.1
    - yarl==1.6.3

jmakov avatar Oct 01 '21 20:10 jmakov

Bump. I have to restart the kernel all the time in the notebook. Any suggestions very welcome (gc.collect() doesn't change anything).

jmakov avatar Jan 02 '23 23:01 jmakov

Hi @jmakov,

Could you provide:

  • an example for us to attempt to reproduce the memory leak?
  • the output of conda list? (I see that you've done that last year, sorry we missed it!)
  • more details on your system (operating system, etc.)?

Reproducing and tracking down memory leaks is notoriously difficult :)

maximlt avatar Jan 03 '23 07:01 maximlt

@maximlt thanks for the quick response! I can reproduce this min example below in jupyter-lab:

import pandas
pandas.options.plotting.backend = "holoviews"  # `datashade=True` doesn't work without this line

# I'm gonna eat about 5GB of your memory and won't give it back :)
df = pandas.DataFrame({"col1": range(0, 100_000_000)})
df.plot(datashade=True)

# output includes this warnings
# WARNING:param.datashade: Parameter(s) [line_width] not consumed by any element rasterizer.
# WARNING:param.datashade: Parameter(s) [line_width] not consumed by any element rasterizer.

OS: Ubuntu 22.04.1 LTS uname info: Linux 5.15.0-56-generic 62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022 x86_64 GNU/Linux conda_list_output.txt

jmakov avatar Jan 03 '23 11:01 jmakov

Would just like to bump this issue a bit since it's almost a blocker for me - imagine a long computation, then trying to plot with diff params, each plot eats memory, and you have to restart the kernel (and again wait for the computation).

jmakov avatar Jan 16 '23 14:01 jmakov

I'm experiencing the same issue, ram is piling up after every cell execution. minimal reproducible code:

import pandas as pd
import numpy as np
import hvplot.pandas

df = pd.DataFrame(np.random.rand(1000000,100), columns=[str(i) for i in range(100)])
plots = df.hvplot.hist('0')

installed packages:

GitPython==3.1.29 holoviews==1.15.3 joblib==1.2.0 mat73==0.60 matplotlib==3.6.2 numpy==1.23.0 pandas==1.5.2 PyAutoGUI==0.9.53 pytest==7.2.0 scikit-learn==1.2.1 scipy==1.9.3 seaborn==0.12.1 shapely==2.0.0 sktime==0.15.0 tqdm==4.64.1 tsfel==0.1.4 tsfresh==0.19.0 xgboost==1.7.2 h5py==3.8.0 lightgbm==3.3.4 sklearn==0.0.post1 bokeh==2.4.3

tomerroditi avatar Mar 28 '23 14:03 tomerroditi

What's the status of this after half a year?

jmakov avatar Jan 03 '24 11:01 jmakov

Hi @jmakov , I'll investigate this week and see what I can find. Feel all free to investigate too, memory leaks aren't the easiest thing to debug :)

maximlt avatar Jan 03 '24 13:01 maximlt