pyodide-kernel Configure the kernel to have packages automatically installed

Problem

It is currently not possible to automatically install Python packages inside the pyolite kernel.

Proposed Solution

It would be nice to be able to configure Pyolite to include some packages that would automatically be installed with micropip.

cc. @psychemedia who suggested it here https://github.com/jtpio/jupyterlite-demo/pull/7#issuecomment-870144941

Additional context

It might be possible to have it once we use IPython for the execution (see related work https://github.com/jtpio/jupyterlite/pull/171), as it's possible to configure IPython itself to run code at the shell start.

Jun 29 '21 11:06 martinRenou

I feel we need to figure out some way to get these hosted with the build, get the service worker going, use dat/ipfs, etc. before installing more stuff magically. Spamming pypi with requests without any kind of caching isn't very nice... They probably weren't signing up to be a static cdn for web page assets.

Jun 29 '21 11:06 bollwyvl

Maybe one way could be to make that part that @psychemedia linked to in https://github.com/jtpio/jupyterlite-demo/pull/7#issuecomment-870144941 configurable at build time:

https://github.com/jtpio/jupyterlite/blob/5225266445d65c05cc2f545fceb1fa9f44380709/packages/pyolite-kernel/src/worker.ts#L20-L25

To download the wheels only once and place them at the right location.

But that would be specific to the pyolite kernel, and the jupyterlite toolchain should ideally be kernel agnostic.

Jun 29 '21 11:06 jtpio

I haven't dug enough into micropip, but presumably it can accept additional (preferred) index pages... given a list of wheels, that should be easy enough to add. What we don't want to do is end up having the maintenance burden of little PRs for every little patch/build someone needs.

At any rate, wheels will get special treatment in the toolchain anyway as they are the most reliable source of labextensions, as we can predict precisely where the assets will be inside them... until we have a wasm-32 target from conda-forge, which would probably be the most desirable way for all non-basically-pure JS kernels. Further, there are a number of wrapper kernels that reuse ipykernel, etc. which would be able to use wheel-based mechanisms.

Jun 29 '21 12:06 bollwyvl

One of the things I hacked to get the folium demo working was a prebuilt universal wheel for one of the packages served from a Github page: https://github.com/OpenComputingLab/vce-wheelhouse

import micropip
await micropip.install("https://opencomputinglab.github.io/vce-wheelhouse/wheelhouse/MarkupSafe-2.0.1-py2.py3-none-any.whl")

It's probably a good idea to separate out the ideas of installing from pypi, from a JupyterLite wheelhouse, or from a local wheel.

In building an http servable distribution, one approach might be:

for the CLI builder to grab an appropriate wheel and bundle it with the distribution;
micropip to then install it from the local wheel.

If micropip can only install from a URL, then the distribution could publish its own wheelhouse from which the the wheel could then be installed. Of course, other folk then might start using your website as a wheelhouse...

Jun 29 '21 14:06 psychemedia

Spamming pypi with requests without any kind of caching isn't very nice... They probably weren't signing up to be a static cdn for web page assets.

Installs with micropip should be cached by the browser same as when downloading any other Pyodide package.

Also,

$ dig files.pythonhosted.org
...
files.pythonhosted.org. 67618   IN      CNAME   dualstack.r.ssl.global.fastly.net.

so the requests are being received by Fastly, which is really not that different from mirroring those files via some other CDN (including JsDelivr which will end up also using Fastly). Compared to the overall PyPi traffic I think this is still negligble, but we can certainly ask them this question.

Generally if you have question or feature requests about micropip, please open an issue in the pyodide repo.

Jul 09 '21 22:07 rth

well, I ran a test with pyolite in vscode, and also question the need to load all of those libraries for simple notebooks that don't use most of them: https://github.com/joyceerhl/vscode-pyolite/issues/2#issuecomment-880135835

vscode-pyolite

this is what's being loaded for a simple notebook with 2 imports to load json data: https://github.com/RandomFractals/vscode-data-table/blob/main/notebooks/chicago-red-light-cameras.ipynb

import json
from js import fetch

response = await fetch('https://data.cityofchicago.org/resource/thvf-6diy.json')
text = await response.text()
data = json.loads(text)
data

Why do we need to load matplotlib and other libraries pyodide has configured by default?

I think they should be loaded dynamically based on imports in the notebook cell code. It's exactly what ObservableHQ and other intelligent JS notebook platforms do since dynamic modules loading is readily available in most web browsers now.

Have you considered optimizing your kernel initialization with just the basics without adding all the other dataviz packages most devs and data scientists might not be using in their pyodide notebooks.

Jul 15 '21 12:07 RandomFractals

@RandomFractals for some libraries it's a bit trickier as they require patches to be rendered properly in output cells in JupyterLite.

Have you considered optimizing your kernel initialization with just the basics without adding all the other dataviz packages most devs and data scientists might not be using in their pyodide notebooks.

Yes, see jupyterlite/jupyterlite#239 as an example. Unfortunately there were some issues with the recursion limit in upstream Pyodide, so the change had to be reverted. But hopefully will come back at some point: jupyterlite/jupyterlite#254

Jul 15 '21 12:07 jtpio

ain't nothing free... there isn't a 1-1 mapping of imported names to installable python packages, and the need for patching is very real.

but we certainly need some better approaches for handling these things: there's a sketch of a path forward over on https://github.com/jupyterlite/jupyterlite/issues/151#issuecomment-879059588 and i'll get around to it when i can, but there's also a lot of other stuff on our plates.... and host environments other than real browsers are going to be best-effort for the foreseeable.

Jul 15 '21 15:07 bollwyvl

@bollwyvl understood. I was just sharing my first impression after trying it in vscode. I will try to find some time soon to provide good data viz examples and notebooks for a deeper dive. I think porting some of these real data notebooks could be fun to validate and troubleshoot some of the hooks you are working on: https://github.com/RandomFractals/Chicago-transportation-notebooks

Other than those extraneous js libs loading nitpicks, I like what you've created so far, and def. plan on using your Py lite stack in browser and vscode.

Jul 15 '21 16:07 RandomFractals

I added vscode Pyolite example docs here: https://github.com/RandomFractals/vscode-data-table#pyolite-notebook-example

Jul 18 '21 00:07 RandomFractals

Does the --piplite-wheels option in the juptyerlite CLI just set up a path (or perhaps cache?) a wheel for an additional package, but then require the use to still install the package eg from a live notebook?

It would be really handy to have a CLI argument --install-package that could take one or more package names and install them in the JupyterLite distribution (feature creeping, this may then also require a --no-deps flag? And soon end up trying to replicate pip?:-()

Nov 30 '21 18:11 psychemedia

just set up a path (or perhaps cache?)

As it says in the docs, --piplite-wheel just ensures they are available to be installed in a live kernel, in that they can be located at runtime, have dependencies fulfilled, etc.

still install the package

yep, a notebook would still have to await piplite.install the packages that had been made available to trigger the downloading, hash checking, decompressing, etc. I don't anticipate doing anything else to much to it until we get it separated into another repo. This was a big of enough pain, and i didn't want to bloat that worse than it was.

among my issues with "magic" importing (or even installing) is... where would (inevitable) errors get shown? To this end, there are a few approaches, such as jupyterlab-scenes, which move this into a clearly "ui automation" place rather than "magic".

The other is just straight perceived performance... even if i have to wait around for something after pressing Run, at least i had the option of starting it (or tweaking it) sooner rather than later.

replicate pip?

yeah, nah, that's why it's as raw as it is... there are so many ways to be wrong for different use cases vs "give me a directory of wheels," which pip download --prefer-binary can already do just fine (until the cli changes again). But it's fairly open to be extended, if an addon wanted to demonstrate how different use cases could be met.

Nov 30 '21 23:11 bollwyvl

I take it the current status is that you can't "inject" wheels that can be loaded with a simple import, like the pre-built pyodide wheels? Adding wheels to /pypi doesn't seem to do what I'd hoped, neither does --piplite-wheels - reading above, it seems that that's to be expected? For Pyodide 0.20, things built-in like import boost-histogram or import scipy just work, downloading something and importing. It's not possible to leverage that system, I guess? Even in 0.20, which uses wheels natively?

It's also possible to pre-install in javascript when using pyodide directly, though with that you have to do the download into the browser on page load, rather than only if imported.

When using JupyterLite for teaching or documentation, it would be really helpful to be able to show the actual command, and not add extra pyodide-only or JupyterLite-only commands that will confuse readers and not work anywhere else.

Also not sure why there's a piplite and a micropip.

Apr 27 '22 04:04 henryiii

With jupyterlite/jupyterlite#655 in, we can now theoretically use IPython profiles to achieve setup pyoodide code in a consistent manner, based on deployer, and then user, preference.

We might also be able to somehow lazily mount all of the known custom wheels into the filesystem... but they'd still have to be found/resolved with the index files to do dependencies properly. I'm pretty confident we still don't want to download and install every package before letting the user interact with the kernel.

just work, downloading something and importing

for the specific case of the packages built and shipped by pyodide, there is a hard-coded mapping of imported names to distrubution names. this does not work for the general case.

will confuse readers and not work anywhere else.

there's a lot of things that won't work here.

Also not sure why there's a piplite and a micropip.

micropip doesn't know how to reference non-PyPI sources of full chains of wheels with dependencies.

Jun 22 '22 13:06 bollwyvl

With jupyterlite/jupyterlite#655 in, we can now theoretically use IPython profiles to achieve setup pyoodide code in a consistent manner, based on deployer, and then user, preference.

I have tried to execute get_ipython().profile_dir.location in JupyterLite and the result is

'/home/pyodide/.ipython/profile_default'

Maybe it is enough to have an option to place a script into the folder

/home/pyodide/.ipython/profile_default/startup/

Looks like each kernel instance use isolated non-persistant file system for /home/pyodide/. So, any file we want to be placed inside /home/pyodide/ need to be copied before each kernel start.

Sep 20 '22 21:09 vasiljevic

I have found more straightforward way to configure just the list of pre-installed packages and proposed it in a pull request (mentioned above).

Sep 21 '22 19:09 vasiljevic

Thank you @vasiljevic ! I'll have a look.

Note that you can also use xeus-python which allows you to pre-install packages https://xeus-python-kernel.readthedocs.io/en/latest/configuration.html

Sep 22 '22 09:09 martinRenou

Note that you can also use xeus-python which allows you to pre-install packages https://xeus-python-kernel.readthedocs.io/en/latest/configuration.html

My organization already uses xeus-python based JupyterLite websites for exercise files (example). We used to suggest MyBinder to our users for quick look into an exercise file set, but that practice had produced half of all MyBinder traffic during a few weeks last spring and for this school year we have decided to early adopt JupyterLite as "the quick look solution" for exercise files.

xeus-python does not use Pyodide, but empack. I like the approach, but it is always possible to face some issue that is better handled in Pyodide, or vice versa. So, I need to be ready to use both.

Since our exercise files are not primarily designed for JupyterLite environment, we can't add JupytrerLite specific code. That is why package pre-installing is so important for our use case.

Sep 22 '22 10:09 vasiljevic

xeus-python does not use Pyodide, but empack. I like the approach, but it is always possible to face some issue that is better handled in Pyodide, or vice versa. So, I need to be ready to use both.

I'd be curious to know which issues are better handled by Pyodide (and vice versa).

Since our exercise files are not primarily designed for JupyterLite environment, we can't add JupytrerLite specific code. That is why package pre-installing is so important for our use case.

Makes total sense 👍🏽

Sep 22 '22 12:09 martinRenou

Especially in the jupyterlite setting, for the primary use case of interactive, lightweight interactive computing, we need to focus on kernels starting quickly and predictably. So I'll still come down on the side of, in lite core, preferring to pursue reducing time-to-interactive editing and crucially, user-focused error reporting rather than putting more stuff that makes the time to hand-off slower in our base kernels.

Of course, I feel pre-running code is better handled on the "client" labextension side, with the equivalent of old school IPython.kernel.execute if it's needed.

If this config ended up in a custom kernelspec, rather than site-wide, that might be more reasonable, as one could have multiple kernels with different names defined in a site with the same underlying implementation, but different packages... but would still want to see the code executed from the "client" side.

Sep 22 '22 12:09 bollwyvl

I'd be curious to know which issues are better handled by Pyodide (and vice versa).

For instance, in the Issue jupyterlite/jupyterlite#798 there is a statement: Specifically, (Pyodide) 0.21.1 fixed a bug that causes Safari to hang when doing almost anything. It seems to have already been updated, but no JupyterLite release has been made.

Also some package I would like to use may be available for Pyodide, but not avaliable for empack , or vice versa. Just take a look at Pyodide change log page and search for "new packages:" . For instance Piodide supports shapely and geos from the version 0.21.0. I could not find thos packages on https://repo.mamba.pm/emscripten-forge .

Sep 22 '22 14:09 vasiljevic

Indeed. Those packages should be added by https://github.com/emscripten-forge/recipes/pull/131, we should probably give a final push to this PR.

Sep 22 '22 15:09 martinRenou

Of course, I feel pre-running code is better handled on the "client" labextension side, with the equivalent of old school IPython.kernel.execute if it's needed.

In the proposed pull request, the list of packages to be pre-installed is handled on the client labestension side, you may take a look at the code in jupyterlite/packages/pyolite-kernel-extension/src/index.ts. The package list is passed to the worker side through the worker initialization since we need the packages be installed before the end of worker initialization to avoid race conditions.

Sep 22 '22 18:09 vasiljevic

We are interesting in using JupyterLite at our university to minimize setup for programming and data science novices. We are currently help back by that it doesn't seem possible to create a JupyterLite instance with pre-installed packages. Is it correct that students would need to (re-)install all packages each time they load JupyterLite in the browser or is there any way to setup JupyterLite with additional packages pre-installed or a one time client-side setup of a "virtual environment" (maybe using the browser cache?).

My understanding from the docs is that the user still needs to install packages each time even if pyolite is shipped with additional wheels. We think it would be discouraging for students to sit and wait for installation each time they want to use JupyterLite since we use many packages (scikit-learn, altair, pandas, and similar), but maybe there are other ways to cut down this wait time?

Dec 12 '22 22:12 joelostblom

Yep. Every kernel launch is basically building a new linux computer in RAM from first principles. This is basically the case for all WASM applications. There are already many layers of caching, but it's not really feasible to run multiple kernels in the same WASM virtual machine, and at present, there is no particular way to snapshot a full running machine.

The reason we (but mostly I) continue to push against pre-installing packages are the many exciting failure modes that can occur for every line of code that gets run before the user has control of their kernel. Once jupyterlite/jupyterlite#386 is complete, folk will much more easily be able to take matters into their own hands by forking it and building their own kernels that include whatever the heck they like at startup. So at present, the current one-liner %pip install -r requirements.txt is pretty much the least-intrusive approach the current pyodide-based kernel has to offer.

The jupyterlite xeus-python kernel actually unpacks all of the the dependencies from conda packages at build time, and then bulk loads it, but at present has no dynamic installation capability.

Dec 13 '22 01:12 bollwyvl

@joelostblom You can use below code snippets to build an extension for pre-run codes

const PrerunCodes = [
  'import piplite',
  'await piplite.install("ipywidgets")',
  'await piplite.install("my-own-package")'
];

const preRunPlugin: JupyterFrontEndPlugin<void> = {
  id: 'jupyterlite-prerun-extension:plugin',
  autoStart: true,
  requires: [INotebookTracker],
  optional: [],
  activate: async (
    app: JupyterFrontEnd,
    nbTracker: INotebookTracker
  ) => {
    nbTracker.currentChanged.connect(() => {
      if (!nbTracker.currentWidget) {
        return;
      }

      var prevSessionStatus: Status =  'unknown';

      nbTracker.currentWidget.context.ready.then(() => {
        var sessionContext = nbTracker.currentWidget.sessionContext;
        sessionContext.statusChanged.connect((sender, args) => {
          if (((args == 'restarting' && prevSessionStatus != 'restarting') || (args == 'starting' && prevSessionStatus != 'starting'))
            && (sessionContext.kernelDisplayName.toLocaleLowerCase() == 'pyolite' || sessionContext.kernelDisplayName.toLocaleLowerCase() == 'python')) {
            sessionContext.ready.then(() => {
              console.log('Session ready, execute prerun codes...');
              const content: KernelMessage.IExecuteRequestMsg['content'] = {
                code: PrerunCodes.join('\n'),
                stop_on_error: true
              };
              sessionContext.session.kernel.requestExecute(content, false);
            })
          }

          prevSessionStatus = args;
        })
      })
    });
  }
};

Dec 13 '22 07:12 qqdaiyu55

@bollwyvl Ooh.. does %pip magic work now? In which py kernel?

Dec 13 '22 12:12 psychemedia

In which py kernel?

In the Pyodide kernel. The examples in the repo have been updated to use the magic if you want to have a look.

Dec 13 '22 13:12 jtpio

Thanks for the detailed info and quick reply @bollwyvl ! It is really helpful and I will try to setup an instance that uses %pip with all our packages to explore how well that approach works. I'm also following the issue you linked, so that was very helpful as well.

@qqdaiyu55 Is my understanding that this would still require the installation code to be run each time the JupyterLite is loaded on the client side? It just makes it a bit more automatic when it is in the extension? Or does this somehow allow the code to only be installed once or otherwise speed up the process?

Dec 13 '22 16:12 joelostblom

@qqdaiyu55 Is my understanding that this would still require the installation code to be run each time the JupyterLite is loaded on the client side? It just makes it a bit more automatic when it is in the extension? Or does this somehow allow the code to only be installed once or otherwise speed up the process?

Yes, this code will still re-install every time you load the website, that's hard to avoid for reasons that @bollwyvl explained above. Browsers will generally cleverly cache the downloads, so they don't happen every time you (re-)load the page (you can check in your console, just make sure not to disable caching when you do). I loaded the packages you say you want to use and it really doesn't take that long to get started with them, so I think that if you can get these to pre-install or just give students the relevant %pip install .. line to run, they would find it easy and quick to use.

Dec 13 '22 17:12 jobovy