Niklas K.
Niklas K.
`mega-1` uses (together with the few hundred MiB of the VQGAN) a little under 12 GiB of VRAM. You can use `mega-1-fp16`, it's half as big and almost as good....
[Upstream's example notebook](https://github.com/borisdayma/dalle-mini/blob/main/tools/inference/inference_pipeline.ipynb) (current HEAD master@db1ed25) makes sure to only load the parameters onto your GPU(s) once, in float16, and with XLA's allocator adjusted, total VRAM use (including display) stays...
Also keep in mind you'll have to use the same batch size (i.e., number of prompts in the list you pass to `processor()`) and number of GPUs/TPUs (`shard_prng_key()` splits the...
> Could you please add a usage example? I don't have time to write a proper example now, sorry... I'm hoping another developer decides to take care of that.
First off: I found that **making many predictions for few prompts, vectorized across PRNG keys, is _much_ faster** - I reach **1 s/image** in float32 with 30 predictions for one...
You don't need to touch the PRNG keys, one per device/batch is enough. For example, this processes two prompts in parallel on one device: ```python tokenized_prompt = processor(["avocado chair", "the...
On my system (**Windows**, GeForce RTX 3090, CUDA 11.3, cuDNN 8.4.0, JAX built from source at d43cb36dae), measuring minimal fp32 `mega-1` VRAM consumption with `XLA_PYTHON_CLIENT_ALLOCATOR=platform`: (note, `XLA_PYTHON_CLIENT_ALLOCATOR=platform` can slow things...
In principle you can make multiplayer mods for *any* PC game. But it's a huge undertaking - designing a multiplayer mode and developing the servers and netcode is enough work...
Use When=PlugIn instead of When=Early. Note that "PlugIn" is case sensitive, i.e. "Plugin" won't work.
I frequently see crashes with this error message in JAX, another XLA client, during CPU graph compilation on Windows 😦 @aliencaocao, did your code reliably trigger the problem for you...