Niklas K. comments

Results 22 comments of


                                            Niklas K.

Clarify what constitutes a "strong local machine"

`mega-1` uses (together with the few hundred MiB of the VQGAN) a little under 12 GiB of VRAM. You can use `mega-1-fp16`, it's half as big and almost as good....

Clarify what constitutes a "strong local machine"

[Upstream's example notebook](https://github.com/borisdayma/dalle-mini/blob/main/tools/inference/inference_pipeline.ipynb) (current HEAD master@db1ed25) makes sure to only load the parameters onto your GPU(s) once, in float16, and with XLA's allocator adjusted, total VRAM use (including display) stays...

How to generate reproducible results

Also keep in mind you'll have to use the same batch size (i.e., number of prompts in the list you pass to `processor()`) and number of GPUs/TPUs (`shard_prng_key()` splits the...

Support offloading encode, for generate() with much less VRAM

> Could you please add a usage example? I don't have time to write a proper example now, sorry... I'm hoping another developer decides to take care of that.

Parallelizing "super conditioning" sampling speeds up inference by 20-40%

First off: I found that **making many predictions for few prompts, vectorized across PRNG keys, is _much_ faster** - I reach **1 s/image** in float32 with 30 predictions for one...

multiprocessing

You don't need to touch the PRNG keys, one per device/batch is enough. For example, this processes two prompts in parallel on one device: ```python tokenized_prompt = processor(["avocado chair", "the...

Model requirements for mega

On my system (**Windows**, GeForce RTX 3090, CUDA 11.3, cuDNN 8.4.0, JAX built from source at d43cb36dae), measuring minimal fp32 `mega-1` VRAM consumption with `XLA_PYTHON_CLIENT_ALLOCATOR=platform`: (note, `XLA_PYTHON_CLIENT_ALLOCATOR=platform` can slow things...

Is multiplayer\coop possible to implement?

In principle you can make multiplayer mods for *any* PC game. But it's a huge undertaking - designing a multiplayer mode and developing the servers and netcode is enough work...

How to combine with Reshade?

Use When=PlugIn instead of When=Early. Note that "PlugIn" is case sensitive, i.e. "Plugin" won't work.

LLVM ERROR: IMAGE_REL_AMD64_ADDR32NB relocation requires an ordered section layout with using XLA with tf.keras.applications.densenet.preprocess_input()

I frequently see crashes with this error message in JAX, another XLA client, during CPU graph compilation on Windows 😦 @aliencaocao, did your code reliably trigger the problem for you...