Max Ryabinin issues

Results 21 issues of


                                            Max Ryabinin

[WIP] Add support for quantization with bitsandbytes

This PR integrates blockwise quantization from [bitsandbytes](https://github.com/facebookresearch/bitsandbytes) as a new compression mechanism of Hivemind. The important part is that it is an *optional* compression protocol: the user should only install...

Document relevant parts of libp2p

Currently, all interfaces with libp2p only gloss over the inner workings of this library, which might not be very helpful for our contributors that want to understand the design decisions...

documentation

Improve handling of data-parallel/multi-process training in DecentralizedOptimizer

Right now, if one decides to train with DecentralizedOptimizer using multiple GPUs with something like DistributedDataParallel from PyTorch, they might face an excessive amount of likely redundant network traffic, since...

enhancement

CPU Offload for experts in Runtime

Given that we're only requesting one expert in the server at a time, it might be possible to keep many experts in CPU memory and to process larger batches in...

enhancement

Check that checkpoints match expert_pattern before loading

Currently it's possible to load experts with a mismatched pattern from the checkpoint directory during the server start. We need to validate each expert UID at initialization.

invalid

server

Server configuration enhancements

Right now, it's tricky to start a Server with a custom expert and to change optimizer/scheduler parameters easily without modifying the code. - [ ] Implement hierarchical YAML configuration that...

enhancement

help wanted

Add mixed precision support to ExpertBackend/RemoteExpert

Right now, we don't fully utilize the Tensor Core capabilities of modern NVIDIA GPUs due to making all server-side computations in mixed precision. It might be possible to switch to...

enhancement

Support gradient accumulation in ExpertBackend

Larger Transformer models are trained with larger batches, it's probably beneficial to accumulate gradients from several backward requests before making a step. It can be implemented in `ExpertBackend.apply_gradients()`, and the...

enhancement

Make RemoteMixtureOfExperts timeouts more general

As of now, `forward/backward_timeout` arguments correspond only to timeouts for Server interactions. However, this is not the only possible cause of freezes: for example, beam search might take too long...

enhancement

mixture-of-experts

[BUG] Tests for compression fail on GPU servers with bitsandbytes installed

**Describe the bug** While working on https://github.com/learning-at-home/hivemind/pull/490, I found that if I have bitsandbytes installed in a GPU-enabled environment, I get an error when running [test_adaptive_compression](https://github.com/learning-at-home/hivemind/blob/master/tests/test_compression.py#L152), which happens to be...

bug