petals
petals copied to clipboard
Miscellaneous server-side improvements
-
task_pool
- [x] remove interate_minibatches, once https://github.com/learning-at-home/hivemind/pull/506 is merged
- [ ] batch sequences of similar length
-
runtime
- [x] do not log "0 parameters" on init (misleading)
- [x] consider removing Runtime, once https://github.com/learning-at-home/hivemind/pull/505 is merged
-
handler
- [x] verify what happens if server have mixed torch_dtype and/or compression
- [x] actually follow forward/backward/inference schema instead of hard-coding
- [x] extract the code for adding prompts into a separate file
- [x] consider merging the code from hivemind's ConnectionHandler instead of inheriting
- [x] add a test that covers _rpc_inference with prompts
- [x] add a test that covers _rpc_inference with hypo-ids
-
MemoryCache
- [x] when running inference over multiple layers on the same server, avoid passing layer activations between cpu<->gpu by
storing them in MemoryCache
- before implementing this, gotta check if this will bring any performance benefit
- [ ] LRU-offload stale cache from gpu to ram
- [x] when running inference over multiple layers on the same server, avoid passing layer activations between cpu<->gpu by
storing them in MemoryCache
-
point system
- [x] make sure points are integers everywhere
- [ ] implement a nonzero prioritizer :)
- [x] move client-side spending polity to sequence_manager