lorax When caching adapters, cache the adapter ID + the API token pair

Feature request

When we cache adapters, we should cache the adapter ID + the API token pair. Even if the adapter is already on GPU memory, we should ensure that the caller has access to it by maintaining a cache of adapter ID + api token pairs.

Motivation

Otherwise, we could get situations where one users calls prompt w/ a private HF hub adapter and HF key, it works and is cached, then another user could call prompt w/ the same adapter without setting a HF api token in the request. Since the adapter is cached, the request works.

Your contribution

I can try to implement it, but I am quite busy so not sure when I can get to it.

May 20 '24 19:05 noah-yoshida

hello, I would like to work on this.

May 21 '24 06:05 safimuhammad

Hey @safimuhammad - wanna chat on discord for next steps on this?

May 23 '24 19:05 magdyksaleh

@magdyksaleh Sure, here's my discord user name msafi38

May 24 '24 02:05 safimuhammad

hey @magdyksaleh , reaching you out on discord, lets discuss next steps on this.

May 31 '24 02:05 safimuhammad