May I know once I want to make a LLM by `BaseAPIModel`, where could I input the url_base?
I cannot find where can I input my url of my LLM on cloud. I should make one just like you do in GPTAPI, shoudn't I?
So why this class called BaseAPIModel?
class BaseAPIModel(BaseModel):
"""Base class for API model wrapper.
Args:
model_type (str): The type of model.
query_per_second (int): The maximum queries allowed per second
between two consecutive calls of the API. Defaults to 1.
retry (int): Number of retires if the API call fails. Defaults to 2.
meta_template (Dict, optional): The model's meta prompt
template if needed, in case the requirement of injecting or
wrapping of any meta instructions.
"""
pass
What is the model you use? The BaseAPIModel is mostly used for closed LLM, which is only used by API. If you use LMDeploy for develop your own model, you can refer LMDeployClient
Thanks for your reply. But I want say that when we use API to deploy our model, that means the device in our hand may with no enough calculate resouces to run a LLM, such as without a Nvidia GPU. And LMDeployClient is relying on lmdeploy, which cannot access in such devices. It's clearly that this class just use a little part of lmdeploy, may you move them out for lagent? I 'll really appreciate it if there will be a new class for API in openai style.
Yeah, that's what I face. When I deploy a model with api_server on a GPU by lmdeploy, I can't access it in lagent without lmdeploy on my client machine, that's why I want to make one by BaseAPIModel.
Actually even if your local device does have GPU, you can install lmdeploy either
If you truly do not want to install lmdeploy in your local device, you can implement a class
like
class RequestPostModel(BaseModel):
def chat():
pload = {
k: v
for k, v in locals().copy().items()
if k[:2] != '__' and k not in ['self']
}
response = requests.post(self.chat_completions_v1_url,
headers=self.headers,
json=pload,
stream=stream)
for chunk in response.iter_lines(chunk_size=8192,
decode_unicode=False,
delimiter=b'\n'):
if chunk:
if stream:
decoded = chunk.decode('utf-8')
if decoded == 'data: [DONE]':
continue
if decoded[:6] == 'data: ':
decoded = decoded[6:]
output = json_loads(decoded)
yield output
else:
decoded = chunk.decode('utf-8')
output = json_loads(decoded)
yield output
which copy from lmdeploy. The class only use request lib to access your API model
I just find you pull a PR for this, thanks for your contribution
Thanks for your advice. But lmdeploy relies on torch, it will make a large of unnecessary download task and Occupation, especially in arm devices😂. That is.