replicate-python
replicate-python copied to clipboard
Python client for Replicate
Hi, I do like to use text-to-image and image-to-image from replicate and I do have a very time consuming in terms of code and performance process to redeploy images from...
I run a python script indicating that I need the old version for me: ``` output_url = replicate.run( "tencentarc/gfpgan:9283608cc6b7be6b65a8e44983db012355fde4132009bf99d976b2f0896856a3", input = { "img": open(in_img_path, "rb"), "scale": 6, "version": "v1.3" }...
Is there any api or function for calculate token i have used for every request and cost for that request for any model
Hello, I was wondering how you can set timeouts in the replicate.run() function. I have tried using the replicate client but it didn't throw a timeout error: ```python from replicate.client...
It would be really helpful if there was a documented example on how to catch common error-cases. Might I request a documentation example, along the following lines: ``` for event...
LLaMA-2 models have a maximum input size of 4096 tokens [[original paper](https://arxiv.org/pdf/2307.09288.pdf), [meta llama github repo](https://github.com/meta-llama/llama/issues/267#issuecomment-1659440955)]. When prompting `meta/llama-2-70b` through replicate, however, the maximum size of the model is, strangely,...
I am getting an error that the prompt length exceeds the maximum input length when calling `meta/llama-2-70b` through the API. I have included the error log from the Replicate dashboard...
Calls to meta/llama-2-70b are sometimes succeeding, but sometimes failing. It is very unreliable. This is the code ``` output = replicate.run( "meta/llama-2-70b", input={ "prompt": "Q: Would a pear sink in...
Running this code ``` import os import replicate from dotenv import load_dotenv load_dotenv() REPLICATE_API_TOKEN = os.getenv("REPLICATE_API_TOKEN") prompt = "Q: What is 10*10? A: " output = replicate.run( "meta/llama-2-7b", input={ "prompt":...
With replicate 0.24.0 Python client and "mistralai/mistral-7b-instruct-v0.2" (which is a model that supports streaming), the iterator I get back from client.run() is truncating output frequently, perhaps 1/50 times. I checked...