Dan

Results 7 comments of Dan

I notice that `warble` doesn't have the same issue, or it was built for x64: ``` # file /usr/local/lib/python3.7/dist-packages/mbientlab/warble/libwarble.so /usr/local/lib/python3.7/dist-packages/mbientlab/warble/libwarble.so: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically...

`keep_alive` is set to 20m (1200) and the requests are serial and immediate - when the first request completed the next request was submitted.

Requests: ``` 2025-02-24 08:34:34.988 Chat request: {"model": "llama3.2-vision", "messages": [{"role": "user", "content": "truncated"}], "stream": false, "options": {"temperature": 1.0}, "keep_alive": 1200, "format": {"type": "object", "properties": {"message": {"type": "string"}, "tool_calls": {"type": "array",...

``` { "model": "llama3.2-vision", "created_at": "2025-02-24T18:05:44.3856736Z", "message": { "role": "assistant", "content": "Hello! How are you today? Is there something I can help you with or would you like to chat?"...

It's the dimension of the image (3200x2400), even though it was only 200KB. I reduced the size to 1280x1024 (still 200KB) and responses are under 6 seconds. Makes sense that...