holmesgpt icon indicating copy to clipboard operation
holmesgpt copied to clipboard

Got error when handling a large file

Open tommy04062019 opened this issue 1 year ago • 5 comments

  • Log file size: 11 MB
  • When asking with a file, an error has raised:
BadRequestError: Error code: 400 - {'error': {'message': "Invalid 'messages[1].content': string too long. Expected a string with maximum length 1048576, but got a string with length 11282561 instead.",
'type': 'invalid_request_error', 'param': 'messages[1].content', 'code': 'string_above_max_length'}}

What max size is allowed?How can I handle a large file?

tommy04062019 avatar Jun 28 '24 03:06 tommy04062019

Screenshot 2024-06-28 at 11 55 30 And this too

tommy04062019 avatar Jun 28 '24 04:06 tommy04062019

Thanks, we're looking into better ways to handle. (And to report the max sizes.)

What are you trying to accomplish with the log analysis? Instead of downloading the file, then analyzing it, are you able to just run a holmes ask command without a file and let holmes itself choose which logs to look at? That tends to handle size limits much much better.

aantn avatar Jun 30 '24 07:06 aantn

Thanks, we're looking into better ways to handle. (And to report the max sizes.)

What are you trying to accomplish with the log analysis? Instead of downloading the file, then analyzing it, are you able to just run a holmes ask command without a file and let holmes itself choose which logs to look at? That tends to handle size limits much much better.

As you can see, in the case of nginx ingress, it's important to note that requests and errors can originate from any of the nginx ingress pods. If there is a deployment with 10 pods, investigating each one individually becomes necessary, so it waste time. Our primary goal is to minimize the time spent on investigation, isn't it? It's worth mentioning that errors might be located anywhere within the log. Even if you limit the log reading to the last 10000 lines, the errors could potentially be found only at the last 10500 linesI, so the result willl returned something like be:AI: everything is ok

tommy04062019 avatar Jun 30 '24 08:06 tommy04062019

@aantn : I think we should implement two things:

  • First, having a size limit
  • Second, with large files, so here's the deal, you gotta read and use AI to dig into things bit by bit(read by chunks). Then, you bring all those little findings together to get the big picture.

tommy04062019 avatar Jun 30 '24 08:06 tommy04062019

What is the trigger for the investigation? (Meaning what is the initial breadcrumb that you saw which caused you to start investigating at all?)

Your ideas are good. I would also be interested in exploring a third option: give the AI a tool to search the logs (e.g. grep).

We have an internal framework for benchmarking the results of different approaches. If you're interested in jumping on a short call, I'd love to explore this use case in a little more depth. That might help us find the best solution.

aantn avatar Jun 30 '24 08:06 aantn

Closing in favor of #522 and #523 which should fix this.

aantn avatar Jun 15 '25 11:06 aantn