insight-bench icon indicating copy to clipboard operation
insight-bench copied to clipboard

Results 10 insight-bench issues
Sort by recently updated
recently updated
newest added

Added pattern design code

[notebook 12](https://github.com/ServiceNow/insight-bench/blob/main/data/notebooks/flag-12.ipynb) annotation seems very off. The observed insights seem to in no way relate to what the analysis actually shows. All incidents have equal category distribution but the insight...

Hi authors, I'd like to confirm if the following parameters were the same as what you used in the paper: ``` for model_name in ["gpt-4o"]: exp_list.append( { "benchmark_type": benchmark_type, "model_name":...

In some `flag-*.json`, i.e. [32, 33, 34, 35, 37, 38, 39, 40, 79, 80, 81, 82, 84, 85, 86, 87], the metadata appears to be null. The problematic section looks...

Hello, The json of flag 36 is missing ground truth insights. Can you provide them, please? Thanks,

I noticed that currently all tasks are using the same goal. Is this the intended design? In my understanding, each task should have its own specific goal derived from its...

Hi, Thank you for your contributions. I ran some experiments using the code you provided and switched from GPT-4o to Claude-3.7-Sonnet. I would expect similar performance to GPT-4o when using...

It seems some data points contain insights that just look like this: ``` "insights": [ "There was no column processed_date to conduct any analysis", "There was no column amount to...

I am having quite some troubles with the Llama-3-as-a-judge pipeline. Here are two issues I've encountered: 1. If Llama does not provide a valid score and hence the index [here](https://github.com/ServiceNow/insight-bench/blob/33c27c1282cb7ed73267d36fb84e27b6ea8aac2b/insightbench/utils/eval_utils.py#L82)...