insight-bench issues

Datagen

Added pattern design code

[notebook 12](https://github.com/ServiceNow/insight-bench/blob/main/data/notebooks/flag-12.ipynb) annotation seems very off. The observed insights seem to in no way relate to what the analysis actually shows. All incidents have equal category distribution but the insight...

george1459

Baseline results and parameters

Hi authors, I'd like to confirm if the following parameters were the same as what you used in the paper: ``` for model_name in ["gpt-4o"]: exp_list.append( { "benchmark_type": benchmark_type, "model_name":...

george1459

Inject pattern

AmirAbaskohi

Missing Metadata

1

In some `flag-*.json`, i.e. [32, 33, 34, 35, 37, 38, 39, 40, 79, 80, 81, 82, 84, 85, 86, 87], the metadata appears to be null. The problematic section looks...

ympan0508

Missing ground truth -- Question 36

Hello, The json of flag 36 is missing ground truth insights. Can you provide them, please? Thanks,

MaxHeuillet

Is it reasonable for all tasks to use the same goal?

I noticed that currently all tasks are using the same goal. Is this the intended design? In my understanding, each task should have its own specific goal derived from its...

duoyw

Results of AgentPoirot with Claude-3.7-Sonnet as Backbone

2

Hi, Thank you for your contributions. I ran some experiments using the code you provided and switched from GPT-4o to Claude-3.7-Sonnet. I would expect similar performance to GPT-4o when using...

wjhou

Wrong data: There was no column ... to conduct any analysis

1

It seems some data points contain insights that just look like this: ``` "insights": [ "There was no column processed_date to conduct any analysis", "There was no column amount to...

george1459

Llama3-as-a-judge issues

1

I am having quite some troubles with the Llama-3-as-a-judge pipeline. Here are two issues I've encountered: 1. If Llama does not provide a valid score and hence the index [here](https://github.com/ServiceNow/insight-bench/blob/33c27c1282cb7ed73267d36fb84e27b6ea8aac2b/insightbench/utils/eval_utils.py#L82)...

george1459

insight-bench
insight-bench copied to clipboard

Metadata

Datagen

Data/Annotation Issues?

Baseline results and parameters

Inject pattern

Missing Metadata

Missing ground truth -- Question 36

Is it reasonable for all tasks to use the same goal?

Results of AgentPoirot with Claude-3.7-Sonnet as Backbone

Wrong data: There was no column ... to conduct any analysis

Llama3-as-a-judge issues

← Metadata

Owner

Metadata

insight-bench insight-bench copied to clipboard

Metadata

← Metadata

Owner

Metadata

insight-bench
insight-bench copied to clipboard