Feat: weighted average table metrics
This PR uses (number of actual table) weighted average instead of average without weights for table metrics.
- pages where there are ground truth tables the weight is proportional to the number of ground truth tables in that page
- pages where there are no ground truth tables but has predicted tables (false positive) are assigned as 1 table worth of weight for the whole page for calculating the mean value of
table_level_acc - pages with false positive tables do not contribute to table structural or table content metrics
test
This PR updates the existing test for evaluating table metrics:
- adds a second file with just 1 table vs. the existing file with 2 tables
- test the weighted average is written to the report
There is one scenario we need to account for. When there is 0 tables in ground truth file, and there were some false positives. I think it could be considered as weight=1 what do you think? Now the file will be not counted right?
There is one scenario we need to account for. When there is 0 tables in ground truth file, and there were some false positives. I think it could be considered as weight=1 what do you think? Now the file will be not counted right?
good call; that makes sense
There is one scenario we need to account for. When there is 0 tables in ground truth file, and there were some false positives. I think it could be considered as weight=1 what do you think? Now the file will be not counted right?
good call; that makes sense
@plutasnyy actually in the code we already filter down to only rows with non-zero "total_tables". If we intend to change that behavior it would be better we do that in a different PR since it changes the existing behavior on tallying tables