azure-sdk-for-python icon indicating copy to clipboard operation
azure-sdk-for-python copied to clipboard

Document intelligence layout-model doesn't detect tables for .xlsx

Open hamer101 opened this issue 1 year ago • 1 comments

  • Package Name: azure-ai-documentintelligence
  • Package Version: 1.0.0b1

No tables are detected for a .xlsx file - the poller's result field of tables remains empty.

Although I suspected loosely written tables might not get detected, I assumed that areas that have been declared as "tables" in the excel application will get extracted.

The analyzed .xlsx file contained one such defined table, along with few subtotal fields. Code used:

with open('/tmp/example.xlsx','rb') as f:
    poller = document_intelligence_client.begin_analyze_document(
        model_id="prebuilt-layout",
        analyze_request=f, 
        content_type="application/octet-stream"
    )
    form_recognizer_results_markdown = poller.result()
print(form_recognizer_results_markdown.tables)

I was about to consider it a feature request, but given it some thought, it deviates quite a bit from behavior I expected :)

hamer101 avatar Mar 07 '24 10:03 hamer101

@hamer101 Thanks for your contact, we'll investigate asap!

YalinLi0312 avatar Mar 08 '24 03:03 YalinLi0312

Hi @hamer101 , unfortunately we are not supporting the feature so far, it is saying "Table is not supported if the input file is XLSX." in our doc(link). I'll tag the service team to aware your request, thanks!

YalinLi0312 avatar Mar 29 '24 01:03 YalinLi0312

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @ctstone @vkurpad.

github-actions[bot] avatar Mar 29 '24 01:03 github-actions[bot]

Is there any update on this? This feature be very appreciated!

maltenlz avatar May 20 '24 08:05 maltenlz

Add @bojunehsu to bring more attention on this feature request on layout-model.

YalinLi0312 avatar Jun 20 '24 23:06 YalinLi0312

@hamer101 Thanks for the feedback. Can you please share more information about the intended use scenarios to help us prioritize the work? Thanks.

bojunehsu avatar Jun 21 '24 15:06 bojunehsu

I wanted to feed a xlsx to a LLM via a pipeline that already deals with tables by tables parameter of layout-model's result.

hamer101 avatar Jun 22 '24 17:06 hamer101

@hamer101 Thanks for sharing the scenario. We will add this request to our backlog.

bojunehsu avatar Jun 24 '24 20:06 bojunehsu

@hamer101 Since this is not an SDK issue, will close it. Thanks

YalinLi0312 avatar Jun 26 '24 21:06 YalinLi0312

@bojunehsu Where can we track this feature request?

clausagerskov avatar Aug 29 '24 10:08 clausagerskov