prose icon indicating copy to clipboard operation
prose copied to clipboard

It takes about 2 hours to detect the Excel file which contains some pictures.

Open ivanliu-microsoft opened this issue 1 year ago • 4 comments

Our CSV parser leverages PROSE to check if the string content is CSV format or not. But our customer reported that it took so much time (2+ hours) to parse before it returned false (Not a qualified CSV file). I find it is blocked by the line of codes below: (See the full codes) image Here is the sample excel file attached. Quotation-Personal care wipes.xlsx My question is:

  • Why this excel file takes so much long time to "learn" by the PROSE
  • Any best practice recommended from your side in this case?

ivanliu-microsoft avatar Aug 19 '24 07:08 ivanliu-microsoft

Can you clarify what exactly is being used to set "strData"? In other words, how is "strData" generated from the shared excel file? (I can't access the Babylon repo to find out.)

ashishxtiwari avatar Aug 19 '24 17:08 ashishxtiwari

Yes, the strData is the content of the shared excel file.

From: Ashish Tiwari @.> Sent: Tuesday, August 20, 2024 1:46 AM To: microsoft/prose @.> Cc: Author @.***> Subject: Re: [microsoft/prose] It takes about 2 hours to detect the Excel file which contains some pictures. (Issue #78)

Can you clarify what exactly is being used to set "strData"? In other words, how is "strData" generated from the shared excel file? (I can't access the Babylon repo to find out.)

— Reply to this email directly, view it on GitHubhttps://github.com/microsoft/prose/issues/78#issuecomment-2297105243 or unsubscribehttps://github.com/notifications/unsubscribe-auth/A5X66FH6UVMWNWZ2QQB36HTZSIVMPBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVA2DINRUGQ4DSOECUR2HS4DFUVUXG43VMWSXMYLMOVS2UMRUG4ZDMNJUHA3TNJ3UOJUWOZ3FOKTGG4TFMF2GK. You are receiving this email because you authored the thread.

Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

ivanliu-microsoft avatar Aug 20 '24 02:08 ivanliu-microsoft

Our csv parser will not feed any data format to PROSE. We just call its Learn() API to detect if the file is CSV or not (return CsvProgram or not). See the screen shot attached above.

ivanliu-microsoft avatar Aug 22 '24 05:08 ivanliu-microsoft

Can you clarify what exactly is being used to set "strData"? In other words, how is "strData" generated from the shared excel file? (I can't access the Babylon repo to find out.)

Hi Ashish, any updates from your side? need some workaround solution on it. Many thanks.

ivanliu-microsoft avatar Aug 23 '24 07:08 ivanliu-microsoft