DadmaTools icon indicating copy to clipboard operation
DadmaTools copied to clipboard

Potential performance Issue: Slow read_csv() Function with pandas 1.3.3

Open TendouArisu opened this issue 1 year ago • 3 comments

Issue Description:

Hello. I have discovered a performance degradation in the read_csv function of pandas version 1.3.3. And I notice some parts of the repository depend on pandas 1.3.3 in dadmatools/requirements.txt and some other dependencies require pandas below 1.4. I am not sure whether this performance problem in pandas will affect this repository. I found some discussions on pandas GitHub related to this issue, including #44158 and #44610. I also found that dadmatools/pipeline/informal2formal/utils.py and dadmatools/pipeline/informal2formal/VerbHandler.py used the influenced api. There may be more files using the influenced api.

Suggestion

I would recommend considering an upgrade to a different version of pandas >= 1.4 or exploring other solutions to optimize the performance of read_csv. Any other workarounds or solutions would be greatly appreciated. Thank you!

TendouArisu avatar Mar 02 '24 08:03 TendouArisu

Thank you for your comment; I will try to update the Pandas version. However, I'm uncertain whether our "informal2formal" function utilized the influenced API.

On another note, I wanted to inquire if you have knowledge about the Persian language?

sadeghjafari5528 avatar Mar 02 '24 14:03 sadeghjafari5528

No, I don't know the Persian language. I encountered this problem in other repositories, so I wanted to note other repositories this potential problem.

TendouArisu avatar Mar 20 '24 01:03 TendouArisu

Thank you for your suggestion.

sadeghjafari5528 avatar Mar 21 '24 05:03 sadeghjafari5528