GEOparse icon indicating copy to clipboard operation
GEOparse copied to clipboard

Failing test: `test_merge_and_average` fails with a TypeError with pandas 2.0.3

Open olp-cs opened this issue 1 year ago • 0 comments

The test is failing on platforms:

  • Linux-6.1.85+-x86_64-with-glibc2.35
    • Python 3.10.12
    • pandas 2.0.3
    • numpy 1.25.2
  • Windows-11-10.0.22631-SP0
    • Python 3.12
    • pandas 2.2.2
    • numpy 2.0.0

Update: The test passes with Python 3.8 and pandas 1.3.0; it seems to be a backward compatibility issue with pandas.

Failed test:

test_GEOparse.py:626 (TestGSE.test_merge_and_average)

TypeError: agg function failed [how->mean,dtype->object]

TypeError: Could not convert string 'DNA segment, Chr 8, ERATO Doi 594, expressed' to numeric

The test fails on this line:

..\src\GEOparse\GEOTypes.py:445: in annotate_and_average

tmp_data = tmp_data.groupby(group_by_column).mean()[[expression_column]]

where

  • tmp_data is a pandas dataframe that contains both numeric and string columns (attached: tmp_data.csv);
  • expression_column = 'VALUE'
  • group_by_column = 'GB_ACC'

Jupyter notebook reproducing the issue: https://gist.github.com/olp-cs/9902b5cdc554afbf3faa7127ee602f20

Would it make sense to filter the columns first, to keep the numerical ones only?

olp-cs avatar Jul 16 '24 10:07 olp-cs