split-annotations icon indicating copy to clipboard operation
split-annotations copied to clipboard

Performance degradation in sa.annotated.pandas

Open kbrodt opened this issue 6 years ago • 1 comments

Hello,

I launched a simple comparison of apply between pandas and sa.annotated.pandas and got performance 0.018s in the first case and 12.47s in the second one. Also I cannot print annotated dataframe. What I do uncorrect here?

To reproduce:

import time


def test_new_apply(is_sa=False):
    if is_sa:
        import sa.annotated.pandas as pd
    else:
        import pandas as pd

    a = pd.DataFrame(list(range(10 ** 5)), columns=['x'])
    if is_sa:
        # print(a.head())  # <- raises AttributeError: 'Operation' object has no attribute 'max'
        pass
    else:
        print(a.head())
    start = time.time()
    a['xp1'] = a['x'].apply(lambda x: x + 1)
    print(time.time() - start)


test_new_apply(is_sa=False)
test_new_apply(is_sa=True)

Environment:

Python 3.6.8 [GCC 8.3.0] on linux

pandas==0.25.2 sa==0.0.4

kbrodt avatar Oct 30 '19 17:10 kbrodt

Probably It is due to non annotated lambda.

kbrodt avatar Oct 30 '19 21:10 kbrodt