arkouda icon indicating copy to clipboard operation
arkouda copied to clipboard

Re # 3128 fixes errors and omissions in PROTO-tests version of datafr…

Open drculhane opened this issue 1 year ago • 2 comments

This has various updates to PROTO-tests/tests/dataframe_test.py.

Brings in an additional 7 tests from the non-PROTOs test: test_to_pandas, test_from_pandas, test_save, test_ipv4_columns, test_convenience_init, test_column_init, test_merge. Both versions now perform the same tests.

Fixes FAILED test of dataframe_creation by hardcoding a size limit = 1000 (creation from list crashes in index.py if list size > 1000).

Fixes FAILED test in dtypes_test by removing test on ak.bigint (which no longer raises the expected TypeError).

drculhane avatar Apr 26 '24 13:04 drculhane

Good suggestions. I've noted that the PROTOs tests are inconsistent with how they do assertions. I'll look for places where I can use assert_frame_equal.

Also, oops again on letting old comments get pushed up, and on my black settings.

--- Andy

On Fri, Apr 26, 2024 at 12:34 PM ajpotts @.***> wrote:

@.**** approved this pull request.

In PROTO_tests/tests/dataframe_test.py https://github.com/Bears-R-Us/arkouda/pull/3139#discussion_r1581246617:

     ]
     pddf = pd.DataFrame(x)
     pddf.columns = pddf.columns.astype(str)
     akdf = ak.DataFrame([ak.array(val) for val in list(zip(*x))])
     assert isinstance(akdf, ak.DataFrame)
     assert len(akdf) == len(pddf)
     # arkouda does not allow for numeric columns.
  •    assert akdf.columns.values == [str(x) for x in pddf.columns.values]
    
  •    assert akdf.columns.values == [str(x) for x in pddf.columns.values] # ****
    

Why is there the # **** comment here?

In PROTO_tests/tests/dataframe_test.py https://github.com/Bears-R-Us/arkouda/pull/3139#discussion_r1581250670:

@@ -971,16 +1063,34 @@ def test_multi_col_merge(self): c = ak.randint(-size // 10, size // 10, size, seed=seed + 2) d = ak.randint(-size // 10, size // 10, size, seed=seed + 3) ones = ak.ones(size, int)

  •    altr = ak.cast(ak.arange(size) % 2 == 0, int)
    

+# altr = ak.cast(ak.arange(size) % 2 == 0, int)

Remove old comment.

In PROTO_tests/tests/dataframe_test.py https://github.com/Bears-R-Us/arkouda/pull/3139#discussion_r1581253058:

@@ -182,7 +191,9 @@ def test_dataframe_creation(self, size): "uint": ak.array(pddf["uint"]), "bigint": ak.arange(2200, 2200 + size), "bool": ak.array(pddf["bool"]),

  •            "segarray": ak.SegArray.from_multi_array([ak.array(x) for x in pddf["segarray"]]),
    
  •            "segarray": ak.SegArray.from_multi_array(
    

Can you check your black settings? It seems like this is reformatting way more than necessary.

In PROTO_tests/tests/dataframe_test.py https://github.com/Bears-R-Us/arkouda/pull/3139#discussion_r1581260021:

  •    ij_expected_df = ak.DataFrame(
    
  •        {
    
  •            "key": ak.array([2, 3]),
    
  •            "value1_x": ak.array(["C", "D"]),
    
  •            "value3_x": ak.array([2, 3]),
    
  •            "value1_y": ak.array(["A", "B"]),
    
  •            "value2": ak.array(["apple", "banana"]),
    
  •            "value3_y": ak.array([1, 1]),
    
  •        }
    
  •    )
    
  •    ij_merged_df = ak.merge(df1, df2, how="inner", on="key")
    
  •    assert ij_expected_df.columns.values == ij_merged_df.columns.values
    

Can we replace all these asserts with assert_frame_equal(ij_expected.to_pandas(retain_index=True), ij_merged.to_pandas(retain_index=True) to simplify these tests a bit?

— Reply to this email directly, view it on GitHub https://github.com/Bears-R-Us/arkouda/pull/3139#pullrequestreview-2025402769, or unsubscribe https://github.com/notifications/unsubscribe-auth/BGL3SKX5HZNBYVOD3L3Z53TY7J6YJAVCNFSM6AAAAABG2Y6CXWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDAMRVGQYDENZWHE . You are receiving this because you authored the thread.Message ID: @.***>

-- Andrew D. Culhane, Ph.D. Sagecor Solutions LLC

drculhane avatar Apr 26 '24 16:04 drculhane

So that entire chunk could be replaced by:

assert_frame_equal(ij_expected_df.to_pandas(),ij_merged_df.to_pandas())

Right? There are several of these, so I'll go through all of them to see whether the same thing applies.

On Fri, Apr 26, 2024 at 12:49 PM Andrew Culhane @.***> wrote:

Good suggestions. I've noted that the PROTOs tests are inconsistent with how they do assertions. I'll look for places where I can use assert_frame_equal.

Also, oops again on letting old comments get pushed up, and on my black settings.

--- Andy

On Fri, Apr 26, 2024 at 12:34 PM ajpotts @.***> wrote:

@.**** approved this pull request.

In PROTO_tests/tests/dataframe_test.py https://github.com/Bears-R-Us/arkouda/pull/3139#discussion_r1581246617:

     ]
     pddf = pd.DataFrame(x)
     pddf.columns = pddf.columns.astype(str)
     akdf = ak.DataFrame([ak.array(val) for val in list(zip(*x))])
     assert isinstance(akdf, ak.DataFrame)
     assert len(akdf) == len(pddf)
     # arkouda does not allow for numeric columns.
  •    assert akdf.columns.values == [str(x) for x in pddf.columns.values]
    
  •    assert akdf.columns.values == [str(x) for x in pddf.columns.values] # ****
    

Why is there the # **** comment here?

In PROTO_tests/tests/dataframe_test.py https://github.com/Bears-R-Us/arkouda/pull/3139#discussion_r1581250670:

@@ -971,16 +1063,34 @@ def test_multi_col_merge(self): c = ak.randint(-size // 10, size // 10, size, seed=seed + 2) d = ak.randint(-size // 10, size // 10, size, seed=seed + 3) ones = ak.ones(size, int)

  •    altr = ak.cast(ak.arange(size) % 2 == 0, int)
    

+# altr = ak.cast(ak.arange(size) % 2 == 0, int)

Remove old comment.

In PROTO_tests/tests/dataframe_test.py https://github.com/Bears-R-Us/arkouda/pull/3139#discussion_r1581253058:

@@ -182,7 +191,9 @@ def test_dataframe_creation(self, size): "uint": ak.array(pddf["uint"]), "bigint": ak.arange(2200, 2200 + size), "bool": ak.array(pddf["bool"]),

  •            "segarray": ak.SegArray.from_multi_array([ak.array(x) for x in pddf["segarray"]]),
    
  •            "segarray": ak.SegArray.from_multi_array(
    

Can you check your black settings? It seems like this is reformatting way more than necessary.

In PROTO_tests/tests/dataframe_test.py https://github.com/Bears-R-Us/arkouda/pull/3139#discussion_r1581260021:

  •    ij_expected_df = ak.DataFrame(
    
  •        {
    
  •            "key": ak.array([2, 3]),
    
  •            "value1_x": ak.array(["C", "D"]),
    
  •            "value3_x": ak.array([2, 3]),
    
  •            "value1_y": ak.array(["A", "B"]),
    
  •            "value2": ak.array(["apple", "banana"]),
    
  •            "value3_y": ak.array([1, 1]),
    
  •        }
    
  •    )
    
  •    ij_merged_df = ak.merge(df1, df2, how="inner", on="key")
    
  •    assert ij_expected_df.columns.values == ij_merged_df.columns.values
    

Can we replace all these asserts with assert_frame_equal(ij_expected.to_pandas(retain_index=True), ij_merged.to_pandas(retain_index=True) to simplify these tests a bit?

— Reply to this email directly, view it on GitHub https://github.com/Bears-R-Us/arkouda/pull/3139#pullrequestreview-2025402769, or unsubscribe https://github.com/notifications/unsubscribe-auth/BGL3SKX5HZNBYVOD3L3Z53TY7J6YJAVCNFSM6AAAAABG2Y6CXWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDAMRVGQYDENZWHE . You are receiving this because you authored the thread.Message ID: @.***>

-- Andrew D. Culhane, Ph.D. Sagecor Solutions LLC

-- Andrew D. Culhane, Ph.D. Sagecor Solutions LLC

drculhane avatar Apr 26 '24 17:04 drculhane