chispa icon indicating copy to clipboard operation
chispa copied to clipboard

PySpark test helper methods with beautiful error messages

Results 64 chispa issues
Sort by recently updated
recently updated
newest added

When using `ignore_nullable=True` chispa still sees differences in ArrayType because there's a nullable difference in the inner type: `StructField(my_arr_col,ArrayType(StringType,false),false)` `StructField(my_arr_col,ArrayType(StringType,true),true)`

help wanted
good first issue

When calling the assert_df_equality and assert_approx_df_equality it will be good to have the option to not display the get_string(). Sometimes the output might be to long or truncated.I think this...

good first issue

ignore row and/or column order paramters for `assert_approx_df_equality` function

It would be great if we could avoid column order checking when using `assert_approx_df_equality`

good first issue

E chispa.dataframe_comparer.SchemasNotEqualError: E +------------------------------------------+------------------------------------------+ E | schema1 | schema2 | E +------------------------------------------+------------------------------------------+ E | StructField(second_name,StringType,true) | StructField(second_name,StringType,true) | E | StructField(id,LongType,true) | StructField(id,LongType,true) | E | StructField(floor,LongType,true) | StructField(floor,LongType,true) |...

help wanted
good first issue

Hi there! Thank you very much for your great library. In order to debug faster and see what went wrong, I came up with a simple solution of displaying original...

The [newly created] test below fails. This because `are_structfields_equal` doesn't check for the case when the dataType is an array. If the dataType is array, then the nullability shouldn't matter...

This pull request solves the issue by making a fairly big change to the API. Now, rather than having two assertion functions for both DataFrame and column comparison, there is...

When trying to `assert_df_equality` with `allow_nan_equality=True`, if the both DataFrames hold an array that contains some `nan` values then the comparer fails, even if the `nan`s are in the same...

good first issue

I feel like this would be quite useful. Were there any design choices for why it wasn't included or would this be a useful addition? https://github.com/MrPowers/chispa/blob/500793efe14b1975b86fb1a923ee6cd68ba559d8/chispa/dataframe_comparer.py#L38-L40