chispa
chispa copied to clipboard
PySpark test helper methods with beautiful error messages
When using `ignore_nullable=True` chispa still sees differences in ArrayType because there's a nullable difference in the inner type: `StructField(my_arr_col,ArrayType(StringType,false),false)` `StructField(my_arr_col,ArrayType(StringType,true),true)`
When calling the assert_df_equality and assert_approx_df_equality it will be good to have the option to not display the get_string(). Sometimes the output might be to long or truncated.I think this...
ignore row and/or column order paramters for `assert_approx_df_equality` function
It would be great if we could avoid column order checking when using `assert_approx_df_equality`
E chispa.dataframe_comparer.SchemasNotEqualError: E +------------------------------------------+------------------------------------------+ E | schema1 | schema2 | E +------------------------------------------+------------------------------------------+ E | StructField(second_name,StringType,true) | StructField(second_name,StringType,true) | E | StructField(id,LongType,true) | StructField(id,LongType,true) | E | StructField(floor,LongType,true) | StructField(floor,LongType,true) |...
Hi there! Thank you very much for your great library. In order to debug faster and see what went wrong, I came up with a simple solution of displaying original...
The [newly created] test below fails. This because `are_structfields_equal` doesn't check for the case when the dataType is an array. If the dataType is array, then the nullability shouldn't matter...
This pull request solves the issue by making a fairly big change to the API. Now, rather than having two assertion functions for both DataFrame and column comparison, there is...
When trying to `assert_df_equality` with `allow_nan_equality=True`, if the both DataFrames hold an array that contains some `nan` values then the comparer fails, even if the `nan`s are in the same...
I feel like this would be quite useful. Were there any design choices for why it wasn't included or would this be a useful addition? https://github.com/MrPowers/chispa/blob/500793efe14b1975b86fb1a923ee6cd68ba559d8/chispa/dataframe_comparer.py#L38-L40