chispa icon indicating copy to clipboard operation
chispa copied to clipboard

chispa 1.0 release

Open MrPowers opened this issue 2 years ago • 5 comments

It would be nice to develop chispa so we can make a 1.0 release.

We might even want to expose a different interface. Something like this:

@dataclass
class MyFormats:
    mismatched_rows = ["light_yellow"]
    matched_rows = ["cyan", "bold"]
    mismatched_cells = ["purple"]
    matched_cells = ["blue"]

my_chispa = Chispa(formats=MyFormats())

my_chispa.assert_df_equality(actual_df, expected_df)

The user could inject the my_chispa object in their tests as follows:

@pytest.fixture()
def my_chispa():
    return Chispa(formats=MyFormats())

def test_shows_assert_basic_rows_equality(my_chispa):
  ...
  my_chispa.assert_basic_rows_equality(df1.collect(), df2.collect())

It's worth contemplating at least.

MrPowers avatar Feb 19 '24 20:02 MrPowers

Let's brainstorm some of the "big issues" with chispa:

  • bad for wide table DataFrame comparisons
  • doesn't handle some column types well
  • probably doesn't handle some edge cases well (e.g. array columns with NaN values)
  • user can't customize formatting
  • some bad abstractions (e.g the underline_cells argument)
  • Users can't disable terminal characters (sometimes users want to use this in a notebook and don't want any Terminal formatting output)

Here are some project goals:

  • always maintain backward compatibility whenever possible
  • output beautiful error messages and make it easier for users to unit test their PySpark code
  • allow users to run unit tests in a performant manner

For chispa 1.0, it might be better to build new interfaces rather than modify the existing interfaces. But I'd rather not make chispa 1.0 backward incompatible. Let's align on vision & interfaces.

MrPowers avatar Jul 17 '24 20:07 MrPowers

For chispa 1.0, it might be better to build new interfaces rather than modify the existing interfaces. But I'd rather not make chispa 1.0 backward incompatible. Let's align on vision & interfaces.

Why not to have a new API, but do not delete an old one, only raise DeprecationWarnings? Or even just create a chispa.v2 API.

SemyonSinchenko avatar Jul 18 '24 13:07 SemyonSinchenko

Yep, I already started building that new interface with Chispa(formats=MyFormats()). We may want to expose the public API via Chispa going forward. I think we just need to figure out exactly the public interface that we want to expose to end users. The public interface should meet all the project goals, should be flexible enough to allow for customizations, and should be easy to run with the defaults.

MrPowers avatar Jul 18 '24 14:07 MrPowers

user can't customize formatting

I already started building that new interface with Chispa(formats=MyFormats()). [...]

@MrPowers For a proposed new way of formatting configuration, see https://github.com/MrPowers/chispa/pull/127 which would change that for users to e.g.

Chispa(
    formats=FormattingConfig(
        mismatched_rows={"color": "light_yellow"}
    )
)

fpgmaas avatar Jul 19 '24 05:07 fpgmaas

I think the best way to move forward is to simply create separate issues for the following topics:

bad for wide table DataFrame comparisons doesn't handle some column types well probably doesn't handle some edge cases well (e.g. array columns with NaN values) user can't customize formatting some bad abstractions (e.g the underline_cells argument) Users can't disable terminal characters (sometimes users want to use this in a notebook and don't want any Terminal formatting output)

So we can discuss them separately. We add them to the milestone for a 1.0 release. We release features and changes one-by-one by incrementing the minor version, and when all desired changes and features for the 1.0 release are finished, we release it.

fpgmaas avatar Jul 19 '24 09:07 fpgmaas