data-validator
data-validator copied to clipboard
A tool to validate data, built around Apache Spark.
It would be great to have a `distinctCountCheck` validator that checks the number of distinct values in a column of a given table, and that this number matches a user...
@samratmitra-0812 [pointed out](https://github.com/target/data-validator/pull/45#discussion_r429314336): > This behaviour of throwing an exception for unsupported type [in columnSumCheck] is different from columnMaxCheck, where it is treated as a normal check failure. I think...
As discussed in our original Spark Summit presentation: See [22 min mark](https://youtu.be/LTeZoo6kEBQ?t=1319). _Listening to myself is awful btw._ Inspired by the nice visualization provided by [Facets Overview](https://pair-code.github.io/facets/) while leveraging spark...
``` $ spark-submit --master "local[*]" $(ls -t target/scala-2.11/data-validator-assembly-*.jar | head -n 1) --config local_validators.yaml --jsonReport target/testreport.json --htmlReport target/testreport.html 20/04/07 18:04:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform......
Currently, if I wanted to check for null values in each of the columns (`age`, `occupation`) of a table, the `checks:` section of the configuration file would contain something this:...
Feature request for supporting grouping and then checks on grouped data.
While testing `stringLengthCheck` I accidently referenced `minLength` instead of `minValue` This caused `configTest()` to fail for no apparent reason and took me a really long time to debug because the...
Support connecting to smtp host using SSL and user authentication. See [Sending email java ssltls auth](https://www.geeksforgeeks.org/sending-email-java-ssltls-authentication/) Will require adding some addition options to `EmailConfig`
We have quite a few tests. We should make a pass through them a remove any ones that don't improve our code coverage.