data-validator icon indicating copy to clipboard operation
data-validator copied to clipboard

A tool to validate data, built around Apache Spark.

Results 43 data-validator issues
Sort by recently updated
recently updated
newest added

Updates [org.scalameta:scalafmt-core](https://github.com/scalameta/scalafmt) from 3.5.8 to 3.5.9. [GitHub Release Notes](https://github.com/scalameta/scalafmt/releases/tag/v3.5.9) - [Version Diff](https://github.com/scalameta/scalafmt/compare/v3.5.8...v3.5.9) I'll automatically update this PR to resolve conflicts as long as you don't change it yourself. If you'd...

Removes a small inefficiency. When there are no vars in the config file or passed at the command line there is no need to call `variableSubstitution`.

enhancement

https://github.com/sbt/sbt-ghpages with perhaps a one-pager that links to docs that this plugin can also put in the gh-pages branch.

When trying to run a config check on a parquet file, the following error can be seen: ``` root@lubuntu:/home/jyoti/Spark# /opt/spark/spark-3.1.2-bin-hadoop3.2/bin/spark-submit --num-executors 10 --executor-cores 2 data-validator-assembly-20220111T034941.jar --config config.yaml 22/01/11 11:50:53 WARN...

In order to enable data-validator for Hadoop 3, a dependency on [HiveWarehouseConnector](https://mvnrepository.com/artifact/com.hortonworks.hive/hive-warehouse-connector_2.11/1.0.0.3.1.0.53-1) was added. Post this unit tests started failing with the following exception: ``` java.lang.SecurityException: class "org.codehaus.janino.JaninoRuntimeException"'s signer information...

**Is your feature request related to a problem? Please describe.** We currently only ship for Scala 2.11 and Spark 2.3.x. **Describe the solution you'd like** We should ship for newer...

enhancement
hacktoberfest

**Is your feature request related to a problem? Please describe.** We've got ValidatorTable and `tables` in the config, but they're not really tables in the case of orc or parquet...

enhancement

**Describe the bug** When specifying a check with a threshold that will parse to a JSON float, e.g. ```yaml threshold: 0.10 # will be ignored threshold: 10% # works threshold:...

bug

Currently, if sending email fails because the email server is temporarily offline or overloaded, the only choice of action is to rerun the whole validation. This can be very expensive,...

enhancement
good first issue

https://github.com/dmurvihill/courier is very attractive, missing only the retry support that would satisfy #70. Using Courier would satisfy #19 and could facilitate #5 as something to do in the process.