data-validator
data-validator copied to clipboard
A tool to validate data, built around Apache Spark.
Updates [org.scalameta:scalafmt-core](https://github.com/scalameta/scalafmt) from 3.5.8 to 3.5.9. [GitHub Release Notes](https://github.com/scalameta/scalafmt/releases/tag/v3.5.9) - [Version Diff](https://github.com/scalameta/scalafmt/compare/v3.5.8...v3.5.9) I'll automatically update this PR to resolve conflicts as long as you don't change it yourself. If you'd...
Removes a small inefficiency. When there are no vars in the config file or passed at the command line there is no need to call `variableSubstitution`.
https://github.com/sbt/sbt-ghpages with perhaps a one-pager that links to docs that this plugin can also put in the gh-pages branch.
When trying to run a config check on a parquet file, the following error can be seen: ``` root@lubuntu:/home/jyoti/Spark# /opt/spark/spark-3.1.2-bin-hadoop3.2/bin/spark-submit --num-executors 10 --executor-cores 2 data-validator-assembly-20220111T034941.jar --config config.yaml 22/01/11 11:50:53 WARN...
In order to enable data-validator for Hadoop 3, a dependency on [HiveWarehouseConnector](https://mvnrepository.com/artifact/com.hortonworks.hive/hive-warehouse-connector_2.11/1.0.0.3.1.0.53-1) was added. Post this unit tests started failing with the following exception: ``` java.lang.SecurityException: class "org.codehaus.janino.JaninoRuntimeException"'s signer information...
**Is your feature request related to a problem? Please describe.** We currently only ship for Scala 2.11 and Spark 2.3.x. **Describe the solution you'd like** We should ship for newer...
**Is your feature request related to a problem? Please describe.** We've got ValidatorTable and `tables` in the config, but they're not really tables in the case of orc or parquet...
**Describe the bug** When specifying a check with a threshold that will parse to a JSON float, e.g. ```yaml threshold: 0.10 # will be ignored threshold: 10% # works threshold:...
Currently, if sending email fails because the email server is temporarily offline or overloaded, the only choice of action is to rerun the whole validation. This can be very expensive,...
https://github.com/dmurvihill/courier is very attractive, missing only the retry support that would satisfy #70. Using Courier would satisfy #19 and could facilitate #5 as something to do in the process.