Don't generate the assertion files in `piperider run`
Summary
Some users use piperider as the data profiling tool only. However, in current journey, it will always generate the assertion files for the first run.
fetching metadata
[1/1] data ━━━━━━━━━━━ 5/5 0:00:00
No assertion found
Do you want to auto generate recommended assertions for this datasource [Yes/no]?
The problem would be
- User don't know what will happen when I enter yes or no
- Even say NO, there is still empty assertion files generated. But why don't we generate it only when the user would like to write the tests?
- If the user say YES, the assertions files are generated for current profiling result. However, if the user is not intended to write assertion files right away, the generated assertions would be confusing for the future runs.
- Another problem is that, all the assertion files for every tables are generated. It would be not realistic to write all the tests at the same time.
Intended Outcome
- Don't generate assertions in
piperider run, usegenerate-assertionscommand instead to generate template or assertions. - The real case to writetest is table by table. It would be more reasonable to generate assertions -> edit assertion file -> test by table basic.
How will it work?
-
The
piperider runwill not generate assertions. -
In
generate-assertions, we have to specify the table to generate rather than all tables. (e.g.piperider generate-assertions --table mytable) -
In
generate-assertions, user can select empty template or suggestion assertions.$ piperider generate-assertions --table mytable [?] Which type of strategy to generate assertions: * Empty assertions with column structure Suggest the assertions by the profiling result.
Internal ticket sc-28737
- Another problem is that, all the assertion files for every tables are generated. It would be not realistic to write all the tests at the same time.
but what if the user just wants to generate the assertions by the profiling results instead of writing all the tests at the early time? can the user execute the single command to generate all suggested assertions?
- The piperider run will not generate assertions.
if the user already had experience with the profiling/testing of piperider, and the user wants to start a new data project. can they execute piperider run with generated assertions by passing the option? if so, it can ease the user effort to generate the assertions in the new project
but what if the user just wants to generate the assertions by the profiling results instead of writing all the tests at the early time? can the user execute the single command to generate all suggested assertions?
I don't think there is a perfect rules to generate ready-to-use result for suggested assertions. I prefer to make it a baseline of assertions rather than a perfect ready-to-use suggestion rule.
can they execute piperider run with generated assertions by passing the option? if so, it can ease the user effort to generate the assertions in the new project
like piperider run --generate-assertions?
Sorry that I prefer to separate the two journeys. The reason is
- Same above. I don't think there is a perfect way to generate the suggested assertion
- We can have better experience to tell user what assertions are generated by optimize the
generate-assertionoutput and ask user to edit it. - There would be some interaction in the generate-assertion, I prefer to make
piperider runsimple that the it oinly run a profiling/test pipeline rather than generating assertions as well.
okay, that makes sense
i'd like to see user can have a better experience with what assertions are generated and how they can edit it
Available in v0.13.0