recce icon indicating copy to clipboard operation
recce copied to clipboard

[Feature] Support "Slim CI"-based runs?

Open gofford opened this issue 1 year ago • 4 comments

Describe the feature

This is part question, part feature request.

Request I'd like to be able to specify the files to compare between manifests, rather than comparing everything.

Context As part of our deployment process the dbt project is compiled and the artefacts are stored in GCS. Our deployment process is phased, so we have a 'release PR' for the global changelog and multiple dbt models can change in the main branch relative to the deployed code. For CI, we run a deferred dbt run (slim CI) based on changed files in each PR individually. When branched from main prior to deployment a given PR might technically have other un-deployed changes relative to the production manifest but those files are changed in different PRs. For example, consider a scenario where branch A is merged to main but not yet released and then branch B is subsequently branched from main. Relative to the deployed manifest, branch B will be seen as containing all of the the changed files from A despite branch B not changing any of the same files

When running Recce in this scheme - i.e., where the PR manifest(s) has a partial set of model only, but the 'prod' manifest is lagged behind the main branch - the Recce diff determines that all of the files in a given PR are changed.

is there a way to compare only the changed models in a given PR with those same models in the base manifest? - when changing a specific model in a PR I only care about the changes incurred by the model I have changed, and not by other unrelated changes in the repository.

Describe alternatives you've considered

I haven't considered any other options, but this has stopped me adopting Recce in our release process.

Who will this benefit?

The benefit will be more streamlined PR annotations for phased deployment processes, related to only those files changed in the PR and not all changes.

Are you interested in contributing this feature?

No response

Anything else?

No response

gofford avatar Sep 18 '24 13:09 gofford

Hi Jason, Apologies for the delay in replying. We're discussing this issue internally and may request some further clarification soon. Thanks for your patience. Dave

DaveFlynn avatar Sep 24 '24 08:09 DaveFlynn

image

It looks like you have a release branch representing the latest production state (master in the diagram), and a main branch for development (develop in the diagram). Is that correct?

If so, the easiest way to use Recce in a slim CI setup would be:

# Prepare base state in the target-base directory
# Run dbt with the deferred option
dbt run --state target-base/ --deferred -s state:modified+
# Diff the result
recce server

However, since there is only one production environment, changes from other PRs may still be visible. In your example, PR B would see the changes from PR A.

The recommended solution is as follows:

  • Create a separate base environment from production, such as a staging environment. Reference
  • Update the staging environment every time a PR is merged, ensuring it stays up-to-date with the latest commit. Reference
  • Ensure that the PR branch is up-to-date with the base branch before using Recce in that PR. Reference

If these three steps are followed, Recce will only show changes introduced by the specific PR.

popcornylu avatar Sep 24 '24 09:09 popcornylu

Hi both, that makes sense. And that diagram does capture the nuance of what's going on here.

I suppose I was hoping to be able to subset the existing "deployed" manifest rather than regenerating but I can see why this would be awkward though given how Recce handles the diff. I'll try some ideas and let you know how it goes.

gofford avatar Sep 25 '24 09:09 gofford

Adding to this - I think this is specifically an issue for preset checks run automatically in ci with recce run.

Motivating Example

In the CI pipeline for my dbt project, I build modified models using the dbt build --select state:modified+ --state ./base_target into a dedicated PR schema, e.g. dbt_pr_99.

I define some preset checks in my recce.yml file which access the database (e.g. QueryDiff) on my_model that should be run anytime my_model_a is modified. I want to avoid having to rewrite these queries / share the recce state file to run these checks across different PRs modifying my_model_a.

I then raise a PR which modifies my_model_b. Because I am using dbt build --select state:modified+ --state ./base_target, my_model_b and any downstream nodes will be built into my pr dataset. Because my_model_a is not selected by the node selector, it is not built into my PR schema.

In this scenario, when recce run is invoked in my CI pipeline recce will fail since my_model_a has not been built in the dbt_pr_99 dataset which is being compared to the corresponding model in the schema specified in the manifest passed as the base target.

Expected behaviour

It should be possible to conditionally run preset checks based on whether the relevant node is selected in the modified nodes.

morgan-dgk avatar Sep 04 '25 06:09 morgan-dgk