community-plugins icon indicating copy to clipboard operation
community-plugins copied to clipboard

🚀 entity-validation/catalog: Add possibility to validate multiple Entities that depend on each other

Open knowacki23 opened this issue 1 year ago • 4 comments

Plugin Name

🚀 entity-validation/catalog

🔖 Feature description

Add possibility to send more then one entity to /api/catalog/validate-entity and add temporary registry of validated entities to allow validation of multiple entities that are depending on each other. This feature would allow validation of multiple entities at once which would be helpful when user is trying to introduce multiple entities that depend on each other at once.

We are using validate-entity endpoint to validate entities in Pull Requests. If user tries to introduce more than one entity and these entities are depending on each other validator fails.

🎤 Context

While using entity-validation we have realized that it is impossible to validate two (or more) separate entities that depend on each other. For example, if we want to validate new Group entity and a Component that should be owned by that Group entity validator will return error for the Component because it doesn't know about the group entity. Sample validator input:

apiVersion: backstage.io/v1beta1
kind: Group
metadata:
  namespace: default
  annotations:
  name: fake-team
  title: FAKE-Team
  description: This is a fake team for test purposes
spec:
  type: Team
---
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: backstage
  title: Backstage
  namespace: delivery
  description: Backstage
spec:
  type: website
  lifecycle: production
  owner: fake-team
  system: fake-system

validator response: Validates Group entity correctly. for the Component entity returns following error:

Processor CustomEntityValidationProcessor threw an error while validating the entity component:fake-namespace/backstage; caused by ValidationError: spec.owner: "fake-team" failed endpoint validation as it is not a valid group in backstage; entityOwner: fake-team;

In the networking tab of developer web browser tools I see that validator sends two separate POST requests to the https://<backstage-url>/api/catalog/validate-entity endpoints. Each of the requests is for each of the components passed to Entity Validator. Screenshot from 2024-06-24 16-00-00 The first one is for the Group entity and it returns 200, the second one is for the Component and it returns 400 due to missing Group entity.

✌️ Possible Implementation

Modify validator and validate-entity endpoint form catalog so the endpoint would accept more than one entity with a single API call. We would send all the entities which should be validated within single API call and then store them in some temporary array and validate other entities from the same request against Software Catalog and that array as well. Or maybe some temporary Software Catalog only for validation purposes?

Would it make sense?

👀 Have you spent some time to check if this feature request has been raised before?

  • [X] I checked and didn't find similar issue

🏢 Have you read the Code of Conduct?

Are you willing to submit PR?

Yes I am willing to submit a PR!

knowacki23 avatar Jun 24 '24 14:06 knowacki23

Hi @knowacki23, sounds like a solid feature to add, feel free to submit a PR 🚀

awanlin avatar Jul 02 '24 14:07 awanlin

Hey @awanlin. I would like to contribute regarding that issue. May I ask you to assign it to me?

knowacki23 avatar Jul 22 '24 09:07 knowacki23

Hey, I did some changes to catalog-backend in order to be able to send multiple entities in a single API call. What I did is:

  • added a new endpoint called /validate-entities so other plugins using /validate-entity endpoint wouldn't be affected by changes. - https://github.com/backstage/backstage/compare/master...knowacki23:backstage:catalog-backend_hadnle_multiple_entities_in_one_api_call_-_validate_entity#diff-4f83d47008b53e1e7842993e99beb5e73fb9e4a160f14278c29d4f4ea0d71df2 that endpoint expects to receive an API call in a following form:
{
    "location":"url:https://url-to-file/catalog-info.yaml",
    "entities": [
        {
            "entity":{
                "apiVersion":"backstage.io/v1alpha1",
                "kind":"Component",
                "metadata":{
                    "name":"test",
                    "title":"Test",
                    "namespace":"test-namespace",
                    "description":"Component description",
                    "links":[],
                    "tags":[],
                    "annotations":{}
                },
                "spec":{
                    "type":"service",
                    "lifecycle":"experimental",
                    "owner":"entity-owner",
                    "system":"entity-system"
                }
            }
        },
        {
            "entity":{
                "apiVersion":"backstage.io/v1alpha1",
                "kind":"Component",
                "metadata":{
                    "name":"test",
                    "title":"Test",
                    "namespace":"test-namespace",
                    "description":"Component description",
                    "links":[],
                    "tags":[],
                    "annotations":{}
                },
                "spec":{
                    "type":"service",
                    "lifecycle":"experimental",
                    "owner":"entity-owner",
                    "system":"entity-system"
                }
            }
        }
    ]
}

since all the entities sent to validator are coming from the same location we only need one location parameter per api call, in the request body we are also passing entities array build of entity. Actual response is not yet created, but I'm thinking that it should be an array made of entity name, validation status and errors

  • added processMultiple function to CatalogProcessingOrchestrator interface. - https://github.com/backstage/backstage/compare/master...knowacki23:backstage:catalog-backend_hadnle_multiple_entities_in_one_api_call_-_validate_entity#diff-2af122646a490b82b9ab87dc7e5cca1faaa4fb0b71424d98b5e38386ed46829d
  • added processMultiple implementation to DefaultCatalogProcessingOrchestrator - https://github.com/backstage/backstage/compare/master...knowacki23:backstage:catalog-backend_hadnle_multiple_entities_in_one_api_call_-_validate_entity#diff-8faea90a46bc4a44050ef2e010454224967e1c8f1196e666a7a0a97ead134604 Basically this function is iterating through an array of entities which should be validated and calls processSingleEntity function for every entity.

And while I'm looking at that solution I'm thinking if it makes sense, or if its even a good solution.

I'm afraid that this might be a bad approach.

Does anyone has some idea or strong opinions regarding this matter?

The problem I want to solve is that we are using a custom validation rules which are checking if an entity under validation has an owner that actually exists in Software Catalog or the system that it is going to be a part of exists in the catalog as well. And in a scenario when we want to validate two entities, lets say Component and System (both are not yet registered to catalog), and that Component is a part of that new System the validation for Component will fail. We would like to be able to pass an array of entities to the validator processor so our custom validator would receive all of the entities which are being validated so we can check if missing entities for some components are also being validated - if yes then the validation should be successful. image

knowacki23 avatar Jul 26 '24 13:07 knowacki23

I was facing a similar situation when I built a linter to verify the catalog-info.yaml files:

  1. The existing api/catalog/validate-entity that triggers a dry-run of processing doesn't validate the relationships, which is what we want to do in the linter.
  2. When we have multiple entities to validate and they depend on each other (the exact situation you described in the issue), Backstage won't have the context as the entities have yet to be ingested in the catalog.
  3. We can introduce a custom processor to solve point 1. The custom processor can emit all the refs and validate if they exist in the catalog. However, the custom processor must not run during the actual processing, since the backstage process entities one by one during ingestion, and the refs check will fail the process.

What I did to solve the problem:

  • Created a custom processor to emit all the relations and validate them against the catalog.
  • (Hack 1) Added an annotation enforce-ref-check, and the custom processor only runs when it sees enforce-ref-check=true in the annotations.
    • When we call api/catalog/validate-entity, we will add the annotation for all the entities we send to the API, which triggers the custom processor.
    • During the normal processing, there is no such annotation in the entities and the custom processor is a no-op.
  • (Hack 2)On the linter side, we did another hack to go through the validation errors and exclude the ones caused by the entities submitted together.

As you can see here, I have to do some "dirty" hacks to pull this off. I personally would love to see some progress on this matter so I don't have to maintain all the hacks I did.

namco1992 avatar Sep 16 '24 03:09 namco1992

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Nov 15 '24 06:11 github-actions[bot]