sdk Allow multiple `sources` in `createIntegrationEntity`

Currently, if we have multiple rawData sets added to an entity, we need to use the setRawData function after creating an entity:

const entity = createIntegrationEntity({
  entityData: {
    source: data,
    _key: 'some-key',
    _class: 'Resource',
    _type: 'some-type',
  },
});

setRawData(entity, { name: 'blobServiceProperties', rawData: blobServiceProperties });

It would be good to allow multiple sets of raw data to be added directly in createIntegrationEntity(). This would be more succinct, makes the function more expressive, and possibly allows better auto-conversion (even though auto-conversion may be an anti-pattern).

const entity = createIntegrationEntity({
  entityData: {
    sources: [
      { name: 'default', rawData: data },
      { name: 'blobServiceProperties', rawData: blobServiceProperties },
    ],
    _key: 'some-key',
    _class: 'Resource',
    _type: 'some-type',
  },
});

Mar 22 '21 16:03 ndowmon

I was thinking about solving this problem as well, but came to a different conclusion. I think what we should be doing is allow duplicate partial entities and linking them with IS relationships. The idea is to have smaller simpler entities that each integration owns instead of a single large source-of-truth entity and relying on the mapper to take care of updating it for us. I believe that our proposals are incompatible with each-other so we might want to figure that out.

Mar 22 '21 18:03 mknoedel

I see. That's interesting. What would you expect the entity _types and _classes to be in that instance?

In the example above (taken from a use case in the graph-azure project), blobServiceProperties is essentially a set of configuration variables on the storage_account entity. Every storage_account has exactly one instance of these properties, but they aren't returned from the same endpoint. If anything, I would probably construct the relationhips as azure_storage_account|HAS|azure_storage_account_blob_service_properties, where azure_storage_account is Datastore and azure_storage_account_blob_service_properties is Configuration.

I'd love to see some additional use cases, and examples when the relationship class is IS.

Mar 22 '21 18:03 ndowmon

The AWS integration will most certainly need this raw data convenience feature. There are definitely entities that are composed as the result of multiple API calls, with different names for the raw data, and most of those are not likely to find support for being ingested as distinct entities. Or to put it another way, this Issue is a good idea and quite independent from the question of whether we should push for never merging the results of two API calls into a single entity.

Mar 22 '21 18:03 aiwilliams

I see thanks for explaining it. Explaining it back, this allows us to put all of the api data sources used to generate the api you are creating. That's a great idea and is separate from what I stated above. In the past I have had the name of the sources as the keys and the values as the raw data like:

     "name": "default",
      "rawData": {
        "blobServiceProperties": {
          "key": "value"
        },
        "default": {
          "key": "value"
        }

which works I think but is not nearly as explicit as you are stating. I like the change.

Mar 23 '21 13:03 mknoedel