pyFF icon indicating copy to clipboard operation
pyFF copied to clipboard

When an entity is loaded from 2 sources, entity data from the 1st source is lost

Open enriquepablo opened this issue 1 year ago • 5 comments

For filtering with trust info, we need to add a few attributes to the discojson format: registrationAutority, the attributes entity-category, entity-category-support, assurance-certification, for IdPs, and DiscoveryResponses for SPs.

When the load pipe loads several sources, it accumulates all entities in a single dictionary keyed by entityID, here. This means that only the data in the last source loaded is going to be kept. There is a comment there saying "TODO: merge", but what we have there are EntityDescriptor XML elements, which for example can only carry at most one RegistrationInfo element.

Code Version

master

Expected Behavior

We would want to keep all the data in each entity until it is used by discojson.

Current Behavior

Data that is different accross sources is lost.

Possible Solution

One possibility would be to parse the entities e.g. around the line of code referenced above, and keep the loosable information in a new dictionary attached to the store, that could then be accessed in the discojson pipe.

Steps to Reproduce

  1. Load an entity from 2 sources, with a different registrationAuthority in each case
  2. Try to access both registrationAuthorities in the discojson pipe

enriquepablo avatar Jan 15 '25 09:01 enriquepablo

Adding a test pipeline to reproduce the issue. Put the 3 files in a directory, adjust the paths in test.yaml, and execute pyff test.yaml. Note that the select pipe in the test has dedup False set, and we obtain 2 identical copies of the entity JSON; if dedup is set to True (default), you just obtain a single copy of the same.

Both XML files are identical except for the RegistrationInfo. The RegistrationInfo from the 1st surce is lost.

Well, github does not allow me to attach yaml or xml files, so I'll paste them below.

test.yaml

- load:
  - file:///path/to/test/directory/test-idp-1.xml
  - file:///path/to/test/directory/test-idp-2.xml
- select dedup False
- discojson
- publish:
    output: "./test.json"
    raw: true
    update_store: false

test-idp-1.xml

<md:EntityDescriptor xmlns:md="urn:oasis:names:tc:SAML:2.0:metadata" xmlns:shibmd="urn:mace:shibboleth:metadata:1.0" xmlns:mdrpi="urn:oasis:names:tc:SAML:metadata:rpi"
                     entityID="https://idp.example.com/saml2/idp/metadata.php">
  <md:IDPSSODescriptor protocolSupportEnumeration="urn:oasis:names:tc:SAML:2.0:protocol">
    <md:Extensions>
      <mdrpi:RegistrationInfo registrationAuthority="http://www.swamid.se/" registrationInstant="2015-02-11T11:09:51Z">
        <mdrpi:RegistrationPolicy xml:lang="en">http://swamid.se/policy/mdrps</mdrpi:RegistrationPolicy>
      </mdrpi:RegistrationInfo>
      <shibmd:Scope regexp="false">example.com</shibmd:Scope>
      <mdui:UIInfo xmlns:mdui="urn:oasis:names:tc:SAML:metadata:ui">
        <mdui:DisplayName xml:lang="sv">Example universitet</mdui:DisplayName>
        <mdui:DisplayName xml:lang="en">Example University</mdui:DisplayName>
        <mdui:Description xml:lang="sv">Identity Provider för Example universitet</mdui:Description>
        <mdui:Description xml:lang="en">Identity Provider for Example University</mdui:Description>
        <mdui:InformationURL xml:lang="sv">http://www.example.com/</mdui:InformationURL>
        <mdui:InformationURL xml:lang="en">http://www.example.com/english/</mdui:InformationURL>
        <mdui:Logo height="63" width="358">https://www.example.com/static/images/umu_logo.jpg</mdui:Logo>
        <mdui:Logo xml:lang="sv" height="63" width="358">https://www.example.com/static/images/logo.jpg</mdui:Logo>
        <mdui:Logo xml:lang="en" height="63" width="350">https://www.example.com/static/images/logo_eng.jpg</mdui:Logo>
        <mdui:Keywords xml:lang="sv">exempel</mdui:Keywords>
        <mdui:Keywords xml:lang="en">example</mdui:Keywords>
      </mdui:UIInfo>
      <mdui:DiscoHints xmlns:mdui="urn:oasis:names:tc:SAML:metadata:ui">
        <mdui:DomainHint>example.com</mdui:DomainHint>
        <mdui:DomainHint>example.net</mdui:DomainHint>
        <mdui:IPHint>10.0.0.0/8</mdui:IPHint>
      </mdui:DiscoHints>
    </md:Extensions>
    <md:ArtifactResolutionService Binding="urn:oasis:names:tc:SAML:2.0:bindings:SOAP" Location="https://idp.example.com/saml2/idp/ArtifactResolutionService.php" index="0"/>
    <md:SingleLogoutService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-Redirect" Location="https://idp.example.com/saml2/idp/SingleLogoutService.php"/>
    <md:NameIDFormat>urn:oasis:names:tc:SAML:2.0:nameid-format:transient</md:NameIDFormat>
    <md:SingleSignOnService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-Redirect" Location="https://idp.example.com/saml2/idp/SSOService.php"/>
  </md:IDPSSODescriptor>
  <md:Organization>
    <md:OrganizationName xml:lang="sv">ExempelU</md:OrganizationName>
    <md:OrganizationName xml:lang="en">ExampleU</md:OrganizationName>
    <md:OrganizationDisplayName xml:lang="sv">Exempel Universitetet</md:OrganizationDisplayName>
    <md:OrganizationDisplayName xml:lang="en">The Example University</md:OrganizationDisplayName>
    <md:OrganizationURL xml:lang="sv">http://www.example.com</md:OrganizationURL>
    <md:OrganizationURL xml:lang="en">http://www.example.com/english</md:OrganizationURL>
  </md:Organization>
  <md:ContactPerson contactType="administrative">
    <md:Company>Example University</md:Company>
    <md:SurName>Example helpdesk</md:SurName>
    <md:EmailAddress>[email protected]</md:EmailAddress>
  </md:ContactPerson>
  <md:ContactPerson contactType="technical">
    <md:Company>Example University</md:Company>
    <md:SurName>Example helpdesk</md:SurName>
    <md:EmailAddress>[email protected]</md:EmailAddress>
  </md:ContactPerson>
  <md:ContactPerson contactType="support">
    <md:Company>Example University</md:Company>
    <md:SurName>Servicedesk Example universitet</md:SurName>
    <md:EmailAddress>[email protected]</md:EmailAddress>
  </md:ContactPerson>
</md:EntityDescriptor>

test-idp-2.xml

<md:EntityDescriptor xmlns:md="urn:oasis:names:tc:SAML:2.0:metadata" xmlns:shibmd="urn:mace:shibboleth:metadata:1.0" xmlns:mdrpi="urn:oasis:names:tc:SAML:metadata:rpi"
                     entityID="https://idp.example.com/saml2/idp/metadata.php">
  <md:IDPSSODescriptor protocolSupportEnumeration="urn:oasis:names:tc:SAML:2.0:protocol">
    <md:Extensions>
      <mdrpi:RegistrationInfo registrationAuthority="https://www.carsi.edu.cn" registrationInstant="2020-03-27T09:48:13Z">
        <mdrpi:RegistrationPolicy xml:lang="zh">https://www.carsi.edu.cn/index_zh.htm</mdrpi:RegistrationPolicy>
      </mdrpi:RegistrationInfo>
      <shibmd:Scope regexp="false">example.com</shibmd:Scope>
      <mdui:UIInfo xmlns:mdui="urn:oasis:names:tc:SAML:metadata:ui">
        <mdui:DisplayName xml:lang="sv">Example universitet</mdui:DisplayName>
        <mdui:DisplayName xml:lang="en">Example University</mdui:DisplayName>
        <mdui:Description xml:lang="sv">Identity Provider för Example universitet</mdui:Description>
        <mdui:Description xml:lang="en">Identity Provider for Example University</mdui:Description>
        <mdui:InformationURL xml:lang="sv">http://www.example.com/</mdui:InformationURL>
        <mdui:InformationURL xml:lang="en">http://www.example.com/english/</mdui:InformationURL>
        <mdui:Logo height="63" width="358">https://www.example.com/static/images/umu_logo.jpg</mdui:Logo>
        <mdui:Logo xml:lang="sv" height="63" width="358">https://www.example.com/static/images/logo.jpg</mdui:Logo>
        <mdui:Logo xml:lang="en" height="63" width="350">https://www.example.com/static/images/logo_eng.jpg</mdui:Logo>
        <mdui:Keywords xml:lang="sv">exempel</mdui:Keywords>
        <mdui:Keywords xml:lang="en">example</mdui:Keywords>
      </mdui:UIInfo>
      <mdui:DiscoHints xmlns:mdui="urn:oasis:names:tc:SAML:metadata:ui">
        <mdui:DomainHint>example.com</mdui:DomainHint>
        <mdui:DomainHint>example.net</mdui:DomainHint>
        <mdui:IPHint>10.0.0.0/8</mdui:IPHint>
      </mdui:DiscoHints>
    </md:Extensions>
    <md:ArtifactResolutionService Binding="urn:oasis:names:tc:SAML:2.0:bindings:SOAP" Location="https://idp.example.com/saml2/idp/ArtifactResolutionService.php" index="0"/>
    <md:SingleLogoutService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-Redirect" Location="https://idp.example.com/saml2/idp/SingleLogoutService.php"/>
    <md:NameIDFormat>urn:oasis:names:tc:SAML:2.0:nameid-format:transient</md:NameIDFormat>
    <md:SingleSignOnService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-Redirect" Location="https://idp.example.com/saml2/idp/SSOService.php"/>
  </md:IDPSSODescriptor>
  <md:Organization>
    <md:OrganizationName xml:lang="sv">ExempelU</md:OrganizationName>
    <md:OrganizationName xml:lang="en">ExampleU</md:OrganizationName>
    <md:OrganizationDisplayName xml:lang="sv">Exempel Universitetet</md:OrganizationDisplayName>
    <md:OrganizationDisplayName xml:lang="en">The Example University</md:OrganizationDisplayName>
    <md:OrganizationURL xml:lang="sv">http://www.example.com</md:OrganizationURL>
    <md:OrganizationURL xml:lang="en">http://www.example.com/english</md:OrganizationURL>
  </md:Organization>
  <md:ContactPerson contactType="administrative">
    <md:Company>Example University</md:Company>
    <md:SurName>Example helpdesk</md:SurName>
    <md:EmailAddress>[email protected]</md:EmailAddress>
  </md:ContactPerson>
  <md:ContactPerson contactType="technical">
    <md:Company>Example University</md:Company>
    <md:SurName>Example helpdesk</md:SurName>
    <md:EmailAddress>[email protected]</md:EmailAddress>
  </md:ContactPerson>
  <md:ContactPerson contactType="support">
    <md:Company>Example University</md:Company>
    <md:SurName>Servicedesk Example universitet</md:SurName>
    <md:EmailAddress>[email protected]</md:EmailAddress>
  </md:ContactPerson>
</md:EntityDescriptor>

enriquepablo avatar Jan 15 '25 11:01 enriquepablo

Detailed solution

The main requirements are:

  • By default, pyFF's behaviour doesn't change
  • We add options to the load and select pipes to not deduplicate entities that come from different sources and have the same entityID.

Code interventions

  • In the load pipe, entities from all sources are accumulated in a Store.entites dict keyed by entityID, here. With this proposal, if load receives a no dedup option, it would key the entities with both entityID and md_source.
  • In the select pipe we would use this PR.

Intended Usage

The objective of this proposal is to allow the MDQ service to pre-filter results according to certain entity attributes. In principle, these are: entity-category, entity-category-support, assurance-certification, and registrationAuthority. So the MDQ server, when it loads the entity metadata, will look for entities with the same entityID, will merge their values for the above attributes, and use the merged result to index a single copy of the entity. These attributes are not provided to the frontend, so the entity data sent to the frontend will not change.

Problems

  • The same entityID in 2 different federations may stand for different entities. This is a problem that stands right now, is not introduced by this change, so I'd consider it orthogonal to this question.
  • We are merging entity metadata that may be different due to policy. Again, I think this problem is orthogonal to this solution. As things stand right now, assuming that for example OpenAthens is sourced before SWAMID, a user of an IdP that is registered in both federations, and wants to access an OpenAthens SP, will receive the IdP entity metadata that was registered with SWAMID. Even more: we are already able to filter by metadata source. So if an SP chooses to pre-filter results by md_source=OpenAthens, it will receive the metadata registered with SWAMID for all entities registered in both.

enriquepablo avatar Feb 03 '25 13:02 enriquepablo

To insist on the above. SeamlessAccess is not SAML. To start with, there is a mismatch in the uniqueness of entityID's: in SAML, they are unique by federation, but SeamlessAccess wuold like them to be universally unique. So SeamlessAccess is a service on top of SAML that needs to deal with this mismatch, in the sense that better allows it to provide the intended service.

If pyFF is allowed to produce a JSON list of entities with duplicates for entities registered in more than one source, it won't be doing anything wrong in the SAML sense - there will be no merging of data from different sources.

Then it is thiss-mdq who will need to deduplicate the IdP list received from pyFF. But thiss-mdq will be serving for SeamlessAccess, so it does not need (or can) be fully SAML compliant. At this point this needs to be useful rather than compliant.

enriquepablo avatar Feb 03 '25 14:02 enriquepablo

I have a hard time following the changes needed for this. Can you write this up as a pull request? I have a feeling it might be pretty substantial changes to get this to work.

mikaelfrykholm avatar Feb 10 '25 11:02 mikaelfrykholm