SimpleIdServer icon indicating copy to clipboard operation
SimpleIdServer copied to clipboard

[SCIM] Feature Request - make patch faster by returning 204

Open danflomin opened this issue 2 years ago • 17 comments

Hello

During my work on improving the performance of SCIM, especially the PATCH method for groups, I saw that for each PATCH request, we query the entire group representation from the DB. This takes a long time when the group has many members (can actually take most of the running time).

Since most of the PATCH requests for groups is about adding\removing members, it would be much more efficient to not load all the data into memory. I think we should check each PatchOperation separately, such that for:

  • Add member operation - check if the user is already a member. If it is, then do nothing. Otherwise, only add the relevant attributes.
  • Remove member operation - similarly.

Since it is common for vendors to have groups with many members, and to send PATCH requests in bursts, it will be much better for performance to do it this way.

The SCIM RFC allows returning 204 in case of successful PATCH. image

I will appreciate this very much. What do you think on this?

Kind regards Dan

danflomin avatar Aug 29 '23 11:08 danflomin

Hello,

I'm working on a solution to fix this problem.

KR,

SID

simpleidserver avatar Aug 29 '23 18:08 simpleidserver

Todo List :

  • Refactor the class RepresentationHelper, don't retrieve the entire representation : DONE
  • Refactor the class RepresentationReferenceSync, don't retrieve the entire representation : DONE
  • Update all the UTS : DONE
  • Adapt "EF" project : DONE
  • Execute "integrations tests" on EF and check "performance" : DONE
  • Adapt "MongoDB" project : DONE
  • Create migration script : DONE

simpleidserver avatar Sep 02 '23 19:09 simpleidserver

Hello, I'm trying to do a real test with more than 2k users in one single group and I noticed that the response time increases every time you put new user.

image

Now I have only 150 users in one group, in my scenario I need to manager 10k users in one single group.

I'm using last version of 4.0.4 branch locally

gabrielemilan avatar Sep 05 '23 12:09 gabrielemilan

Hello,

The logic used to add a member to a group takes too much time because, in the previous version, all the 'members' are retrieved. I have made some modifications to address this issue in the 'release/4.0.4' branch, and the performance is now better!

Kind regards,

SID

simpleidserver avatar Sep 05 '23 15:09 simpleidserver

I tried and it seems faster now, well done!

I tried also the get user by userName and I suggest adding this index on mongo:

db.scimRepresentationAttributes.createIndex( { RepresentationId: 1, SchemaAttributeId: 1, ValueString: 1 }, { name: "RepresentationId_1_SchemaAttributeId_1_ValueString_1" } );

This is the result:

Before image

After image

gabrielemilan avatar Sep 05 '23 16:09 gabrielemilan

Hello @danflomin & @gabrielemilan ,

All the changes have been committed to the release/4.0.4 branch, and the performance has improved. I made an effort to consistently load the minimal subset of attributes used during PATCH, POST, and DELETE operations.

However, there have been some changes in the database, including the addition of two new columns in the SCIMRepresentationAttribute table: ComputedValueIndex and IsComputed.

I have written a small documentation explaining how to migrate from version 4.0.3 to 4.0.4. You can find the documentation at the following link: https://github.com/simpleidserver/SimpleIdServer/blob/release/4.0.4/website/docs/migrations/403to404.md

KR,

SID

simpleidserver avatar Sep 05 '23 20:09 simpleidserver

I noticed performance issue using this patch, trying to remove user from group:

url: v2/Groups/d8c7ac0d-68a4-4d4a-bf01-afcc9a907e11

{ "schemas": [ "urn:ietf:params:scim:api:messages:2.0:PatchOp" ], "Operations": [ { "op":"remove", "path": "members[value eq"00179e8b-1c75-45e5-b0af-a02ebf4b55fd"]" } ] }

when I have a lot of users in one group the process doesn't end.

gabrielemilan avatar Sep 06 '23 13:09 gabrielemilan

Indeed, there is an issue in the function 'SCIMRepresentationCommandRepository.FindAttributes.' It executes the following MongoDB instruction and causes a performance problem. I rewrote the algorithm, and the code is available in the 'release/4.0.4' branch. Can you please try it again?


[
  {
    $project: {
      _outer: "$$ROOT",
      _id: 0,
    },
  },
  {
    $lookup: {
      from: "representationAttributes",
      localField: "_outer._id",
      foreignField: "ParentAttributeId",
      as: "_inner",
    },
  },
  {
    $project: {
      a: "$_outer",
      Children: "$_inner",
      _id: 0,
    },
  },
  {
    $match: {
      "a.RepresentationId":
        "21c11c48-7418-4307-9da2-ff524947e3fe",
    },
  },
  {
    $project: {
      Attribute: "$a",
      Children: {
        $map: {
          input: "$Children",
          as: "c",
          in: {
            Attribute: "$$c",
          },
        },
      },
      _id: 0,
    },
  },

  {
    $match: {
      $and: [
        {
          "Attribute.SchemaAttributeId":
            "8e5b5165-b130-4a7e-8e9e-f359095cbc74",
        },
        {
          $expr: {
            $anyElementTrue: {
              $map: {
                input: "$Children",
                as: "v__1",
                in: {
                  $and: [
                    {
                      $eq: [
                        "$$v__1.Attribute.SchemaAttributeId",
                        "13f2bf74-79e1-4afe-9e2f-6996ef4251b9",
                      ],
                    },
                    {
                      $eq: [
                        {
                          $toLower: {
                            $ifNull: [
                              "$$v__1.Attribute.ValueString",
                              "",
                            ],
                          },
                        },
                        "a0be941f-4a23-493f-b200-06fdb9a98bcc",
                      ],
                    },
                  ],
                },
              },
            },
          },
        },
      ],
    },
  },
]

simpleidserver avatar Sep 06 '23 14:09 simpleidserver

Yes, now it works!

Thanks.

gabrielemilan avatar Sep 06 '23 14:09 gabrielemilan

It looks like it's somehow worse now. I am running a patch on a group with 17K users adding another 100 users and the operation takes 20 + minutes and counting. Using EF.

KR Yoav

yoavyanuka avatar Dec 24 '23 12:12 yoavyanuka

Update: Looks like it works ok on a newly created group (after the fix) but an old group is getting stuck on patch

yoavyanuka avatar Dec 24 '23 19:12 yoavyanuka

Update 2: Looks like it's doing countless calculations around the SyncIndirectReferences specifically this for loop:

foreach (var pa in parentAttrs) {
    if (!children.Any(r => r.ParentAttributeId == pa.Id && r.SchemaAttributeId == valueAttr.Id && r.ValueString == newSourceScimRepresentation.Id))
        result.AddReferenceAttributes(BuildScimRepresentationAttribute(pa.RepresentationId, propagatedAttribute.TargetAttributeId, newSourceScimRepresentation, Mode.PROPAGATE_INHERITANCE, newSourceScimRepresentation.ResourceType, targetSchema, false));
}
                                }

yoavyanuka avatar Dec 24 '23 19:12 yoavyanuka

Hello,

There is a breaking change in the SCIM library between versions 4.0.3 and 4.0.4.

Specifically, two columns have been added to the SCIMRepresentationAttribute :

  1. ComputedIndexValue: This column now stores a computed value for each attribute. In the case of complex attributes, the value is the concatenation of its children's values. This value plays a crucial role when the SCIM server is checking the uniqueness of the attribute.

  2. IsComputed: This column indicates whether the attribute is automatically computed or not. For instance, the attribute groups.type is always computed by the server and only has two possible values, direct or indirect.

Upon adding a new group, both columns are correctly populated. However, for older groups, a migration process must be executed. For detailed information about the migration process, please refer to the documentation available at: https://simpleidserver.com/docs/migrations/403to404/

KR,

SID

simpleidserver avatar Dec 25 '23 08:12 simpleidserver

I followed the instructions. The migration looks to add the new columns with default value null and false not populate it. How do we need to populate current rows?

KR, Yoav

yoavyanuka avatar Dec 26 '23 19:12 yoavyanuka

Thank you for your response on this ticket @thabart .

You are the best 😄

I somehow missed the commented code in the Startup.cs file.

I'll try to run it and report on how the migration went.

Kind regards Dan

danflomin avatar Dec 27 '23 19:12 danflomin

Hi @thabart

I have 2 questions:

Should this part be without a filter? I thought that the ComputedValueIndex should exist only when IsComputed is true.

var groupedAttributes = context.SCIMRepresentationAttributeLst.Include(a => a.SchemaAttribute).GroupBy(a => a.RepresentationId);

And also, do you think it'll be possible to convert this migration into a SQL script? I believe it'll be much faster.

Kind regards Dan

danflomin avatar Dec 27 '23 19:12 danflomin

When the property IsComputed is set to true, the attribute is automatically calculated by SCIM. Examples include the properties 'type' and 'display'.

The property ComputedValueIndex is a concatenation of the children's values. SCIM utilizes it to enhance the execution of the comparison algorithm between two representations.

Both properties serve different needs; therefore, the following part must be without a filter:

var groupedAttributes = context.SCIMRepresentationAttributeLst.Include(a => a.SchemaAttribute).GroupBy(a => a.RepresentationId);

While it is possible to migrate the logic into a SQL script, there is a risk of regression because all the .NET logic must be translated :)

simpleidserver avatar Dec 27 '23 21:12 simpleidserver