Create Collection takes 30-45 seconds & sometimes times out
Describe the bug This issue is reproducible on both https://sandbox.dspace.org (running pre-8.0) and https://demo.dspace.org (running 7.x). When attempting to create a new Collection, the request always takes a long time (at least 30-45 seconds) and sometimes times out with a 504 Gateway Timeout error.
This issue only occurs for creation of Collections. Creating Communities or Items seems uneffected.
To Reproduce Steps to reproduce the behavior:
- Login as an Admin to either Sandbox or Demo
- In your browser, open up your DevTools -> Network tab (useful to watch the create step)
- In the Admin menu, choose "New -> Collection". Select any Community as the parent
- Enter a title and click "Save". The page will appear to not respond. Eventually either a 504 or 200 will return, but only after about 30-45 seconds.
- In your DevTools -> Network tab you'll see that a
POSTrequest was made to/server/api/core/collections?parent=[community-uuid]. It will remain "Pending" for the entire 30-45 seconds. - On Sandbox, the request will sometimes return as successful (200 OK), but only after about 30-45 seconds.
- On Demo, the request almost always times out (504 error returned). However, the Collection will have been created successfully behind the scenes. So, if you search for the collection after seeing a 504 error, you will find that it exists.
- In the backend
dspace.logfile, thePOSTrequest appears to succeed almost immediately. It's unclear why it stays "Pending" for so long, and whether the bug is in the frontend or backend.
- In your DevTools -> Network tab you'll see that a
Expected behavior Creating a Collection obviously should be as fast as creating a Community.
Related work
It's unclear what caused this bug, but the bug exists in both dspace-7_x and main. So, that implies it was caused by a bug fix which was applied to both branches.
Hi @tdonohue, reading the description of the issue, I think it could be related to a consumer configured to reload the submission configurations when a collection is edited/created. See:
https://github.com/DSpace/DSpace/blob/main/dspace/config/dspace.cfg#L833-L835
I think that this consumer could be refactored to avoid reload the configurations multiple times during the creation of a collection, moving the reload process to the end() function.
I am not able to reproduce it locally, but I suppose that it happens because in demo and sandbox reloading the submission configurations takes more time.
I will send a PR with the proposed changes to see if it could solve or mitigate the issue.
@toniprieto : That would be wonderful! I had actually just realized myself that the likely cause is the SubmissionConfigConsumer, but I had not determined how to fix it. If you could create a quick PR to help mitigate the issues, that would be very much appreciated! I'd ensure it gets reviewed/tested quickly.
NOTE: After applying the fix in #9462, the "Create Collection" step is much faster... though it's still slightly slower than I'd like to see. It now takes ~6 seconds instead of ~30-45 seconds.
We may need to consider ways to speed this up further...or possibly run the submission config "reload" behind the scenes (instead of waiting on it to complete before returning). But, for now, the significant performance issue is fixed.
There is an issue https://github.com/DSpace/DSpace/issues/9402 that is addressed with: https://github.com/DSpace/DSpace/pull/9415 . This PR fixes an issue that limit you to a maximum of 10 collections per entity type (default solr rows limit). But that one isn't in DS source yet.
By choosing Solr as a solution for enabling you to set a form based on the defined entity types at collections, we intended to speed results. Even with https://github.com/DSpace/DSpace/pull/9415 30-45 seconds (considering we are using solr) it seems too much time. Perhaps you have a big repository (with a large number of collections) with a large number of different entities types, that could explain.
Also, does the repository changes the configuration?
event.consumer.submissionconfig.filters = Collection+Modify_Metadata
Currently, with our local migrated repositories we didn't verified this issue. With our local repositories it takes about 15 sec to create a new collection and 5 sec to effectively edit it. Perhaps the issue could also be with policy creation.
A better implementation to the one I did with forms loading, with some associated effort, would be to also consider an alternative way how submission forms are made available. Currently, it's based on the configured collections, specifically on the collection's Handle, but, perhaps it makes sense to do a runtime validation based on the entity type for the provided collection.
@paulo-graca : I can verify the behavior here was on sandbox.dspace.org and demo.dspace.org. Those sites both use default settings in dspace.cfg with regards to the event.consumer.submissionconfig.filters.
I've not yet narrowed down why this submissionconfig consumer is adding so much time to the Collection creation process, but it is down to ~6 seconds now (after merging https://github.com/DSpace/DSpace/pull/9462). That's a big improvement over 30+ seconds, but I'd still like to see if we can make it faster (as Communities are created much more quickly, for example).
I'll take a closer look at https://github.com/DSpace/DSpace/pull/9415 as well to see if there are improvements there...but at a glance, I'm not sure whether that will impact performance. It does seem like an important bug fix though which we should get into 8.0 and 7.x
@paulo-graca I think it's a good idea to implement what you suggested in your last comment. In version 8.x, a change that allows configuring forms at the community level has been included (DSpace/DSpace#9259). With this change, the function that returns the Item Submission process used by a collection receives a collection as a parameter (previously it received a handle). See:
https://github.com/DSpace/DSpace/blob/main/dspace-api/src/main/java/org/dspace/app/util/SubmissionConfigReader.java#L228
Having the collection makes it simpler to retrieve the entity type and read a map with the relation between entityTypes and item submission process that could be built during the initial load. This way, there would be no need to perform a submission config reload when a collection is edited/created and the issue DSpace/DSpace#9402 should also be resolved.
@paulo-graca @tdonohue I've had time these days to implement the approach described in the last comment, and I've sent a PR: DSpace/DSpace#9478 The implementation is very similar to DSpace/DSpace#9259 Could you take a look to see if this new approach is appropriate?