cms icon indicating copy to clipboard operation
cms copied to clipboard

`cmsImportUser --all` is slow for large number of users (~1000)

Open picrin opened this issue 7 years ago • 5 comments

ImportUser logic has O(n**2) performance with the --all flag, due to processing the entire list of users once per each user. This degrades performance of large user imports, with imports of about 3000 users taking 1 second/user.

picrin avatar Feb 28 '18 22:02 picrin

In theory current version isn't strictly O(n^2) it would be O(n) if format allowed easy retrieval of single user by having each user in separate file or stored them in db that can be efficiently queried.

I noticed this importer API limitation while working on lio-lv/cms#2 . User importer should have method for returning list of users. Not sure if it can be implemented in backwards compatible way without breaking existing loaders.

karliss avatar Mar 01 '18 08:03 karliss

Mmm... We could try to call a method for returning all users and, if it's not implemented by the specific loader, we could just fallback to the current behavior (repeated single user fetches)

On Thu, Mar 1, 2018 at 9:13 AM, karliss [email protected] wrote:

In theory current version isn't strictly O(n^2) it would be O(n) if format allowed easy retrieval of single user by having each user in separate file or stored them in db that can be efficiently queried.

I noticed this importer API limitation while working on lio-lv/cms#2 https://github.com/lio-lv/cms/pull/2 . User importer should have method for returning list of users. Not sure if it can be implemented in backwards compatible way without breaking existing loaders.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cms-dev/cms/issues/874#issuecomment-369510372, or mute the thread https://github.com/notifications/unsubscribe-auth/ABOc8Yh9hSmIKqOLgaWJ1xnsidl5xvnZks5tZ62ngaJpZM4SXcWo .

wil93 avatar Mar 01 '18 08:03 wil93

Having users in separate files would be difficult for me as a cms user -- the best way for me is to automatically generate a single file in the italian format and import all users using the --all flag.

I was actually able to accomplish the import I wanted to accomplish yesterday by working out inactive users, which ended up with only 800 users, which uploaded in just a few minutes (the speed of upload was about 3 users/second).

Not sure about possible fixes, I'm not familiar with the codebase. My only contribution to the technical discussion would be entirely trivial and miss all the architectural complexity: Under the assumption that the user list fits in memory, and each user has unique username, we can load users into a python set/dictionary and retrieve them in O(1).

picrin avatar Mar 01 '18 09:03 picrin

@picrin in your case, if you are generating the contest.yaml with the users, it would be easier to avoid that step and call in your generating script cmsAddUser, which should be quick.

@karliss I think the rationale is that user importers don't know about sets of user, which is a property of a contest.

I see two ways of solving this, one is @wil93's, the other is to allow importers to keep or share state, i.e., avoid reparsing the same info again and again.

The most subtle one to avoid touching anything else, would be to have a "cache" in yaml loader (when it sees a contest path, it keeps the parsed yaml (and a dictionary of users by username) in a class dictionary).

Or (slightly touching the importers) a contest loader could return a factory of user/task loaders, and at that point the contest loader being friendly with the user loader could pass the parsed yaml and the user index.

stefano-maggiolo avatar Mar 01 '18 11:03 stefano-maggiolo