`cmsImportUser --all` is slow for large number of users (~1000)
ImportUser logic has O(n**2) performance with the --all flag, due to processing the entire list of users once per each user. This degrades performance of large user imports, with imports of about 3000 users taking 1 second/user.
In theory current version isn't strictly O(n^2) it would be O(n) if format allowed easy retrieval of single user by having each user in separate file or stored them in db that can be efficiently queried.
I noticed this importer API limitation while working on lio-lv/cms#2 . User importer should have method for returning list of users. Not sure if it can be implemented in backwards compatible way without breaking existing loaders.
Mmm... We could try to call a method for returning all users and, if it's not implemented by the specific loader, we could just fallback to the current behavior (repeated single user fetches)
On Thu, Mar 1, 2018 at 9:13 AM, karliss [email protected] wrote:
In theory current version isn't strictly O(n^2) it would be O(n) if format allowed easy retrieval of single user by having each user in separate file or stored them in db that can be efficiently queried.
I noticed this importer API limitation while working on lio-lv/cms#2 https://github.com/lio-lv/cms/pull/2 . User importer should have method for returning list of users. Not sure if it can be implemented in backwards compatible way without breaking existing loaders.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cms-dev/cms/issues/874#issuecomment-369510372, or mute the thread https://github.com/notifications/unsubscribe-auth/ABOc8Yh9hSmIKqOLgaWJ1xnsidl5xvnZks5tZ62ngaJpZM4SXcWo .
Having users in separate files would be difficult for me as a cms user -- the best way for me is to automatically generate a single file in the italian format and import all users using the --all flag.
I was actually able to accomplish the import I wanted to accomplish yesterday by working out inactive users, which ended up with only 800 users, which uploaded in just a few minutes (the speed of upload was about 3 users/second).
Not sure about possible fixes, I'm not familiar with the codebase. My only contribution to the technical discussion would be entirely trivial and miss all the architectural complexity: Under the assumption that the user list fits in memory, and each user has unique username, we can load users into a python set/dictionary and retrieve them in O(1).
@picrin in your case, if you are generating the contest.yaml with the users, it would be easier to avoid that step and call in your generating script cmsAddUser, which should be quick.
@karliss I think the rationale is that user importers don't know about sets of user, which is a property of a contest.
I see two ways of solving this, one is @wil93's, the other is to allow importers to keep or share state, i.e., avoid reparsing the same info again and again.
The most subtle one to avoid touching anything else, would be to have a "cache" in yaml loader (when it sees a contest path, it keeps the parsed yaml (and a dictionary of users by username) in a class dictionary).
Or (slightly touching the importers) a contest loader could return a factory of user/task loaders, and at that point the contest loader being friendly with the user loader could pass the parsed yaml and the user index.