hashtags icon indicating copy to clipboard operation
hashtags copied to clipboard

Searching for a hashtag is slow

Open eggpi opened this issue 4 years ago • 8 comments

As far as I can tell, searching for any hashtag seems to take multiple seconds and even times out sometimes. We should look into optimizing this.

As a simple first step, we could log the output of explain on the queryset used in searches to see if we're doing something funny like a full-table scan. For my own future reference, here's the documentation for interpreting that output.

It would also be good to experiment with a profiler to get a breakdown of the wall-clock time of a query. I think it's likely that the bottleneck is the database query (because the rest of the code is pretty standard use of Django), but we shouldn't take that for granted and this would allow us to identify other bad spots.

@Samwalton9 would you be able to try the above in the production environment please?

eggpi avatar Jul 04 '21 21:07 eggpi

I might not find time for this at the moment, but I'd be happy to get you a copy of the production database for local testing, if that would be helpful.

Samwalton9 avatar Jul 12 '21 12:07 Samwalton9

Sure, let's try that. If I can reproduce it like that, it's even better.

eggpi avatar Jul 12 '21 14:07 eggpi

https://drive.google.com/file/d/1iyX0EEmeJhqS_6jImG1Q4gu1_xLnAbmU/view?usp=sharing

Let me know when you've downloaded the file so I can reclaim some Google Drive storage space :)

Samwalton9 avatar Jul 12 '21 16:07 Samwalton9

Downloaded, thanks!

eggpi avatar Jul 12 '21 18:07 eggpi

I think there are two main sources of slowness here:

  • Missing database indexes for some important queries.
  • Accidental evaluation of Django QuerySets. We attempt to use pagination for search results, but we end up running both a paginated query (SQL with OFFSET / LIMIT clauses) and its non-paginated, slow version.

I have a couple of fairly simple patches to address those issues and, in local experiments, they make a big difference: we essentially go from always timing out when searching for the "Trending tags", to serving the search within 3 seconds or so.

The one thing I'm not proposing we try yet is an optimization within Django's pagination (that's my f277fe8a9d84cbc84c764e5644caf1f6c9ab5dbb), as that one actually has behavior changes to the tool other than the speed.

I'll open a pull request with the more conservative patches, let's see how they behave in production.

eggpi avatar Jul 23 '21 21:07 eggpi

Per https://github.com/WikipediaLibrary/hashtags/pull/71#issuecomment-886620302 we've just made huge gains in tool speed!

Samwalton9 avatar Jul 26 '21 11:07 Samwalton9

It looks like there's still some bad performance when querying for many hashtags at once (example query).

I need to check whether this is some other unoptimized code path, but at this point we might need to consider patches that change the UI (e.g. break the pagination of results).

eggpi avatar Jul 28 '21 07:07 eggpi

Things are looking a little better now I think. I'm seeing the test query load in around 20s, which is not great, but better than consisten 502s.

eggpi avatar Jul 29 '21 11:07 eggpi