Security change to ransackable attribute leads to potentially huge overhead when using postgresql
Application Context
We have a rails application currently running in production that used to be able to handle a metric ton of jobs with ease. We recently upgraded from Rails 7.0 to 7.1 (that was it's own can of worms) and ransack to v4.0
Before deployment we rarely if ever saw this query
SELECT a.attname, format_type(a.atttypid, a.atttypmod),
pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod,
c.collname, col_description(a.attrelid, a.attnum) AS comment,
a.attidentity AS identity,
attgenerated as attgenerated
FROM pg_attribute a
LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum
LEFT JOIN pg_type t ON a.atttypid = t.oid
LEFT JOIN pg_collation c ON a.attcollation = c.oid AND a.attcollation <> t.typcollation
WHERE a.attrelid = ?::regclass
AND a.attnum > ? AND NOT a.attisdropped
ORDER BY a.attnum
After release this query (among others) started popping up during peak hours around 15k-20k times per hour. This leads to huge performance impacts on our users when trying to navigate the website during those peak job hours. At first I thought something was wrong with resque, but eventually I came across this chain of events that I think is the culprit.
We use authorizable_ransackable_attributes for many of our models because they have no sensitive data.
-
in
authorizable_ransackable_attributesthere is a call tocolumn_names -
in
column_namesthere is a call tocolumns -
in
columnsthere is a call tocolumn_definitions -
in
column_definitionswe find the query mentioned before that started appearing exactly as we upgraded ransack
The reason why this appeared so intensely and might not have been noticed is that in a web scenario the schema caching might come into play reducing the need from rails to run the column_definitions sql query. But in jobs like Resque, which we use, connections are forked for each job => if you have 400 workers starting and ending every couple of seconds non-stop for an hour or so, you end up with a non-responsive frontend and extremely slow queries because the database seems to be saturated with these calls.
I know it sounds odd that a query as "basic" as this might cause extreme slowdowns to the database, but at this point this is the only explanation I have left as to why upgrading rails/ransack led to extreme slowdowns we are currently facing.
We're currently looking into ways to minimize the forking process whilst preserving the recommendations from resque (to fork each worker) but I'd love to have an alternative way to whitelist all columns in a table for ransack (maybe allow the use of Rails.cache to cache the authorizable_ransackable_attributes with a TTL so it uses the default rails caching which in our case is configured in to a redis cache ?)
Would be interested in knowing how other people are solving this. I'm working on upgrading a Rails 6 app to Rails 7 and taking Ransack from 2.x to 4.x, so this will probably be an issue unless we explicitly list each model's columns, which I guess is the most correct solution.
We were also hit by this wen upgrading from activeadmin 2 to version 3. We saw quite a degradation in our system, experiencing very slow load times and had to revert the change.
Basically, to maintain compatibility with the precious question we followed the recommendation of defining a ransackable_attributes method in the base application record, which called authorizable_ransackable_attributes.
Maybe authorizable_ransackable_attributes need to be cached in the adapter, just like ransackable_attributes is.