`GET /v2/projects_random` almost never returns the requested number of projects
Describe the bug
The projects_random endpoint on the Modrinth API is not returning the expected number of projects requested. When requesting a certain number of random projects through the endpoint, the number of projects returned is almost always lower than the requested count.
For instance, when requesting a single project with the count parameter set to 1, the endpoint sometimes returns no projects. For example: https://api.modrinth.com/v2/projects_random?count=1
The docs specify the count as "The number of random projects to return"
Steps to reproduce
- Make a GET request to the projects_random endpoint with a count parameter set to a specific number.
- See the number of returned projects in the response.
- Repeat the request several times to confirm that the number of projects returned is consistently lower than the requested count.
Expected behavior
The endpoint should return a list of random projects, where the number of projects returned should always match the specified count.
Additional context
Empty response with count=1

Response with count=20 but the number of returned projects is 3

It seems the route name really lives up to its name, huh 😂. I'm having the same issue.
The issue seems like it could originate from src/routes/projects.rs, line 51 through 63:
let project_ids = sqlx::query!(
"
SELECT id FROM mods TABLESAMPLE SYSTEM_ROWS($1) WHERE status = ANY($2)
",
count.count as i32,
&*crate::models::projects::ProjectStatus::iterator().filter(|x| x.is_searchable()).map(|x| x.to_string()).collect::<Vec<String>>(),
)
.fetch_many(&**pool)
.try_filter_map(|e| async {
Ok(e.right().map(|m| database::models::ids::ProjectId(m.id)))
})
.try_collect::<Vec<_>>()
.await?;
My guess is that the returned samples are correct, just that the searchability of the projects gets them filtered out, sometimes resulting in an amount of projects less than specified in the count argument.
Alternatively, the issue may be caused by something in the database itself. PostgreSQL documentation states:
This table sampling method accepts a single integer argument that is the maximum number of rows to read. The resulting sample will always contain exactly that many rows, unless the table does not contain enough rows, in which case the whole table is selected.
Like the built-in SYSTEM sampling method, SYSTEM_ROWS performs block-level sampling, so that the sample is not completely random but may be subject to clustering effects, especially if only a small number of rows are requested.
Note that I don't know a lot about databases, and that this may not be the cause.
I'm encountering this issue too (it's making my doctest assert fail).
Taking a look at the relevant code, I came to the same conclusion as HIHIQY1 did, I'm pretty sure the culprit is that the count cap is done before filtering out non-searchable mods.
I think the only way to fix this would be to somehow make the database itself filter out non-searchable mods within the query. That should technically be possible, but I have no knowledge of how databases work so I dunno.
PostgreSQL supports TABLESAMPLE for tables and materialized views only, so we have to create a materialized view and query on that instead:
CREATE MATERIALIZED VIEW searchable_mods AS SELECT id FROM mods WHERE status = ANY(ARRAY['approved', 'archived']);
SELECT id FROM searchable_mods TABLESAMPLE SYSTEM_ROWS(5);
Seems like there are two ways to go about this:
- Create a materialized view (fast, but have to setup a nightly cron job to refresh the table)
- ORDER BY random (slow and gets slower the more projects are added but no maintenance required)
This issue still exists and is most likely the cause of the empty random project showcase on the landing page.