postgres_scanner icon indicating copy to clipboard operation
postgres_scanner copied to clipboard

LIKE translation is collate sensitive

Open sisidra opened this issue 5 months ago • 1 comments

What happens?

Consider query:

select endpoint from logs where endpoint like '/api/%'

It translates to following PostgreSQL queries (got from SET pg_debug_show_queries = TRUE):

BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ

COPY (SELECT "endpoint" FROM "postgres"."logs" WHERE ctid BETWEEN '(0,0)'::tid AND '(1000,0)'::tid AND ("endpoint" >= '/api/' AND "endpoint" < '/api0')) TO STDOUT (FORMAT "binary");

COPY (SELECT "endpoint" FROM "postgres"."logs" WHERE ctid BETWEEN '(1000,0)'::tid AND '(4294967295,0)'::tid AND ("endpoint" >= '/api/' AND "endpoint" < '/api0')) TO STDOUT (FORMAT "binary");

COMMIT

Looks almost reasonable, only text comparison in Postgres is collate sensitive. See:

select
    '/api/something' < '/api0' COLLATE "en_US.utf8" AS collate_en_us_wrong,
    '/api/something' < '/api0' COLLATE "C" AS collate_c_correct,
    '/api/something' < '/api0' COLLATE "unicode" AS collate_unicode_correct,
    '/api/something' LIKE '/api/%' AS like_correct

Output:

collate_en_us_wrong collate_c_correct collate_unicode_correct like_correct
f t t t

As a result we get incorrect results - for example like '/%' yields no rows and like 'a%' yields all rows for table containing absolute paths.

To Reproduce

  1. Attach to postgresql
  2. Ensure default collate is "en_US.utf8"
  3. Match path-like data with LIKE '/%

OS:

MacOS Sequoia

PostgreSQL Version:

PostgreSQL 17.5 (Debian 17.5-1.pgdg120+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit

DuckDB Version:

v1.3.2 (Ossivalis) 0b83e5d2f6

DuckDB Client:

v1.3.2 (Ossivalis) 0b83e5d2f6

Full Name:

Mārtiņš Kalvāns

Affiliation:

NymTech

Have you tried this on the latest main branch?

  • [x] I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • [x] I agree

sisidra avatar Aug 25 '25 12:08 sisidra

Here is my exploration trace that might or might not help:

I have no context nor knowledge if the fix is in this repo or duckdb, or both. Nor is solution restructuring optimizations, adding collations or anything else.

Hope it helps.

sisidra avatar Aug 26 '25 09:08 sisidra