spring-data-relational icon indicating copy to clipboard operation
spring-data-relational copied to clipboard

Introduce abstractions to define custom criteria predicates

Open Wuzhengyu97 opened this issue 1 year ago • 10 comments

I am using Spring Data R2DBC with a PostgreSQL database. My goal was to use CriteriaDefinition to describe a query for a column of type text[] in PostgreSQL. Specifically, I wanted to perform array-related operations such as checking if the array contains a value (using the PostgreSQL @> operator).

For example, I expected to write something like this:

Query.query(Criteria.where("tags").contains("electronics"));

However, I could not find any built-in support for such array operators in CriteriaDefinition. This left me wondering if I have missed some part of the documentation or if this functionality is not supported.

The lack of array operator support means that for now, I would have to resort to raw SQL for such queries, like this:

String sql = "SELECT * FROM products WHERE tags @> ARRAY['electronics']::text[]";
databaseClient.sql(sql).all();

This feels frustrating, especially because I want to leverage the abstraction of R2dbcEntityTemplate and avoid writing raw SQL. Is there currently a way to describe such queries using CriteriaDefinition, or are array-related operations not yet supported for PostgreSQL?

I would greatly appreciate any insights or guidance on this topic.

Wuzhengyu97 avatar Nov 28 '24 13:11 Wuzhengyu97

You are correct, there is currently no special support for this operator.

I'm also not sure if it makes sense to support all the operators (and functions) that one database or another supports.

Instead we should come up with a way to define arbitrary criteria, or extend Criteria in a simple way, so that users can provide there own operators and functions.

schauder avatar Nov 28 '24 14:11 schauder

@schauder

I believe the most straightforward and simple solution would be to add a new constructor method to Criteria, allowing developers to pass in custom string conditions, which can then be parsed into the framework's internal Condition object. For example, an expression like tags @> ARRAY['electronics']::text[], which essentially represents a database comparison operation, can be mapped to the Comparison type within the framework. This way, users can conveniently utilize database-specific operators (such as PostgreSQL's @>), without breaking the existing structure and design principles of Criteria.

At its core, I believe the framework should at least provide users with an option to support custom operators, rather than limiting them to the predefined set of operators. This would not only enhance the flexibility of the framework but also meet the demands of users dealing with specific database features or complex business scenarios. At the same time, it would preserve the ability to generate and control the underlying SQL.

To be candid, I'm not very familiar with this framework yet, and I've only taken a brief look at its code. Therefore, this is just an initial idea based on intuition. If there's anything inappropriate in it, I would appreciate some feedback and guidance. I strongly believe that having a mechanism to support custom operators would be a great complement to the framework's extensibility.

Wuzhengyu97 avatar Nov 28 '24 16:11 Wuzhengyu97

Perhaps I can provide a PR to try implementing this feature? @schauder

Wuzhengyu97 avatar Dec 03 '24 09:12 Wuzhengyu97

That would be nice.

schauder avatar Dec 04 '24 07:12 schauder

Taking a step back from the actual problem there is a problem space around database-specific operations that we would like to surface through Criteria. Arrays with Postgres is just one instance, another one is JSON query support. Some concepts would also apply to SELECT projections, but let's face one aspect at a time.

We cannot provide all operators through the generic Criteria API as running queries against a different database that doesn't support these would just not work.

Likely, a user would like to write something like:

Criteria.where("tags").satifies(Postgres.contains("electronics"))

Criteria.where("tags").satifies(Postgres.contains(Postgres.array("electronics")))

rendering into:

tags @> ARRAY[?]::text[]

Bear with me, I'm not exactly sure whether parameter binding works inside the ARRAY[…] operator, however this is to depict the general idea.

To make this work, we need predicates to apply mapping, specify binding parameters and to generate SQL text.

Going further, one might would want to express:

json_array_length(my_json_column) > 5

When considering an abstraction for predicates, we should also spend a bit of time to consider how the example above could be represented and what pieces it would require to make it work. Java code could look like the following:

Criteria.where(Postgres.jsonArrayLength("tags")).greaterThan(5)

So these are my thoughts around dialect-specific predicates so far.

mp911de avatar Jan 17 '25 08:01 mp911de

Thank you, @mp911de, that makes sense. And I generally agree with you.

However, we need to understand, that there are a lot of possible predicates that are vendor specific, and it would certainly make sense to support with type safe built-in API, that you depicted. That totally makes sense to me. However, my concern is that we need to have both approaches - hear me out, please**.

Sometimes, the expression is just really tooooo complex, and we just cannot handle it via our type safe API. Apart from that, if we would add support for Postgres Arrays and JSONB, other people would request features similar to that. That is often the case. We can provide this solution as a final fallback if we either cannot implement the type safe API for this exact vendor's feature, or we consider it not to be reasonable. At least we have a fallback solution, and we do not just leave users with nothing - that is my point.

The people want to get their things done, and type safe API is definitely better, but maybe let's add it on top of this solution?

mipo256 avatar Jan 20 '25 10:01 mipo256

However, my concern is that we need to have both approaches

Yeah, I think about a general foundation that enables building custom predicates and then, as we progress with our implementation, we would add operators and predicates that we support.

The general fallback is a fully custom SQL statement for now. We can explore other alternatives as we progress that allow a smaller scope than the full statement.

mp911de avatar Jan 20 '25 10:01 mp911de

The problem with custom SQL via @Query for instance is that it is not dynamic.

I'm an active user of the Spring Data JDBC myself and my colleages are as well, and we, as many others, use the Criteria API when we need to dynamically add conditions to the SQL, in other words, the SQL statement may not contain the condition tags @> ARRAY[?]::text[] at all, in some cases. So we do not really have anything to address that - @Query is not really a good fit here. JdbcTemplate and etc also is not a good fit.

mipo256 avatar Jan 20 '25 10:01 mipo256

We iterated on this topic and came up with ideas how we could express dialect-specific operators, conditions, projections and sorting. As initial draft, we could imagine to start with something alike:

// tags @> ARRAY['electronics', 'gaming']
where("foo").as(PgSql::operators).contains(PgSql.arrayOf("electronics", "gaming"));
		
// properties#>'{a,b}' @> ARRAY['electronics', 'gaming'], specifically ARRAY[?, ?]
PgSql.where(PgSql.json("properties").path("a,b")).contains("electronics", "gaming");

// properties#>>'{a,b}' @> '["electronics", "gaming"]'
PgSql.where(PgSql.json("properties").path("a,b").asString()).contains("[\"electronics\", \"gaming\"]");


// SELECT embedding <-> '[3,1,2]' AS distance
Query.projection(Projection.of(PgSql.vector("embedding").distanceTo(L2, Vector.of()).as("distance")));

// SELECT 1 - (embedding <-> '[3,1,2]') AS distance
Query.projection(Projection.of(Literal.of(1).subtract(PgSql.vector("embedding").distanceTo(L2, Vector.of()).nest()).as("distance"))

// embedding <-> '[3,1,2]' < 0.5
PgSql.where("embedding").distanceTo(L2, Vector.of()).isLessThan(0.5);
PgSql.where("embedding", it -> it.distanceTo(L2, Vector.of())).isLessThan(0.5);

// properties -> 'active'::boolean = true
PgSql.where("properties", it -> it.field("active").asBoolean()).isTrue();

// properties -> 'active'::boolean
PgSql.where("properties", it -> it.field("active").asBoolean());

// WHERE active
where("active");

// Escape hatch: sales @> employee_data->'skills'
where("sales").satisfies((context) -> {
		// context contains the target property/column, type, access to the converter and bindings
		// context.writeValue(value)
		// context.bind(context.convert(context.getTargetType(), value))
		BindMarker bindMarker = context.bind("skills");

	return Comparison.create(context.lhs(), "@>", Expressions.just("employee_data->%s".formatted(bindMarker));
});

We want to consider the following:

  • Reduce noise when expressing functions
  • Use as much bind markers as possible
  • Call out unsafe/unsanitized property/column usage (where("1 == 1 – DROP BOBBY TABLES") for where, projections, and order by in the docs
  • Easy opt-int (where(…).as(PgSql::operators), PgSql.where(…))
  • Provide an easy escape-hatch for operations that we do not provide as API

This is a broader topic than initially anticipated. We would like to cover in the first iteration:

  • Support for JSON, Arrays, Vector Search and Casting (unary operators, CAST(col AS type), foo::json)
  • Provide a foundation for dialect-specific extensions
  • Use Postgres to validate our design

Going forward, we would like to host only a reduced set of dialects so that other developers can build support for operators and expressions on top of our infrastructure as we cannot maintain all sorts of dialects.

mp911de avatar Jun 18 '25 09:06 mp911de

see the progress on this issue, thank you!

Regarding the database method, is it easier to implement by using a dedicated API for splicing?

vnobo avatar Jun 26 '25 06:06 vnobo