Inefficient optimization
I encountered a case where Linq.Expression.Optimizer was applied to a logical expression, but the resulting optimized expression was not fully simplified to its minimal form. After manual analysis, I found that the expression can be reduced further, which suggests that the optimizer is not performing complete logical simplifications in this case.
Original Expression The following logical expression is my original input:
c => (
(
(
(
(!c.IsRestricted && !c.IsAnonymized) &&
!c.IsDeleted
) ||
(
(c.IsRestricted && !c.IsAnonymized) &&
!c.IsDeleted
)
) ||
(
c.IsAnonymized && !c.IsDeleted
)
) ||
c.IsDeleted
)
Optimized Expression (Generated by Linq.Expression.Optimizer)
c => (
(
(
!(
(c.IsRestricted || c.IsAnonymized) ||
c.IsDeleted
) ||
(
(c.IsRestricted && !c.IsAnonymized) &&
!c.IsDeleted
)
) ||
(
c.IsAnonymized && !c.IsDeleted
)
) ||
c.IsDeleted
)
While this version is slightly rewritten, it is not fully optimized.
Manual Simplification By manually simplifying the logical structure, we can reduce it to:
c => (!c.IsDeleted || c.IsDeleted)
which always evaluates to true.
Expected Behavior The optimizer should ideally recognize that the entire expression simplifies to true and return c => true instead of keeping unnecessary logical operations.
Environment Linq.Expression.Optimizer Version: 1.0.24 .NET 9 Windows 11
Reproducible demo:
using System;
using System.Linq;
using LinqKit;
class Program
{
static void Main()
{
var statuses = Enum.GetValues(typeof(ClientStatus)).Cast<ClientStatus>();
var predicate = PredicateBuilder.New<Client>(false);
foreach (var status in statuses.Distinct())
{
predicate = status switch
{
ClientStatus.Active => predicate.Or(c => !c.IsRestricted && !c.IsAnonymized && !c.IsDeleted),
ClientStatus.Restricted => predicate.Or(c => c.IsRestricted && !c.IsAnonymized && !c.IsDeleted),
ClientStatus.Anonymized => predicate.Or(c => c.IsAnonymized && !c.IsDeleted),
ClientStatus.Deleted => predicate.Or(c => c.IsDeleted),
_ => throw new ArgumentOutOfRangeException(nameof(statuses), status, $@"Unsupported ClientStatus: {status}")
};
}
// Optimize expression
var optimized = OptimizeExtension.Optimize<Func<Client, bool>>(predicate);
Console.WriteLine("Original Expression:");
Console.WriteLine(predicate);
Console.WriteLine("\nOptimized Expression:");
Console.WriteLine(optimized);
}
}
public class Client
{
public bool IsRestricted { get; set; }
public bool IsAnonymized { get; set; }
public bool IsDeleted { get; set; }
}
public enum ClientStatus
{
Active,
Restricted,
Anonymized,
Deleted
}
Demo output
Original Expression:
c => (((((Not(c.IsRestricted) AndAlso Not(c.IsAnonymized)) AndAlso Not(c.IsDeleted)) OrElse ((c.IsRestricted AndAlso Not(c.IsAnonymized)) AndAlso Not(c.IsDeleted))) OrElse (c.IsAnonymized AndAlso Not(c.IsDeleted))) OrElse c.IsDeleted)
Optimized Expression:
c => (((Not(((c.IsRestricted OrElse c.IsAnonymized) OrElse c.IsDeleted)) OrElse ((c.IsRestricted AndAlso Not(c.IsAnonymized)) AndAlso Not(c.IsDeleted))) OrElse (c.IsAnonymized AndAlso Not(c.IsDeleted))) OrElse c.IsDeleted)
Bonus The explanation for binary logic optimization for this particular case: https://chatgpt.com/share/67d0ce83-e1dc-8004-a36d-94be9b45ab00
There is a configuration that tells which rules to use, which you can override: https://github.com/Thorium/Linq.Expression.Optimizer/blob/eddd78818ade888ebe1d0bfb771d521b5c0896bd/src/Code/ExpressionOptimizer.fs#L473
The issue here is that if you use all the rules, then there is a possibility that some expressions start to oscillate. By that, I mean if A is optimized to B, and B is optimized to A, then it's a never-ending loop.
Maybe those expressions could be in use only in as part of some larger scenario where the oscilation won't then happen.
For my use case, these are simply filters for an entity. Filters are user inputs that frequently access the same members, and my goal with this library is to reduce the number of checks performed by the database. I believe it would be beneficial if the library could determine efficient way of minimizing the expression.
Currently, I'm unsure how to improve performance due to the limited public API, and I don't see how to utilize reductionMethods to enhance behavior. I don't write in F#, so I wish I could contribute more.
Regarding the oscillation you mentioned, one possible approach is to treat all optimizations as a graph - storing already visited states during optimization to avoid re-entering the same state.
Another option is to build Perfect Conjunctive Normal Form of all Boolean variables, but in this case, it's not clear how to deal with non-Boolean expression members, functions, comparisons, etc.
I'm looking into your example code, and I'm facing a few issues:
- You construct different lambdas with
c => ... c.IsDeletedand the parametercis not the same object between different lambdas (it returns different GetHashCode). As the current expression-optimizer just has some parameter ruleswhere p = p', the parameters have different objects, and that check fails, so the rule doesn't hit. This could be fixed by saying, "ok, let's also match the expression parameter on the member name and expression level. This would work but could cause some side-effects if then produced weird queries where variables are re-used (something likefrom p in persons where p.Age > 2 group p.car by p.Age into p). Maybe no-one does that. I'll have to consider this bit more. It would go like this:
let internal propertyMatch (p:Expression) (p2:Expression) =
if p.NodeType <> p.NodeType then false
else
match p.NodeType, p, p2 with
| ExpressionType.MemberAccess, (:? MemberExpression as pe1), (:? MemberExpression as pe2) ->
(pe1.Member = pe2.Member) && (pe1.Expression = pe2.Expression)
| _ -> false
and used like when p = p' || propertyMatch p p'.
- When doing the optimization, the current not-used associate rule is just wrong (it does nothing). It should be as follows:
let associate = function
| Or' (Or' (l, r), r') -> Expression.OrElse(l, Expression.OrElse(r, r')) :> Expression
| Or' (l, Or' (l', r)) -> Expression.OrElse(Expression.OrElse(l, l'), r) :> Expression
| And' (And' (l, r), r') -> Expression.AndAlso(l, Expression.AndAlso(r, r')) :> Expression
| And' (l, And' (l', r)) -> Expression.AndAlso(Expression.AndAlso(l, l'), r) :> Expression
| noHit -> noHit
For example (a || (b || c)) -> ((a || b) || c)
However, these currently unused rules do produce another potential issue of evaluation order.
Here is an example, let's say a = y.Col1 > 1 and b = x == null and c = x.Col4 > 10
Now, the original rule would be: (y.Col1 > 1 || (x == null || x.Col4 > 10))
...but after the associate-rule run the version would be:
((y.Col1 > 1 || x == null) || x.Col4 > 10)
...and this version could potentially cause some kind of null-reference-exception, if x is used before null-check.
-
Perfect Conjunctive Normal Form could sound like a nice idea if LINQ
ORwould support a list / params-array, but it doesn't. That's why for example case!a || !b || !c || !d, with is a normal form:OR(OR(OR(NOT(a),NOT(b)),NOT(c)),NOT(d))But after deMorgan-rule the expression-tree is smaller with way fewer operations:NOT(OR(OR(OR(a,b),c),d) -
The example you set was specifically nasty, because in theory you can have e.g. 10 different .IsSomething columns and the combinatory tree of conditions grow exponentially. This means the tree traversal will take exponential time, and this expression optimiser has to be working on smaller hardware and still operate faster than the ChatGPT solver.
Some issues are addressed now in the new version. However, there is still a limit to tree-parsing.
May I ask why do you need to have your conditions in full disjunctive normal form? Like a condition "ClientStatus.Anonymized => predicate.Or(c => c.IsAnonymized && !c.IsDeleted)," I would expect the deleted flag to not affect the boolean of whether it is anonymized or not.
I agree these filters might be opinionated, but they are based on the flow of GDPR compliance. We need to anonymize user after, let's say one year, but can keep all his non-personal data for statistics purposes. The deleted user is still kept in the system just for referential integrity with other entities or for investigations.
For the filters logic every boolean flag does not depend on any other, except for Active - it is based on all others. The client could be deleted, but not anonymized. Anonymization is performed after some period of time, but deletion can be performed manually. But for administrator who observes clients, it's barely needed to see deleted or anonymized users if he is looking for Active or Restricted accounts.
Anyways these filters are part of current business logic and are subject to change in future if reconsidered. The point of this issue was all about optimization mechanics.
By the way, I tested my initial code snippet with Linq.Expression.Optimizer 1.0.26, but it made no difference from v1.0.24.
Hi, just pushed a version 1.0.28 which should do this way better.
Exciting news - thank you! This looks much better.
I’m not sure what your plans are for this project, but I see significant potential here. If you’re interested, I’d be happy to contribute more test cases (C# only) as a token of appreciation.
I’ve already started some testing and found two relatively simple cases that currently fail:
new()
{
Name = "Remove mutually exclusive condition",
Input = c => c.IsDeleted || (c.LastName == "John" && c.LastName != "John"),
Expected = c => c.IsDeleted
},
new()
{
Name = "Simplify always-true disjunction with flattened conditions",
Input = c =>
c.IncorrectLoginAttempts > 5 ||
(!c.IsRestricted && !c.IsAnonymized && !c.IsDeleted) ||
(c.IsRestricted && !c.IsAnonymized && !c.IsDeleted) ||
(c.IsAnonymized && !c.IsDeleted) ||
c.IsDeleted,
Expected = c => true
},
https://github.com/ClassTerr/Linq.Expression.Optimizer.Tests/blob/main/Issue18.Tests/ExpressionOptimizerTests.cs
It’s not a blocker for me right now, but let me know if you’d like me to continue exploring similar issues. Otherwise - feel free to close the issue.
Thanks, definitely I think we can improve this.
I would like to have Remove mutually exclusive condition but... let's see how it would work with basic OR:
c.LastName == "John" || c.LastName != "John" -> true, that's easy to see.
c.LastName == "John" || something || something2 || c.LastName != "John" -> true, ok then, it's true.
graph TB;
A((OR))-->B((OR))
A-->C((OR));
B-->E1(("=="))
B-->F([Something])
C-->H([Something2])
C-->I1(("!="))
E1-->E11(["c.LastName"])
E1-->E12(["John"])
I1-->I11(["c.LastName"])
I1-->I12(["John"])
So to support these 4 properties (3 ORs) you'd need already 17 conditions:
| Or'(left, right)
| Or'(left, Or'(right,_))
| Or'(left, Or'(_,right))
| Or'(Or' (left, _), right)
| Or'(Or' (_, left), right)
| Or'(Or' (left, _), Or'(right,_))
| Or'(Or' (_, left), Or'(right,_))
| Or'(Or' (left, _), Or'(_,right))
| Or'(Or' (_, left), Or'(_, right))
| Or'(left, Or'(Or'(right,_), _))
| Or'(left, Or'(Or'(_,right), _))
| Or'(left, Or'(_, Or'(right,_)))
| Or'(left, Or'(_, Or'(_,right)))
| Or'(Or'(Or'(left, _), _), right)
| Or'(Or'(Or'(_, left), _), right)
| Or'(Or'(_, Or'(left, _)), right)
| Or'(Or'(_, Or'(_, left)), right)
when mutuallyExclusive left right ->
We quite clearly see that we'd need a recursion to parse both sides of the binary-tree, manually typing And/Or is not sustainable:
- For all conditions it has to remmeber previous conditions
- And for all new Or/And it will need to visit both sides of the sub-trees.
Now, I know real case SQL usage where people use this by code-generators, where it generates 5000 ORs to a single SQL-clause. I don't know why not use IN, but I know this is the case. That's why it needed the "balancetree" rule.
c.LastName == "John" || x1 || x2 || ... || x4999 || x5000 || c.LastName != "John" -> true,
So this is a combinatory explosion, and the recursion needs clearly some limitations to avoid DoS attacks. Then the question is how deep binary trees we should support? That could be configurable to push the responsibility to a user, but I don't see that optimal.
The original purpose of this tool is to save time; instead of transferring big SQL over the wire to the database and then the database tries to execute the complex queries, this pre-optimises things to hit better indexing. Thus if the pre-optimisation would take longer than the other process, it'd be pointless.
What would be a proper depth to cover a real-case use, but not distract other real-case use?
Hi,
Can you test the new version, please?
If you often encounter complex clauses that you need to optimise, one possibility is using Wybe: https://github.com/lamg/wybe, which uses the Z3 engine to solve these.
Hello! Excuse me for long absence. I have tested updates a bit. There is one strange case that is still failing:
new()
{
Name = "Simplify always-true disjunction with flattened conditions",
Input = c =>
c.IncorrectLoginAttempts < 5 ||
(!c.IsRestricted && !c.IsAnonymized && !c.IsDeleted) ||
(c.IsRestricted && !c.IsAnonymized && !c.IsDeleted) ||
(c.IsAnonymized && !c.IsDeleted) ||
c.IsDeleted,
Expected = c => true
}
flowchart TD
Root["OR"] --> A["c.IncorrectLoginAttempts < 5"] & B["AND"] & C["AND"] & D["AND"] & F["c.IsDeleted"]
B --> B1["!c.IsRestricted"] & B2["!c.IsAnonymized"] & B3["!c.IsDeleted"]
C --> C1["c.IsRestricted"] & C2["!c.IsAnonymized"] & C3["!c.IsDeleted"]
D --> D1["c.IsAnonymized"] & D2["!c.IsDeleted"]
The expression stays unmodified after optimization, despite the fact it is reducible to true.
However, there is another, almost the same test that differs only with extra parentheses. This changes tree structure of the expression, but still doesn't change final reducing effect.
new()
{
Name = "Simplify always-true disjunction with grouped conditions",
Input = c =>
c.IncorrectLoginAttempts < 5 ||
(
(!c.IsRestricted && !c.IsAnonymized && !c.IsDeleted) ||
(c.IsRestricted && !c.IsAnonymized && !c.IsDeleted) ||
(c.IsAnonymized && !c.IsDeleted) ||
c.IsDeleted
),
Expected = c => true
},
flowchart TD
Root["OR"] --> A["c.IncorrectLoginAttempts < 5"] & Group["Grouped OR"]
Group --> B["AND"] & C["AND"] & D["AND"] & E["c.IsDeleted"]
B --> B1["!c.IsRestricted"] & B2["!c.IsAnonymized"] & B3["!c.IsDeleted"]
C --> C1["c.IsRestricted"] & C2["!c.IsAnonymized"] & C3["!c.IsDeleted"]
D --> D1["c.IsAnonymized"] & D2["!c.IsDeleted"]
This is the only one test case I found so far that I think is important to solve. If optimizer handles the second case correctly, then it's likely a tree shape sensitivity issue. Probably, normalizing disjunctions/conjunctions before applying simplifications will help. Not fundamentally, but will be a step in a right direction.
Additionally, I started working on an expression complicator that generates all possible expressions up to a specified tree height, but I’m currently short on time, so no success so far. I’ll let you know if I manage to finish it later or if I discover any additional cases.
Should be fixed now. (1.0.34)