Implicit Boolean Casting causes unexpected behavior in Logical Boolean Operators
Describe the bug AGE relies on implicit casting to smoothly implement PG-type booleans. AG types may be implicitly cast to a boolean type. This has proven to not be an issue in most cases, but does cause unexpected behavior in transform_BoolExpr(). transform_BoolExpr uses PG's coerce_to_boolean() to check if agtypes can be cast to a boolean type, and because AGE allows implicit castring, there are cases that do not throw an error where it normally would be expected to.
In the case of logical gates, this causes the following unexpected behavior:
SELECT * from cypher('my_graph_name', $$
RETURN true OR 1
$$) as (a agtype);
The previous query would be expected to throw a type compatibility error, but instead it returns 'TRUE'
Similarly,
SELECT * from cypher('my_graph_name', $$
RETURN FALSE AND 1
$$) as (a agtype);
would be expected to throw a type compatibility error, but instead it returns FALSE.
The inverse of these two queries, RETURN 1 OR true, and RETURN 1 and false throw an error:
ERROR: cannot cast agtype integer to type boolean
STATEMENT: SELECT * FROM cypher('empty', $$ RETURN 1 OR true $$) as (a agtype);
ERROR: cannot cast agtype integer to type boolean
Because of the associativity of logical operators (left associativity), the boolean expression is optimized to pass or fail as soon as either condition is validated. But, this is an execution phase error.
The error that coerce_to_boolean would throw is:
ERROR: argument of OR must be type boolean, not type integer at character 8
STATEMENT: SELECT 1 OR TRUE
ERROR: argument of OR must be type boolean, not type integer
which would be an error thrown during the transform phase rather than during execution. Since AGE's constraint logic (WHERE, property constraints, etc), rely on implicit boolean casting, resolving this specific edge case may require modifying logic during the execution phase.
How are you accessing AGE (Command line, driver, etc.)?
- Command Line
What data setup do we need to do? This can be run on an instance of AGE installed via command line with no data setup required.
What is the necessary configuration info needed?
- Install AGE.
What is the command that caused the error?
Denoted above, but here is one of the problem commands stated again:
SELECT * from cypher('my_graph_name', $$
RETURN FALSE AND 1
$$) as (a agtype);
Expected behavior We'd expect coerce_to_boolean or another function in the transform phase to throw a type mismatch error, something along the lines of:
ERROR: argument of OR must be type boolean, not type integer at character 8
STATEMENT: SELECT 1 OR TRUE
ERROR: argument of OR must be type boolean, not type integer
Environment (please complete the following information):
- Version: latest main branch (ver 1.2.0)
@dehowef You're right, the bug occurs when using logical gates like OR and AND with incompatible operand types in Apache AGE.
The implicit boolean casting in the transform phase causes these operands to be cast to boolean types without throwing type compatibility errors. This leads to unexpected behaviour, such as returning 'TRUE' for RETURN true OR 1 and 'FALSE' for RETURN FALSE AND 1. In contrast, SELECT 1 OR true correctly throws an error.
To resolve this, we need to modify the code to enforce type compatibility and ensuree consistent behaviour in logical boolean operators also testing that should validate the expected behaviour.
@AbdulSamad4068 Unfortunately, you can't really pre-validate the inputs to AND or OR. As an example, look at the following cypher commands -
CREATE ({field: 1});
CREATE ({field: false});
CREATE ({field: "Name"});
MATCH (u) RETURN (true OR u.field);
MATCH (u) RETURN (false AND u.field);
In the above example we have 3 vertices, each with a differently typed value for the same key: field. None of this is known until the execution phase. Specifically, the type is not known until u.field has been evaluated - after - getting a value for u.
Additionally, in the execution phase, the logic for T_BoolExpr will skip over any extra input once the result of AND or OR is determined. For OR, just one true will suffice for an output of true. For AND, just one false will suffice for an output of false.
The only way to potentially resolve this is to write a completely different path for the execution of Booleans in AGE.
However, per PostgreSQL docs, using side effects to determine other outcomes is not recommended. Meaning, true OR is always true and false AND is always false. It is not good form to have a function error out (side effect) to break this. At least, that was my understanding of what they stated.
See this link for more -
https://www.postgresql.org/docs/15/sql-expressions.html#SYNTAX-EXPRESS-EVAL
in OR operator if first argument is true then it does not check the second argument because it is always true, similarly in AND operator if first argument is false then it does not check the second argument and return false, in other cases the second arguments needs to be checked.
Yeah, I said that already @farooquememon385
@jrgemignani (to make sure I understand this correctly), as you mentioned according to PostgreSQL docs, it isn't recommended to have side effects like erroring out with true OR which is always true, however, I'm currently using PG14 and when running SELECT true OR 1; it does error out:
ERROR: argument of OR must be type boolean, not type integer
LINE 1: SELECT true OR 2
^
Is that against the docs recommendations or I have misunderstood something?
@Mghrabi The error you are getting is expected behavior. This is coerce_to_boolean catching an uncoercable type in the transform phase and throwing the appropiate error before the query reaches the execution phase.
What John was saying was that without the coerce_to_boolean function, an alternate execution phase path is likely needed, and erroring out there would be bad practice.
You can read here for more information on phases of a query:
https://www.postgresql.org/docs/current/query-path.html
@dehowef Is this still an issue that we need to look into or correct?
This issue is stale because it has been open 45 days with no activity. Remove "Abondoned" label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for further 7 days with no activity.