age icon indicating copy to clipboard operation
age copied to clipboard

Implicit Boolean Casting causes unexpected behavior in Logical Boolean Operators

Open dehowef opened this issue 2 years ago • 7 comments

Describe the bug AGE relies on implicit casting to smoothly implement PG-type booleans. AG types may be implicitly cast to a boolean type. This has proven to not be an issue in most cases, but does cause unexpected behavior in transform_BoolExpr(). transform_BoolExpr uses PG's coerce_to_boolean() to check if agtypes can be cast to a boolean type, and because AGE allows implicit castring, there are cases that do not throw an error where it normally would be expected to.

In the case of logical gates, this causes the following unexpected behavior:

SELECT * from cypher('my_graph_name', $$
 RETURN true OR 1
$$) as (a agtype);

The previous query would be expected to throw a type compatibility error, but instead it returns 'TRUE'

Similarly,

SELECT * from cypher('my_graph_name', $$
 RETURN FALSE AND 1
$$) as (a agtype);

would be expected to throw a type compatibility error, but instead it returns FALSE.

The inverse of these two queries, RETURN 1 OR true, and RETURN 1 and false throw an error:

 ERROR:  cannot cast agtype integer to type boolean
 STATEMENT:  SELECT * FROM cypher('empty', $$ RETURN 1 OR true $$) as (a agtype);
 ERROR:  cannot cast agtype integer to type boolean

Because of the associativity of logical operators (left associativity), the boolean expression is optimized to pass or fail as soon as either condition is validated. But, this is an execution phase error.

The error that coerce_to_boolean would throw is:

ERROR:     argument of OR must be type boolean, not type integer at character 8
STATEMENT:    SELECT 1 OR TRUE
ERROR: argument of OR must be type boolean, not type integer

which would be an error thrown during the transform phase rather than during execution. Since AGE's constraint logic (WHERE, property constraints, etc), rely on implicit boolean casting, resolving this specific edge case may require modifying logic during the execution phase.

How are you accessing AGE (Command line, driver, etc.)?

  • Command Line

What data setup do we need to do? This can be run on an instance of AGE installed via command line with no data setup required.

What is the necessary configuration info needed?

  • Install AGE.

What is the command that caused the error?

Denoted above, but here is one of the problem commands stated again:

SELECT * from cypher('my_graph_name', $$
 RETURN FALSE AND 1
$$) as (a agtype);

Expected behavior We'd expect coerce_to_boolean or another function in the transform phase to throw a type mismatch error, something along the lines of:

ERROR:     argument of OR must be type boolean, not type integer at character 8
STATEMENT:    SELECT 1 OR TRUE
ERROR: argument of OR must be type boolean, not type integer

Environment (please complete the following information):

  • Version: latest main branch (ver 1.2.0)

dehowef avatar Jun 23 '23 23:06 dehowef

@dehowef You're right, the bug occurs when using logical gates like OR and AND with incompatible operand types in Apache AGE.

The implicit boolean casting in the transform phase causes these operands to be cast to boolean types without throwing type compatibility errors. This leads to unexpected behaviour, such as returning 'TRUE' for RETURN true OR 1 and 'FALSE' for RETURN FALSE AND 1. In contrast, SELECT 1 OR true correctly throws an error.

To resolve this, we need to modify the code to enforce type compatibility and ensuree consistent behaviour in logical boolean operators also testing that should validate the expected behaviour.

AbdulSamad4068 avatar Jun 30 '23 16:06 AbdulSamad4068

@AbdulSamad4068 Unfortunately, you can't really pre-validate the inputs to AND or OR. As an example, look at the following cypher commands -

CREATE ({field: 1});
CREATE ({field: false});
CREATE ({field: "Name"});

MATCH (u) RETURN (true OR u.field);
MATCH (u) RETURN (false AND u.field);

In the above example we have 3 vertices, each with a differently typed value for the same key: field. None of this is known until the execution phase. Specifically, the type is not known until u.field has been evaluated - after - getting a value for u.

Additionally, in the execution phase, the logic for T_BoolExpr will skip over any extra input once the result of AND or OR is determined. For OR, just one true will suffice for an output of true. For AND, just one false will suffice for an output of false.

The only way to potentially resolve this is to write a completely different path for the execution of Booleans in AGE.

However, per PostgreSQL docs, using side effects to determine other outcomes is not recommended. Meaning, true OR is always true and false AND is always false. It is not good form to have a function error out (side effect) to break this. At least, that was my understanding of what they stated.

See this link for more -

https://www.postgresql.org/docs/15/sql-expressions.html#SYNTAX-EXPRESS-EVAL

jrgemignani avatar Jun 30 '23 16:06 jrgemignani

in OR operator if first argument is true then it does not check the second argument because it is always true, similarly in AND operator if first argument is false then it does not check the second argument and return false, in other cases the second arguments needs to be checked.

farooquememon385 avatar Jun 30 '23 18:06 farooquememon385

Yeah, I said that already @farooquememon385

dehowef avatar Jun 30 '23 20:06 dehowef

@jrgemignani (to make sure I understand this correctly), as you mentioned according to PostgreSQL docs, it isn't recommended to have side effects like erroring out with true OR which is always true, however, I'm currently using PG14 and when running SELECT true OR 1; it does error out:

ERROR:  argument of OR must be type boolean, not type integer
LINE 1: SELECT true OR 2
                                     ^

Is that against the docs recommendations or I have misunderstood something?

Mghrabi avatar Jul 07 '23 07:07 Mghrabi

@Mghrabi The error you are getting is expected behavior. This is coerce_to_boolean catching an uncoercable type in the transform phase and throwing the appropiate error before the query reaches the execution phase.

What John was saying was that without the coerce_to_boolean function, an alternate execution phase path is likely needed, and erroring out there would be bad practice.

You can read here for more information on phases of a query:

https://www.postgresql.org/docs/current/query-path.html

dehowef avatar Jul 07 '23 23:07 dehowef

@dehowef Is this still an issue that we need to look into or correct?

jrgemignani avatar Oct 21 '23 00:10 jrgemignani

This issue is stale because it has been open 45 days with no activity. Remove "Abondoned" label or comment or this will be closed in 7 days.

github-actions[bot] avatar May 11 '24 00:05 github-actions[bot]

This issue was closed because it has been stalled for further 7 days with no activity.

github-actions[bot] avatar May 20 '24 00:05 github-actions[bot]