sqlpp17 icon indicating copy to clipboard operation
sqlpp17 copied to clipboard

Type Trait Categories

Open Leon0402 opened this issue 3 years ago • 5 comments

I was wondering if we might need a clear definition and documentation of our type traits / error messages. I think it's somewhat difficult to understand in the code & error message what an expression is exactly for instance.

There are two things I don't like currently in the code that I tested:

  • is_expression_v is somewhat restrictive resulting in lines like not is_expression_v<Expression> and not std::is_integral_v<Expression> and not is_text_v<Expression>, where you explicitly allow some things again
  • The relation between is_expression_v and some other traits is unclear like has_ordable_value_v or has_numeric_value_v

My suggestion would be:

  • is_expression_v should be more general and include all valid sql: Columns, Subqueries, Alias, sqlpp:star, all kinds of literals values, ...
  • There should be type traits which are more restrictive and direct subsets of expressions like is_numeric_expression_v and is_aggregate_expression_v

The consequences would be:

  • More checks needed to rule out for instance sqlpp:star in many commands, but more specific and helpful error message -> "max(*) is not allowed"
  • "Everything is allowed by default" approach. The chance is increased that we forget some of the checks and thus allow invalid sql syntax. But we decrease the chance that we disallow valid sql syntax. This is better in my opinion.
  • Code is easier to understand
    • No "and chaining". Like in count.h, where we currently disallow every non expression except numerical literals and text literals. This also leads to confusing error messages "arg must be an expression or a numeric literal or a text literal or ... " instead of more helpful messages like "arg cannot be ..."
    • Relations between traits a very clear based on the naming. Currently I always wonder in say max.h if has_ordable_value_v really checks everything needed

What do you think? Would this work out for the complete code (my suggestions are mainly based on what I have tested so far)?

Edit: Regarding the issue that there might be too many extra checks (like checking in most cases that sqlpp:star is disallowed), we would always have the option to add more "convenience type traits" that are more specific than is_expression_v say is_value_expression. These aren't self explaining, so they would need to be documented well (tradeoff). A better solution might be investigating how we can share code, which does the same checks, but throw different asserts (or assert messages).

Leon0402 avatar Jun 29 '22 16:06 Leon0402

Thank you for starting this thread! There is a lot of value in this discussion.

I was wondering if we might need a clear definition and documentation of our type traits / error messages. I think it's somewhat difficult to understand in the code & error message what an expression is exactly for instance.

Better documentation for sure. And I am sure we can find better concepts than what's currently there :-)

My suggestion would be:

* `is_expression_v` should be more general and include all valid sql: Columns, Subqueries, Alias, sqlpp:star, all kinds of literals values, ...

I would probably omit sqlpp::star from this list (it represents one or more expressions but is not one itself, which is why you can use it in so few places).

* There should be type traits which are more restrictive and direct subsets of expressions like `is_numeric_expression_v` and `is_aggregate_expression_v`

Agreed.

The consequences would be:

* More checks needed to rule out for instance `sqlpp:star` in many commands, but more specific and helpful error message -> "max(*) is not allowed"

We could easily have this today, by

  • not making sqlpp::star an expression
  • explicitly testing for sqlpp::star That being said, I am not sure if that's really necessary.
* "Everything is allowed by default" approach. The chance is increased that we forget some of the checks and thus allow invalid sql syntax. But we decrease the chance that we disallow valid sql syntax. This is better in my opinion.

This I find dangerous. The fundamental goal of the library is to ensure that your code is correct. I'd rather risk disallowing something by accident than allowing something by accident.

* Code is easier to understand

By declaring hierarchies / relationships of traits? Yes, for sure.

  * No "and chaining". Like in `count.h`, where we currently disallow every non expression except numerical literals and text literals. This also leads to confusing error messages "arg must be an expression or a numeric literal or a text literal or ... " instead of more helpful messages like "arg cannot be ..."

I think we could re-arrange the if-clauses to get better messages and fewer chains. Following my own guidance above, it might make sense to order them like this

disallow alias with a specific message disallow aggregate with a specific message allow expression (which includes numeric and string as described above) allow star disallow everything else with a generic message.

  * Relations between traits a very clear based on the naming. Currently I always wonder in say `max.h` if `has_ordable_value_v` really checks everything needed

Agreed, documented relationships would make this easier for sure.

What do you think? Would this work out for the complete code (my suggestions are mainly based on what I have tested so far)?

Edit: Regarding the issue that there might be too many extra checks (like checking in most cases that sqlpp:star is disallowed), we would always have the option to add more "convenience type traits" that are more specific than is_expression_v say is_value_expression. These aren't self explaining, so they would need to be documented well (tradeoff). A better solution might be investigating how we can share code, which does the same checks, but throw different asserts (or assert messages).

Turning sqlpp::star into an non-expression will make things substantially simpler :-)

rbock avatar Jun 30 '22 05:06 rbock

Thank you for taking part in the discussion!

We could easily have this today, by

  • not making sqlpp::star an expression
  • explicitly testing for sqlpp::star That being said, I am not sure if that's really necessary.

Yes, that's correct. Although one needs to be careful with the arrangement. The star checks needs to be before the expression test.

I think we could re-arrange the if-clauses to get better messages and fewer chains. Following my own guidance above, it might make sense to order them like this.

I'm not sure if allowing works without chaining? It would look like this in code I believe (pseudocode :-))

if (is_alias)
     fail()
if(is_aggregate)
     fail()
if(not is_expression and not is is_star) 
// or (not (is_expression or is_star))
  fail()

Which is the same as before, except literals are include in is_expression already. The arrangement doesn't matter as is_alias and is_aggregate are subsets of is_expression. I suppose this shows another motivation of my proposal: No dependence on the arrangement. Ideally it should look like this everywhere:

// Define a broad category like is_expression_v, is_numeric_expression. The most specific category that makes sense and really only contains disallowed stuff
if(not is_broad_category) 
// Rule out specific subsets of the broad category that are not allowed
if(is_subset_1)
if(is_subset_2)
if(is_subset_3)

As "broad" category allows much by default (like literals and that stuff), there will be a few more subset checks than there are currently. But very specific and helpful error messages like "Subset x is not allowed".

The same reasoning applies generally for sqlpp::star. Not making it part of expression will lead to:

if(not is_broad_category and not is_star)

On the other hand I understand your reasoning that "star" is really kind of special. Another idea would be:

  • is_expression_v -> includes star
  • is_expression_without_star_v (there might be a better name) -> More specific category that will be used in most places

The only difference is really that the pattern is consistent with this. But generally I don't mind if slpp::star is treated as an exception, because it is.

This I find dangerous. The fundamental goal of the library is to ensure that your code is correct. I'd rather risk disallowing something by accident than allowing something by accident.

My reasoning behind this is that you always have your database / sql engine to check it's valid in the end (during runtime though). And I see sqlpp more as a linter. It should detect as many mistakes as possible, but never give false positives. Because if I use the library and it tells me: "You cannot use this expression, this is disallowed", but it is actually allowed, then I can do nothing about it. I need to file a bug and write raw sql in the meantime. I think this is very frustrating. If it's the other way around, I still file a bug, but I don't need to wait for anyone to fix it.

But so far it seems like the general idea is good and there are only discussions about some small details left (to include or not include slppstar :-))

Leon0402 avatar Jun 30 '22 06:06 Leon0402

I'm not sure if allowing works without chaining? It would look like this in code I believe (pseudocode :-))

if (is_alias)
     fail()
if(is_aggregate)
     fail()
if(not is_expression and not is is_star) 
// or (not (is_expression or is_star))
  fail()

I would have done this:

// Exclude a few expression types
if (is_alias)
    return failure;
if (is_aggregate)
    return failure;

// Allow expressions and star:
if (is_expression)
  return success;
if (is_star)
  return success;

// catch all:
  return failure;

This I find dangerous. The fundamental goal of the library is to ensure that your code is correct. I'd rather risk disallowing something by accident than allowing something by accident.

My reasoning behind this is that you always have your database / sql engine to check it's valid in the end (during runtime though). And I see sqlpp more as a linter. It should detect as many mistakes as possible, but never give false positives. Because if I use the library and it tells me: "You cannot use this expression, this is disallowed", but it is actually allowed, then I can do nothing about it. I need to file a bug and write raw sql in the meantime. I think this is very frustrating. If it's the other way around, I still file a bug, but I don't need to wait for anyone to fix it.

I wrote sqlpp11 because I am really, really bad at writing SQL strings in C++ code. Noticing my mistakes in unit tests was just super annoying. I wanted the compiler to tell me everything I did wrong. I want this to continue:

If you can construct a statement with the standard functionality from the library, the compiler will find everything you are doing wrong. You can do other stuff, too, e.g. by concatenating clauses and using verbatim, but then you don't have any guarantees.

And bear in mind: The library is fundamentally limited in the syntax it supports. You cannot fix this by being a bit more lenient in the parameters a function will accept. There are a lot of things you simply cannot do with the library. It is missing functions, it is missing constructs like INSERT INTO ... SELECT FROM ... ON DUPLICATE KEY UPDATE, etc. And that's OK. You can still use it, and in those (hopefully) rare cases construct a query out of individual clauses and sqlpp::verbatim.

When you do that, the library makes little to no promises about correctness. You are mostly on your own.

But in the "normal" case, when your code creates a statement using the DSL as is, the library will catch all mistakes (minus bugs).

rbock avatar Jun 30 '22 18:06 rbock

Thanks for the clarification of the motivation. Will do my best to reduce the "minus bugs" part with the tests :-)

I would have done this:

Ah I see. I just forgot for a moment that you can also return success obviously :-) But I have to say that I find the code still more difficult to understand. Hard to tell why, but I suppose because of the mix of failure & success & catch all requires more thought if all cases have been properly handled. But that's a very subjective opinion.

I do like the positive check for sqlpp:star though. A simple

if (is_star) 
  return success;
  
// here follows rest

This would shows very clearly that sqlpp:star is something special and therefore handled extra.

Another solution would be to handle sqlpp:star with an overload instead. I think this is what you proposed in the MR if I understood you correctly? I wasn't first sure about this, but:

SELECT count(DISTINCT *) FROM test;

is invalid. And I think that's the only exception regarding distinct. All other expressions are allowed with distinct here (and in other queries with a distinct). So this would be a good argument for using the overload approach, which I now favor. What you think?

Leon0402 avatar Jul 01 '22 06:07 Leon0402

Another solution would be to handle sqlpp:star with an overload instead. I think this is what you proposed in the MR if I understood you correctly? I wasn't first sure about this, but:

SELECT count(DISTINCT *) FROM test;

is invalid.

Huh, I did not know that :-)

And I think that's the only exception regarding distinct. All other expressions are allowed with distinct here (and in other queries with a distinct). So this would be a good argument for using the overload approach, which I now favor.

Agreed, a separate overload is probably best.

rbock avatar Jul 02 '22 05:07 rbock