efcore.pg icon indicating copy to clipboard operation
efcore.pg copied to clipboard

Does npgsql support 'IS DISTINCT FROM' and 'IS NOT DISTINCT FROM' when building query using System.Linq.Expressions?

Open cirrusone opened this issue 1 year ago • 1 comments

I intially asked this in dotnet/runtime/discussions but feedback indicates it may be a provider issue instead?

Is it possible to build a SQL query using System.Linq.Expressions so the npgsql provider uses either 'IS DISTINCT FROM' and 'IS NOT DISTINCT FROM' for simpler null handling?

The main issue I have is that inversing an expression using NOT, omits null values.

private Expression NotEquals(Type type, string value, Expression propertyExp)
{
	return Expression.Not(Equals(type, value, propertyExp));
}

cirrusone avatar May 17 '24 13:05 cirrusone

The main issue I have is that inversing an expression using NOT, omits null values.

Can you please provide more context on what you mean here, by posting a code sample and the results you're seeing? Rather than thinking about IS DISTINCT FROM, can you please try to show a LINQ query that's producing results that are unexpected or wrong?

For more detail, EF's SQL translation should already be producing SQL that behaves like IS DISTINCT FROM, by adding SQL to make the comparison behave like C# (or like IS DISTINCT FROM). For example, given the following query, where String1 and String2 are both configured as nullable:

_ = await context.Blogs.Where(b => b.String1 == b.String2).ToListAsync();

EF generates the following SQL:

SELECT b."Id", b."String1", b."String2"
FROM "Blogs" AS b
WHERE b."String1" = b."String2" OR (b."String1" IS NULL AND b."String2" IS NULL)

The b."String1" IS NULL AND b."String2" IS NULL bit is added by EF to make sure rows are returned where both are null. EF contains substantial logic to make sure this works, including around negation.

roji avatar May 17 '24 15:05 roji

Hi,

Thanks for responding, it is most likely my poor understanding rather than anything wrong. I've inherited a complex codebase which seems to have gone through a migration from SQL Server to Postgres and seems to have it's own implementation of UseDatabaseNullSemantics which is introducing lots of bugs. It may have been required at the time as Core was developed, not sure. I have over 300 fields which can be grouped into complex logic so I'm trying to go back to basics and replicate exactly what I want in Postgres.

I'm trying to recreate the exact same results as the following.

-- Assuming data [1, 2, 3, 4, 5, null]

-- Example1: Returns 1 record which equals 4
select field1 from testtable where field1 = 4;

-- Example2: Returns 4 records 1, 2, 3, 5
select field1 from testtable where field1 != 4;

-- Example3: Returns 5 records 1, 2, 3, 5, null
select field1 from testtable where field1 is distinct from 4; 

However, the issue I'm experiencing is that in Linq.Expressions, when applying NOT to the Example1, I get the result for Example3 (including nulls) rather than Example2.

My question really is, how do I represent both Example2 and Example 3 in Linq.Expressions? I do have the need to both include nulls or exclude nulls when negating equality.

Here is some test code showing this issue ->

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net6.0</TargetFramework>
    <ImplicitUsings>disable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Dapper" Version="2.1.35" />
    <PackageReference Include="Microsoft.EntityFrameworkCore" Version="6.0.7" />
    <PackageReference Include="Npgsql" Version="6.0.7" />
    <PackageReference Include="Npgsql.EntityFrameworkCore.PostgreSQL" Version="6.0.7" />
  </ItemGroup>

</Project>
using Dapper;
using Microsoft.EntityFrameworkCore;
using Npgsql;
using System;
using System.Collections.Generic;
using System.ComponentModel.DataAnnotations.Schema;
using System.Data;
using System.Diagnostics;
using System.Linq;
using System.Linq.Expressions;
using System.Threading;
using System.Threading.Tasks;

namespace ConsoleApp1;

static class Program
{
    static async Task Main(string[] args)
    {
        var efTest = new EFTest();
        await efTest.Run();
    }
}

public class EFTest
{
    public async Task Run()
    {
        string conn = "Server=127.0.0.1;Port=5432;Database=dbtest;User Id=**;Password=**;Timeout=60;CommandTimeout=600;Include Error Detail=true;";
        int commandTimeOutSec = 600;

        // Setup table and data
        using (NpgsqlConnection connection = new NpgsqlConnection(conn))
        {
            CancellationToken cancellationToken = default(CancellationToken);
            await connection.OpenAsync(cancellationToken);

            using (var transaction = connection.BeginTransaction())
            {
                string sqlQuery = @"
                drop table if exists testtable;";
                await connection.ExecuteAsync(sqlQuery, null, transaction, commandTimeOutSec, CommandType.Text);

                sqlQuery = @"
                create table if not exists testtable (
                field1 int null
                );";
                await connection.ExecuteAsync(sqlQuery, null, transaction, commandTimeOutSec, CommandType.Text);

                sqlQuery = @"
                insert into testtable (field1)
                values
                (1),
                (2),
                (3),
                (4),
                (5),
                (null);";
                await connection.ExecuteAsync(sqlQuery, null, transaction, commandTimeOutSec, CommandType.Text);

                transaction.Commit();
            }

        }

        // Run Database Tests
        int value = 4;

        using (NpgsqlConnection connection = new NpgsqlConnection(conn))
        {
            CancellationToken cancellationToken = default(CancellationToken);
            await connection.OpenAsync(cancellationToken);

            string sqlQuery = @"select field1 from testtable where field1 = @field1;";
            var parameters = new DynamicParameters();
            parameters.Add("@field1", value);
            IEnumerable<TestTable> data = await connection.QueryAsync<TestTable>(sqlQuery, parameters, null, commandTimeOutSec, CommandType.Text);
            Debug.Assert(data.Count() == 1 && data.First().Field1 == value);

            sqlQuery = @"select field1 from testtable where field1 != @field1;";
            parameters = new DynamicParameters();
            parameters.Add("@field1", value);
            data = await connection.QueryAsync<TestTable>(sqlQuery, parameters, null, commandTimeOutSec, CommandType.Text);
            Debug.Assert(data.Count() == 4 && data.Count(x => x.Field1 == value) == 0);

            sqlQuery = @"select field1 from testtable where field1 is distinct from @field1;";
            parameters = new DynamicParameters();
            parameters.Add("@field1", value);
            data = await connection.QueryAsync<TestTable>(sqlQuery, parameters, null, commandTimeOutSec, CommandType.Text);
            Debug.Assert(data.Count() == 5 && data.Count(x => x.Field1 == value) == 0);
        }

        // Run Linq.Expressions Tests
        var options = new DbContextOptionsBuilder<ApplicationDbContext>().UseNpgsql(conn).Options;
        using (var ctx = new ApplicationDbContext(options))
        {
            // Passes.
            IEnumerable<TestTable> data = ctx.TestTable.BuildQuery(applyNOT: false);
            Debug.Assert(data.Count() == 1 && data.First().Field1 == value);

            // Fails, returns null but SQL query excludes null
            data = ctx.TestTable.BuildQuery(applyNOT: true);
            Debug.Assert(data.Count() == 4 && data.Count(x => x.Field1 == value) == 0);

            // Not sure how to replicate same result for
            // 1. select field1 from testtable where field1 != @field1;
            // 2. select field1 from testtable where field1 is distinct from @field1;
        }

    }

}

public static class Builder
{
    public static IQueryable<T> BuildQuery<T>(this IQueryable<T> queryable, bool applyNOT)
    {

        var pe = Expression.Parameter(typeof(T), "item");
        Expression propertyExp = Expression.Property(pe, "field1");
        Expression constant = Expression.Constant(4);
        Expression expressionTree = Expression.Equal(propertyExp, Expression.Convert(constant, propertyExp.Type));

        if (applyNOT)
        {
            expressionTree = Expression.Not(expressionTree);
        }

        var whereCallExpression = Expression.Call(
            typeof(Queryable),
            "Where",
            new[] { queryable.ElementType },
            queryable.Expression,
            Expression.Lambda<Func<T, bool>>(expressionTree, pe));

        var filteredResults = queryable.Provider.CreateQuery<T>(whereCallExpression);

        return filteredResults;

    }

}

public class ApplicationDbContext : DbContext
{
    public DbSet<TestTable> TestTable { get; set; }

    public ApplicationDbContext(DbContextOptions options) : base(options) { }

    protected override void OnModelCreating(ModelBuilder builder)
    {
        base.OnModelCreating(builder);

        builder.Entity<TestTable>(i =>
        {
            i.ToTable("testtable");
            i.HasNoKey();
        });
    }

}


[Table("testtable", Schema = "public")]
public class TestTable
{
    [Column("field1")]
    public int? Field1 { get; set; }
}

cirrusone avatar May 18 '24 08:05 cirrusone

Here's the query and assertion:

data = ctx.TestTable.BuildQuery(applyNOT: true);
Debug.Assert(data.Count() == 4 && data.Count(x => x.Field1 == value) == 0);

Inspecting the query tree with the debugger before execution shows that BuildQuery() produces the following tree:

[Microsoft.EntityFrameworkCore.Query.EntityQueryRootExpression].Where(item => Not((item.Field1 == Convert(4, Nullable`1))))

In other words, you're asking for all items where Field1 isn't 4. Turning on EF's SQL logging feature, we can see that EF generates the following:

SELECT t.field1
FROM testtable AS t
WHERE t.field1 <> 4 OR t.field1 IS NULL

Note the t.field1 IS NULL added by EF, to make the SQL return the same rows that the LINQ query would have returned if executed in memory, against .NET objects. Since in C# (2-value logic) terms, 4 is indeed not equal to NULL, the row with the NULL is returned (just as it would if executed against a regular .NET array); this is also the behavior you get when using IS DISTINCT FROM, which is why that SQL query above returns 5 rows rather than 4.

In other words, as far as I can tell, everything seems to be working as designed... where exactly do you see a problem?

roji avatar May 18 '24 12:05 roji

There is no problem now I understand that this is by design, sorry to have wasted your time, I think I was falsely expecting the orignal query of

SELECT t.field1
FROM testtable AS t
WHERE t.field1 = 4

to produce the following when negated

SELECT t.field1
FROM testtable AS t
WHERE t.field1 <> 4

but it actually produces the following

SELECT t.field1
FROM testtable AS t
WHERE t.field1 <> 4 OR t.field1 IS NULL

Completely understand that this aligns with C# execution. I need to do a bit more research as I will have the need to toggle between including/excluding ...OR t.field1 IS NULL but I'm assuming that will be handled within Linq.Expressions.

Thank you.

cirrusone avatar May 18 '24 13:05 cirrusone