DuckDB.NET icon indicating copy to clipboard operation
DuckDB.NET copied to clipboard

Error in large appender add

Open Cricle opened this issue 2 years ago • 11 comments

At that time I want sync mysql to duckdb, I use appender to insert ~ 50w rows.

I after checked the data consistence, some TIMESTAMP value is error.

Like that

image

But I try insert or update this row, It will be ok.

INSERT INTO "juhe1" (_id, datetime, ja1, ja2, sa3, ca4)
VALUES (10022702, '2023-07-06 18:59:03', '2010-07-30 15:07:32', 11, '327.84', 'gNcTiI06wP')
ON CONFLICT (_id)
DO UPDATE SET datetime=EXCLUDED.datetime, ja1=EXCLUDED.ja1, ja2=EXCLUDED.ja2, sa3=EXCLUDED.sa3, ca4=EXCLUDED.ca4;

The append code is

public static Action<DuckDBAppenderRow, IDataReader>[] BuildFetcher(IDataRecord record)
{
    var fetcher = new Action<DuckDBAppenderRow, IDataReader>[record.FieldCount];
    for (int i = 0; i < fetcher.Length; i++)
    {
        var index = i;
        var type = record.GetFieldType(i);
        if (type == typeof(bool))
            fetcher[i] = (row, reader) => row.AppendValue(reader.GetBoolean(index));
        else if (type == typeof(sbyte))
            fetcher[i] = (row, reader) => row.AppendValue((sbyte)reader.GetByte(index));
        else if (type == typeof(byte))
            fetcher[i] = (row, reader) => row.AppendValue(reader.GetByte(index));
        else if (type == typeof(short))
            fetcher[i] = (row, reader) => row.AppendValue(reader.GetInt16(index));
        else if (type == typeof(ushort))
            fetcher[i] = (row, reader) => row.AppendValue((ushort)reader.GetInt16(index));
        else if (type == typeof(int))
            fetcher[i] = (row, reader) => row.AppendValue(reader.GetInt32(index));
        else if (type == typeof(uint))
            fetcher[i] = (row, reader) => row.AppendValue((uint)reader.GetInt32(index));
        else if (type == typeof(long))
            fetcher[i] = (row, reader) => row.AppendValue(reader.GetInt64(index));
        else if (type == typeof(ulong))
            fetcher[i] = (row, reader) => row.AppendValue((ulong)reader.GetInt64(index));
        else if (type == typeof(float))
            fetcher[i] = (row, reader) => row.AppendValue(reader.GetFloat(index));
        else if (type == typeof(double))
            fetcher[i] = (row, reader) => row.AppendValue(reader.GetDouble(index));
        else if (type == typeof(decimal))
            fetcher[i] = (row, reader) => row.AppendValue((double)reader.GetDecimal(index));
        else if (type == typeof(DateTime))
            fetcher[i] = (row, reader) => row.AppendValue(reader.GetDateTime(index));
        else if (type == typeof(string))
            fetcher[i] = (row, reader) => row.AppendValue(reader.GetString(index));
        else
            throw new NotSupportedException(type.ToString());
    }
    return fetcher;
}
//Here is execute code
var res = 0;
var fetcher = BuildFetcher(reader);
using (var appender = duckDBConnection.CreateAppender(tableName))
{
    var fieldCount = reader.FieldCount;
    while (reader.Read())
    {
        var row = appender.CreateRow();
        for (int i = 0; i < fieldCount; i++)
        {
            fetcher[i](row, reader);
        }
        row.EndRow();
        res++;
    }
}
return Task.FromResult(res);

The probability of this occurring is approximately once every thousands of lines.

Cricle avatar Nov 17 '23 01:11 Cricle

What error do the timestamp values have?

Giorgi avatar Nov 17 '23 08:11 Giorgi

What error do the timestamp values have?

The timestamp in some rows will error for inserted.

Like inserted '2020-01-01 00:00:00', when query result will be '2030-02-03 12:01:02'. I don't know it is duckdb error or the library error. I try use insert into to insert ~50w rows data, some result is error, I try insert ~15w rows, It will be ok.

Cricle avatar Nov 18 '23 04:11 Cricle

Do you get a wrong result for the same row and column every time you run the code or is it different on each run? If it is the same every time you could put a breakpoint to see what the DuckDB.Net code does for that field to see if there is an error in converting DateTime to DuckDBTimestampStruct in DuckDBDateOnly.FromDateTime or DuckDBTimeOnly.FromDateTime

Giorgi avatar Nov 18 '23 15:11 Giorgi

Every time same row is wrong. I debug for check native api result, the DuckDBTimestamp is right, But I don't know if the others are correct.

Cricle avatar Nov 19 '23 00:11 Cricle

Does it happen if you try to import only that one row?

Giorgi avatar Nov 19 '23 08:11 Giorgi

No, I try appender or insert is ok.

Cricle avatar Nov 19 '23 11:11 Cricle

If you can create a project that reproduces the issue I can try to have a look at it. It will be easier if you export MySQL data to a CSV file and use that in the demo so that I don't have to set up a database.

Giorgi avatar Nov 19 '23 11:11 Giorgi

Ok, I will try to write a mini project and export all data.

Thanks.

Cricle avatar Nov 19 '23 12:11 Cricle

AppendFail.zip

My time zone is UTC+8.

Cricle avatar Nov 21 '23 06:11 Cricle

Can you modify source code to read from CSV instead of MySQL?

Giorgi avatar Nov 24 '23 20:11 Giorgi

AppendFail.zip Already modify to read csv

Cricle avatar Nov 28 '23 01:11 Cricle

@Cricle Can you check if you still experience this issue with the 0.10.1 version?

Giorgi avatar Mar 18 '24 19:03 Giorgi

Closing due to inactivity. If you still experience the issue feel free to reopen it.

Giorgi avatar Mar 22 '24 11:03 Giorgi