Treating the Empty String as NULL since 0.3.2-patch6
I can reproduce this misbehaviour since version 0.3.2-patch6.
Up to version 0.3.2-patch5 everything worked fine.
I poked around an bit and found out that the jdbc-driver has problems with empty strings. It seems that it handles empty strings like null values.
Before patch6 everything worked fine. This behavior still exists in patch9
This relates to #896
Unexpected error inserting/updating row in part [insertRow add batch]
...
com.clickhouse.client.data.BinaryStreamUtils.writeString (BinaryStreamUtils.java:1667)
If is replace the empty string value with any character the error is gone.
Thanks @mafiore. It sounds like empty string result in a null flag. Let me add a few more test cases to see if I can reproduce the issue - would be great if you can share a minimal test case.
thanks for your quick reply @zhicwu. My test-case is build in Apache Hop, so this is no raw code to share. but I can explain how I reproduced the behaviour.
Up to 0.3.2-patch5 everthing worked fine. When we tried to change to a higher revision, the mess began. First I thought it has something to do with the timestamp-format.
I ended up with a Simple Table like
CREATE TABLE TEST_EMPTY_STRING
(
`ID` UInt64,
`SOMETHING` String
)
ENGINE = MergeTree
PRIMARY KEY (ID)
ORDER BY (ID);
I set the NULL values within the test records to empty with SOMETHING==null?"":SOMETHING
This leads to the described error. If I set the value to e.g. "-" the record will be accepted an written.
There must be a related change between rev5 and rev6
Below test case works for me. Did I miss something?
try (ClickHouseConnection conn = newConnection(new Properties());
Statement s = conn.createStatement()) {
s.execute("drop table if exists test_read_write_strings;"
+ "create table test_read_write_strings(id Int32, s1 String, s2 Nullable(String), s3 Array(Nullable(String)))engine=Memory");
try (PreparedStatement stmt = conn
.prepareStatement("insert into test_read_write_strings")) {
stmt.setInt(1, 1);
stmt.setString(2, "");
stmt.setString(3, "");
stmt.setObject(4, new String[] { "" });
stmt.addBatch();
stmt.setInt(1, 2);
stmt.setString(2, "");
stmt.setString(3, null);
stmt.setObject(4, new String[] { null });
stmt.addBatch();
int[] results = stmt.executeBatch();
Assert.assertEquals(results, new int[] { 1, 1 });
}
ResultSet rs = s.executeQuery("select * from test_read_write_strings order by id");
Assert.assertTrue(rs.next());
Assert.assertEquals(rs.getInt(1), 1);
Assert.assertEquals(rs.getString(2), "");
Assert.assertEquals(rs.getString(3), "");
Assert.assertEquals(rs.getObject(4), new String[] { "" });
Assert.assertTrue(rs.next());
Assert.assertEquals(rs.getInt(1), 2);
Assert.assertEquals(rs.getString(2), "");
Assert.assertEquals(rs.getString(3), null);
Assert.assertEquals(rs.getObject(4), new String[] { null });
Assert.assertFalse(rs.next());
}
I set the
NULLvalues within the test records to empty withSOMETHING==null?"":SOMETHINGThis leads to the described error. If I set dthe value to e.g.
"-"the record will be accepted an written.
This might not related but what's the version of ClickHouse you're using?
22.3.2 revision 54455.
Hmm... I can only reproduce the issue when trying to write null to a non-nullable string column. Also I didn't see suspicious change in patch6. Could that be some feature in Hop to treat empty string as null?
No there is absolutely nothing like that in Hop. What I do ist to filter out the nulls and set them explicitely to an empty String to avoid Nulls. to be wriiten to a non nullable column.I tried it in every direction. leave the filed null, set the field to empty String, set it to a defaut String "-" .
The crazy is, that it works with the same setup since month, but the switch away from rev5 is breaking it.
For the moment it is ok for us to stay at rev5, but this no longterm solution.
Thanks @mafiore and sorry for the inconvenience at your end. In general, we should avoid writing null to a non-nullable column. However, it's not always true - for example, when we want to use default value of a column, we may pass null even we know the column is not nullable. This is also the reason why #935 didn't pass build check :)
I think both cases can be addressed by adding a JDBC-specific option nullParameterHandler (defaults to none for backward compatibility, and only for PrepareStatement parameters) to customize the behavior:
- convert - convert null to default value of corresponding data type(not the default value of the column)
- exception - throw SQLException instead of ugly NPE to prevent user from setting null to a non-null column
- none - do nothing
Make sense?
I agree with you @zhicwu but I don't tried write NULL to non nullable. I change NULL to an empty String which is not NULL at all and this had worked until rev5. Thank you !
I agree with you @zhicwu but I don't tried write NULL to non nullable. I change NULL to an empty String which is not NULL at all and this had worked until rev5. Thank you !
I understand. It could be a breaking change I made without noticing. But with the new option, it can simplify your ETL work too, because you'll no longer need additional step to convert null to empty string. Have you tried insert_null_as_default before? It could be a server-side solution.
@zhicwu this is a very cool helpful feature. The clickhouse-developers do a great job. I curious how it will be when the jdbc driver supports native connections (hot http based)
@mafiore, did you mean clickhouse-tcp-client? I don't think it will help much, except native data format support(for parallel deserialization & serialization). If the concern is performance, you may watch #928 as I'll provide more updates in the next following days. To put in short, the JDBC driver is very slow. Java client and streaming make things better, but still far behind ClickHouse native client.
@zhicwu It may be that JDBC ist slow compared to the native CH Client. But for our needs it is fast like a flash :-) JDBC is often the only way to interact with other systems. Maybe it could be a way to have a JDBC wrapper around the clickhouse-client. (with the caveat of the separate deployment of platform native libs). When I remember it correctly, this was done with SQL Server for a long time
Maybe it could be a way to have a JDBC wrapper around the clickhouse-client.
Exactly. There'll be a clickhouse-cli-client, which is simply a wrapper of the native command line :D
nullAsDefault connection property(defaults to 0) was added in #956, but it's only for JDBC driver(not supported in Java client for performance reason).
Example: jdbc:ch://localhost/db?nullAsDefault=1
0 - treat null as is and throw exception when trying to insert null value into non-nullable column
1 - treat null as is and disable null-check for insertion
2 - convert null to default value of corresponding data type for both insertion and query
hello @zhicwu I am sorry to say, that there is no difference in the behaviour. The Empty String ist not accepted. I am using the patch 10
Error inserting/updating row 2022/06/27 14:40:57 - store barvalue event 2.0 - Cannot set null to non-nullable column #6 [event_unit LowCardinality(String)]

Why can't this not work, like it did before patch 6 ?
I tried every combination and ended up with setting the value to "empty", which is not what I want. The empty String is a String with the length of 0, not NULL
Sorry to hear about that @mafiore. Let's reopen the issue.
Can you provide minimum pipeline for reproducing the issue? I'll then install Apache hop and debug from there.
@mafiore, I tried on hop 2.0.0 and was not able to reproduce the issue. nullAsDefault works exactly as expected. Also the exception above will only be thrown when nullAsDefault < 1:
https://github.com/ClickHouse/clickhouse-jdbc/blob/ea5aaf579b0612bcf1825eb1ec31bf9b170a7a65/clickhouse-jdbc/src/main/java/com/clickhouse/jdbc/internal/InputBasedPreparedStatement.java#L320-L323
On a side note, it looks like they started to enhance clickhouse plugin by replacing clickhouse4j to official driver, so that nobody has to use generic database plugin to access ClickHouse :D
@zhicwu Sorry, But I have a lack of time thees days. I will try this again, but I thought I tested it with different setups
On a side note, it looks like they started to enhance clickhouse plugin by replacing clickhouse4j to official driver, so that nobody has to use generic database plugin to access ClickHouse :D
Who do you think, has triggered this? :-) yes - you are right. It was me ...
@zhicwu Hey there, unfortunately this bug still occurs for us. Currenty we are testing on Apache Hop 2.1.0. We could reproduce it using driver clickhouse-jdbc-0.3.2-patch9-http.jar (which ships with Hop) and clickhouse-jdbc-0.3.2-patch11-http.jar
Independently from the value nullAsDefault is set to following error is thrown:
2022/11/24 11:02:35 - Table output.0 - Error inserting row into table [Test] with values: [null], [null]
2022/11/24 11:02:35 - Table output.0 -
2022/11/24 11:02:35 - Table output.0 - Unexpected error inserting/updating row in part [insertRow add batch]
2022/11/24 11:02:35 - Table output.0 - at java.lang.Thread.run (Thread.java:829)
2022/11/24 11:02:35 - Table output.0 - at org.apache.hop.pipeline.transform.RunThread.run (RunThread.java:51)
2022/11/24 11:02:35 - Table output.0 - at org.apache.hop.pipeline.transforms.tableoutput.TableOutput.processRow (TableOutput.java:117)
2022/11/24 11:02:35 - Table output.0 - at org.apache.hop.pipeline.transforms.tableoutput.TableOutput.writeToTable (TableOutput.java:251)
2022/11/24 11:02:35 - Table output.0 - at org.apache.hop.core.database.Database.insertRow (Database.java:1097)
2022/11/24 11:02:35 - Table output.0 - at com.clickhouse.jdbc.internal.InputBasedPreparedStatement.addBatch (InputBasedPreparedStatement.java:313)
2022/11/24 11:02:35 - Table output.0 - at com.clickhouse.client.data.ClickHouseRowBinaryProcessor$MappedFunctions.serialize (ClickHouseRowBinaryProcessor.java:486)
2022/11/24 11:02:35 - Table output.0 - at com.clickhouse.client.data.ClickHouseRowBinaryProcessor$MappedFunctions.lambda$buildMappingsForDataTypes$65 (ClickHouseRowBinaryProcessor.java:337)
2022/11/24 11:02:35 - Table output.0 - at com.clickhouse.client.data.BinaryStreamUtils.writeString (BinaryStreamUtils.java:1667)
One therory is that the empty string interferes with the check for a null character here: https://github.com/ClickHouse/clickhouse-jdbc/blob/42b1b2193931dc68b86cb5dfc86392d4c6d6601b/clickhouse-client/src/main/java/com/clickhouse/client/data/BinaryStreamUtils.java#L1758-L1764
I created an example repo to demonstrate this behaviour if you want to check it out yourself: https://github.com/uklft/clickhouse-jdbc-hop-test
edit: nullAsDefault seems to be ignored from DBeaver + ClickHouse JDBC, too.
Using the URL jdbc:clickhouse:http://localhost:18123?nullAsDefault=0.
Hi @uklft, thanks for sharing your findings. Just tried the example you provided using nullAsDefault=2 and I was able to reproduce the issue using patch9 but it works for both patch11 and nightly build. I use below table for testing:
drop table if exists test_empty_string;
create table test_empty_string(id Int32, name String)engine=MergeTree() order by tuple();
To upgrade JDBC driver:
$ cd hop/plugins/databases/clickhouse/lib
$ rm -f *.jar
$ wget https://repo1.maven.org/maven2/com/clickhouse/clickhouse-jdbc/0.3.2-patch11/clickhouse-jdbc-0.3.2-patch11-all.jar
Did I miss anything?
nullAsDefault seems to be ignored from DBeaver + ClickHouse JDBC, too. Using the URL jdbc:clickhouse:http://localhost:18123?nullAsDefault=0.
What's the expectation? The option is only useful when you pass null values to PreparedStatement, but I'm not sure how I can do that in DBeaver. If you're talking about binding variables like insert into test_empty_string values( :id, :name), it seems DBeaver does not support binding null value.
Update: submitted https://issues.apache.org/jira/browse/HOP-4629 for upgrading JDBC driver and remove http-client jar, which might be the cause of the issue.
Hey @zhicwu thanks for looking into it and creating an Issue for Hop!
You are right, the parameter works as expected with patch11. Indeed the http-client jar was present during my test, so that may be the reason the problem still occurred.
Concerning DBeaver: I expected the parameter to work also on a direct insert. Sorry for the misunderstanding.
Thanks @uklft for confirming the issue has been resolved.
@zhicwu Sorry to bother you again, but we noticed that the original problem of this issue unfortunately is not resolved.
nullAsDefault being ignored by the driver is resolved.
But the problem that (at least in Hop) empty strings are passed as null values remains.
When i send rows containing ("", "") i get this error:
2022/12/02 12:29:14 - Table output.0 - Prepared statement : INSERT INTO EmptyStringTest.Test (id, name) VALUES ( ?, ?)
2022/12/02 12:29:14 - clickhouse - Rollback on database connection [clickhouse]
2022/12/02 12:29:14 - Table output.0 - ERROR: Because of an error, this transform can't continue:
2022/12/02 12:29:14 - Table output.0 - ERROR: org.apache.hop.core.exception.HopException:
2022/12/02 12:29:14 - Table output.0 - Error inserting row into table [Test] with values: [], []
2022/12/02 12:29:14 - Table output.0 -
2022/12/02 12:29:14 - Table output.0 - Error inserting/updating row
2022/12/02 12:29:14 - Table output.0 - Cannot set null to non-nullable column #1 [id UInt64]
2022/12/02 12:29:14 - Table output.0 -
2022/12/02 12:29:14 - Table output.0 -
2022/12/02 12:29:14 - Table output.0 - at org.apache.hop.pipeline.transforms.tableoutput.TableOutput.writeToTable(TableOutput.java:383)
2022/12/02 12:29:14 - Table output.0 - at org.apache.hop.pipeline.transforms.tableoutput.TableOutput.processRow(TableOutput.java:117)
2022/12/02 12:29:14 - Table output.0 - at org.apache.hop.pipeline.transform.RunThread.run(RunThread.java:51)
2022/12/02 12:29:14 - Table output.0 - at java.base/java.lang.Thread.run(Thread.java:829)
2022/12/02 12:29:14 - Table output.0 - Caused by: org.apache.hop.core.exception.HopDatabaseException:
2022/12/02 12:29:14 - Table output.0 - Error inserting/updating row
2022/12/02 12:29:14 - Table output.0 - Cannot set null to non-nullable column #1 [id UInt64]
2022/12/02 12:29:14 - Table output.0 -
2022/12/02 12:29:14 - Table output.0 - at org.apache.hop.core.database.Database.insertRow(Database.java:1135)
2022/12/02 12:29:14 - Table output.0 - at org.apache.hop.pipeline.transforms.tableoutput.TableOutput.writeToTable(TableOutput.java:251)
2022/12/02 12:29:14 - Table output.0 - ... 3 more
2022/12/02 12:29:14 - Table output.0 - Caused by: java.sql.SQLException: Cannot set null to non-nullable column #1 [id UInt64]
2022/12/02 12:29:14 - Table output.0 - at com.clickhouse.jdbc.SqlExceptionUtils.clientError(SqlExceptionUtils.java:73)
2022/12/02 12:29:14 - Table output.0 - at com.clickhouse.jdbc.internal.InputBasedPreparedStatement.addBatch(InputBasedPreparedStatement.java:328)
2022/12/02 12:29:14 - Table output.0 - at com.clickhouse.jdbc.internal.InputBasedPreparedStatement.executeAny(InputBasedPreparedStatement.java:110)
2022/12/02 12:29:14 - Table output.0 - at com.clickhouse.jdbc.internal.InputBasedPreparedStatement.executeLargeUpdate(InputBasedPreparedStatement.java:185)
2022/12/02 12:29:14 - Table output.0 - at com.clickhouse.jdbc.internal.AbstractPreparedStatement.executeUpdate(AbstractPreparedStatement.java:135)
2022/12/02 12:29:14 - Table output.0 - at org.apache.hop.core.database.Database.insertRow(Database.java:1100)
2022/12/02 12:29:14 - Table output.0 - ... 4 more
Version is clickhouse-jdbc-0.3.2-patch11-all.jar.
I updated the example repo in case you want to try yourself.
Hi @uklft, just tried your repo again and it works for me. Did you update database configuration by setting nullAsDefault to 2? You can either do that via manual URL(e.g. jdbc:ch:http://<server>/?nullAsDefault=2) or add an option accordingly.
Thanks for looking into it.
So when you run the project as it is (with nullAsDefault=0) you get no error?
If i add nullAsDefault=2 it works as expected but that is not the point.
The problem is that the driver sends a null value although i am passing an empty string.
But i may want to set an empty string on a not nullable column and fail if it receives a null value. I would expect that behaviour by default.
I see. By default, Apache Hop treats empty string as null value, but you can change the behavior by setting variable HOP_EMPTY_STRING_DIFFERS_FROM_NULL to Y (Tools -> Edit config variables).
Oh wow, that is the key. Thank you so much for pointing that out!
today I updated clickhouse and now same python code that passed None data to clickhouse-connect and made Null values in tables, now makes empty string.