npgsql icon indicating copy to clipboard operation
npgsql copied to clipboard

Unknown message code: 3 or Unknown message code: 23

Open joecarr opened this issue 4 years ago • 15 comments

Steps to reproduce

We have been unable to reproduce the error in a console application. It is only visible within our production environment. If we take one of our applications, and direct the connection string directly to the db rather than pgbouncer, that will cause the issue to cease. As it happens relatively frequently, we can capture some diagnostics around the issue, but cannot reproduce at will.

The issue

We've read all of the previous responses that we can find to this issue, and do not believe it to be a threading issue (re-using the connection). The most common instance of the issue occurs on a short lived web request during which the application uses only 1 db connection. The Unknown message codes are always either 3 or 23.

I'm curious if you have any ideas to try, or logging information to capture which may help us track down the cause.

Relevant connection string parameters:

Pooling=false

Stacktrace

{"StackTraceString":"at Npgsql.Util.PGUtil.ValidateBackendMessageCode(BackendMessageCode code)
   at Npgsql.NpgsqlConnector.<<ReadMessage>g__ReadMessageLong|194_0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Npgsql.NpgsqlConnector.<ExecuteInternalCommand>d__249.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Npgsql.NpgsqlTransaction.<Commit>d__16.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Npgsql.NpgsqlTransaction.Commit()
   at NHibernate.Transaction.AdoTransaction.Commit()",
"Message":"Unknown message code: 3",
"HResult":-2147467259,
"RemoteStackTraceString":null,
"ClassName":"Npgsql.NpgsqlException",
"InnerException":null,
"Data":null,
"WatsonBuckets":null,
"ExceptionMethod":"8
ValidateBackendMessageCode
Npgsql, Version=5.0.10.0, Culture=neutral, PublicKeyToken=5d8b90d52f46fda7
Npgsql.Util.PGUtil
Void ValidateBackendMessageCode(Npgsql.BackendMessageCode)",
"RemoteStackIndex":0,
"HelpURL":null,
"Source":"Npgsql"}

Thank you beforehand for any assistance!

Further technical details

  • Npgsql version: 5.0.10
  • PostgreSQL version: 13.2
  • pgbouncer version: 1.9.0

Operating system:

  • App: Windows Server 2019 Datacenter
  • Db: Red Hat 4.8.5-44

Other details about my project setup: pgbouncer configured in transaction mode

joecarr avatar Nov 09 '21 19:11 joecarr

This definitely looks like a race condition, although I can't say yet what exactly makes it manifest (going to run a few test today). In the meantime, the best way forward is to try and gather some npgsql logs (you can find more info here). Just in case, are you using TransactionScope or Keepalive?

vonzshik avatar Nov 10 '21 06:11 vonzshik

Thank you very much for your reply. We do not use TransactionScope nor KeepAlive. We've got the logger integrated, and are attempting to issue with the logging enabled. Will follow up when we have!

joecarr avatar Dec 20 '21 17:12 joecarr

Hello @vonzshik , we were able to reproduce the situation with Npgsql logging enabled : https://gist.github.com/joecarr/ff4c63d16a9113f47a057460c925b855.

Sending cancellation... appears to be the source of the issue, but unsure why it would've been called.

joecarr avatar Dec 21 '21 14:12 joecarr

@joecarr okay, from the logs I can say that:

  1. The cancellation itself is most likely not a culprit (we send it whenever the timeout is reached, in your case, after 30 seconds)
  2. The query timed out because the read buffer was in a bad state (that is, there was some data left over from a previous query/action)

Was this connection (1576769187) mentioned before in the logs? If so, could you send the last 10/20 actions?

vonzshik avatar Dec 21 '21 15:12 vonzshik

  1. Understood
  2. This is interesting, because we're running an integration test which creates a new connection (via NHibernate), and executes 1 select statement. No previous query/action of which I am aware.

Unfortunately, there is no information in the log for connection 1576769187 which is not included in the gist.

We can reproduce an Unknown message code error when hitting the command timeout at will. Unfortunately, the one which affects our production instance occurs frequently within a second or less of execution.

joecarr avatar Dec 21 '21 19:12 joecarr

We've attempted to reproduce this issue with a simple console app both with and without NHibernate, and both throw the expected Npgsql.NpgsqlException (0x80004005): Exception while reading from stream. We're going to continue to step through the issue in our main application, and will follow up should we have any further questions.

joecarr avatar Jan 20 '22 01:01 joecarr

@joecarr great. If you manage to put a repro together for us, we can investigate this on our side.

roji avatar Jan 20 '22 10:01 roji

@joecarr Very sorry for the wait, was busy with personal stuff. Is it possible for you to test a few things:

  1. Try to connect to PgBouncer without SSL
  2. Use the latest version of PgBouncer
  3. All of the above combined

I was running a few tests with the latest version of PgBouncer and I didn't have any problems. So I just want to confirm whether the issue is reproduceable with the setup I have.

vonzshik avatar Jan 24 '22 10:01 vonzshik

Hi! I don't have a repro case but I can report that we get randomly code 1, 3, 7 and 23 messages in a similar fashion, but just with plain old reads.

App details:

  • Single threaded
  • C#, .NET Framework 4.8, Windows Server
  • Npgsql 6.0.2.0
  • Direct database connection (no ORM, no proxy, no load balancer, nothing)
  • PostgreSQL 12 + TimescaleDB plugin (this instance is a physical replica of another one).
  • One-off process that runs for 20-30min and closes.
  • Issues 10-20 different queries hundreds of times with different parameters.
  • Connection string parameters are the defaults (ie, no extra settings beyond host/user/pass).
  • I suspect the db instance may be ram starved but running an offending query individually via DBeaver (JDBC driver) runs fine albeit takes a while (~10s).

I very much doubt I can consistently repro any of this or test any further right now but I wanted to hand you the info I have in case it helps. I just updated the driver to the 6.0.3.0 version to try it out if it makes any difference.

Thank you for your time!

Example stack trace:

Npgsql.NpgsqlException (0x80004005): Unknown message code: 7 at Npgsql.Util.PGUtil.ValidateBackendMessageCode(BackendMessageCode code) at Npgsql.Internal.NpgsqlConnector.<<ReadMessage>g__ReadMessageLong|211_0>d.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Npgsql.Internal.NpgsqlConnector.<<ReadMessage>g__ReadMessageLong|211_0>d.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Threading.Tasks.ValueTask`1.get_Result() at Npgsql.NpgsqlDataReader.<NextResult>d__47.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Npgsql.NpgsqlDataReader.NextResult() at Npgsql.NpgsqlCommand.<ExecuteReader>d__116.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at Npgsql.NpgsqlCommand.<ExecuteReader>d__116.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior) at Npgsql.NpgsqlCommand.ExecuteDbDataReader(CommandBehavior behavior) at System.Data.Common.DbCommand.ExecuteReader() at [here our app code starts]

And another one with a different code:

Npgsql.NpgsqlException (0x80004005): Unknown message code: 1 at Npgsql.Util.PGUtil.ValidateBackendMessageCode(BackendMessageCode code) at Npgsql.Internal.NpgsqlConnector.<<ReadMessage>g__ReadMessageLong|211_0>d.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Npgsql.Internal.NpgsqlConnector.<<ReadMessage>g__ReadMessageLong|211_0>d.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Threading.Tasks.ValueTask`1.get_Result() at Npgsql.NpgsqlDataReader.<NextResult>d__47.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Npgsql.NpgsqlDataReader.NextResult() at Npgsql.NpgsqlCommand.<ExecuteReader>d__116.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at Npgsql.NpgsqlCommand.<ExecuteReader>d__116.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior) at Npgsql.NpgsqlCommand.ExecuteDbDataReader(CommandBehavior behavior) at System.Data.Common.DbCommand.ExecuteReader() at [here our app code starts]

fetchezar avatar Feb 17 '22 14:02 fetchezar

Hi, I also got an 'Unknown message code: 23' exception. This happens within my 'Postgres Notifications' implementation:

This code has worked with npgsql v5.0.10 without problems:

connectionStr = "Host=127.0.0.1;Username=user;Password=pw;Database=MYDB;Keepalive=60" var dbConnection= new NpgsqlConnection(connectionStr ); dbConnection.Open(); dbConnection.Notification += OnConnectionOnNotification;

using (var command = new NpgsqlCommand($"listen myChannel;", dbConnection)) { command.ExecuteNonQuery(); } while (!token.IsCancellationRequested) { dbConnection.Wait(TimeSpan.FromSeconds(5)); }

But with the latest Version v6.0.3 I got an exception (Unknown message code: 23) when dbConnection.Wait(TimeSpan.FromSeconds(5)); got called the 2nd time - that happens after 5 seconds.

Changing this : dbConnection.Wait(TimeSpan.FromSeconds(5)); to this : dbConnection.Wait(); caused an exception exactly after 60 seconds (see connection string 'Keepalive'.

Maybe this info can help? Try to get a repo of this....

The Stack-Trace :: at Npgsql.Util.PGUtil.ValidateBackendMessageCode(BackendMessageCode code) at Npgsql.Internal.NpgsqlConnector.<<ReadMessage>g__ReadMessageLong|211_0>d.MoveNext() at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task) at Npgsql.Internal.NpgsqlConnector.<Wait>d__259.MoveNext() at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Npgsql.NpgsqlConnection.Wait(Int32 timeout) at Npgsql.NpgsqlConnection.Wait(TimeSpan timeout) at DoSubscribe(Subscription objSubscription, CancellationToken token) in MyNotification.cs:line 120

Michael-Kempe avatar Feb 21 '22 15:02 Michael-Kempe

@Michael-Kempe can you please open a new issue with a fully runnable console program that shows the error?

roji avatar Feb 21 '22 15:02 roji

Sure. My Issue. Thanks

Michael-Kempe avatar Feb 21 '22 15:02 Michael-Kempe

Hello everyone. There has been found a possible reason for this error. You can find more info here: https://github.com/npgsql/npgsql/issues/4305#issuecomment-1086728515.

vonzshik avatar Apr 02 '22 21:04 vonzshik

Hello everyone. There has been found a possible reason for this error. You can find more info here: #4305 (comment).

I am on 6.0.4 version; i am passing ConnectionTimeOut parameter to connection string as suggested.. Still i am getting the same issue described here and in #4305 ticket as well.

ZenwalkerD avatar Jun 01 '22 10:06 ZenwalkerD

@ZenwalkerD you should pass CancellationTimeout, and not ConnectionTimeOut. Are you using Npgsql with .NET Framework and SSL?

vonzshik avatar Jun 01 '22 10:06 vonzshik

Closing as no response was provided.

roji avatar Mar 09 '23 09:03 roji