CoreWCF [Bug]: Potential Race Condition When WCF Client Aborts NetTCP Channel

Duplicate ?

[X] I have searched issues/discussions and did not find other issues/discussions reporting this bug.

Product version

1.6.0

Describe expected behavior

We're using a WCFCore Service with NetTcp. This code has been operating successfully for a couple of years but with the recent 1.6.0 WcfCore.NetTcp Nuget package I started getting a ton of ConnectionResetException's. I have traced this to when a Business rule Exception is thrown from the WCF service and I believe there is a race condition/bug in the CoreWCF code. The issue is not consistent and I was only able to reproduce it in debugger with break points holding up client/server threads.

Inside the service we have and ErrorHandler class that implements IErrorHandler. This has two methods (ProvideFault and HandleError). ProvideFault is executed prior to sending the response to the calling client and HandleError is executed later. In HandleError we have or logging logic to write out the information. This is where we are seeing the increased amount of ConnectionReset exceptions. It is expected that the WcfCore components cleanly handle any client disconnects that may occur without throwing an exception.

Describe actual behavior

What I have found that after passing through "ProvideFault" CoreWCF.Dispatcher.ImmutableDispatchRuntime has a method "ProcessError" inside of this methord there is a "await ReplyAsync(rpc);" method. When the ReplyAsync() is executed it returns control to the calling client. If I put a break point on that method and debug the process, I allow the CoreWCF code to complete the ReplyAsync, but I do not let it hit the next method await ProcessMessageCleanupAsync(rpc) yet. Once the client has control normal client side clean up is performed:

if (serviceChannel.State == CommunicationState.Faulted)
{
    serviceChannel.Abort();
}

if (serviceChannel.State != CommunicationState.Closed)
{
    serviceChannel.Close();
}

After the client cleans up its code I let the debugger in the WCF Service continue and process the ProcessMessageCleanupAsync(rpc). In side this method is where I am getting the ConnectionResetException. I believe that the "rpc" state is not receiving the client Abort() message quick enough and rpc shows the connection is still open which then ultimately throws the ConnectionResetException when it attempts to access it.

If I instead pause the calling Client code before the .Abort() is called and let the server side process completely first, then the calling client code does not get the ConnectionResetException.

Which binding

NetTcp

security

None

Which .NET version

.NET 6

Which os platform

Windows

Code snippet used to reproduce the issue

See above, Note: I am using .NET8 which isn't an option in the drop down list.  Also, we did not see this issue when we used the 1.5.x versions of the CoreWcf.NetTcp.

Stacktrace if any

Microsoft.AspNetCore.Connections.ConnectionResetException: An existing connection was forcibly closed by the remote host.
 ---> System.Net.Sockets.SocketException (10054): An existing connection was forcibly closed by the remote host.
   --- End of inner exception stack trace ---
   at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result)
   at System.IO.Pipelines.Pipe.ReadAsync(CancellationToken token)
   at CoreWCF.Channels.Framing.NetTcpExceptionConvertingDuplexPipe.NetTcpExceptionConvertingPipeReader.ReadAsync(CancellationToken cancellationToken)
   at CoreWCF.Channels.Framing.DuplexPipeStream.ReadAsyncInternal(Memory`1 destination, CancellationToken cancellationToken)
   at System.IO.Stream.ReadAtLeastAsyncCore(Memory`1 buffer, Int32 minimumBytes, Boolean throwOnEndOfStream, CancellationToken cancellationToken)
   at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
   at System.Net.Security.NegotiateStream.<ReadAsync>g__ReadAllAsync|105_0[TIOAdapter](Stream stream, Memory`1 buffer, Boolean allowZeroRead, CancellationToken cancellationToken)
   at System.Net.Security.NegotiateStream.ReadAsync[TIOAdapter](Memory`1 buffer, CancellationToken cancellationToken)
   at System.IO.Pipelines.StreamPipeReader.<ReadInternalAsync>g__Core|40_0(StreamPipeReader reader, Nullable`1 minimumSize, CancellationTokenSource tokenSource, CancellationToken cancellationToken)
   at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
   at CoreWCF.Channels.ServerFramingDuplexSessionChannel.ServerSessionConnectionMessageSource.ReceiveAsync(CancellationToken token)
   at CoreWCF.Channels.SynchronizedMessageSource.ReceiveAsync(CancellationToken token)
   at CoreWCF.Channels.TransportDuplexSessionChannel.EnsureInputClosedAsync(CancellationToken token)
   at CoreWCF.Channels.TransportDuplexSessionChannel.OnCloseAsync(CancellationToken token)
   at CoreWCF.Channels.ServerFramingDuplexSessionChannel.OnCloseAsync(CancellationToken token)
   at CoreWCF.Channels.CommunicationObject.CloseAsync(CancellationToken token)
   at CoreWCF.Channels.ServiceChannel.OnCloseAsync(CancellationToken token)
   at CoreWCF.Channels.CommunicationObject.CloseAsync(CancellationToken token)
   at CoreWCF.Dispatcher.MessageRpc.CloseChannelAsync()

Dec 06 '24 20:12 chekm8

To clarify, this is in the scenario where the client calls Abort() on the channel?

Jan 10 '25 06:01 mconnew

@mconnew Correct, if the client calls the "serviceChannel.Abort();" before the server has completed its cleanup.

Jan 13 '25 12:01 chekm8

Looking at the code and comparing this with .NET Framework, this does look kind of expected. I checked the full equivalent callstack in WCF on .NET Framework and as far as I can tell an exception would bubble up to MessageRpc.CloseChannel. You can see the WCF code here. The difference is you would get a CommunicationObjectAbortedException on WCF. It looks like another case needs to be added to NetTcpExceptionConvertingPipeReader to convert a ConnectionResetException into CommunicationObjectAbortedException.
Is your issue that your fault handler is getting called or that the exception isn't derived from CommunicationObject and so is throwing off your logic? Or is it you aren't expecting these exceptions at all? It's a matter of luck that if you don't normally see these on .NET Framework as you could theoretically see a similar thing (but with the correct exception) with WCF.

Jan 28 '25 23:01 mconnew

Small correction from my earlier comment, it would throw CommunicationException because the connection abort wasn't server side initiated.

Jan 29 '25 01:01 mconnew

I've made a change to convert it to a CommunicationException so you can better handle the types of exceptions a custom handler might be passed. I can't do much more than that as swallowing the exception would be a change in behavior from .NET Framework, even if it happens more often for CoreWCF. You can just ignore these exceptions in your own handler.

Jan 29 '25 01:01 mconnew

@mconnew, Thanks for making the updates. We did end up adjusting our code to catch and ignore the following:

 //ConnectionResetException occurs when the client calls abort() before the operation is completed
(ex is ConnectionResetException && ex.Message.Contains("An existing connection was forcibly closed by the remote host"))
 //OperationCanceledException occurs when the client calls abort() and the server receives the notice in enough time to cancel.
|| (ex is OperationCanceledException && ex.Message.Contains("The operation was canceled"))

Our original Abort check was: ex is CommunicationException && ex.Message.Contains("The socket connection was aborted"))

I looked at your changeset but could not tell if the CommunicationException would still have a similar "The socket connection was aborted" Message as noted above or that might change.

Feb 03 '25 13:02 chekm8

I believe the message will be:

The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue. Local socket timeout was 'unknown'.

Feb 04 '25 18:02 mconnew