Our Agent is Receiving Closed Connections from the ZIS
There are many reasons that an agent may receive a closed connection when it tries to send a message to the ZIS. It is the responsibility of the agent to handle this condition properly which means to not simply resend the message immediately or change a few pieces of information in the message (like the MessageId) and resend it).
In these times where data centers face daily threats from those who would want to intrude, network devices and operating systems put in place systems to try to detect intruder's behavior and block their access. Agents that mishandle closed connection errors as described above end up resembling an intruder to the network or operating system and they protect their assets by continuing to close your connections, which continues the cycle.
How to Correctly Handle a Closed Connection
When your agent receives a closed connection, it will either get a "web exception" or a "runtime exception" (at least this is the terminology used by the .Net framework). With web exceptions, you get much more information about what could have happened:
|ConnectFailure||The remote service could not be contacted at the transport level.|
|ConnectionClosed||The connection was closed prematurely.|
|KeepAliveFailure||The server closed a connection made with the Keep-alive header set.|
|NameResolutionFailure||The name service could not resolve the host name.|
|ProtocolError||The response received from the server was complete but indicated an error at the protocol level.|
|ReceiveFailure||A complete response was not received from the remote server.|
|RequestCanceled||The request was canceled.|
|SecureChannelFailure||An error occurred in a secure channel link.|
|SendFailure||A complete request could not be sent to the remote server.|
|ServerProtocolViolation||The server response was not a valid HTTP response.|
|Success||No error was encountered.|
|Timeout||No response was received within the time-out set for the request.|
|TrustFailure||A server certificate could not be validated.|
|MessageLengthLimitExceeded||A message was received that exceeded the specified limit when sending a request or receiving a response from the server.|
|Pending||An internal asynchronous request is pending.|
|PipelineFailure||This value supports the .NET Framework infrastructure and is not intended to be used directly in your code.|
|ProxyNameResolutionFailure||The name resolver service could not resolve the proxy host name.|
|UnknownError||An exception of unknown type has occurred.|
In our agent, we catch web exceptions and sleep the agent before resending the message an amount of time relative to the type of error.
For example, if the returned value was "DNS Failure", it doesn't make sense to retry right away because a human needs to make a configuration change — so send a notification to an administrator and retry in 20 minutes.
If a "runtime error" was received, it is likely that the agent's own operating system is causing the failure. This might mean that the message may have been sent already (and may have been processed by the ZIS) and that something local may need to be addressed.
In any event, the agent should at least wait for a short time (perhaps .2-.5 seconds) before attempting any retries. In our agent, we increase that retry interval on each retry attempt until it reaches some maximum (perhaps 1 hour). This is because after each failed retry, the likelihood of a success diminishes with each subsequent retry at the same interval.