Using TCP keepalive to Detect Network Errors --------------------------------------------- To detect network errors and signaling connection problems, you can enable TCP keep alive feature. It will increase signaling bandwidth used, but as bandwidth utilized by signaling channels is low from its nature, the increase should not be significant. Moreover, you can control it using keep alive timeout. The problem is that most system use keep alive timeout of 7200 seconds, which means the system is notified about a dead connection after 2 hours. You probably want this time to be shorter, like one minute or so. On each operating system, the adjustment is done in a different way. After settings all parameters, it's recommended to check whether the feature works correctly - just make a test call and unplug a network cable at either side of the call. Then see if the call terminates after the configured timeout. Here are some hints below. Linux systems ------------- Use sysctl -A to get a list of available kernel variables and grep this list for net.ipv4 settings: sysctl -A | grep net.ipv4 There should exist the following variables: - net.ipv4.tcp_keepalive_time - time of connection inactivity after which the first keep alive request is sent; - net.ipv4.tcp_keepalive_probes - number of keep alive requests retransmitted before the connection is considered broken; - net.ipv4.tcp_keepalive_intvl - time interval between keep alive probes. You can manipulate with these settings using the following command: sysctl -w net.ipv4.tcp_keepalive_time=60 sysctl -w net.ipv4.tcp_keepalive_probes=3 sysctl -w net.ipv4.tcp_keepalive_intvl=10 This sample command changes TCP keepalive timeout to 60 seconds with 3 probes, 10 seconds gap between each. With this, your application will detect dead TCP connections after 90 seconds (60 + 10 + 10 + 10). FreeBSD and MacOS X ------------------- For the list of available TCP settings (FreeBSD 4.8 an up and 5.4): sysctl -A | grep net.inet.tcp There should exist the following variables: - net.inet.tcp.keepidle - Amount of time, in milliseconds, that the (TCP) connection must be idle before keepalive probes (if enabled) are sent; - net.inet.tcp.keepintvl - The interval, in milliseconds, between keepalive probes sent to remote machines. After TCPTV_KEEPCNT (default 8) probes are sent, with no response, the (TCP)connection is dropped; - net.inet.tcp.always_keepalive - Assume that SO_KEEPALIVE is set on all TCP connections, the kernel will periodically send a packet to the remote host to verify the connection is still up. Therefore formula to calculate maximum TCP inactive connection time is following: net.inet.tcp.keepidle + (net.inet.tcp.keepintvl x 8) the result is in milliseconds. For example, by setting: net.inet.tcp.keepidle = 10000 net.inet.tcp.keepintvl = 5000 net.inet.tcp.always_keepalive =1 (must be 1 always) the system will disconnect a call when TCP connection is dead for: 10000 + (5000 x 8) = 50000 msec (50 sec) To make system remember these settings at startup, you should add them to /etc/sysctl.conf file. Solaris ------- For the list of available TCP settings: ndd /dev/tcp \? Keepalive related variables: - tcp_keepalive_interval - idle timeout. Example: ndd -set /dev/tcp tcp_keepalive_interval 60000 Windows ------- Search Knowledge Base for article ID 120642: http://support.microsoft.com/kb/120642/EN-US Basically, you need to tweak some registry entries under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters Linux ----- net.ipv4.tcp_keepalive_time = 300 net.ipv4.tcp_keepalive_probes = 10 net.ipv4.tcp_keepalive_intvl = 30 The procedures involving keepalive use three user-driven variables: tcp_keepalive_time the interval between the last data packet sent (simple ACKs are not considered data) and the first keepalive probe; after the connection is marked to need keepalive, this counter is not used any further tcp_keepalive_intvl the interval between subsequential keepalive probes, regardless of what the connection has exchanged in the meantime tcp_keepalive_probes the number of unacknowledged probes to send before considering the connection dead and notifying the application layer