My previous blog looked at some of the operational details of TCP SACK, both from a performance and analysis perspective. I noted how packet retransmissions are reduced when the client performs selective acknowledgements but the server performance may still be impacted, lowering the overall throughput. In this second of two parts I’ll comment on how things look from a protocol analyzer perspective.
One of the first things that should have been obvious from the discussion thus far is that analyzers should also be looking at SACK information from the client to determine whether or not retransmissions exist. Unfortunately, most analyzers only look for retransmissions the old fashion way: duplicate TCP sequence numbers. If the packet was dropped prior to your analyzer insertion point into the network, retransmissions go undetected.
One must also be careful when seeing a TCP packet with a lower sequence number than the previous packet. It is not necessarily an out of order packet. This can easily be determined by looking for prior SACK information, which virtually all clients and servers in use today support by default.
Finally, it is of interest to see how efficiently a server is able to process and send the missing segment or segments. The case I referred to in the previous blog showed server delay in segment recovery.
Let’s look at three analyzers in order by name: Observer (Network Instruments), OmniPeek (WildPackets), and Wireshark (the Ethereal replacement spearheaded by CACE).
Note: I tested latest shipping version of each analyzer as of this blog posting. The three I picked are all representative of today’s protocol analysis tools and all have some type of expert system. I ran the same exact trace file (TCP packet loss as discussed in Part 1) through all three analyzers, using the .enc format. The trace was originally captured from the client’s Ethernet, with the remote server located on the other side of a WAN.
Observer reported no TCP events in the expert analsysis summary. Yet in looking at the TCP retrans column at the far right, it did show 32 retransmissions.
Bottom line: With Observer, be sure to examine the retransmission column in the connection details for problematic application/TCP performance. Be careful with the Connection Dynamics feature which identifies the packets as out of order, not as retransmissions like the expert.
OmniPeek has a unique expert event called “TCP Slow Segment Recovery” as you can see in the screen shot below. Thus, OmniPeek does look at both the SACK information from the client followed by the respective recovered segments sent by the server. The default delay setting is 250 milliseconds. By using a little trick and lowering this to 0 milliseconds, we can catch all recovered segments. Per the screenshot, the number of segments recovered is 30.
Bottom line: Slow segment recovery may or may not be a problem, but is a strong hint. You need to analyze further to see if it goes with the flow or disrupts the flow. One hint is to look at the max, min, and average throughput reported by the expert.
As shown in the next screen shot, the Wireshark expert info correctly ‘Notes' that there are TCP Retransmissions (32 total) and 'Warns' about 'Previous Segment Lost' (30 total). Thus, it took 32 retransmissions to recover 30 segments.
Bottom line: Look prior to the retransmissions to see if SACK is being using. Also, the duplicate ACK’s shown in the screen shot are not really duplicates per se. As packets continue to stream in following a missing segment, the client continues to SACK each segment, incrementing the received block range. Once the missing segment or segments are received, the client resorts back to normal periodic ACKs.
Hopefully I’ve whet your appetite for probing further. TCP Selective ACKs work well in reducing TCP retransmissions but watch out for how efficiently servers handle it. This is but one example of how we can analyze better, smarter, deeper.