How do packet analyzers stack up in detecting and reporting the simplest and most fundamental indication of an anomaly, the venerable TCP retransmission? I recently looked at five tools ranging from free to $80,000. I did this because I saw something suspicious as reported by the big gun.
In my book I recommend that anytime you see suspect reporting or diagnostics, that you verify them at least once the hard way – by hand – so that from then on you know whether they are accurate under those circumstances.
In this particular case, the "AppDoctor" of the high-end tool was informing me that “There are many packet retransmissions. The network may be heavily congested, or there may be an error-prone link.” The value it gave me was over 5% which is certainly of concern. That’s 50+ packets out of every 1,000.
I didn't suspect at the time that the network path was congested and didn't want to chase down duplex mismatches and the like, so I wanted a second opinion and ran the trace through the free tool. It came up with all of 3 retransmissions per 1000 packets or 1/3 of a percent.
Why the large difference of opinion? Shouldn’t packets be cut and dry, i.e. factual? Before answering the question, I’d like to point out some of the various idiosyncrasies in how a number of analyzers report TCP retransmissions.
As you may have guessed, the free tool is Wireshark. The $80,000 tool, Opnet’s ACE. I also ran the trace through the latest Network Instruments Observer, WildPackets OmniPeek, and Network General Sniffer analyzers and focused on one section where the Opnet ACE diagnosed 72 retransmissions out of some 1,100 packets.
Observer reported the same high retransmission count. It tried to be extra helpful in noting that they were also of the “too fast retransmission” variety (at or below 180 ms by default) and that they were “excessive” (2% or more of the total packets by default for the critical level). That would have been a great diagnosis if it had been correct. More on this later.
The Sniffer values were a little strange, depending on if you are looking at the number of Sniffer symptom objects or the packet summary decodes. The summary decodes contain three “Expert: Retransmission” notifications yet the tally in the expert summary lists only two possibly due to a grouping by TCP flow/conversation (i.e. two of the three retransmissions were in the same TCP flow.)
So the individual packet retransmission counts for each analyzer were:
- Opnet: 72
- Wireshark: 3
- Observer: 72
- OmniPeek: 3
- Sniffer: 3
As I used to say in my classes, shall we just average the results and call it a day? Not!
The right answer verified manually is three, making WireShark, OmniPeek and Sniffer were correct in this particular scenario. That’s not to say that these tools are always correct in every situation - they aren't. Again, the purpose of this exercise is to verify your data. I'm not picking on any particular tool.
The reason for the large number of false positives in and Opnet and Observer was due to its misinterpretation of the TCP close connection sequence.
A graceful TCP close (i.e. not a RST or reset) is a four-packet TCP FIN sequence consisting of a FIN followed by an ACK to close one half of the connection, and then another FIN-ACK pair closes the second half of the connection to bring it to a full close (or in short FIN-ACK, FIN-ACK) as shown in the following figure (from Network General Sniffer).
Textbook TCP Close
In the trace in question, I noticed a different close sequence: FIN one direction, FIN the other direction, ACK the FIN, ACK the FIN (or in short, FIN-FIN-ACK-ACK). Also unusual was the fact that the FIN bit was set in the ACK from the server. The following shows the alternate TCP close.
Observer and Opnet apparently are fooled into thinking that the final TCP ACK packet is a retransmission since the FIN bit is set again (which is irrelevant as the connection is already closed) and the TCP sequence number matches the previous FIN packet. Sniffer et. el. do not report them as retransmissions because they are simply the last of the four packets in a TCP FIN close sequence.
This particular application used hundreds of such TCP sessions and subsequent closes to transfer a relatively small amount of data, another problem in itself.
The lesson is that when in doubt, seek a second opinion from another tool or roll up your sleeves and perform manual analysis on a small test section of packet data to confirm your suspicions.