Quick, what’s the first word that comes to mind when you hear the word ‘forensics’? CSI? Despite being an IT guy, I still have morbid thoughts along the lines of a pathologist slicing into a cadaver. Or collecting stuff that might contain the perps’ DNA at a crime scene (fingerprints are so yesteryear.)
Webster’s defines forensics as “The use of science and technology to investigate and establish facts in criminal or civil courts of law.”
I think this fits well with corporate needs for network forensics using analysis tools. Many vendors have tools with catchy taglines on variants of “Retrospective Analysis”, “Business Forensics”, “Turn Back the Clock,” and so on. Unlike real-time, the basic premise behind network forensics is to mine data (usually via packets) and perform post analysis to reconstruct content or gather intelligence as to why certain things happened. In some ways, forensics is like detailed hindsight.
There are several areas where forensics can be applied. Samples of some broad categories include:
- Compliance: Oops, someone sent out company confidential financial information in an unencrypted email or used IM to gossip about an coworker's medical condition.
- Troubleshooting: Why did our network meltdown this morning? Why do our CRM users often experience poor performance in the afternoon?
- Verticals: Why did the core switch peg during a critical trading hour? Why are doctors losing wireless connectivity? Is our converged VoIP operating smoothly?
Returning to the Webster definition, analysis tools can be used to establish facts as to a particular network related event that is disturbing. By network, I mean the entire infrastructure from fabric to nodes to users – let’s not forget about the human element.
According to searchnetworking.com, Marcus Ranum is credited with saying “Network forensics is the capture, recording, and analysis of network events in order to discover the source of security attacks or other problem incidents.”
Thus by virtually all definitions of the term, forensics is traditionally associated with crime solving. This puts us more in the overall thinking of “when illicit things happened, what happened, and how can it be prevented?” whether from inside or outside sources. Contrast this to troubleshooting as mentioned above. Forensics is criminal. A slow network is not. Although one could argue that a dead network is criminal!
There’s also a relatively new category of forensics – enterprise forensics that focuses both on user activity and what drives or doesn’t drive the business (analytics + behavior). Is what we’re seeing on the network consistent with business objectives? Apdex (a measure of application satisfaction with respect to end-users) is a good measure of this. Now we’re crossing the boundary over into Application Performance Management (APM) along with user behavior and IT vs. business expectations.
Some companies in fact have attempted with limited success, to focus on the behavioral aspect of forensics. According to the book Digital Evidence and Computer Crime, “Behavioral evidence analysis provides a systematized method of synthesizing the specific technical knowledge and general scientific methods to gain a better understanding of criminal behavior and motivation.”
The big question here is whether or not unusual behavior can be predicted in advance based on characteristics of anomalous activity. So what if Johnny does a 2 gig file transfer on Friday afternoon? Maybe it’s a routine back-up. Maybe it’s a one-off OS patch. Furthermore, who cares if our WAN utilization is 100% during that time, as long as it’s fair access for all and all access for one when no one else is on at the moment? Naturally, illegal file sharing and such is cause for alarm, but there will always be unpredictable anomalous spikes that don’t fit baselines.
I used to teach in my network analysis and troubleshooting courses that having 100% utilization is not necessarily a bad thing and should not be sending and setting off SNMP traps and alarms all over the place. Of course, you don’t want one user or application to consume all the bandwidth for extended periods of time. But brief bursts of 100% can actually be a good thing. After all, if no application can ever utilize 100% of the pipe, then perhaps we should optimize things. The key is to get on and off the network as quickly as possible – big pipes (and low end-to-end latency) help to achieve that. But I digress.
Forensics tools need to provide the flexibility to blend real-time analysis with forensics and allow you to optimize the tool for your given situation. Is 100% capture to disk of massive amounts of data important to you? Where do you need to cover (capture) in your network, what is the nature of the traffic, and what are the capture bandwidth requirements? How long do you need to keep the data around? How important are the distributed aspects and how efficient is the data conveyed to centralized consoles or distributed consoles shared by multiple engineers (investigators)? Do you prefer that the forensics data mining and subsequent analysis be carried on at the remote engines or brought back to the console to analyze locally and/or take off-line?
So slip on a mask and a pair of latex gloves and get to work!