Wednesday, February 6, 2013

Malware Analysis 101 [Part 1]

Back in July of 2012 at the Rhode Island OWASP and again in September of 2012 at "The Brain Tank", I was fortunate enough to be allocated 50 minutes during which to speak about a sort of side hobby that overlaps into my normal workflow. Its a topic that really inspired me to look into the Info-Sec industry as a career: that topic is Malware. I find evil things highly entertaining, and what is more sinister than malicious software?

The goal of my original presentation was to provide an overview of basic techniques I use on a daily basis in order to inform my decision making process. Within 15 minutes of receiving a sample (regardless of the form it arrives in!) an analyst needs to be able to determine what action to take. The desired outcome is to categorize the sample as:
1) Benign
2) Suspicious
3) Malicious
With that in mind, I am going to go ahead and skip the first 2 slides, which consisted of an Introduction and an About Me section, and hop right into the meat of my presentation. 

Note: Some of my slides contain 3rd party graphics or images. These will all be linked below the image of the slide.

Image Source
Types of Malware:

Subdividing different "species" of malware is a fairly difficult task as there are a significant numbers of samples which exhibit a large range of capabilities. With that in mind, a generic outline can be presented as the following:

  • Backdoor - Installed onto a host to allow access. Usually, an attacker can connect with little to no authentication to an environment in which he or she may execute commands.
  • Botnet - Similar to a backdoor, however each machine infected with the same malware receives the same set of instructions from a command and control server (C2).
  • Downloader - Code that allows the download of additional malicious code. Generally a "gateway" type of malware installed after an attacker first gains access. This behavior is often observed in Exploit Kits (Exploit -> Downloader Trojan -> Trojan retrieves botnet or Infostealer Payload).
  • Rootkit/Bootkit - Conceals the existence of itself and other malicious code in an attempt to provide some resiliency and persistence. Generally malware in this category spans into backdoor functionality.
  • Worm/Virus - Malicious code that can copy itself and propagate. In ye olde times this was used for DOS/Notoriety. Current functionality appears to be financially motivated; worms are often utilized to gain access to machines to send spam.
  • Infostealer [Trojan] - Collects information and exfiltrates it. Usually includes sniffers, hash collection, and keyloggers. A prominent example is the Zeus family of malware. 
These categories are NOT mutually exclusive!

Modern Malware:
Image Source

Malicious software is the primary vector for the majority of intrusions. It is easiest to break down the "styles" of attacks into two categories:

  1. Targeted - The result of social engineering techniques like spear phishing. This is the "precision strike" approach.
  2. Untargeted - This is commonly described as drive-by exploitation, as observed in Exploit Kits. This is the "shotgun" approach.
A key part of modern malware (actually, what makes it modern) is the inclusion of network communication components. In olden times, malware did not always utilize a host's internet connectivity to receive commands or update; this provided for autonomous, but dumb, software. Nowadays, it is exceedingly common for malicious software to include some type of network functionality. Malware authors have incorporated a number of techniques for communication ranging from the basic (HTTP GET/POST requests over plain text) to the advanced (Custom binary protocols with encryption). No matter the technique used, the general functionality remains the same: network connectivity allows for data exfiltration, remote command execution, and updating of core malware components.

Network Detection:

Image Source
With the rise of network connectivity of malware, IDS/IPS platforms have become a popular solution for detecting threats. These devices, however, are not without their limits. Some of the pitfalls of IDS/IPS devices are:

  • Analysis of traffic in context is extremely difficult to automate. Protocols and formatting within protocols make the context of traffic of paramount importance. For example, it is common to see IP-> TCP -> HTTP -> GZIP -> JavaScript in the same stream. If the IDS/IPS is unable to examine the JavaScript, it could miss potentially malicious payloads. The good news is that this is a well known short coming of this type of technology and developers are constantly adding features to make their platforms increasingly protocol aware.
  • Related to the issue of contextual analysis, IDS/IPS make decisions on one packet at a time. Evasion/Insertion techniques or malformed packets can cause unintended results.
The introduction of target-based analysis (Snort's Frag3 Preprocessor) allows the IDS/IPS to "know" how devices will handle ambiguity. Though this is a step in the right direction, human analysis is still required to compensate for the shortcomings of this technology. Unusual or one-off scenarios can still confuse IDS/IPS. 

IDS have come a long way since their days of "Network Grep", but they are certainly not foolproof.

Signature Types:

Several types of signatures are used by IDS/IPS platforms to detect the activity of malicious software 

  1. Phoning Home - The goal of these types of signatures is to detect outbound connection attempts to a malicious host. This is occasionally done by inference: "Oh hey, this internal host is making tons of HTTP POST requests to a host in Kazakhstan  That probably isn't suspicious." Other signatures in this category focus on the specific formatting of requests. A recent example is the Redkit Exploit Kit utilizing a predictable .jar file name. This isn't foolproof, as services like Twitter/Facebook/Google have been used to mask command and control communications within benign websites.
  2. Third-Party communications - Malware may not necessarily check in with a C2 first. In the case of several versions of the "ZeroAccess" Trojan, the legitimate host "fling(dot)com" is queried in an attempt to geo-locate the IP of the infected host. This can be an early indicator of a compromised machine.
  3. Inference - Like the name suggests, this type of signature is essentially "reading into" traffic and making a decision. This is commonly implemented in the form of DNS lookups for malicious domains. This traffic may be from a local host to an internal DNS server, however the domain being requested has been identified as malicious and thus results in an alert. Another example is the TDL3/4 class of Trojans which have consistently identifiable certificate exchanges.
Stay tuned for Part 2: Approaches to Analysis!


  1. It's important to note for signatures, one of the best signature options is the use of Yara:

    1. Absolutely! Yara is an excellent tool for those who are looking into developing signatures. Using it alongside ClamAV is quite handy.