Sunday, February 10, 2013

Malware Analysis 101 [Part 2]

Now that we've covered the different classifications of the functionality of malicious software, made the distinction between targeting vectors, and summarized how IDS/IPS devices fit into the scheme, its time to move into analysis of malware. In case you missed it, Part 1 can be located here.

Image Source
Approaches:

Generally, analysis of any sample can be broken down into two categories, static and dynamic which further subdivide into basic and advanced:

  • Static
    • Basic - Examine an executable without viewing instructions.
    • Advanced - Code analysis using disassembly tools to view Op Code.
  • Dynamic
    • Basic - Behavioral observation [Sandboxing].
    • Advanced - Code analysis using a debugger to manually control the flow of the program. This technique is used to understand the more complex aspects of a sample which static code analysis may be unable to provide.
Each of the above techniques can be utilized in synchronicity of one another in order to reveal different pieces of information about a piece of software. The goal of analysis is to take these many small pieces of information from multiple sources and form a picture of the nature of the sample.

Basic Static Analysis:

From here on, it is assumed that a sample is readily apparent and available. Finding malware is a whole different can of worms, and a worthy topic for a separate blog post series (maybe in the future). 

With a sample provided, one of the first things that can be done is AntiVirus scanning. The key is to use multiple programs and resources as this increases signature coverage. Siganture variations make a substantial difference in the rate of detection. Fortunately, instead of buying 8-10 host based antivirus products (and pitting them against each other on the same local machine) along with their accompanying subscriptions, several free web services exist which will scan a submitted sample against multiple Antivirus engines. These engines include both signature based detection and heuristic based detection. The most common/popular/wellknown example is, of course, Virustotal.com (which has register free, public, and private API functionality).

Before moving on, a word about the inherent weaknesses of Antivirus software. The two types of detection mechanisms, signature based and heuristic based, have a number of shortcomings. Simple code modifications easily bypass signature based detection, and in this era of updatable malware through network (command and control) communications, the cat and mouse game has become even more fast paced. Heuristic engines have a similar weakness: they can be completely bypassed with new or unusual code. If there is no record to compare a sample against, it is impossible for a heuristic engine to categorize.

Image Source
The image to the right was generated by using VirusTotal's main page to submit a binary retrieved from a domain listed on malwaredomainlist.com. This file is categorized as a generic password stealer, which slots nicely into the "Infostealer" category defined in part 1 of this series. Interesting note: 9/42 AV products detected this sample as malicious, although the ones that did identifity the sample, categorized it similarly [this suggests a commonality amongst their signature designs]. The grey window shows PE32 structural information, which is something we will examine a little later. Sites like VirusTotal and AntiVirus products as a whole are not withoiut their faults: benign files can and oftentimes do trigger alerts [Example being unebootin, packed with UPX which appears suspicious, and does trigger some alerts]. Thus, solely relying on multi-scanner service like this is not going to provide enough information to make an informed decision about the overall nature of a sample. Use the information gained from services like VirusTotal or NoVirusThanks to inform your decisions, not make them for you!

Sandboxing:
Image Source

Sandboxing is a basic, dynamic analysis technique in which a sample is run in an isolated environment: the goal being to "box off" potentially malicious activity from harming a local machine. This is generally achieved through virtualization ex: Cuckoo Sandbox. Sandboxes attempt to mimic common network services in order to monitor the behavior of malware when provided network connectivity. Like antivirus services, there are a number of free, web-based services that provide this functionality if one does not have the means to setup a local environment. By far the most popular is ThreatExpert. ThreatExpert performs automated analysis, and is generally very useful in initial reporting.

Automated sandbox analysis is not without its drawbacks and frequently will fail to give any meaningful output. Some of the issues are as follows:
1) Any sample that requires command line options will not be analyzed appropriately as an automated environment has no way to pass arguments via command line in the course of executing a piece of software.
2) Though network services are simulated, command and control traffic is not. Malware this is reliant on instructions from a remote host may never execute, leaving automated analysis useless.

Provided the sandbox is able to execute the provided sample, a report containing system changes, network connectivity, and basic functionality if generated. ThreatExpert has the added ability to identify some malwre through AV scans, but this functionality overlaps with services previously mentioned. Like the multi-engine antivirus services  the results of automated analysis should not be taken as definitive: the goal of sandboxing is to provide insight into the ways in which a sample may attempt to manipulate a host.

Report Source
Here is an example report for a sample submitted to ThreatExpert in 2010. Though 2010 may be a reltively ancient date at this point, it is a good demonstration of information provided by ThreatExpert post-analysis.  You can see basic summary information of the submitted sample including hash, file size  Aliases as identified by AV products and Time/Date Submitted. Of more interest is the "What's Been Found Field" which summarizes the activity which took place during the run time in an easily readable and understandable form. This information is very useful for understanding the capabilities of a sample, and can provide some insight into the Function Calls that you may see further along in the analysis process. The next section contains File System changes: files added, removed and modified and their names + directories, potentially useful for signature development or manual removal. The final section of note is the Network activity summary. Here you can see that a file was downloaded, potentially provided yet another sample for analysis in the future. You may also be able to infer that the sample submitted has some downloader Trojan Functionality, and is likely to making Windows API calls in order to download this file.

Stay Tuned for Part 3!
[Hashing, PE Headers, Linked Libraries and Functions!]

No comments:

Post a Comment