Rethinking Antivirus: Executable Analysis in the Network Cloud
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Rethinking Antivirus: Executable Analysis in the Network Cloud Jon Oberheide, Evan Cooke, Farnam Jahanian Electrical Engineering and Computer Science Department University of Michigan, Ann Arbor, MI 48109 {jonojono, emcooke, farnam}@umich.edu A BSTRACT host-based antivirus software. Antivirus software installed on each end host in an or- ganization has become the de-facto security mechanism 1. More is better: A network service can employ a used to defend against unwanted executables. We argue cluster of servers to quickly analyze executables us- that the executable analysis currently provided by host- ing multiple techniques. For example, instead of us- based antivirus software can be more efficiently and ef- ing a single antivirus engine, a network service can fectively provided as an in-cloud network service. In- run a wide range of antivirus engines in parallel stead of running complex analysis software on every end while also performing behavioral analysis [3, 8]. host, we suggest that each end host run a lightweight pro- Additional detection engines can easily be inte- cess to acquire executables entering a system, send them grated into a network service, allowing for consider- into the network for analysis, and then run or quarantine able extensibility. Such comprehensive analysis can them based on a threat report returned by the network significantly increase the detection coverage of ma- service. An executable analysis service run inside an en- licious software. terprise network or by a service provider could integrate 2. Keep it simple stupid: By significantly lowering antivirus software, behavioral simulation, and other anal- the complexity of host-based monitoring software, ysis engines from multiple vendors providing better de- a number of advantages are realized. Clients no tection of malware and simplify client software enabling longer need to continually update their local signa- deployment on a broader range of devices. To explore ture database, reducing administrative cost. Simpli- this idea we construct a prototype composed of a Win- fying the client software also decreases the chance dows based host agent and an in-cloud analysis service that it could contain exploitable vulnerabilities [5, and evaluate it using a diverse dataset of 5066 unique 13]. Finally, lightweight client software allows the malicious executables. By correlating information be- service to be extended to mobile and resource- tween multiple detection engines, our system provides limited devices that lack sufficient processing power over 98% detection coverage of the malicious executa- but remain an enticing target for malware. bles using eight antivirus engines and two behavioral en- gines compared to a 54% to 86% detection rate using the 3. Sharing is caring: Correlating information be- latest commercial antivirus products. tween engines is enabled when multiple engines are run within a network service and can be beneficial 1 I NTRODUCTION to detection coverage. For example, if a behavioral Detecting malicious software has become an increas- system finds that the behavior of an unknown exe- ingly challenging problem. While a single antivirus en- cutable has similar behavior to an executable pre- gine may be able to detect many types of malware, 0-day viously classified as malicious by antivirus engines, threats and other obfuscated attacks [12] can frequently the unknown executable can be quarantined. evade a single engine. We argue that the executable anal- ysis service currently provided by host-based antivirus This paper investigates how to design a network ser- software can be more efficiently and effectively provided vice for executable analysis. We construct a prototype as an in-cloud network service. Instead of running com- composed of a Windows-based host agent and an in- plex analysis software on every end host, we suggest that cloud network service. Using a recent dataset of 5066 each end host run a lightweight process to acquire exe- unique malicious executables [7], we demonstrate how cutables entering a system, send them to a network ser- the system detects 98% of the samples using eight an- vice for analysis, and then run or quarantine them based tivirus engines and two behavioral engines in parallel, on a threat report returned by the network service. compared to a 54% to 86% detection rate using the lat- Centralizing the analysis of suspicious software as est commercial antivirus products. We also show that the a network service has several advantages over existing average time to analyze a legitimate or malicious exe- 1
cutable, including network latencies, is less than a sec- ware analysis service runs as an additional layer of pro- ond. tection to augment existing security systems inside an or- ganizational network such as an enterprise. 2 A PPROACH 3.1 Executable Identification and Acquisition The use of traditional signature-based approaches for the problem of detecting malicious software in the network Malicious executables can enter an organization using a and on the host has become increasingly difficult. The range of different techniques. For example, mobile de- elevating sophistication of modern malicious software vices (e.g., a USB stick), email attachments, downloads poses a significant challenge for any single vendor to de- (wanted or unwanted), and vulnerable network services velop signatures for every new piece of malicious soft- are all common entry points. Due to the broad range ware. Indeed, a recent Microsoft survey found more than of entry vectors our system includes a lightweight exe- 45,000 new variants of backdoors, trojans, and bots dur- cutable acquisition system that is run on each end host. ing the second half of 2006 [6]. Just like existing antivirus software, the client module In response, a new generation of systems has been runs on each host and inspects each executable. The exe- developed to monitor the behavior of suspicious appli- cution of any binary is trapped and diverted to a handling cations to identify malicious binaries. Sandbox services routine first. The handling routine begins by hashing the such as Norman [8] and CWSandbox [3] enable the anal- binary and checking the hash against a local and remote ysis of suspicious binaries and produce a detailed report whitelist and blacklist. If the hash does not match in the of a binary’s actions in a simulated environment. In addi- whitelist or blacklist, the entire binary is sent securely to tion, virtual machines have been used to analyze malware the in-cloud service for analysis. and track malicious behavior [1, 10, 14]. While behav- To minimize the time between when a user downloads ioral and VM-based detection techniques improve our an executable and receives a response from the network ability to detect malicious software, they can be resource service, the architecture provides a method to send a bi- intensive and are not designed to protect every end host. nary for analysis as soon as it is written or changed on the In this paper, we explore how to improve the detection end host system (e.g., via file-copy, installation, or down- of malicious software by providing executable analysis load). Doing so amortizes the transmission and analysis as an in-cloud network service. We envision a vendor- cost over the time elapsed between binary creation and agnostic service that is deployed in a high-speed, low- user-initiated execution. latency network such as inside an enterprise or by a ser- 3.2 In-Cloud Executable Analysis vice provider. Such a centralized service would integrate detection engines from many vendors and include an end The second major component of the system is the net- host component for the automated acquisition of bina- work service responsible for malware analysis. The job ries. The network service is similar to other in-cloud pro- of the analysis service is to receive an executable and re- tection mechanisms such as email [2, 4] and HTTP [11] turn a threat report with information about whether the filtering, except that we use a dedicated end host agent executable is safe to run. which enables our system to protect against executables 3.2.1 Executable Analysis that arrive using different protocols and devices. Rather then choosing one single technique to detect mali- 3 A RCHITECTURE cious executables, we suggest that a network service run multiple engines in parallel. Unlike host-based analysis Performing the analysis of executables using a network systems that must meet relatively tight storage and com- service is not a simple task. Suspicious executables must putational constraints, a network service can easily scale be acquired and isolated so that they can be sent to an up- to meet the demand of multiple analysis engines. stream analysis service. The analysis service must be ef- While the specific backend analysis techniques used ficient, and the execution of malicious executables must are independent of the proposed architecture, two gen- be prevented. eral classes seem particularly well-suited to the proposed To solve these problems we envision an architecture approach. that includes two major components. The first host-based component acquires executables and sends them to the • Antivirus Engines: There is a wide range of an- network service, and the second network-based compo- tivirus products commercially available today, and nent identifies malicious executables and returns a threat as we will show, they differ significantly in their de- report to the end host. Figure 1 shows a high level de- tection coverage. By analyzing each candidate exe- sign of the system architecture. It is important to point cutable with multiple antivirus engines, the network out that we do not see our system replacing existing an- service is able to provide better detection of mali- tivirus or intrusion detection solutions. Rather, our mal- cious software. 2
Executables Obtained from Executable Executable Analysis Engines HTTP • Antivirus EXE Threat Report • Behavioral Analysis FTP Email • Black/White Lists P2P IM Network Cloud Figure 1: Architectural approach for in-cloud executable analysis. • Behavioral Analyzers: A second major method of When the client receives the threat report, it will either testing executables is behavioral analysis. A behav- begin execution of the binary or prevent execution and ioral system executes suspicious executables in a present the report to the user. This, of course, assumes sandboxed environment [3, 8] or virtual machine [1] that the client module has not been compromised. Just as and records host state changes and network activity. with antivirus software, we assume that integrity of the client is maintained as a trusted module. While performing analysis in parallel is far more effi- cient then serial execution, delay can still exist between 4 I MPLEMENTATION AND E VALUATION when an executable is submitted and when the result is We constructed a prototype implementation of the pro- returned. The architecture employs a caching mechanism posed architecture that includes an executable acquisi- to better deal with associated latencies. tion system for the Windows platform and an in-cloud The caching mechanism is simply a whitelist/blacklist network service for analysis. This section describes how database of hashes of previously analyzed binaries. The we implemented the prototype and explores the detection idea is that executables run in an organization will fre- capabilities and performance of the system. quently be similar across different hosts so keeping a cache of previously analyzed binaries can significantly 4.1 System Implementation improve system performance. A cache hit in either the whitelist or blacklist would therefore result in an imme- 4.1.1 Client Module diate server response indicating whether the client should We implemented a lightweight client module to detect execute the binary. executables written to disk and then send them to the network service for analysis. The notification for file sys- 3.2.2 Executable Threat Report tem writes is provided by the ReadDirectoryChangesW The results from the executable analysis engines are syn- Win32 API call, a mechanism similar to inotify on thesized into a threat report returned to the client over Linux. Each file written is scanned for a valid Portable a secure and authenticated channel. The report includes Executable (PE) header to verify whether it is an exe- three distinct sections. cutable. The executable is then hashed using the SHA- 1 algorithm and compared to both the local and remote • Execution Directive: a boolean indicating whether whitelist and blacklist. Local whitelists are seeded with the executable should be run. The formula used to hashes of the default set of executables included with set this value is customizable as described above but common Microsoft Windows installations to reduce re- the simplest mode is to set the execution directive mote whitelist lookups. to “do not run” if any detection engine classifies an The client module also prevents the execution of bi- executable as malware. naries that have been identified as malicious. By hooking the CreateProcess Win32 API call, we interpose on the • Family/Variant Labels: a list of family/variant la- creation of new processes and halt the execution of un- bels assigned to the malware by the different analy- wanted code. sis engines. • Behavioral Analysis: a list of host and network be- 4.1.2 Network Service haviors observed during simulation. This may in- The second component of the prototype is a network ser- clude information about processes spawned, files vice responsible for analyzing executables using multiple and registry keys modified, network activity, and detection engines and returning a comprehensive threat other state changes. report on each executable. Incoming executables from 3
AV Vendor Version Signatures Detection Results # Detected Antivirus Products Avast 4.7.1001 000740-0 84.7% 1 86.59% F-Secure ClamAV 0.90.2 3224 59.7% 2 92.93% Trend, Avast F-Prot 6.0.7.0 4.3.3 79.9% 3 94.63% Trend, F-Secure, Avast F-Secure 7.01.128 N/A 86.6% 4 95.34% ClamAV, Symantec, Trend, Avast Kaspersky 6.0.2.621 N/A 85.3% 5 95.85% ClamAV, Symantec, Trend, F-Secure, Avast McAfee 5100.0194 5027.0000 54.9% 6 96.15% F-Prot, ClamAV, Symantec, Trend, F-Secure, Avast Symantec 14.0.3.3 N/A 81.9% 7 96.23% Mcafee, F-Prot, ClamAV, Symantec, Trend, Kaspersky, Avast Trend Micro 15.00.1433 4.459.00 82.0% 8 96.23% all Table 1: The antivirus protection engines, the versions used in the pro- Table 2: Detection coverage of the malware samples using between one totype, and the percentage of malware samples detected. and eight antivirus products. end hosts are assigned to a request broker which is re- engines in our network service, we were able to detect sponsible for delivering the executable to each analysis 4875 executables as malicious, resulting in a detection engine, collecting the results, and returning the threat re- rate over 96%. The dataset used in the paper only in- port back to the client. cludes known malicious executables and so exploration We use eight antivirus and two behavioral engines in of false positives is an open research question. our prototype. The eight antivirus engines are listed in While the eight antivirus engines were able to iden- Table 1, and the behavioral analyzers include Norman tify 96% of the executables as malicious, 191 executa- Sandbox Analyzer [8] and the behavioral profiling sys- bles were not flagged. This last 4% is arguably the most tem described in [1]. The detection engines are run in difficult for current detection systems to identify. For- parallel in virtualized environments. tunately, the prototype also includes behavioral analy- Our prototype also includes an optimization aimed at sis and the capability to share information between de- reducing the latency perceived by end users when run- tection engines. Behavioral reports on the 191 executa- ning newly obtained executables. We implemented a net- bles revealed that 92 had similar behavior to known ma- work stream sensor that promiscuously monitors a net- licious binaries already identified by the antivirus soft- work tap (e.g., a switch span port) to acquire executables. ware. Such information could be used to classify those 92 By deploying such a component, executables can be ex- binaries as malicious resulting in a detection rate of over tracted out of network transmissions on the fly and ana- 98%, potentially extending the coverage to polymorphic lyzed by the in-cloud network service before even reach- and 0-day malware that have evaded the antivirus en- ing the destination end host. Clearly this approach does gines. Such sharing of information between detection en- not speed up the analysis of all executables as network gines illustrates one interesting advantage of combining traffic can be encrypted and the sensor currently handles heterogeneous detection systems into one service. only a handful of common network protocols. 4.2.2 Performance 4.2 Deployment and Results By moving analysis into the network, we incur the net- 4.2.1 Detection Coverage work latency of sending the binary to the service and To evaluate the detection capabilities of the prototype the latency of using a multiple engines to analyze the we obtained over 5000 malware executables from Ar- binary. In this subsection, we show that this latency bor Network’s Arbor Malware Library (AML) [7]. The is small on average and is acceptable for clients con- AML contains malware which has been captured in the nected to the high-speed, low-latency (
ware samples. The average size of the samples was 366 Finally, since our system integrates other detection KB and the average analysis time per executable for the engines, it also shares their limitations. For example, antivirus engines was 0.48 seconds. Symantec was the a behavioral profiler might not detect malware that de- slowest taking 0.91 seconds per executable on average. lays exhibiting behavior. However, since the architecture Again, the analysis times and executable sizes indicate is modular, improved detection engines are easily inte- perceived execution latencies of a second or two. grated into the system. Thus far we have assumed that the executable under We are currently working to refine our prototype and analysis has never been observed before. Our prototype are collaborating with different enterprises and service includes both a local and remote cache of threat reports providers to collect information on executable usage and of previously analyzed executables to significantly speed further evaluate our approach. up execution of previously analyzed binaries. While we ACKNOWLEDGMENTS do not have data in this paper on the frequency of legit- imate executables used in an organization, it would be This work was supported in part by the Department of Home- land Security (DHS) under contract number NBCHC060090 logical that many employees within an organization use and by the National Science Foundation (NSF) under contract a similar set of applications making caching an impor- number CNS 0627445. We would like to thank Jose Nazario tant performance enhancement. We were able to obtain from Arbor Networks and Georg Wicherski from the mwcol- data from the mwcollect Alliance [9] on the frequency lect Alliance. of malicious executables. Over a two month period on a /18 network they observed 213 distinct executables over R EFERENCES 2.5 million times. Of the 213 executables, 164 of them [1] Michael Bailey, Jon Oberheide, Jon Andersen, Z. Morley Mao, Farnam Jahanian, and Jose Nazario. Automated classification and (77%) were seen multiple times i.e., 49 were only ob- analysis of internet malware. To appear in Proceedings of the served once. Given the vast number of duplicate encoun- 10th International Symposium on Recent Advances in Intrusion tered, a blacklist cache for these 213 executables would Detection (RAID’07). have a cache hit rate of over 99.99%, resulting in a near- [2] Barracuda Networks. Barracuda spam firewall. http://www. barracudanetworks.com, 2007. instant response from the service. [3] Carsten Willems and Thorsten Holz. Cwsandbox. http://www. cwsandbox.org/, 2007. 5 D ISCUSSION [4] Cloudmark. Cloudmark authority anti-virus. http://www. This paper has presented a vision for moving from a host- cloudmark.com, 2007. centric antivirus paradigm to providing executable anal- [5] Abhishek Kumar, Vern Paxson, and Nicholas Weaver. Exploit- ing underlying structure for detailed reconstruction of an internet- ysis as an in-cloud network service. We constructed a scale event. Proceedings of the USENIX/ACM Internet Measure- prototype and our initial results show potential for a sig- ment Conference, October 2005. nificant improvement in the detection of real malicious [6] Microsoft. Microsoft security intelligence report: July-december software. However, there are several areas that require 2006. http://www.microsoft.com/technet/security/ default.mspx, May 2007. further investigation. [7] Arbor Networks. Arbor malware library (AML). http://www. First and foremost is the question of disconnected op- arbornetworks.com, 2007. eration. When a mobile user is not connected to the net- [8] Norman Solutions. Norman sandbox whitepaper. work, has a high-latency, low-bandwidth connection, or http://download.norman.no/whitepapers/whitepaper_ is the victim of a denial of service attack, sending ex- Norman_SandBox.pdf, 2003. ecutables to the network service may not be feasible. [9] Paul Bacher and Markus Kotter and Georg Wicherski. The mw- collect alliance. http://www.mwcollect.org, 2007. While certain organizations may select a policy requir- [10] Honeynet Project. Know Your Enemy: Learning with VMware. ing that users wait for network connectivity before run- 2003. ning new applications, others may desire more flexibility. [11] Niels Provos. Spybye. http://www.monkey.org/˜provos/ One solution is to have mobile users also run traditional spybye, 2007. antivirus software as a fail-over backup. Therefore, in the [12] Paul Royal, Mitch Halpin, David Dagon, Robert Edmonds, and disconnected state, users will have at least the same de- Wenke Lee. PolyUnpack: Automating the Hidden-Code Extrac- tion of Unpack-Executing Malware. In The 22th Annual Com- tection coverage as today. puter Security Applications Conference (ACSAC 2006), Miami Another issue is that malicious software comes in Beach, FL, December 2006. many forms. While our prototype currently operates only [13] Symantec Corporation. Symantec security advisory (sym06- on Win32 Portable Executables, we are working on ex- 010). http://www.symantec.com/avcenter/security/ Content/2006.05.25.html, 2006. tending it to handle external DLLs, interpreted scripting [14] Michael Vrable, Justin Ma, Jay Chen, David Moore, Erik Van- languages, malicious web content, and other file types. dekieft, Alex C. Snoeren, Geoffrey M. Voelker, and Stefan Sav- Although our system cannot detect shellcode injected age. Scalability, fidelity and containment in the Potemkin virtual into memory, such shellcode often fetches an executable honeyfarm. In Proceedings of the 20th ACM Symposium on Op- erating System Principles (SOSP), Brighton, UK, October 2005. which can be detected. 5
You can also read