Semantics-Aware Android Malware Classification Using Weighted Contextual API Dependency Graphs

Page created by Monica Gallagher

IT & Technique

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Semantics-Aware Android Malware Classification Using
        Weighted Contextual API Dependency Graphs

                                Mu Zhang                           Yue Duan                   Heng Yin                 Zhiruo Zhao
                                          Department of EECS, Syracuse University, Syracuse, NY, USA
                                                   {muzhang,yuduan,heyin,zzhao11}@syr.edu

ABSTRACT                                                                                      1.    INTRODUCTION
The drastic increase of Android malware has led to a strong interest                             The number of new Android malware instances has grown ex-
in developing methods to automate the malware analysis process.                               ponentially in recent years. McAfee reports [3] that 2.47 million
Existing automated Android malware detection and classification                               new mobile malware samples were collected in 2013, which repre-
methods fall into two general categories: 1) signature-based and 2)                           sents a 197% increase over 2012. Greater and greater amounts of
machine learning-based. Signature-based approaches can be easily                              manual effort are required to analyze the increasing number of new
evaded by bytecode-level transformation attacks. Prior learning-                              malware instances. This has led to a strong interest in developing
based works extract features from application syntax, rather than                             methods to automate the malware analysis process.
program semantics, and are also subject to evasion. In this paper,                               Existing automated Android malware detection and classifica-
we propose a novel semantic-based approach that classifies An-                                tion methods fall into two general categories: 1) signature-based
droid malware via dependency graphs. To battle transformation                                 and 2) machine learning-based. Signature-based approaches [18,
attacks, we extract a weighted contextual API dependency graph as                             37] look for specific patterns in the bytecode and API calls, but they
program semantics to construct feature sets. To fight against mal-                            are easily evaded by bytecode-level transformation attacks [28].
ware variants and zero-day malware, we introduce graph similarity                             Machine learning-based approaches [5, 6, 27] extract features from
metrics to uncover homogeneous application behaviors while toler-                             an application’s behavior (such as permission requests and criti-
ating minor implementation differences. We implement a prototype                              cal API calls) and apply standard machine learning algorithms to
system, DroidSIFT, in 23 thousand lines of Java code. We evaluate                             perform binary classification. Because the extracted features are
our system using 2200 malware samples and 13500 benign sam-                                   associated with application syntax, rather than program semantics,
ples. Experiments show that our signature detection can correctly                             these detectors are also susceptible to evasion.
label 93% of malware instances; our anomaly detector is capable                                  To directly address malware that evades automated detection,
of detecting zero-day malware with a low false negative rate (2%)                             prior works distill program semantics into graph representations,
and an acceptable false positive rate (5.15%) for a vetting purpose.                          such as control-flow graphs [11], data dependency graphs [16, 21]
                                                                                              and permission event graphs [10]. These graphs are checked against
                                                                                              manually-crafted specifications to detect malware. However, these
Categories and Subject Descriptors                                                            detectors tend to seek an exact match for a given specification and
D.2.4 [Software Engineering]: Software/Program Verification—                                  therefore can potentially be evaded by polymorphic variants. Fur-
Validation; D.4.6 [Operating Systems]: Security and Protection—                               thermore, the specifications used for detection are produced from
Invasive software                                                                             known malware families and cannot be used to battle zero-day mal-
                                                                                              ware.
General Terms                                                                                    In this paper, we propose a novel semantic-based approach that
                                                                                              classifies Android malware via dependency graphs. To battle trans-
Security                                                                                      formation attacks [28], we extract a weighted contextual API de-
                                                                                              pendency graph as program semantics to construct feature sets. The
Keywords                                                                                      subsequent classification then depends on more robust semantic-
Android; Malware classification; Semantics-aware; Graph similar-                              level behavior rather than program syntax. It is much harder for
ity; Signature detection; Anomaly detection                                                   an adversary to use an elaborate bytecode-level transformation to
                                                                                              evade such a training system. To fight against malware variants
                                                                                              and zero-day malware, we introduce graph similarity metrics to un-
                                                                                              cover homogeneous essential application behaviors while tolerat-
                                                                                              ing minor implementation differences. Consequently, new or poly-
                                                                                              morphic malware that has a unique implementation, but performs
Permission to make digital or hard copies of all or part of this work for personal or         common malicious behaviors, cannot evade detection.
classroom use is granted without fee provided that copies are not made or distributed            To our knowledge, when compared to traditional semantics-aware
for profit or commercial advantage and that copies bear this notice and the full cita-        approaches for desktop malware detection, we are the first to ex-
tion on the first page. Copyrights for components of this work owned by others than           amine program semantics within the context of Android malware
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-          classification. We also take a step further to defeat malware vari-
publish, to post on servers or to redistribute to lists, requires prior specific permission   ants and zero-day malware by comparing the similarity of these
and/or a fee. Request permissions from permissions@acm.org.
                                                                                              programs to that of known malware at the behavioral level.
CCS’14, November 3–7, 2014, Scottsdale, Arizona, USA.
Copyright 2014 ACM 978-1-4503-2957-6/14/11 ...$15.00.
                                                                                                 We build a database of behavior graphs for a collection of An-
http://dx.doi.org/10.1145/2660267.2660359.                                                    droid apps. Each graph models the API usage scenario and pro-

gram semantics of the app that it represents. Given a new app, a
Submit Vet Update
query is made for the app’s behavior graphs to search for the most Database &
similar counterpart in the database. The query result is a similarity Classifiers
Developer Android
score which sets the corresponding element in the feature vector of App Market
the app. Every element of this feature vector is associated with an Report Offline Graph Database
Construction &
individual graph in the database. Online Detection Training Phase
We build graph databases for two sets of behaviors: benign and Figure 1: Deployment of DroidSIFT
malicious. Feature vectors extracted from these two sets are then
used to train two separate classifiers for anomaly detection and sig- rudimentary. As an example, consider the Bouncer [22] vetting
nature detection. The former is capable of discovering zero-day system that is used by the official Google Play Android market.
malware, and the latter is used to identify malware variants. Though the technical details of Bouncer are not publicly avail-
We implement a prototype system, DroidSIFT, in 23 thousand able, experiments by Oberheide and Miller [24] show that Bouncer
lines of Java code. Our dependency graph generation is built on performs dynamic analysis to examine apps within an emulated
top of Soot [2], while our graph similarity query leverages a graph Android environment for a limited period of time. This method
matching toolkit [29] to compute graph edit distance. We evaluate of analysis can be easily evaded by apps that perform emulator
our system using 2200 malware samples and 13500 benign sam- detection, contain hidden behaviors that are timer-based, or oth-
ples. Experiments show that our signature detection can correctly erwise avoid triggering malicious behavior during the short time
label 93% malware instances; our anomaly detector is capable of period when the app is being vetted. Signature detection tech-
detecting zero-day malware with a low false negative rate (2%) and niques adopted by the current anti-malware products have also been
an acceptable false positive rate (5.15%) for vetting purpose. shown to be trivially evaded by simple bytecode-level transforma-
In summary, this paper makes the following contributions: tions [28].
We propose a new technique, DroidSIFT, illustrated in Figure 1,
• We propose a semantic-based malware classification to ac-
that addresses these shortcomings and can be deployed as a re-
curately detect both zero-day malware and unknown variants
placement for existing vetting techniques currently in use by the
of known malware families. We model program semantics
Android app markets. This technique is based on static analysis,
with weighted contextual API dependency graphs, introduce
which is immune to emulation detection and is capable of analyz-
graph similarity metrics to capture homogeneous semantics
ing the entirety of an app’s code. Further, to defeat bytecode-level
in malware variants or zero-day malware samples, and en-
transformations, our static analysis is semantics-aware and extracts
code similarity scores into graph-based feature vectors to en-
program behaviors at the semantic level. More specifically, we
able classifiers for anomaly and signature detections.
achieve the following design goals:
• We implement a prototype system, DroidSIFT, that performs
static program analysis to generate dependency graphs. Droid- • Semantic-based Detection. Our approach detects malware
SIFT builds graph databases and produces graph-based fea- instances based on their program semantics. It does not rely
ture vectors by performing graph similarity queries. It then on malicious code patterns, external symptoms, or heuristics.
uses the feature vectors of benign and malware apps to con- Our approach is able to perform program analysis for both
struct two separate classifiers that enable anomaly and signa- the interpretation and demonstration of inherent program de-
ture detections, respectively. pendencies and execution contexts.

• We evaluate the effectiveness of DroidSIFT using 2200 mal- • High Scalability. Our approach is scalable to cope with
ware samples and 13500 benign samples. Experiments show millions of unique benign and malicious Android app sam-
that our signature detection can correctly label 93% malware ples. It addresses the complexity of static program analysis,
instances; our anomaly detector is capable of detecting zero- which can be considerably expensive in terms of both time
day malware with a low false negative rate (2%) and an ac- and memory resources, to perform precisely.
ceptable false positive rate (5.15%) for vetting purpose.
• Variant Resiliency. Our approach is resilient to polymor-
phic variants. It is common for attackers to implement ma-
Paper Organization. The rest of the paper is organized as follows.
licious functionalities in slightly different manners and still
Section 2 describes the deployment of our learning-based detection
be able to perform the expected malicious behaviors. This
mechanism and gives an overview of our malware classification
malware polymorphism can defeat detection methods that
technique. Section 3 defines our graph representation of Android
are based on exact behavior matching, which is the method
program semantics and explains our graph generation methodol-
prevalently adopted by existing signature-based detection and
ogy. Section 4 elaborates graph similarity query, the extraction of
graph-based model checking. To address this, our technique
graph-based feature vectors and learning-based anomaly & signa-
is able to measure the similarity of app behaviors and tolerate
ture detection. Section 5 shows the experimental results of our ap-
such implementation variants by using similarity scores.
proach. Discussion and related work are presented in Section 6
and 7, respectively. Finally, the paper concludes with Section 8.
Consequently, we are able to conduct two kinds of classifica-
tions: anomaly detection and signature detection. Upon receiving a
2. OVERVIEW new app submission, our vetting process will conduct anomaly de-
In this section, we present an overview of the problem and the tection to determine whether it contains behaviors that significantly
deployment and architecture of our proposed solution. deviate from the benign apps within our database. If such a devia-
tion is discovered, a potential malware instance is identified. Then,
2.1 Problem Statement we conduct further signature detection on it to determine if this app
An effective vetting process for discovering malicious software falls into any malware family within our signature database. If so,
is essential for maintaining a healthy ecosystem in the Android app the app is flagged as malicious and bounced back to the developer
markets. Unfortunately, existing vetting processes are still fairly immediately.

[0,0,0,0.9,…,0.8]
buckets [1,0.6,0,0,…,0.7]
{ }
{ } [0,0,...,0,0,0,0,1]
[0,0,...,0,0,0,1,0]

}
[0,0.9,0.7,0,…,0]

..
.
[0,0,...,0,0,0,1,1]
{ [0.6,0.9,0,0,…,0]
Android Apps

..
.
[1,0,...,1,1,1,1,1] [0.8,0,0.8,0,…,1]
[1,1,...,1,1,1,1,1]

Behavior Graph Scalable Graph Graph-based Feature Anomaly & Signature
Generation Similarity Query Vector Extraction Detection
Figure 2: Overview of DroidSIFT
If the app passes this hurdle, it is still possible that a new malware (3) Graph-based Feature Vector Extraction. Given an app, we
species has been found. We bounce the app back to the developer attempt to find the best match for each of its graphs from the
with a detailed report when suspicious behaviors that deviate from database. This produces a similarity feature vector. Each ele-
benign behaviors are discovered, and we request justification for ment of the vector is associated with an existing graph in the
the deviation. The app is approved only after the developer makes database. This vector bears a non-zero similarity score in one
a convincing justification for the deviation. Otherwise, after fur- element only if the corresponding graph is the best match to
ther investigation, we may confirm it to indeed be a new malware one of the graphs for the given app.
species. By placing this information into our malware database, we
further improve our signature detection to detect this new malware (4) Anomaly & Signature Detection. We have implemented a
species in the future. signature classifier and an anomaly detector. We have produced
It is also possible to deploy our technique via a more ad-hoc feature vectors for malicious apps, and these vectors are used
scheme. For example, our detection mechanism can be deployed as to train the classifier for signature detection. The anomaly de-
a public service that allows a cautious app user to examine an app tection discovers zero-day Android malware, and the signature
prior to its installation. An enterprise that maintains its own private detector uncovers the type (family) of the malware.
app repository could utilize such a security service. The enterprise
service conducts vetting prior to adding an app to the internal app
3. WEIGHTED CONTEXTUAL API DEPEN-
pool, thereby protecting employees from apps that contain malware DENCY GRAPH
behaviors. In this section, we describe how we capture the semantic-level
behaviors of Android malware in the form of graphs. We start
2.2 Architecture Overview by identifying the key behavioral aspects that must be captured,
Figure 2 depicts the workflow of our graph-based Android mal- present a formal definition, and then present a real example to
ware classification. This takes the following steps: demonstrate these aspects.

3.1 Key Behavioral Aspects
(1) Behavior Graph Generation. Our malware classification con- We consider the following aspects as essential when describing
siders graph similarity as the feature vector. To this end, we the semantic-level behaviors of an Android malware sample:
first perform static program analysis to transform Android byte-
code programs into their corresponding graph representations. 1) API Dependency. API calls (including reflective calls to the
Our program analysis includes entry point discovery and call private framework functions) indicate how an app interacts with
graph analysis to better understand the API calling contexts, the Android framework. It is essential to capture what API calls
and it leverages both forward and backward dataflow analysis an app makes and the dependencies among those calls. Prior
to explore API dependencies and uncover any constant parame- works on semantic- and behavior-based malware detection and
ters. The result of this analysis is expressed via Weighted Con- classification for desktop environments all make use of API de-
textual API Dependency Graphs that expose security-related pendency information [16, 21]. Android malware shares the
behaviors of Android apps. same characteristics.
2) Context. An entry point of an API call is a program entry
(2) Scalable Graph Similarity Query. Having generated graphs point that directly or indirectly triggers the call. From a user-
for both benign and malicious apps, we then query the graph awareness point of view, there are two kinds of entry points:
database for the one graph most similar to a given graph. To user interfaces and background callbacks. Malware authors com-
address the scalability challenge, we utilize a bucket-based in- monly exploit background callbacks to enable malicious func-
dexing scheme to improve search efficiency. Each bucket con- tionalities without the user’s knowledge. From a security ana-
tains graphs bearing APIs from the same Android packages, lyst’s perspective, it is a suspicious behavior when a typical user
and it is indexed with a bitvector that indicates the presence of interactive API (e.g., AudioRecord.startRecording()) is
such packages. Given a graph query, we can quickly seek to the called stealthily [10]. As a result, we must pay special attention
corresponding bucket index by matching the package’s vector to APIs activated from background callbacks.
to the bucket’s bitvector. Once a matching bucket is located,
we further iterate this bucket to find the best-matching graph. 3) Constant. Constants convey semantic information by revealing
Finding the best-matching graph, instead of an exact match, is the values of critical parameters and uncovering fine-grained
necessary to identify polymorphic malware. API semantics. For instance, Runtime.exec() may execute

, OnClickListener. Handler.
BroadcastReceiver.onReceive, Ø onClick Runnable.run handleMessage

, e1 e2 e3
BroadcastReceiver.onReceive, Ø

,
BroadcastReceiver.onReceive, Ø

, a1 a2 a3
BroadcastReceiver.onReceive, Ø
Runnable.start Handler. SmsManager.
, sendMessage sendTextMessage
BroadcastReceiver.onReceive, Ø

,
Figure 4: Callgraph for asynchronously sending an SMS message.
BroadcastReceiver.onReceive, Setc “e” and “a” stand for “event handler” and “action” respectively.
Setc = {”http://softthrifty.com/security.jsp”}
manner. This graph contains five API call nodes. Each node con-
Figure 3: WC-ADG of Zitmo tains the call’s prototype, a set of any constant parameters, and the
entry points of the call. Dashed arrows that connect a pair of nodes
varied shell commands, such as ps or chmod, depending upon
indicates that a data dependency exists between the two calls in
the input string constant. Constant analysis also discloses the
those nodes.
data dependencies of certain security-sensitive APIs whose benign-
By combining the knowledge of API prototypes with the data
ness is dependent upon whether an input is constant. For exam-
dependency information shown in the graph, we know that the app
ple, a sendTextMessage() call taking a constant premium-
is forwarding an incoming SMS to the network. Once an SMS is
rate phone number as a parameter is a more suspicious behavior
received by the mobile phone, Zitmo creates an SMS object from
than the call to the same API receiving the phone number from
the raw Protocol Data Unit by calling createFromPdu(byte[]).
user input via getText(). Consequently, it is crucial to extract
It extracts the sender’s phone number and message content by call-
information about the usage of constants for security analysis.
ing getOriginatingAddress() and getMessageBody(). Both
strings are encoded into an UrlEncodedFormEntity object and
Once we look at app behaviors using these three perspectives, we
enclosed into HttpEntityEnclosingRequestBase by using the
perform similarity checking, rather than seeking an exact match,
setEntity() call. Finally, this HTTP request is sent to the net-
on the behavioral graphs. Since each individual API node plays a
work via AbstractHttpClient.execute().
distinctive role in an app, it contributes differently to the graph sim-
Zitmo variants may also exploit various other communication-
ilarity. With regards to malware detection, we emphasize security-
related API calls for the sending purpose. Another Zitmo instance
sensitive APIs combined with critical contexts or constant param-
uses SmsManager.sendTextMessage() to deliver the stolen in-
eters. We assign weights to different API nodes, giving greater
formation as a text message to the attacker’s phone. Such variations
weights to the nodes containing critical calls, to improve the “qual-
motivate us to consider graph similarity metrics, rather than an ex-
ity” of behavior graphs when measuring similarity. Moreover, the
act matching of API call behavior, when determining whether a
weight generation is automated. Thus, similar graphs have higher
sample app is benign or malicious.
similarity scores by design.
The context provided by the entry points of these API calls in-
3.2 Formal Definition forms us that the user is not aware of this SMS forwarding behav-
ior. These consecutive API invocations start within the entry point
To address all of the aforementioned factors, we describe app be-
method onReceive() with a call to createFromPdu(byte[]).
haviors using Weighted Contextual API Dependency Graphs (WC-ADG).
onReceive() is a broadcast receiver registered by the app to re-
At a high level, a WC-ADG consists of API operations where some
ceive incoming SMS messages in the background. Therefore, the
of the operations have data dependencies. A formal definition is
createFromPdu(byte[]) and subsequent API calls are activated
presented as follows.
from a non-user-interactive entry point and are hidden from the
Definition 1. A Weighted Contextual API Dependency Graph is
user.
a directed graph G = (V, E, α, β) over a set of API operations Σ
Constant analysis of the graph further indicates that the forward-
and a weight space W , where:
ing destination is suspicious. The parameter of execute() is nei-
• The set of vertices V corresponds to the contextual API opera-
ther the sender (i.e., the bank) nor any familiar parties from the
tions in Σ;
contacts. It is a constant URL belonging to an unknown third-party.
• The set of edges E ⊆ V × V corresponds to the data dependen-
cies between operations;
• The labeling function α : V → Σ associates nodes with the la- 3.4 Graph Generation
bels of corresponding contextual API operations, where each label We have implemented a graph generation tool on top of Soot [2]
is comprised of 3 elements: API prototype, entry point and constant in 20k lines of code. This tool examines an Android app to conduct
parameter; entry point discovery and perform context-sensitive, flow-sensitive,
• The labeling function β : V → W associates nodes with their and interprocedural dataflow analyses. These analyses locate API
corresponding weights, where ∀w ∈ W , w ∈ R, and R represents call parameters and return values of interest, extract constant pa-
the space of real numbers. rameters, and determine the data dependencies among the API calls.

3.3 A Real Example Entry Point Discovery.
Zitmo is a class of banking trojan malware that steals a user’s Entry point discovery is essential to revealing whether the user
SMS messages to discover banking information (e.g., mTANs). is aware that a certain API call has been made. However, this
Figure 3 presents an example WC-ADG that depicts the malicious identification is not straightforward. Consider the callgraph seen
behavior of a Zitmo malware sample in a concise, yet complete, in Figure 4. This graph describes a code snippet that registers a

Table 1: Calling Convention of Asynchronous Calls
Algorithm 1 Entry Point Reduction for Asynchronous Callbacks             Top-level Class   Run Method               Start Method
  Mentry ← {Possible entry point callback methods}                       Runnable          run()                    start()
  CMasync ← {Pairs of (BaseClass, RunM ethod) for asynchronous           AsyncTask         onPreExecute()           execute()
  calls in framework}                                                    AsyncTask         doInBackground()         onPreExecute()
  RSasync ← {Map from RunM ethod to StartM ethod for asyn-               AsyncTask         onPostExecute()          doInBackground()
  chronous calls in framework}                                           Message           handleMessage()          sendMessage()
  for mentry ∈ Mentry do
      c ← the class declaring mentry
      base ← the base class of c                                          public class AsyncTask{
      if (base, mentry ) ∈ CMasync then                                     public AsyncTask execute(Params... params){
           mstart ← Lookup(mentry ) in RSasync                                e x e c u t e S t u b (params);
           for ∀ call to mstart do                                          }
               r ← “this” reference of call                                 public AsyncTask e x e c u t e S t u b (Params...params){
               P ointsT oSet ← PointsToAnalysis(r)                            onPreExecute();
               if c ∈ P ointsT oSet then                                      Result result = doInBackground(params);
                    Mentry = Mentry − {mentry }                               o n P o s t E x e c u t e S t u b (result);
                    BuildDependencyStub(mstart , mentry )                   }
               end if                                                       public void o n P o s t E x e c u t e S t u b (Result result){
           end for                                                            onPostExecute(result);
      end if                                                                }
  end for                                                                 }
  output Mentry as reduced entry point set                                Figure 5: Stub code for dataflow of AsyncTask.execute
                                                                           Given these inputs, our algorithm iterates over Mentry . For ev-
                                                                        ery method mentry in this set, it finds the class c that declares this
onClick() event handler for a button. From within the event han-        method and the top-level base class base that c inherits from. Then,
dler, the code starts a thread instance by calling Thread.start(),      it searches the pair of base and mentry in the CMasync set. If a
which invokes the run() method implementing Runnable.run().             match is found, the method mentry is a “callee” by convention.
The run() method passes an android.os.Message object to the             The algorithm thus looks up mentry in the map SRasync to find
message queue of the hosting thread via Handler.sendMessage().          the corresponding “caller” mstart . Each call to mstart is further
A Handler object created in the same thread is then bound to this       examined and a points-to analysis is performed on the “this” refer-
message queue and its Handler.handleMessage() callback will             ence making the call. If class c of method mentry belongs to the
process the message and later execute sendTextMessage().                points-to set, we can ensure the calling relationship between the
   The sole entry point to the graph is the user-interactive callback   caller mstart and the callee mentry and remove the callee from
onClick(). However, prior work [23] on the identification of pro-       the entry point set.
gram entry points does not consider asynchronous calls and recog-          To indicate the data dependency between these two methods,
nizes all three callbacks in the program as individual entry points.    we introduce a stub which connects the parameters of the asyn-
This confuses the determination of whether the user is aware that an    chronous call to the corresponding parameters of its callee. Fig-
API call has been made in response to a user-interactive callback.      ure 5 depicts the example stub code for AsyncTask, where the
To address this limitation, we propose Algorithm 1 to remove any        parameter of execute() is first passed to doInBackground()
potential entry points that are actually part of an asynchronous call   through the stub executeStub(), and then the return from this
chain with only a single entry point.                                   asynchronous execution is further transferred to onPostExecute()
   Algorithm 1 accepts three inputs and provides one output. The        via onPostExecuteStub().
first input is Mentry , which is a set of possible entry points. The       Once the algorithm has reduced the number of entry point meth-
second is CMasync , which is a set of (BaseClass, RunM ethod)           ods in Mentry , we explore all code reachable from those entry
pairs. BaseClass represents a top-level asynchronous base class         points, including both synchronous and asynchronous calls. We
(e.g., Runnable) in the Android framework and RunM ethod is             further determine the user interactivity of an entry point by exam-
the asynchronous call target (e.g., Runnable.run()) declared in         ining its top-level base class. If the entry point callback overrides a
this class. The third input is RSasync , which maps RunM ethod          counterpart declared in one of the three top-level UI-related inter-
to StartM ethod. RunM ethod and StartM ethod are the callee             faces (i.e., android.graphics.drawable.Drawable.Callback,
and caller in an asynchronous call (e.g., Runnable.run() and            android.view.accessibility.AccessibilityEventSource,
Runnable.start()). The output is a reduced Mentry set.                  and android.view.KeyEvent.Callback), we then consider the
   We compute the Mentry input by applying the algorithm pro-           derived entry point method as a user interface.
posed by Lu et al. [23], which discovers all reachable callback
methods defined by the app that are intended to be called only by       Constant Analysis.
the Android framework. To further consider the logical order be-           We conduct constant analysis for any critical parameters of se-
tween Intent senders and receivers, we leverage Epicc [25] to           curity sensitive API calls. These calls may expose security-related
resolve the inter-component communications and then remove the          behaviors depending upon the values of their constant parameters.
Intent receivers from Mentry .                                          For example, Runtime.exec() can directly execute shell com-
   Through examination of the Android framework code, we gener-         mands, and file or database operations can interact with distinctive
ate a list of 3-tuples consisting of BaseClass, RunM ethod and          targets by providing the proper URIs as input parameters.
StartM ethod. For example, we capture the Android-specific call-           To understand these semantic-level differences, we perform back-
ing convention of AsyncTask with AsyncTask.onPreExecute()               ward dataflow analysis on selected parameters and collect all possi-
being triggered by AsyncTask.execute(). When a new asyn-                ble constant values on the backward trace. We generate a constant
chronous call is introduced into the framework code, this list is       set for each critical API argument and mark the parameter as “Con-
updated to include the change. Table 1 presents our current model       stant” in the corresponding node on the WC-ADG. While a more
for the calling convention of top-level base asynchronous classes in    complete string constant analysis is also possible, the computation
Android framework.                                                      of regular expressions is fairly expensive for static analysis. The

substring set currently generated effectively reflects the semantics , where V and V 0 are respectively the vertices of two graphs, vI and
of a critical API call and is sufficient for further feature extraction. vD are individual vertices inserted to and deleted from G, while EI
and ED are the edges added to and removed from G.
API Dependency Construction. WGED presents the absolute difference between two graphs.
We perform global dataflow analysis to discover data dependen- This implies that wged(G,G’) is roughly proportional to the sum
cies between API nodes and build the edges on WC-ADG. How- of graph sizes and therefore two larger graphs are likely to be more
ever, it is very expensive to analyze every single API call made by distant to one another. To eliminate this bias, we normalize the re-
an app. To address computational efficiency and our interests on sulting distance and further define Weighted Graph Similarity based
security analysis, we choose to analyze only the security-related upon it.
API calls. Permissions are strong indicators of security sensitivity Definition 3. The Weighted Graph Similarity of two Weighted
in Android systems, so we leverage the API-permission mapping Contextual API Dependency Graphs G and G’, with a weight func-
from PScout [8] to focus on permission-related API calls. tion β, is,
Our static dataflow analysis is similar to the “split”-based ap-
wged(G, G0 , β)
proach used by CHEX [23]. Each program split includes all code wgs(G, G0 , β) = 1 − (2)
reachable from a single entry point. Dataflow analysis is performed wged(G, ∅, β) + wged(∅, G0 , β)
on each split, and then cross-split dataflows are examined. The dif-
ference between our analysis and that of CHEX lies in the fact that , where ∅ is an empty graph. wged(G, ∅, β) + wged(∅, G0 , β) then
we compute larger splits due to the consideration of asynchronous equates the maximum possible edit cost to transform G to G’.
calling conventions.
We make a special consideration for reflective calls within our 4.2 Weight Assignment
analysis. In Android programs, reflection is realized by calling the Instead of manually specifying the weights on different APIs
method java.lang.reflect.Method.invoke(). The “this” ref- (in combination of their attributes), we wish to see a near-optimal
erence of this API call is a Method object, which is usually ob- weight assignment.
tained by invoking either getMethod() or getDeclaredMethod()
from java.lang.Class. The class is often acquired in a reflective Selection of Critical API Labels.
manner too, through Class.forName(). This API call resolves a Given a large number of API labels (unique combinations of
string input and retrieves the associated Class object. API names and attributes), it is unrealistic to automatically as-
We consider any reflective invoke() call as a sink and conduct sign weights for every one of them. Our goal is malware clas-
backward dataflow analysis to find any prior data dependencies. If sification, so we concentrate on assigning weights to labels for
such an analysis reaches string constants, we are able to statically the security-sensitive APIs and critical combinations of their at-
resolve the class and method information. Otherwise, the reflective tributes. To this end, we perform concept learning to discover
call is not statically resolvable. However, statically unresolvable critical API labels. Given a positive example set (PES) contain-
behavior is still represented within the WC-ADG as nodes which ing malware graphs and a negative example set (NES) containing
contain no constant parameters. Instead, this reflective call may benign graphs, we seek a critical API label (CA) based on two re-
have several preceding APIs, from a dataflow perspective, which quirements: 1) frequency(CA,PES) > frequency(CA,NES) and 2)
are the sources of its metadata. frequency(CA,NES) is less than the median frequency of all critical
API labels in NES. The first requirement guarantees that a critical
API label is more sensitive to a malware sample than a benign one,
4. ANDROID MALWARE CLASSIFICATION while the second requirement ensures the infrequent presence of
We generate WC-ADGs for both benign and malicious apps. Each such an API label in the benign set. Consequently, we have se-
unique graph is associated with a feature that we use to classify lected 108 critical API labels. Our goal becomes the assignment
Android malware and benign applications. of appropriate weights to these 108 labels while assigning a default
weight of 1 to all remaining API labels.
4.1 Graph Matching Score
To quantify the similarity of two graphs, we first compute a graph Weight Assignment.
edit distance. To our knowledge, all existing graph edit distance al- Intuitively, if two graphs come from the same malware family
gorithms treat node and edge uniformly. However, in our case, our and share one or more critical API labels, we must maximize the
graph edit distance calculation must take into account the differ- similarity between the two. We call such a pair of graphs a “homo-
ent weights of different API nodes. At present, we do not con- geneous pair”. Conversely, if one graph is malicious and the other
sider assigning different weights on edges because this would lead is benign, even if they share one or more critical API labels, we
to prohibitively high complexity in graph matching. Moreover, to must minimize the similarity between the two. We call such a pair
emphasize the differences between two nodes in different labels, of graphs a “heterogeneous pair”. Therefore, we cast the problem
we do not seek to relabel them. Instead, we delete the old node and of weight assignment to be an optimization problem.
insert the new one subsequently. This is because node “relabeling” Definition 4. The Weight Assignment is an optimization problem
cost, in our context, is not the string edit distance between the API to maximize the result of an objective function for a given set of
labels of two nodes. It is the cost of deleting the old node plus that graph pairs {}:
of adding the new node. X X
Definition 2. The Weighted Graph Edit Distance (WGED) of max f ({< G, G0 >}, β) = wgs(G, G0 , β) − wgs(G, G0 , β)
two Weighted Contextual API Dependency Graphs G and G’, with is a is a
a uniform weight function β, is the minimum cost to transform G homogeneous pair heterogeneous pair
to G’: s.t.
X X
wged(G, G0 , β) = min( β(vI ) + β(vD ) + |EI | + |ED |) 1 ≤ β(v) ≤ θ, if v is a critical node;
vI ∈{V 0 −V } vD ∈{V −V 0 } β(v) = 1, otherwise.
(1) (3)

Homogeneous Heterogeneous

                                                                                                                                                    ge er
                                                                                                                                                  sa ag
                           Graph Pairs Graph Pairs                                 createFromPdu

                                                                                                                                             sM nyM er
                                                                                                                                               e s an
                                                                                                                                           Sm o g
                                                                                                                                         y. ph na
                                                                                                                                       on ele Ma
                                                                               getOriginatingAddress

                                                                                                                                     ph .T n
                                                                                                                                   le y tio
             β    +

                                                                                                                                 te on a
                                                                                                                               d. h oc
                                                                                       getMessageBody

                                                                                                                                     ke er
                                                                                                                                 .S ip s
                                                                                                                          va y r e
                                                                                                                             oi lep .L
                                f({},β)

                                                                                                                               et .C es
                                                                                                                                   oc h
                                                                                                                        ja x.cr os.P tim
                                                                                                                          dr .te ion

                                                                                                                                       t
                                                                                                                             .n pto oc
                  -

                                                                                                                          va d. n
                                                                                                                       an oid cat
                                                                                        
                                                                                                                       ja oi .Ru
                                                                                                                          dr .lo
                                                                                                                  e

                                                                                                                          dr ng
                                                                                                                ag

                                                                                                                       an roid

                                                                                                                       an .la
                                                                                                                 s
                                                                                        setEntity

                                                                                                              es

                                                                                                                          va
                                                                                                                          d
                                                                                                            sM

                                                                                                                      an

                                                                                                                       ja
                                                                                                           m
                                Hill Climber                                          execute                         [0,   0,   0,   …,        0,   0,   0,   1]

                                                                                                          y.S
                                                                                                     on
                                                                                                                      [0,   0,   0,   …,        0,   0,   1,   0]

                                                                                                     ph
                                                                                                   le
                                                                                                                      [0,   0,   0,   …,        0,   0,   1,   1]

                                                                                                 te
 Figure 6: A Feedback Loop to Solve the Optimization Problem

                                                                                              d.
                                                                                                                      [0,   0,   0,   …,        0,   1,   0,   0]

                                                                                            oi
                                                                                          dr
                                                                                                                      [0,   0,   0,   …,        0,   1,   0,   1]

                                                                                        an
, where β is the weight function that requires optimization; θ is the             [0, 0, 1, …, 0, 0, 0, 0]

                                                                                                                                      ... ...
upper bound of a weight. Empirically, we set θ to be 20.
   To achieve the optimization of Equation 3, we use the Hill Climb-                                                  [0, 0, 1, …, 0, 0, 0, 0]

                                                                                                                                      ... ...
ing algorithm [30] to implement a feedback loop that gradually im-
proves the quality of weight assignment. Figure 6 presents such
a system, which takes two sets of graph pairs and an initial weight             Figure 7: Bucket-based Indexing of Graph Database
function β as inputs. β is a discrete function which is represented as
a weight vector. At each iteration, Hill Climbing adjusts a single el-   sition to a hierarchical database structure, such as vantage point
ement in the weight vector and determines whether the change im-         tree [20], under each bucket.
proves the value of objective function f({},β). Any change            Figure 7 demonstrates the bucket query for the WC-ADG of Zitmo
that improves f({},β) is accepted, and the process contin-         shown in Figure 3. This graph contains six API calls, three of which
ues until no change can be found that further improves the value.        belong to a critical package: android.telephony.SmsManager.
                                                                         The generated bitvector for the Zitmo graph indicates the presence
4.3    Implementation                                                    of this API package, and an exact match for the bitvector is per-
  To compute the weighted graph similarity, we use a bipartite           formed against the bucket index. Notice that the presence of a sin-
graph matching tool [29]. We cannot directly use this graph match-       gle critical package is different from that of a combination of mul-
ing tool because it does not support assigning different weights         tiple critical packages. Thus, the result bucket used by this search
on different nodes in a graph. To work around this limitation, we        contains graphs that include android.telephony.SmsManager
enhanced the bipartite algorithm to support weights on individual        as the only critical package in use. SmsManager, being a criti-
nodes.                                                                   cal package, helps capture the SMS retrieval behavior and narrow
                                                                         down the search range. Because HTTP-related API packages are
4.4    Graph Database Query                                              not considered as critical, such an exact match over index will not
   Given an app, we match its WC-ADGs against all existing graphs        exclude Zitmo variants that use other I/O packages, such as SMS.
in the database. The number of graphs in the database can be fairly
large, so the design of the graph query must be scalable.                4.5    Malware Classification
   Intuitively, we could insert graphs into individual buckets, with
each bucket labeled according to the presence of critical APIs. In-      Anomaly Detection.
stead of comparing a new graph against every existing graph in the          We have implemented a detector to conduct anomaly detection.
database, we limit the comparison to only the graphs within a par-       Given an app, the detector provides a binary result that indicates
ticular bucket that possesses graphs containing a corresponding set      whether the app is abnormal or not. To achieve this goal, we build
of critical APIs. Critical APIs generally have higher weights than       a graph database for benign apps. The detector then attempts to
regular APIs, so graphs in other buckets will not be very similar to     match the WC-ADGs of the given app against the ones in database.
the input graph and are safe to ignore. However, API-based bucket        If a sufficiently similar one for any of the behavior graphs is not
indexing may be overly strict because APIs from the same package         found, an anomaly is reported by the detector. We have set the
usually share similar functionality. For instance, both getDeviceId()    similarity threshold to be 70% per our empirical studies.
and getSubscriberId() are located in TelephonyManager pack-
age, and both retrieve identity-related information. Therefore, we       Signature Detection.
instead index buckets based on the package names of critical APIs.          We next use a classifier to perform signature detection. Our sig-
   More specifically, to build a graph database, we must first build     nature detector is a multi-label classifier designed to identify the
an API package bitvector for all existing graphs within the database.    malware families of unknown malware instances.
Such a bitvector has n elements, each of which indicates the pres-          To enable classification, we first build a malware graph database.
ence of a particular Android API package. For example, a graph           To this end, we conduct static analysis on the malware samples
that calls sendTextMessage() and getDeviceId() will set the              from the Android Malware Genome Project [1,36] to extract WC-ADGs.
corresponding bits for the android.telephony.SmsManager and              In order to consider only the unique graphs, we remove any graphs
android.telephony.TelephonyManager packages. Graphs that                 that have a high level of similarity to existing ones. With experi-
share the same bitvector (i.e., the same API package combination)        mental study, we consider a high similarity to be greater than 80%.
are then placed into the same bucket. When querying a new graph          Further, to guarantee the distinctiveness of malware behaviors, we
against the database, we encode its API package combination into         compare these malware graphs against our benign graph set and
a bitvector and compare that bitvector against each database index.      remove the common ones.
Notice that, to ensure the scalability, we implement the bucket-            Next, given an app, we generate its feature vector for classifi-
based indexing with a hash map where the key is the API package          cation purpose. In such a vector, each element is associated with
bitvector and the value is a corresponding graph set.                    a graph in our database. And, in turn, all the existing graphs are
   Empirically, we found this one-level indexing efficient enough        projected to a feature vector. In other words, there exists a one-to-
for our problem. If the database grows much larger, we can tran-         one correspondence between the elements in a feature vector and

80
40

Number
of
Graphs

Number
of
Graphs
G1 G2 G3 G4 G5 G6 G7 G8 ... G861 G862 60
30
ADRD 0 0 0 0 0 0.8 0.9 0 ... 0 0 40
20
20
10
DroidDream 0.9 0 0 0 0.8 0.7 0.7 0 ... 0 0
0
0
DroidKungFu 0 0.7 0 0 0.6 0 0.6 0 ... 0 0.9 1
4001
8001
12001
1
501
1001
1501
App
ID App
ID
Figure 8: An Example of Feature Vectors (a) Graphs per Benign App. (b) Graphs per Malware.
600
300
the existing graphs in the database. To construct the feature vec-

Number
of
Nodes

Number
of
Nodes
tor of the given app, we produce its WC-ADGs and then query the 400
200
graph database for all the generated graphs. For each query, a best 200
100
matching graph is found. The similarity score is then put into the 0
0
feature vector at the position corresponding to this best matching 1
30001
60001
90001
1
4001
8001
12001
16001
graph. Specifically, the feature vector of a known malware sample Graph
ID Graph
ID
is attached with its family label so that the classifier can understand (c) Nodes per Benign Graph. (d) Nodes per Malware Graph.
the discrepancy between different malware families.
Figure 8 gives an example of feature vectors. In our malware Figure 9: Graph Generation Summary.
graph database of 862 graphs, a feature vector of 862 elements is
constructed for each app. The two behavior graphs of ADRD are These facts serve as the basic requirements for the scalability of our
most similar to graph G6 and G7, respectively, from the database. approach, since the runtime performance of graph matching and
The corresponding elements of the feature vector are set to the sim- query is largely dependent upon the number of nodes and graphs,
ilarity scores of theose features. The rest of the elements remain set respectively.
to zero.
Once we have produced the feature vectors for the training sam- 5.3 Classification Results
ples, we can next use them to train a classifier. We select Naïve
Bayes algorithm for the classification. In fact, we can choose dif-
Signature Detection.
ferent algorithms for the same purpose. However, since our graph-
We use a multi-label classification to identify the malware fam-
based features are fairly strong, even Naïve Bayes can produce sat-
ily of the unrecognized malicious samples. Therefore, we expect to
isfying results. Naïve Bayes also has several advantages: it re-
only include those malware behavior graphs, that are well labeled
quires only a small amount of training data; parameter adjustment
with family information, into the database. To this end, we rely on
is straightforward and simple; and runtime performance is favor-
the malware samples from the Android Malware Genome Project
able.
and use them to construct the malware graph database. Conse-
quently, we built such a database of 862 unique behavior graphs,
5. EVALUATION with each graph labeled with a specific malware family.
We then selected 1050 malware samples from the Android Mal-
5.1 Dataset & Experiment Setup ware Genome Project and used them as a training set. Next, we
We collected 2200 malware samples from the Android Malware would like to collect testing samples from the rest of our collec-
Genome Project [1] and McAfee, Inc, a leading antivirus company. tion. However, a majority of malware samples from McAfee are
To build a benign dataset, we received a number of benign samples not strongly labeled. Over 90% of the samples are coarsely labeled
from McAfee, and we downloaded a variety of popular apps hav- as “Trojan” or “Downloader”, but in fact belong to a specific mal-
ing a high ranking from Google Play. To further sanitize this benign ware family (e.g., DroidDream). Moreover, even VirusTotal can-
dataset, we sent these apps to the VirusTotal service for inspection. not provide reliable malware family information for a given sam-
The final benign dataset consisted of 13500 samples. We performed ple because the antivirus products used by VirusTotal seldom reach
the behavior graph generation, graph database creation, graph sim- a consensus. This fact tells us two things: 1) It is a non-trivial
ilarity query and feature vector extraction using this dataset. We task to collect evident samples as the ground truth in the context of
conducted the experiment on a test machine equipped with Intel(R) multi-label classification; 2) multi-label malware detection or clas-
Xeon(R) E5-2650 CPU (20M Cache, 2GHz) and 128GB of physi- sification is, in general, a challenging real-world problem.
cal memory. The operating system is Ubuntu 12.04.3 (64bit). Despite the difficulty, we obtained 193 samples, each of which
is detected as the same malware by major AVs. We then used those
5.2 Summary of Graph Generation samples as testing data. The experiment result shows that our clas-
Figure 9 summarizes the characteristics of the behavior graphs sifier can correctly label 93% of these malware instances.
generated from both benign and malicious apps. Among them, Fig- Among the successfully labeled malware samples are two types
ure 9a and Figure 9b illustrate the number of graphs generated from of Zitmo variants. One uses HTTP for communication, and the
benign and malicious apps. On average, 7.8 graphs are computed other uses SMS. While the former one is present in our malware
from each benign app, while 9.8 graphs are generated from each database, the latter one was not. Nevertheless, our signature de-
malware instance. Most apps focus on limited functionalities and tector is still able to capture this variant. This indicates that our
do not produce a large number of behavior graphs. In 92% of be- similarity metrics effectively tolerate variations in behavior graphs.
nign samples and 98% of malicious ones, no more than 20 graphs We further examined the 7% of the samples that were misla-
are produced from an individual app. beled. It turns out that the mislabeled cases can be roughly put into
Figure 9c and Figure 9d present the number of nodes of benign two categories. First, DroidDream samples are labeled as Droid-
and malicious behavior graphs. A benign graph, on average, has KungFu. DroidDream and DroidKungFu share multiple malicious
15 nodes, while a malicious graph carries 16.4. Again, most of behaviors such as gathering privacy-related information and hidden
the activities are not intensive, so the majority of these graphs have network I/O. Consequently, there exists a significant overlap be-
a small number of nodes. Statistics show that 94% of the benign tween their WC-ADGs. Second, Zitmo, Zsone and YZHC instances
graphs and 91% of the malicious ones carry less than 50 nodes. are labeled as one another. These three families are SMS Tro-

12000

Number of Unique Graphs
True Positive False Positive

Number of Detections
10000 25
20
8000
15
6000 10
5
4000
0
2000

0 Detector ID
3000 4000 5000 6000 7000 8000 9000 10000 11000
Number of Benign Apps Figure 11: Detection Ratio for Obfuscated Malware

Figure 10: Convergence of Unique Graphs in Benign Apps Further, we would like to evaluate our anomaly detector with
malicious samples from new malware families. To this end, we
jans. Though their behaviors are slightly different from each other, retrieve a new piece of malware called Android.HeHe, which was
they all exploit sendTextMessage() to deliver the user’s informa- first found and reported in January 2014 [12]. Android.HeHe ex-
tion to an attacker-specified phone number. Despite the mislabeled ercises a variety of malicious functionalities such as SMS inter-
cases, we still manage to successfully label 93% of the malware ception, information stealing and command-and-control. This new
samples with a Naïve Bayes classifier. Applying a more advanced malware family does not appear in the samples from the Android
classification algorithm will further improve the accuracy. Malware Genome Project, which were collected from August 2010
to October 2011, and therefore cannot be labeled via signature de-
tection. DroidSIFT generates 49 WC-ADGs for this sample and,
Anomaly Detection. once these graphs are presented to the anomaly detector, a warn-
Since we wish to perform anomaly detection using our benign
ing is raised indicating the abnormal behaviors expressed by these
graph database, the coverage of this database is essential. In the-
graphs.
ory, the more benign apps that the database collects, the more be-
nign behaviors it covers. However, in practice, it is extremely diffi-
cult to retrieve benign apps exhaustively. Luckily, different benign Detection of Transformation Attacks.
apps may share the same behaviors. Therefore, we can focus on We collected 23 DroidDream samples, which are all intention-
unique behaviors (rather than unique apps). Moreover, with more ally obfuscated using a transformation technique [28], and 2 be-
and more apps being fed into the benign database, the database size nign apps that are deliberately disguised as malware instances by
grows slower and slower. Figure 10 depicts our discovery. When applying the same technique. We ran these samples through our
the number of apps increases from 3000 to 4000, there is a sharp anomaly detection engine and then sent the detected abnormal ones
increase (2087) of unique graphs. However, when the number of through the signature detector. The result shows that while 23 true
apps grows from 10000 to 11000, only 220 new, unique graphs are malware instances are flagged as abnormal ones in anomaly de-
generated, and the curve begins to flatten. tection, the 2 clean ones also correctly pass the detection without
We built a database of 10420 unique graphs from 11400 benign raising any warnings. We then compared our signature detection
apps. Then, we tested 2200 malware samples against the benign results with antivirus products. To obtain detection results of an-
classifier. The false negative rate was 2%, which indicates that tivirus software, we sent these samples to VirusTotal and selected
42 malware instances were not detected. However, we noted that 10 anti-virus (AV) products (i.e., AegisLab, F-Prot, ESET-NOD32,
most of the missed samples are exploits or Downloaders. In these DrWeb, AntiVir, CAT-QuickHeal, Sophos, F-Secure, Avast, and
cases, their bytecode programs do not bear significant API-level Ad-Aware) that bear the highest detection rates. Notice that we
behaviors, and therefore generated WC-ADGs do not necessarily consider the detection to be successful only if the AV can correctly
look abnormal when compared to benign ones. At this point, we flag a piece of malware as DroidDream or its variant. In fact, to our
have only considered the presence of constant parameters in an API observation, many AV products can provide partial detection re-
call. We did not further differentiate API behaviors based upon sults based upon the native exploit code included in the app pack-
constant values. Therefore, we cannot distinguish the behaviors age or common network I/O behaviors. As a result, they usually
of Runtime.exec() calls or network I/O APIs with varied string recognize these DroidDream samples as “exploits” or “Download-
inputs. Nevertheless, if we create a custom filter for these string ers” while missing many other important malicious behaviors. Fig-
constants, we will then be able to identify these malware samples ure 11 presents the detection ratios of “DroidDream” across dif-
and the false negative rate will drop to 0. ferent detectors. While none of the antivirus products can achieve
Next, we used the remaining 2100 benign apps as test samples to a detection rate higher than 61%, DroidSIFT can successfully flag
evaluate the false positive rate of our anomaly detector. The result all the obfuscated samples as DroidDream instances. In addition,
shows that 5.15% of clean apps are mistakenly recognized as sus- we also notice that AV2 produces a relatively high detection ra-
picious ones during anomaly detection. This means, if our anomaly tio (52.17%), but it also mistakenly flags those two clean samples
detector is applied to Google Play, among the approximately 1200 as malicious apps. Since the disguising technique simply renames
new apps per day [4], around 60 apps will be mislabeled as contain- the benign app package to the one commonly used by DroidDream
ing anomalies and be bounced back to the developers. We believe (and thus confuses this AV detector), such false positives again ex-
that this is an acceptable ratio for vetting purpose. Moreover, since plain that external symptoms are not robust and reliable features for
we do not reject the suspicious apps immediately, but rather ask malware detection.
the developers for justifications instead, we can further eliminate
these false positives during this interactive process. In addition, as 5.4 Runtime Performance
we add more benign samples into the dataset, the false positive rate Figure 12 illustrates the runtime performance of DroidSIFT. Specif-
will further decrease. ically, it demonstrates the cumulative time consumption of graph

You can also read