EPSRC Reference: |
EP/K032585/1 |
Title: |
Se-Ma-Match: Semantic Malware Matching |
Principal Investigator: |
King, Professor A |
Other Investigators: |
|
Researcher Co-Investigators: |
|
Project Partners: |
|
Department: |
Sch of Computing |
Organisation: |
University of Kent |
Scheme: |
Standard Research |
Starts: |
14 October 2013 |
Ends: |
13 October 2016 |
Value (£): |
245,525
|
EPSRC Research Topic Classifications: |
Fundamentals of Computing |
|
|
EPSRC Industrial Sector Classifications: |
|
Related Grants: |
|
Panel History: |
Panel Date | Panel Name | Outcome |
23 Jan 2013
|
EPSRC Research Institute APA & V
|
Announced
|
|
Summary on Grant Application Form |
The flood of malware samples is predicted to grow into a deluge in
2012, making the problem of maintaining a database of malware
signatures ever more difficult. For each new sample, it is important
to determine the threat that it poses.
In response to this, dynamic malware analysis
tools have been designed that execute the sample in a sandbox,
monitoring the actions of a sample. If these actions are similar
to those of malware that has been already indexed in the database,
then one might draw conclusions regarding provenance and severity
of the threat posed. If the sample does not match against known
malware, then it can be subject to manual scrutiny, using a dissembler
such as IDA Pro.
This Linnaean approach to malware analysis is both natural and
convenient: it is natural to group malware into families that share
common attributes; and it is provides a convenient way of assessing
threat. Yet the whole methodology is predicated on the accuracy
with which samples are characterised by their signatures. If a
sample is assigned a signature that does not express its behaviour,
then samples that are behaviourally distinct can be erroneously
grouped together. Conversely, samples which behave the same, but
appear different, can be accidentally placed in different groups.
The main problem with dynamic malware analysis tools is that they
execute the binary for a limited time, typically considering just
one path through the binary. This limits the actions that can be
observed, rendering the signature inaccurate for programs that
reveal their true behaviour later. In addition, the dynamic approach
can miss infrequent actions or logic bombs. The dynamic approach is
also susceptible to timing attacks that detect a tracer to turn off
some action. Above all, the signatures are based solely and only
on those actions that are encountered during the trace.
More static approaches have been applied too, at one extreme using
the call graph of the binary itself for classification, and at the
other deploying model checking techniques to search the paths through
call graph for signature behaviours that characterise known malware
families. Yet graph matching techniques are sensitive to control-flow
obfuscation and model checking requires the signature behaviours
to be known up-front and distilled into a temporal formula or an
automata.
A middle ground is offered by abstract interpretation since it
provides a way to systematically consider all paths, while monitoring
a program for actions that inform the construction of the signature.
Abstract interpretation provides a way to break the dichotomy between
the purely dynamic and the purely static approach to malware analysis
into a graduated continuum. Formally, purely static approach
(a.k.a. a static analysis) can be derived from the purely dynamic
approach (a.k.a. a tracer) by compositing a sequence of abstractions:
if all n abstractions are applied the result is the static analysis;
but if the first m < n abstractions are applied the result is a
hybrid. The challenge is to find the hybrid that provides sufficient
path coverage to undercover logic bombs yet is sufficiently robust
to be used by practitioners in the security sector. The proposed
project will discover this sweet point by following two complementary
lines of inquiry. Concrete traces will be abstracted to cover more
paths and more actions (at UCL). Static analyses, which covers all
paths, will be refined to avoid paths and actions that do not
actually occur (at Kent). Thus UCL will add missing information
to signatures (converging on the ideal signature from below) whilst
Kent will remove excess information from signatures (converging on
the ideal signature from above). By reflecting on the relative
merits of these approaches, we will draw conclusions on how to
construct robust signatures for malware classification and thereby
advance the whole field.
|
Key Findings |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Potential use in non-academic contexts |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Impacts |
Description |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk |
Summary |
|
Date Materialised |
|
|
Sectors submitted by the Researcher |
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
|
Project URL: |
|
Further Information: |
|
Organisation Website: |
http://www.kent.ac.uk |