A Power Law for Cybercrime - Deepak Chandarana & Richard Overill Department of Computer Science King's College London

Page created by Tina Wells
 
CONTINUE READING
A Power Law for Cybercrime - Deepak Chandarana & Richard Overill Department of Computer Science King's College London
A Power Law for Cybercrime

 Deepak Chandarana & Richard Overill
   Department of Computer Science
       King’s College London
      richard.overill@kcl.ac.uk
Overview
„   Introduction to cybercrime
„   Power law characterisation
„   Examples of power law relationships
„   Data collection
„   Results of analysis
„   Interpretation of results
„   Conclusion

                                          2
Introduction
„   Cybercrime refers to Internet and computer
    related crime
    ‰   Credit card fraud, Financial fraud, Identity fraud,
        Cyber-extortion, Cyber-sabotage, Cyber-espionage,...
    ‰   Viruses, Worms, Logic Bombs, Trojan Horses, RATs,
        Rootkits, Denial of Service attacks, Phishing attacks,..
„   A growing and evolving form of crime
„   Cost estimated at £1.5 trillion pa world-wide
„   Poses many challenges for organisations,
    governments and law enforcement
                                                             3
Who carries out Cybercrime?

„   Insiders (employees)
„   Hackers (cyber-mercenaries)
„   Criminals (serious & organised crime)
„   Terrorists (sub-state groups)
„   Corporations (commercial espionage)
„   Government agencies (counterintelligence)

                                                4
Their Motives

„   There are many motives:
    ‰   Revenge, ideology, competition, money, influence

„   Two main classes:
    ‰   Intrinsic: motivated by internal factors
    ‰   Extrinsic: motivated by external factors

„   These motivations may also be combined

                                                           5
Power Law Characterisation
„   Probability of measuring a particular value of some
    quantity varies inversely as a power of that value
                              −α
                 p( x) = Cx
    ‰   α is the exponent of the power law
    ‰   C is the probability normalisation constant

„   If logarithms are taken:
             log( p ( x )) = −α log( x ) + log(C )
„   Has a straight line form with gradient -α:
                y = mx + c
                                                          6
Histograms - 1
„   Number of values that fit into a data range is
    counted
„   Midpoint of the range is plotted on the x-axis
„   Frequency of each range is plotted on the y-axis
„   Problems with histograms:
    ‰   Trends can be hidden
    ‰   What bin size to use?
    ‰   Noise in the tail (due to low frequency values)

                                                          7
Histograms -2
„   Logarithmic scales
„   Problem of noise in the tail (Newman, 2005)

„   Partially overcome noise by using logarithmic binning.
    Vary the sizes of the bins using a fixed multiplier

                                                             8
Cumulative Distribution Function (CDF)
„   The CDF defines the probability P(Xx) that
    X has a value greater than x
„   X is plotted on the x-axis (the abscissa)
„   CDF/CCDF is plotted on the y-axis (the ordinate)
„   Advantage 1: the CDF is well-defined for values of X which
    have low probability (the tail)
„   Advantage 2: the CDF on a log-log plot is a straight line
    with gradient -(α-1)

                                                            9
Calculating the Exponent α
„   Line of best fit (linear regression) introduces
    serious inaccuracies (Goldstein et al., 2004)
„   Use a MLE formula for α (Newman, 2005):
                                   −1
                   ⎡ n
                           xi ⎤
         α = 1 + n ⎢ ∑ ln      ⎥
                   ⎣ i =1 xmin ⎦
    ‰   n is the number of points
    ‰   xi, i = 1…n are the measured values of x
    ‰   xmin is the minimum value of x for which the power
        law behaviour holds - power laws diverge as x
        approaches zero
                                                         10
Calculating xmin
„   The distribution deviates from the power law below xmin
„   Solar Flare example (Newman, 2005):

„   xmin obtained by inspection of the graph is not accurate
„   Use the Kolmogorov-Smirnov D-statistic to determine xmin
    from a goodness-of-fit test against the empirical CDF:
„   D=maxi(P(X>=xi) - (n-i)/n, (n-i+1)/n - P(X>=xi))
                                                               11
Fatal Quarrels

„   Lewis Fry Richardson
    (1948) carried out work
    into the statistics of fatal
    quarrels from 1820-1945
„   Data was placed into
    ranges and plotted on
    logarithmic scales

                                   12
Conventional War

„   Newman (2005)
    considered the
    cumulative distribution
    of the intensity of 119
    wars from 1816-1980

„   He calculated the
    exponent to be 1.80

                              13
Terrorism
„   Clauset & Young (2005)
    considered terrorist
    attacks 1968-2004
„   They divided events
    into two categories:
    ‰   G7 countries follow a
        power law with exponent
        1.71
    ‰   Non-G7 countries have
        an exponent of 2.5

                                  14
Why different Exponents?

„   Terrorist attacks in industrialised nations are
    relatively rare but tend to be large when they
    do occur (higher levels of security)

„   Attacks in the less industrialised world tend to
    be smaller, but more frequent, events (lower
    levels of security)

                                                      15
Aims of this research

„   Investigate whether cybercrime conforms to a
    power law model
„   Compare with conventional war and terrorism
    models

                                               16
Collecting & Selecting the Data
„   Many data sources were initially considered
    (UK DTI, UK NHTCU, ACCSS, etc.)
„   Computer Security Institute / Federal Bureau of
    Investigation (CSI/FBI) Annual Computer
    Crime and Security Survey was finally selected
„   Most complete historical data set (1997-2006)
„   x-value = total amount of money lost from an
    attack (direct + collateral losses) in $US
„   Crimes for which the historical data set is
    incomplete (e.g. web-site defacement) are
    omitted, but are used in re-sampling tests
                                                  17
Cumulative Distribution Function

„   Produces a curve, not a straight line, indicating that a single
    power law relationship does not exist

                                                                      18
Dividing the Curve - 1

                         19
Dividing the Curve -2
„   Graph is divided into left and right sides
„   To get an overall fit the weighted Pearson’s product
    moment correlation coefficient is optimised wrt the
    position of the dividing point:
      nl = the number of points on the left side of the graph
      nr = the number of points on the right side of the graph
      rl 2 = correlation coefficient for the left side
      rr2 = correlation coefficient for the right side
      n = nl + nr
      r 2 = weighted mean of the correlation coefficient of the graph
          nl .rl 2 + nr .rr2
      r =
       2

                   n
                                                                        20
Dividing the Curve - 3

                         21
Division of Crimes - 1
„   Calculate percentage of data points each type
    of crime represents
„   To identify the most prevalent crimes:
    ‰   Absolute test: A crime that represents less than
        10% of the data is not considered
    ‰   Relative test: If a crime appears on both sides and
        if its percentage on one side is less than half its
        percentage on the other side then the smaller
        percentage is not considered

                                                              22
Division of Crimes - 2
                            Total Annual Losses ($)

Crimes on Left Side    Insider Abuse of Net Access
                       Laptop Theft
                       Sabotage of Data of Networks
                       System Penetration
                       Telecom Fraud
                       Unauthorised Insider Access
Crimes on Right Side   Financial Fraud
                       Insider Abuse of Net Access
                       Theft of Proprietary Information
                       Malware: viruses, worms, Trojans

                                                      23
Division of Crimes - 3
„   Left side = intrinsic (and combined) crimes
„   Right side = extrinsic crimes
    ‰   Primarily money motivated
    ‰   Follow a more targeted and organised approach
    ‰   Organised crime in cybercrime
    ‰   May also be an element of crimes on the left side
„   Why is organised crime in cybercrime?
    ‰   Anonymity of the Internet
    ‰   Trans-border in nature
    ‰   Weak international laws
    ‰   Big money to be made!

                                                            24
What does the Exponent tell us? - 1

        Cybercrime Cybercrime Conventional    G7      Non-G7
         90 subset) (122 set)     war      Terrorism Terrorism
Left       1.78       1.60
Side
                                  1.80        1.71      2.5
Right      2.60       2.55
Side

                                                              25
What does the Exponent tell us? - 2
„   Crimes on the right side are targeted against
    larger organisations with stronger defensive
    measures
„   The attacks succeed less frequently, but are
    large events when they do happen
„   The attacks on the left side are smaller in
    scale and carried out on organisations with
    weaker defences
„   The attacks succeed with greater frequency,
    but have a smaller financial impact

                                                    26
Adapting the Model - 1
„   Johnson et al.
    (2005) analysed the
    ongoing conflicts in
    Colombia and Iraq
    between 1988-2004

„   They considered
    how the power law
    exponent changed
    over time

                           27
Adapting the Model - 2
„   They put forward a model of insurgent warfare to
    explain the power law behaviour of conventional war
    and terrorism
„   We adapt this model to the domain of cybercrime
„   Attack unit - group of people that can organise
    themselves to act as a single unit
„   Attack strength - amount of money lost due to an
    event carried out by this attack unit
„   Strength of the attack unit depends on the skill of its
    members and the electronic weapons they possess

                                                          28
Adapting the Model - 3
„   The left side gives a similar picture to war:
    ‰   There can be a wide distribution of attack units
    ‰   Crimes such as System Penetration or
        Unauthorised Insider Access can be carried out
        by a single person or a group of attackers
    ‰   These attack units have different attack strengths
    ‰   As a result there is a wider variation in crimes that
        occur on the left side compared to the right side

                                                            29
Adapting the Model - 4
„   The right side is comparable to terrorism in
    non-G7 countries
    ‰   More organised nature of the crimes
    ‰   Consider an organised crime group bringing
        together a number of hackers to form an attack
        unit of a specific strength
    ‰   Carry out the attack then disperse the group to
        help avoid detection
    ‰   More transient attack units whose attack strengths
        change dynamically due to their continual
        fragmentation and coalescence.
                                                         30
Implications
„   Organisations, governments and law
    enforcement agencies are fighting enemies
    with two different attack styles and motivations
„   The left side contains crimes of a more
    intrinsic nature
    ‰   Variations in the size and strength of attack units
    ‰   Attack units are more static in structure
„   The right side contains crimes of a more
    extrinsic nature
    ‰   Attack units are more dynamic in structure

                                                              31
Summary & Conclusions
„   Reviewed the power law relationships found in
    warfare
„   For cyber-crime (in USA) a single power law
    relationship does not exist
„   Evidence to indicate that a double power law
    relationship holds
„   Left side characteristic of conventional warfare
„   Right side characteristic of non-G7 terrorism

                                                  32
References
„   L F Richardson (1948) J Amer Stat Assoc 43 523-546
„   L F Richardson (1960) Statistics of Deadly Quarrels, Boxwood
    Press
„   L-E Cederman (2003) Amer Polit Sci Rev 97 135-150
„   M L Goldstein et al. (2004) Eur Phys J B41 255-258
„   M E J Newman (2005) Contemp Phys 46 323-351
„   A Clauset & M Young (2005) arxiv.org/abs/physics/0502014/
„   N Johnson et al. (2005) arxiv.org/abs/physics/0506213/
„   D Chandarana & R E Overill (2007) J Information Warfare
    (to be submitted)

                                                             33
Questions

            34
You can also read