Open Source Intelligence - Giorgio Fumera source: Davide Ariu-UniCa
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Pattern Recognition and Applications Lab Open Source Intelligence Giorgio Fumera fumera@unica.it source: Davide Ariu – davide.ariu@pluribus-one.it University of Cagliari, Italy Department of Electrical and Electronic Engineering
Intelligence • Definition: the process and product of identifying, collecting, analyzing and refining information to make it useful to policymakers in making decisions — specifically, about potential threats to national security • Intelligence gathering – clandestine operations, secret or covert means, known only at the highest levels of government – information that is widely available • Can be used for both legitimate and nefarious purposes http://pralab.diee.unica.it https://www.fbi.gov/about-us/intelligence 1
Intelligence collection disciplines (*INT) Five intelligence collection disciplines: 1. HUMan INTelligence (HUMINT): the process of gaining intelligence from humans or individuals by analyzing behavioral responses through direct interaction 2. SIGnal INTelligence (SIGINT): electronic transmissions that can be collected by ships, planes, ground sites, or satellites – Communications Intelligence (COMINT): interception of communications between two parties 3. IMagery INTelligence (IMINT), or PHOTo INTtelligence (PHOTINT) 4. Measurement And Signatures INTelligence (MASINT): advanced processing and use of data gathered from overhead and airborne IMINT and SIGINT collection systems (e.g.: identifying chemical weapons) • TELemetry INTelligence (TELINT): data relayed by weapons during tests • ELectronic INTelligence (ELINT): electronic emissions picked up from modern weapons and tracking systems 5. Open Source INTelligence (OSINT): the process of gathering intelligence from publicly available resources (including the Internet) http://pralab.diee.unica.it 2
OSINT – Definitions • Open Source Information (OSINF): publicly available data – not necessarily free • OSINF collection: monitoring, selecting, retrieving, tagging, cataloguing, visualising & disseminating data • Open Source Intelligence (OSINT): proprietary intelligence recursively derived from OSINF, as a result of expert analysis Slide credit: C.H. Best, JRC – European Commission http://pralab.diee.unica.it 3
OSINT – Origins • The term 'OSINT' originates from Security Services • The practice of using OSINT to build intelligence is not new – Italy: OVRA (Organizzazione per la Vigilanza e la Repressione dell'Antifascismo) used OSINF since 1930 – Cold war: American and German secret services vs Russia • HUMINT, SIGINT and Classified information was largey preferred • Paradigm change: – 9/11: OSINT could have been use to foresee attacks – fast growth of the Internet, appearance of Social Networks • The 9/11 Commission Report: The need to restructure the intelligence community grows out of six problems that have become apparent before and after 9/11 – Structural barriers to performing joint intelligence work – Lack of common standards and practice across the foreign-domestic divide – Divided management of national intelligence capabilities – Weak capacity to set priorities and to move resources – Too many jobs – Too complex and secret http://pralab.diee.unica.it 4
OSINT – Who is involved? Minister General Commissioner CEO Analyst OSINF Classified Collector/Researcher Information Tool Builder/Developer Slide credit: C.H. Best, JRC – European Commission http://pralab.diee.unica.it 5
Who uses OSINT? • Security Services, Law Enforcement Agencies and Military Bodies • Governmental organisations – EU, NATO, AU Situation Centre – IAEA – Nuclear Safeguard – UN Department for Peacekeeping Operations – World Health Organisation – NGOs • All the large companies http://pralab.diee.unica.it 6
OSINT – Sources of information • Media – newspapers, magazines, radio, television, etc. • The Internet – news web sites, Social Networks, blogs, video sharing sites, thematic sites, etc. – deep Web (not indexed by traditional search engines): dynamic web pages, sites behind log-in, sites with a ROBOT.txt file properly configured – Dark Nets/Web (TOR, I2P) • Subscription services – LexisNexis (http://www.lexisnexis.com): a corporation providing computer-assisted legal research, business research and risk management services. During the 1970s it pioneered the electronic accessibility of legal and journalistic documents – Factiva (http://www.dowjones.com/products/product-factiva/): the world’s leading source of premium news, data and insight, with access to thousands of premium news and information sources on more than 22 million public and private companies – Jane's Information Group (www.janes.com): a British publishing company specialised in military, aerospace and transportation topics – BBC Monitoring (https://monitoring.bbc.co.uk) includes news, information and comment gathered from the mass media around the world for service subscribers http://pralab.diee.unica.it 7
OSINT – Sources of information • Commercial Satellites – http://www.euspaceimaging.com/applications/fields/security-defense- intelligence – https://www.maxar.com • Public Data – government reports, budgets, demographics, hearings, legislative debates, press conferences, speeches, marine and aeronautical safety warnings, environmental impact statements and contract awards • Professional and Academic – conferences, professional associations (e.g., IEEE, ACM), academic papers, and subject matter experts • Open Data – https://open-data.europa.eu/en/data – http://www.dati.gov.it – http://www.datiopen.it – Geospatial Data Providers – for an exhaustive list see: https://en.wikipedia.org/wiki/List_of_GIS_data_sources http://pralab.diee.unica.it 8
*INT targets: individuals Potentially interesting information – physical locations – OSN profiles for checking on relationships, contacts, content sharing, preferred web sites, etc. – e-mail addresses, users’ handles and aliases available on the Internet including infrastructure owned by the individual such as domain names and servers – associations and historical perspective of the work performed including background details, criminal records, owned licenses, registrations, etc.: • public data provided by official databases • private data provided by professional organizations – released intelligence such as content on blogs, journal papers, news articles, and conference proceedings – mobile information including phone numbers, device type, applications in use, etc Source: Targeted Cyber Attacks Multi-staged – Attacks Driven by Exploits http://pralab.diee.unica.it and Malware, Elsevier, 2014 9
*INT targets: corporations and organisations Potentially interesting information – determining the nature of business and work performed by target corporations and organizations to understand the market vertically – fingerprinting infrastructure including IP address ranges, network peripheral devices for security and protection, deployed technologies and servers, web applications, informational web sites, etc. – extracting information from exposed devices on the network such as CCTV cameras, routers, and servers belonging to specific organizations – mapping hierarchical information about the target organizations to understand the complete layout of employees at different layers including ranks, e-mail addresses nature of work, service lines, products, public releases, meeting, etc. – collecting information about the different associations including business clients and business partners – extracting information out of released documents about business, marketing, financial, and technology aspects – gathering information about the financial stand of the organization from financial reports, trade reports, market caps, value history, etc. Source: Targeted Cyber Attacks Multi-staged – Attacks Driven by Exploits http://pralab.diee.unica.it and Malware, Elsevier, 2014 10
*INT target modes: investigating cyber-attacks Domains registered by criminals for • counterfeiting goods • data exfiltration • exploit attacks • illegal pharma • infrastructure (ecrime name resolution) • malware C&C • malware distribution, ransomware • phishing, business email compromise • scams (419, reshipping, stranded traveler…) http://pralab.diee.unica.it 11
OSINT processes TECHNICAL ISSUES • data mapping • data deduping • data cleansing • data conversion • data linking • data normalisation Visualise Collect Transform Analyse Collaborate & Report • Geo-tagging • Translation • Link Analysis • Relationships • Networks • Intel Wiki • Entity Extraction • Relations • Entity Resolution • Geolinking • IM • Trends • Time graphs • Case DB • Statistics • Maps • Publish • Multilingual Information Retrieval connecting • Search the dots • Crawl • News feeds generating actionable intelligence • Machine Translation http://pralab.diee.unica.it Slide credit: C.H. Best, JRC – European Commission 12
Information collection issues How to search for textual information? – search engines • generic, e.g.: Google, Yahoo, Bing, Baidu (Chinese, Japanese), Sogou (Chinese), Soso.com (Chinese) • thematic, e.g.: computers and devices (Shodan), maps (Bing, Google, Nokia, Yahoo!), people (Spokeo), source code (Koders, Krugle, Google Code Search) – libraries (e.g.: Lexis Nexis, IEEE Xplore, ACM Digital Library) How to extract textual information? – API – information access constraints, subject to change; platform-specific – scraping: ad-hoc source code for each platform; noise has to be removed; open solutions exist (need to merge results) 13 http://pralab.diee.unica.it 13
14 Information collection issues Non-textual information (difficult to automate extraction) • Images – people (who)? places? texts? objects? • Videos – people (who)? places? text? objects? (same as for images) – if video contains audio: transcription, translation, speaker identification • Audio traces – transcription – translation • Other files – e.g.; executables files, proprietary formats http://pralab.diee.unica.it 14
15 Information collection: language issues • Culture • Hi-Tech • Sport • Magazine • Art • Health-Environment • Politic • Short news • Religion-Thought • Society • International • University • Economy • Provinces Howzeh • Markets • Photo • Video http://pralab.diee.unica.it 15
16 Information collection: language issues English version: https://en.mehrnews.com/ http://pralab.diee.unica.it Slide Credit: C.H. Best, JRC – European Commission 16
17 Information collection: language issues Persian version: https://www.mehrnews.com/ • News • Culture • Literature • Religion • University • Social • Economic • Political • International • Sport • Nuclear • Photo http://pralab.diee.unica.it Slide Credit: C.H. Best, JRC – European Commission 17
18 Information collection issues: Social Network analysis • Privacy restrictions – third party application developers create applications that ask for unneeded permissions to gain additional information • Platform restrictions – based on social relationships, user-based privacy settings, rate limiting, activity monitoring, and IP address based restrictions • Data availability – users did not provide information – the target information exists, but is "hidden" by privacy and platform restrictions • Data longevity – relationship dynamics change frequently, profiles are updated constantly – each data access is a snapshot of the social graph at collection time • Legal issues – disallowing screen scrapers and other data mining tools through ToS agreements, but legal enforceability remains unclear • Ethical issues – crawling social networks for personal information is an ethically sensitive area Source: B. R. Holland, Enabling Open Source Intelligence (OSINT) in private social networks http://pralab.diee.unica.it 18
Information collection issues: Social Network analysis http://pralab.diee.unica.it 19
20 Information collection issues: Social Network analysis http://pralab.diee.unica.it 20
21 Information transformation: linguistic requirements Foreign language skills and knowledge proficiency: – transcription: both listening and writing proficiency in the source language are essential – interpretation: both listening in the source language and speaking proficiency in the target language are essential – translation: bilingual competence is a prerequisite for translation; linguists must be able to • read and comprehend the source language • write comprehensibly in the target language • choose the equivalent expression in the target language that fully conveys and best matches the meaning intended in the source one http://pralab.diee.unica.it Source: Open-Source Intelligence, Federation of American Scientists, 2012 21
22 Information transformation: entity resolution • The problem of identifying and linking/grouping different manifestations of the same real world object • Examples of manifestations and objects: – Different ways of addressing (names, email addresses, FaceBook accounts) the same person in text – Web pages with differing descriptions of the same business – Different photos of the same object Source: L. Getoor, A. Machanavajjhala - Entity Resolution Tutorial http://pralab.diee.unica.it 22
23 Information transformation: entity resolution Enzo Ferrari e i suoi piloti «Un padrone delle ferriere». Era questo il modo singolare, ma per certi versi affettuoso, con cui Clay Regazzoni amava definire Enzo Ferrari. E che il Drake fosse un vero padrone, e fatto che nessuno dei piloti che hanno fatto tappa a Maranello puo mettere in discussione. Era lui, Ferrari, che stabiliva simpatie e antipatie, ordini e concessioni, stipendi e provvigioni. Su una cosa soltanto non concedeva margini neppure a se stesso: il valore di chi correva per lui. A patto, pero, che il nome del pilota non avesse il sopravvento, nella popolarita, sul nome delle macchine. Arrivando a tempi piu recenti, il pilota che piu affascinò Enzo Ferrari fu Niki Lauda. Fortemente parsimonioso e terribilmente abile nella trattativa economica, in cui eccelleva peraltro anche il Drake, Niki racconta che Ferrari ad un certo punto gli affibbio un curioso soprannome: «Mi chiamava ebreo, probabilmente perche mi riteneva anche un buon commerciante della mia professionalità. A fine luglio 1977, quando l'ex campione del mondo aveva gia firmato per la Brabham Alfa Romeo, Ferrari rivelo un'ammissione di Lauda. «Fino a quando lei sara vivo io guidero per lei», questo disse Niki al Drake, nel frattempo da dieci anni ingegnere honoris causa. Ma alla fine di agosto, Lauda si recò a Maranello e disse a Ferrari che non avrebbe guidato piu le sue macchine. «Se Lauda fosse restato con noi avrebbe almeno eguagliato il record di Fangio di cinque titoli mondiali vinti», confesso Ferrari tempo dopo. Non perdonò mai Lauda e non lo rivolle in Ferrari quando l'austriaco si offerse. Il perdono arrivò anni dopo, poco prima della morte del Drake. L'ultimo pilota, nella classifica degli amori tecnici di Enzo Ferrari, fu Gilles Villeneuve. Il Grande Vecchio era un umorale, quando Lauda lo lasciò fece una scommessa con se stesso: prender un signor nessuno e portarlo al titolo mondiale. http://pralab.diee.unica.it 23
24 Information transformation: entity resolution Enzo Ferrari e i suoi piloti «Un padrone delle ferriere». Era questo il modo singolare, ma per certi versi affettuoso, con cui Clay Regazzoni amava definire Enzo Ferrari. E che il Drake fosse un vero padrone, e fatto che nessuno dei piloti che hanno fatto tappa a Maranello puo mettere in discussione. Era lui, Ferrari, che stabiliva simpatie e antipatie, ordini e concessioni, stipendi e provvigioni. Su una cosa soltanto non concedeva margini neppure a se stesso: il valore di chi correva per lui. A patto, pero, che il nome del pilota non avesse il sopravvento, nella popolarita, sul nome delle macchine. Arrivando a tempi piu recenti, il pilota che piu affascinò Enzo Ferrari fu Niki Lauda. Fortemente parsimonioso e terribilmente abile nella trattativa economica, in cui eccelleva peraltro anche il Drake, Niki racconta che Ferrari ad un certo punto gli affibbio un curioso soprannome: «Mi chiamava ebreo, probabilmente perche mi riteneva anche un buon commerciante della mia professionalità. A fine luglio 1977, quando l'ex campione del mondo aveva gia firmato per la Brabham Alfa Romeo, Ferrari rivelo un'ammissione di Lauda. «Fino a quando lei sara vivo io guidero per lei», questo disse Niki al Drake, nel frattempo da dieci anni ingegnere honoris causa. Ma alla fine di agosto, Lauda si recò a Maranello e disse a Ferrari che non avrebbe guidato piu le sue macchine. «Se Lauda fosse restato con noi avrebbe almeno eguagliato il record di Fangio di cinque titoli mondiali vinti», confessò Ferrari tempo dopo. Non perdonò mai Lauda e non lo rivolle in Ferrari quando l'austriaco si offerse. Il perdono arrivò anni dopo, poco prima della morte del Drake. L'ultimo pilota, nella classifica degli amori tecnici di Enzo Ferrari, fu Gilles Villeneuve. Il Grande Vecchio era un umorale, quando Lauda lo lasciò fece una scommessa con se stesso: prender un signor nessuno e portarlo al titolo mondiale. http://pralab.diee.unica.it 24
25 Information transformation: entity resolution Enzo Ferrari e i suoi piloti «Un padrone delle ferriere». Era questo il modo singolare, ma per certi versi affettuoso, con cui Clay Regazzoni amava definire Enzo Ferrari. E che il Drake fosse un vero padrone, e fatto che nessuno dei piloti che hanno fatto tappa a Maranello puo mettere in discussione. Era lui, Ferrari, che stabiliva simpatie e antipatie, ordini e concessioni, stipendi e provvigioni. Su una cosa soltanto non concedeva margini neppure a se stesso: il valore di chi correva per lui. A patto, pero, che il nome del pilota non avesse il sopravvento, nella popolarita, sul nome delle macchine. Arrivando a tempi piu recenti, il pilota che piu affascinò Enzo Ferrari fu Niki Lauda. Fortemente parsimonioso e terribilmente abile nella trattativa economica, in cui eccelleva peraltro anche il Drake, Niki racconta che Ferrari ad un certo punto gli affibbio un curioso soprannome: «Mi chiamava ebreo, probabilmente perche mi riteneva anche un buon commerciante della mia professionalità. A fine luglio 1977, quando l'ex campione del mondo aveva gia firmato per la Brabham Alfa Romeo, Ferrari rivelo un'ammissione di Lauda. «Fino a quando lei sara vivo io guidero per lei», questo disse Niki al Drake, nel frattempo da dieci anni ingegnere honoris causa. Ma alla fine di agosto, Lauda si recò a Maranello e disse a Ferrari che non avrebbe guidato piu le sue macchine. «Se Lauda fosse restato con noi avrebbe almeno eguagliato il record di Fangio di cinque titoli mondiali vinti», confesso Ferrari tempo dopo. Non perdonò mai Lauda e non lo rivolle in Ferrari quando l'austriaco si offerse. Il perdono arrivò anni dopo, poco prima della morte del Drake. L'ultimo pilota, nella classifica degli amori tecnici di Enzo Ferrari, fu Gilles Villeneuve. Il Grande Vecchio era un umorale, quando Lauda lo lasciò fece una scommessa con se stesso: prender un signor nessuno e portarlo al titolo mondiale. http://pralab.diee.unica.it 25
26 Information transformation: entity resolution Before After http://pralab.diee.unica.it Slide Credit: L. Getoor, A. Machanavajjhala - Entity Resolution Tutorial 26
27 Information transformation: entity resolution Traditional challenge: name/attribute ambiguity Tom Cruise Michael Jordan http://pralab.diee.unica.it Slide Credit: L. Getoor, A. Machanavajjhala - Entity Resolution Tutorial 27
Information transformation: entity resolution Other challenges – errors due to data entry – changing attributes – abbreviations/data truncation V. Rossi Valentino Rossi Vasco Rossi Valeria Rossi 28 http://pralab.diee.unica.it 28
29 Information transformation: entity resolution Abstract problem statement http://pralab.diee.unica.it Slide Credit: L. Getoor, A. Machanavajjhala - Entity Resolution Tutorial 29
Information transformation: entity resolution Deduplication: clustering the record mentions that correspond to the same entity 30 http://pralab.diee.unica.it Slide Credit: L. Getoor, A. Machanavajjhala - Entity Resolution Tutorial 30
Information transformation: entity resolution Deduplication: clustering the record mentions that correspond to the same entity Computing cluster's representatives 31 http://pralab.diee.unica.it Slide Credit: L. Getoor, A. Machanavajjhala - Entity Resolution Tutorial 31
32 Information transformation: entity resolution Linking records that match across databases http://pralab.diee.unica.it Slide Credit: L. Getoor, A. Machanavajjhala - Entity Resolution Tutorial 32
Information transformation: entity resolution 33 http://pralab.diee.unica.it Slide Credit: L. Getoor, A. Machanavajjhala - Entity Resolution Tutorial 33
34 Open source information reliability Two types of sources used to collect information – primary sources: a document or physical object that was written or created during the time under study • original documents (excerpts or translations) such as diaries, constitutions, research journals, speeches, manuscripts, letters, oral interviews, news film footage, autobiographies, and official records • creative works such as poetry,drama,novels,music,and art • relics or artifacts such as pottery, furniture, clothing, artifacts, and buildings • personal narratives and memoirs • person of direct knowledge – secondary sources • journals that interpret findings • yextbooks • magazine articles • commentaries • histories • criticism • encyclopedias http://pralab.diee.unica.it Source: Open-Source Intelligence, Federation of American Scientist, 2012 34
35 Open source information reliability http://pralab.diee.unica.it Source: Open-Source Intelligence, Federation of American Scientist, 2012 35
36 Open source information credibility http://pralab.diee.unica.it Source: Open-Source Intelligence, Federation of American Scientist, 2012 36
37 Link analysis • Basic problem for intelligence analysts: putting information together in an organized way to make it easier extracting meaning into a graphical format • Link analysis can be applied to relationships among identified entities 1. assemble all raw data 2. determine focus of the chart 3. construct an association matrix 4. code the associations in the matrix 5. determine the number of links for each entity 6. draw a preliminary chart (not covered in these slides) 7. clarify and re-plot the chart (not covered in these slides) http://pralab.diee.unica.it Source: Criminal Intelligence – United Nations Office of Drugs and Crime 37
38 Link analysis http://pralab.diee.unica.it Source: Criminal Intelligence – United Nations Office of Drugs and Crime 38
39 Link analysis – Example 1. Assemble all raw data – sssemble all relevant files, field reports, informant reports, records, etc. 2. Determine the focus of the chart – identify the entities that will be the focus of your chart (names of people and/or organizations, auto license numbers, addresses, etc.) 3. Construct an association matrix – an essential, interim step to identify associations between entities http://pralab.diee.unica.it Source: Criminal Intelligence – United Nations Office of Drugs and Crime 39
40 Link analysis – Example 4. Code the associations in the matrix http://pralab.diee.unica.it Source: Criminal Intelligence – United Nations Office of Drugs and Crime 40
41 Link analysis – Example 5. Determine the number of links for each entity http://pralab.diee.unica.it Source: Criminal Intelligence – United Nations Office of Drugs and Crime 41
Preparation & Tools http://pralab.diee.unica.it
Questions to ask before an investigation • Should you hide your activities from bad actors? – criminals may block IPs of known investigators – they may also monitor activity • Do you want to leave crumbs associated with investigations that are traceable back to you? – log records, metadata at third party intelligence sources • Do you want resources you use to leave crumbs on your devices – cookies, plug-ins, or worse… http://pralab.diee.unica.it 43
OnionWRT: Tor router 1. Buy a $20 micro router or Raspberry Pi 2. Install OpenWRT and OnionWRT 3. Investigate over TOR from behind router 4. Put all your devices behind your router WiFi Encryption http://www.securityskeptic.com/2016/01/how-to-turn-a-nexx-wt3020-router-into-a-tor-router.html http://pralab.diee.unica.it 44
Software to anonymize traffic • https://www.torproject.org/projects/projects.html.en – Amnesic Incognito Live System (TAILS) Linux distribution – Tor browser • Disposable, anonymous inboxes – https://mailinator.com/ • Browser tricks – Incognito/private mode can still be tracked – User agent changes (can do with cURL as well) http://pralab.diee.unica.it 45
Recon-Ng https://github.com/lanmaster53/recon-ng • A full-featured open-source Web reconnaissance framework written in Python – geolocating an IP address – finding the domains associated with a given email address – ... • A completely modular framework with independent modules, database interaction, built in convenience functions http://pralab.diee.unica.it 46
Recon-Ng modules – Discovery • discovery/info_disclosure/interesting_files – Exploitation • exploitation/injection/command_injector • exploitation/injection/xpath_bruter – Import • import/csv_file • import/list – Recon (60 modules) • recon/companies-multi/whois_miner • recon/domains-credentials/pwnedlist/leak_lookup • recon/hosts-hosts/ipinfodb • recon/profiles-profiles/twitter – Reporting • reporting/csv • reporting/html • reporting/json • reporting/list http://pralab.diee.unica.it 47
Case Study http://pralab.diee.unica.it
49 Maltego Open Source Intelligence tool Maltego (Community Edition, CE) https://www.maltego.com http://pralab.diee.unica.it 49
You can also read