February 22nd 2019 Joint work with Pascal Dupré, Gili Rusak, Giancarlo Pellegrino and Dan Boneh - Perceptual Ad-Blocking: Meet Adversarial Machine ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Perceptual Ad-Blocking: Meet Adversarial Machine Learning Florian Tramèr Palo Alto Networks February 22nd 2019 Joint work with Pascal Dupré, Gili Rusak, Giancarlo Pellegrino and Dan Boneh
The Future of Ad-Blocking? easylist.txt …markup… …URLs… ??? This is an ad Human distinguishability of ads > Legal requirement (U.S. FTC, EU E-Commerce) > Industry self-regulation on ad-disclosure 2
Perceptual Ad-Blocking § Ad Highlighter by Storey et al. > Visually detects ad-disclosures > Traditional Computer Vision techniques > Simplified version implementable in Adblock Plus § Sentinel by Adblock Plus > Locates ads in Facebook screenshots using neural networks > Not yet deployed 3
Perceptual Ad-Blocking § Ad Highlighter by Storey et al. > Visually detects ad-disclosures > Traditional Computer Vision techniques § Sentinel by Adblock Plus > Locates ads in Facebook screenshots using neural networks > Not yet deployed 4
How Secure is Perceptual Ad-Blocking? … so that Tom’s post gets blocked Jerry uploads malicious content … 5
Outline § Perceptual ad-blockers: how they work § Attacking perceptual ad-blockers § Why defending is hard 6
Outline § Perceptual ad-blockers: how they work § Attacking perceptual ad-blockers § Why defending is hard 7
How does a Perceptual Ad-Blocker Work? https://www.example.com Ad Disclosure Ad Classifier Classifier Data Collection and Training Page Segmentation Classification Action Template matching, OCR, DNNs, Object detector networks Ø Element-based (e.g., find all tags) [Storey et al. 2017] Ø Frame-based (segment rendered webpage into “frames”) Ø Page-based (unsegmented screenshots à-la-Sentinel) 8
Building a Page-Based Perceptual Ad-Blocker § Sentinel is not yet deployed so we rolled our own § Let’s aim bigger than just Facebook! (data collection on FB is a pain / privacy issue anyways...) § We trained an object detector neural network (YOLO-v3) on news websites from all G20 nations > Use filter lists to create a labelled dataset for training > Crop & replace ads for data augmentation (increase data diversity) 9
Outline § Perceptual ad-blockers: how they work § Attacking perceptual ad-blockers § Why defending is hard 11
The Current State of ML ML works well on average ≠ ML works well on adversarial data * *as long as there is no adversary 12
Adversarial Examples Szegedy et al., 2014 Goodfellow et al., 2015 - ≈ 2⁄255 § How? > Training ⟹ “tweak model parameters such that !( ) = %&'(&” > Attacking ⟹ “tweak input pixels such that !( ) = )*++,'” 13
(Meaningful) Defenses 14
Adversarial Examples for Perceptual Ad-Blockers 15
Attacking Page-Based Classifiers § Challenge 1: Input of model is a webpage screenshot > Attack must be implemented in HTML § Challenge 2: ! can’t control or predict the full input > E.g., publishers can’t modify or know contents of ad frames Classifier 16
Ad-Block Evasion § Goal: Make ads unrecognizable by ad-blocker § ! = Website publisher > Abilities: Inspect ad-blocker classifier(s) offline Change page DOM, CSS, JavaScript... Cannot modify content of ad frames § ! = Ad network (or advertisers) > Abilities: Inspect ad-blocker classifier(s) offline Arbitrary changes to content of ad frames 17
Evasion 1: Universal Transparent Overlay § Web publisher perturbs every rendered pixel Use HTML tiling to minimize perturbation size (20 KB) Ø 100% success rate on 20 webpages not used to create the overlay Ø The attack is universal: the overlay is computed once and works for all (or most) websites 18
Evasion 2: Perturbed Ads § Ad network perturbs served ads > Creating a single perturbation that works for every ad on every website is hard > Target a specific domain Alternative attack Ø Publisher perturbs background below ad frame Ø 100% success in evading ads Original With adversarial ad Ø 100% success rate for ads served on BBC.com Ø No CSS: Ad image is directly perturbed on the server Ø The perturbation is universal: It works for all ads (on this domain) 19
Ad-Block Detection § Goal: Trigger ad-blocker on “honeypot” content > Detect ad-blocking in client-side JavaScript or on server > Applicability of these attacks depends on ad-blocker type § ! = Website publisher > Abilities: Inspect ad-blocker classifier(s) offline Change page DOM, CSS Use client-side JavaScript to detect DOM changes 20
Detection: Perturb fixed page layout § Publisher adds honeypot in page-region with fixed layout > E.g., page header original With honeypot header 21
New Threats: Privilege Abuse § Ad-block evasion & detection is a well-known arms race. But there’s more! … so that Tom’s post gets blocked Jerry uploads malicious content … What happened? Ø Object detector model generates box predictions from full page inputs Ø Content from one user can affect predictions anywhere on page Ø Model’s segmentation is not aligned with web-security boundaries 22
Outline § Perceptual ad-blockers: how they work § Attacking perceptual ad-blockers § Why defending is hard 23
A Challenging Threat Model https://www.example.com Ad Disclosure Ad Classifier Classifier Data Collection and Training Page Segmentation Classification Action Ø DOM Obfuscation Adversarial Privilege Data Poisoning Examples Abuse Ø Resource Exhaustion Ø ! has white-box access to ad-blocker Ø ! can exploit False Negatives and False Positives in classification pipeline Ø ! prepares attacks offline ó The ad-blocker must defend against attacks in real-time in the user’s browser Ø ! can take part in crowd-sourced data collection 24
Defense Strategy 1: Obfuscate the Model § Attacks are easy if ! has access to the ML model > Hide model from adversary? § Obfuscate the ad-blocker? > It isn’t hard to create adversarial examples for black-box classifiers https://www.example.com https://www.example.com Ad Disclosure Ad Disclosure Ad Ad Classifier Classifier Classifier § Randomize the ad-blocker? (1) Pageand Data Collection (2) Classification Segmentation Training (1) Page Segmentation (3) Action(2) Classification (3) Action > Deploy different models - Adversarial examples that work against multiple models > Randomly change page before classifying - Adversarial examples robust to random transformations 25
Defense Strategy 2: Anticipate and Adapt § If ad-blocker is attacked (evasion or detection), collect adversarial samples and re-train the model > Or train on adversarial examples proactively § This is called Adversarial Training > New arms-race: ! finds new attacks and ad-blocker re-trains > Mounting a new attack is much easier than 1600 citations, 800 in 2018! updating the model > On-going research: so far ! always wins! Broke 7 defenses, a few days after they were accepted for publication 26
Defense Strategy 3: Simplify the Problem § Storey et al: recognize ad-disclosures > Simpler computer vision problem than full-page ad-detection > Light-weight and mature techniques (OCR, perceptual hashing, SIFT) § Adversarial Examples still exist 27
Take Away § Emulating human-like detection of ads could be the end-game for ad-blockers § But very hard with current computer vision techniques > Resisting adversarial examples is one of the most challenging open problems in ML security § Perceptual ad-blockers have to survive a strong threat model > Evasion & detection with adversarial examples > Privilege abuse attacks from arbitrary content providers > Similar threats for other ML-based ad-blockers (e.g., AdGraph?) Ø Train a page-based ad-blocker Ø Download pre-trained models Ø Attack demos http://arxiv.org/abs/1811.03194 https://github.com/ftramer/ad-versarial 28
You can also read