Adversarial Machine Learning for Protecting Against Online Manipulation
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
SPOTLIGHT Adversarial Machine Learning for Protecting Against Online Manipulation Stefano Cresci , IIT-CNR, 56124, Pisa, Italy, Marinella Petrocchi , IIT-CNR, 56124, Pisa, Italy, and also Scuola IMT Alti Studi Lucca, 55100, Lucca, Italy Angelo Spognardi di Roma, 00161, Rome, Italy , Sapienza Universita Stefano Tognazzi, Konstanz University, 78464, Konstanz, Germany Adversarial examples are inputs to a machine learning system that result in an incorrect output from that system. Attacks launched through this type of input can cause severe consequences: for example, in the field of image recognition, a stop signal can be misclassified as a speed limit indication. However, adversarial examples also represent the fuel for a flurry of research directions in different domains and applications. Here, we give an overview of how they can be profitably exploited as powerful tools to build stronger learning models, capable of better- withstanding attacks, for two crucial tasks: fake news and social bot detection. T he year was 1950, and in his paper “Computing researchers showed how it was possible to fool the Machinery and Intelligence,” Alan Turing asked classifier, specifically an ensemble of neural networks this question to his audience: “Can a machine called ConvNets, by adding noise to the image of a think rationally?” A question partly answered by the panda. The program classified the panda plus the machine learning (ML) paradigm, whose traditional added noise as a gibbon, with a 99% confidence. The definition is as follows: “A computer program is said to modified image is called adversarial example.2 For- learn from experience E with respect to some class of mally, given a data distribution pðx; yÞ over images x tasks T and performance measure P, if its performance and labels y and a classifier f such that fðxÞ ¼ y, an at tasks in T, as measured by P, improves with experi- adversarial example is a modified input x~ ¼ x þ d such ence E.”1 If we define the experience E as “what data that d is a very small (human-imperceptible) perturba- to collect,” the task T as “what decisions the software tion and fð~ xÞ 6¼ y, namely x~ is misclassified while x needs to make,” and the performance measurement P was not. Still in visual recognition, it is possible to “per- as “how we will evaluate its results,” then it becomes turb” a road sign that reproduces, e.g., a stop sign by possible to evaluate the capability of the program to placing small stickers on it, so that the classifier iden- complete the task correctly—that is, to recognize the tifies it as a speed limit sign3 [see Figure 1(a)]. Another type of data, evaluating its performance. noteworthy attack exploits the so-called “adversarial To date, ML helps us to achieve multiple goals, it patches”: the opponent does not even need to know provides recommendations to customers based on the target that the classifier has been trained to rec- their previous purchases or gets rid of spam in the ognize. Simply adding a patch to the input can lead inbox based on spam received previously, just to the system to decide that what it has been given to name a couple of examples. In the image recognition classify is exactly what the patch represents. The case field, the above mentioned program has been trained of the trained model exchanging a banana for a by feeding it different images, thus learning to distin- toaster, having patched an adversarial example next guish them. However, in 2014, Google and NYU to the banana, became popular.4 The few previous examples demonstrate the risks caused by the vulnerability of ML systems to intentional data manipulations, for the field of computer vision. 1089-7801 ß 2021 IEEE In recent years, ML was also at the core of a pleth- Digital Object Identifier 10.1109/MIC.2021.3130380 Date of publication 26 November 2021; date of current ora of efforts in other domains. In particular, some of version 16 May 2022. those that have seen massive application of ML and March/April 2022 Published by the IEEE Computer Society IEEE Internet Computing 47
SPOTLIGHT FIGURE 1. Adversarial examples and their consequences, for a few notable ML tasks. (a) Computer vision. Images can be modi- fied by adding adversarial patches so as to fool image classification systems (e.g., those used by autonomous vehicles); (b) Auto- matic speech recognition. Adding adversarial noise to a speech waveform may result in wrong textual translations; (c) Social bot detection. Similarly to computer vision and automatic speech recognition, adversarial attacks can alter the features of social bots, without impacting their activity, thus allowing them to evade detection; (d) Fake news detection. Tampering with the tex- tual content of an article, or even with its comments, may yield wrong article classifications. AI are intrinsically adversarial—that is, they feature Figure 2, despite its many advantages. Current the natural and inevitable presence of adversaries endeavors mainly focus on the identification of adver- motivated in fooling the ML systems. In fact, adversar- sarial attacks, and only seldom on the development ial examples are extremely relevant in all security- of solutions that leverage adversarial examples for sensitive applications, where any misclassification improving detection systems. The application of induced by an attacker represents a security threat. A AML in these fields is still largely untapped, and its paramount example of such applications is the fight study will provide valuable insights for driving future against online abuse and manipulation, which often research efforts and getting practical advantages. come under the form of fake news and social bots. So how to defend against attackers who try to deceive the model through adversarial examples? It FAKE NEWS AND SOCIAL BOTS turned out that adversarial examples are not exclusively Fake news are often defined as “fabricated information a threat to the reliability of ML models. Instead, they can that mimics news media content in form but not in orga- also be leveraged as a very effective mean to strengthen nizational process or intent.”5,6 Their presence has been the models themselves. A “brute force” mode, the so- documented in several contexts, such as politics, vacci- called Adversarial Training, sees the model designers nations, food habits, and financial markets. pretend to be attackers: they generate several adversar- False stories have always circulated centuries ial examples against their own model and, then, train the before the Internet. One thinks, for instance, of the model not to be fooled by them. maneuvers carried out by espionage and counterespi- Along these lines, in the rest of this article, we onage to get the wrong information to the enemy. If briefly survey relevant literature on adversarial exam- fake stories have always existed, why are we so con- ples and Adversarial Machine Learning (AML). AML cerned about them now? The advent of the Internet, aims at understanding when, why, and how learning while undoubtedly facilitating the access to news, has models can be attacked, and the techniques that can lowered the editorial standards of journalism, and its mitigate attacks. We will consider two phenomena open nature has led to a proliferation of user-gener- whose detection is polluted by adversaries and that ated content, unscreened by any moderator.7 are bound to play a crucial role in the coming years for Usually, fake news is published on some little- the security of our online ecosystems: fake news and known outlet, and amplified through social media social bots. Contrary to computer vision, the adoption posts, quite often using the so-called social bots. Those of AML in these fields is still in its infancy, as shown in are software algorithms that can perfectly mimic the 48 IEEE Internet Computing March/April 2022
SPOTLIGHT FIGURE 2. Adversarial machine learning lead to a rise of adversarial approaches for the detection of manipulated multimedia, fake news, and social bots. behavior of a genuine account and maliciously gener- Regrettably, algorithms can be fooled. As an example, ate artificial hype.8,9 TextBugger12 is a general attack framework for generat- In recent years, research has intensified efforts to ing adversarial text that can trick sentiment analysis combat both the creation and spread of fake news, as classifiers, such as Zhang et al.’s work,13 into erroneous well as the use of social bots. Currently, the most com- classifications via marginal modifications of the text, mon detection methods for both social bots (as tools for such as adding or removing individual words, or even sin- spreading) and fake news are based on supervised ML gle characters. Moreover, not only can a fake news clas- algorithms. In many cases, these approaches achieve sifier be fooled by tampering with part of the news, but very good performances on considered test cases. also by acting on comments and replies. Figure 1(d) Unfortunately, state-of-the-art detection techniques suf- exemplifies the attack: a detector correctly identifies a fer from attacks that critically degrade the performances real article as indeed real. Unfortunately, by inserting a of the learning algorithms. In a classical adversarial fake comment as part of its inputs, the same detector is game, social bots evolved over time:8 while early bots in misled to predict the article as fake instead. Fooling fake the late 2000s were easily detectable only by looking at news detectors via adversarial comment generation has static account information or simple indicators of activ- been demonstrated feasible by Le et al.14 Leveraging the ity, sophisticated bots are nowadays almost indistin- alteration of social responses, such as comments and guishable from genuine accounts. We can observe the replies, to fool the classifier prediction is advantageous same adversarial game in fake news. Recent studies because the attacker does not have to own the pub- show that it is possible to subtly act on the title, content, lished piece (in order to be able to modify it after the pub- or source of the news, to invert the result of a classifier: lication), and the passage from “written-by-humans” to from true to false news, and vice versa.10 self-generated text is less susceptible to detection by the naked eye. In fact, comments and replies are usually accepted, even if written in an informal style and with ADVERSARIAL FAKE NEWS scarce quality. DETECTION Also, Le et al.’s work14 shows how it is possible to Learning algorithms have been adopted with the aim generate adversarial comments of high quality and rel- of detecting false news by, e.g., using textual features, evance with the original news, even at the level of the such as the title and content of the article. Also, it has whole sentence. been shown that users’ comments and replies can be Recent advances in text generation make even possi- valid features to unveil low or no reputable textual ble to generate coherent paragraphs of text. This is the content.11 case, for example, of GPT-2,15 a language model trained March/April 2022 IEEE Internet Computing 49
SPOTLIGHT on a dataset of 8M web pages. Being trained on a myriad accounts—several scholars became aware of the evolu- of different subjects, GPT-2 leads to the generation of tionary nature of social bots. In fact, while the first social surprisingly high-quality texts, outperforming other lan- bots that inhabited our online ecosystems around 2010 guage models with domain-specific training (like news, were extremely simple and visibly untrustworthy or books). Furthermore, Mosallanezhad et al.16 studied accounts, those that emerged in subsequent years fea- how to preserve a topic in synthetic news generation. tured increased sophistication. This change was the Contrary to GPT-2, which selects the most probable result of the development efforts put in place by bot- word from the vocabulary as the next word to generate, masters and puppeteers for creating automated a reinforcement learning agent tries to select words that accounts capable of evading early-detection techni- optimize the matching of a given topic. ques.21 Comparative studies between the first bots and Achievements in text generation have positive prac- subsequent ones, such as those in Yang et al.’s work,22 tical implications, such as, e.g., translation. Concerns, unveiled the evolutionary nature of social bots and laid however, have arisen because malicious actors can the foundations for adversarial bot detection. Notably, exploit these generators to produce false news automat- bot evolution still goes on, fueled by the latest advances ically. While most of online disinformation today is manu- in powerful computational techniques that allow mim- ally written, as progress continues in natural language icking human behavior better than ever before.23 text generation, the creation of propaganda and realis- Based on these initial findings, since 2011 some ini- tic-looking hoaxes will grow at scale.17 Zellers et al.,18 for tial solutions were proposed for detecting evolving example, presented Grover, a model for controllable text social bots.22 These techniques, however, were still generation, with the aim of defending against fake news. based on traditional approaches to the task of social Given a headline, Grover can generate the rest of the bot detection, such as those based on general-pur- article, and vice versa. Interestingly, investigating the pose, supervised ML algorithms.8 Regarding the meth- level of credibility, articles generated with propagandist odological approach, the novelty of this body of work tone result more credible to human readers, rather than mainly revolved around the identification of those ML articles with the same tone, but written by humans. If, on features that seemed capable of allowing the detec- the one hand, this shows how to exploit text generators tion of the sophisticated bots. The test of time, how- to obtain “reliable fake news,” on the other hand, it is the ever, proved such assumptions wrong. In fact, those double-edged blade that allows the reinforcement of the features that initially seemed capable of identifying model. Quoting from Zellers et al.’s work:18 “the best the sophisticated bots started yielding unsatisfactory defense against Grover turns out to be Grover itself,” as performance soon after their proposal.21 it is able to achieve 92% accuracy in discriminating It was not until 2017 that adversarial social bot between human-written and auto-generated texts. detection really ignited. Since then, several approaches Grover is just one of other news generators that obtain were proposed in rapid succession for testing the noticeable results: a vast majority of the news generated detection capabilities of existing bot detectors, when by Grover and four others can fool human readers, as faced with artfully created adversarial examples. well as a neural network classifier, specifically trained to Among the first adversarial examples of social bots, detect fake news.16 Texts generated by the recent there were accounts that did not exist yet, but whose upgrade of GPT-2, GPT-3, get even more impressive behaviors and characteristics were simulated, as done results in resembling hand-written stories.19 in Cresci et al.’s work.24,25 There, the authors used Finally, Miller et al.20 considered the discrimination genetic algorithms to “optimize” the sequence of between true and fake news very challenging: it is actions of groups of bots so that they could achieve enough, e.g., to change a verb from the positive to the their malicious goals, while being largely misclassified negative form to completely change the meaning of the as legitimate, human-operated accounts. Similarly, sentence. They therefore see the study of the news He et al.26 trained a text-generation deep learning source as a possible way of combating this type of attack. model based on latent user representations (i.e., embeddings) to create adversarial fake posts that would allow malicious users to escape Facebook’s ADVERSARIAL SOCIAL BOT detector TIES.27 Other adversarial social bot examples DETECTION were accounts developed and operated ad-hoc for the The roots of adversarial bot detection date back to 2011 sake of evaluating the detection capabilities of existing and almost coincide with the initial studies on bots bot detectors, as done in Grimme et al.’s work.28 Experi- themselves.8 Between 2011 and 2013—that is, soon after mentation with such examples helped scholars under- the first efforts for detecting automated online stand the weaknesses of existing bot detection 50 IEEE Internet Computing March/April 2022
SPOTLIGHT systems, as a first step for improving them. However, “Dipartimenti di eccellenza 2018-2022” of the Computer the aforementioned early body of work on adversarial Science Department of Sapienza University of Rome. social bot examples still suffered from a major draw- The work of Stefano Tognazzi was supported by the Min- back. All such works adopted ad-hoc solutions for gen- istry of Science, Research and the Arts of the State of erating artificial bots, thus lacking broad applicability. Baden-Wu € rttemberg, and the DFG Centre of Excellence Indeed, some solutions were tailored for testing spe- 2117 “Centre for the Advanced Study of Collective cific detectors,24,25 while others relied on manual inter- Behaviour” (ID: 422037984). ventions, thus lacking scalability and generality.28 With the widespread recognition of AML as an extremely powerful learning paradigm, also came new REFERENCES and state-of-the-art approaches for adversarial social 1. T. Mitchell, Machine Learning. New York, NY, USA: bot detection. A paramount example of this spillover is McGraw-Hill, 1997. the work proposed by Wu et al.29 There, the authors 2. I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining leverage a generative adversarial network (GAN) for and harnessing adversarial examples,” in Proc. 3rd Int. artificially generating a large number of adversarial bot Conf. Learn. Representations, 2015. [Online]. Available: examples with which they trained downstream bot http://arxiv.org/abs/1412.6572 detectors. Results demonstrated that this approach 3. K. Eykholt et al., “Robust physical-world attacks on deep augments the training phase of the bot detector, thus learning visual classification,” in Proc. IEEE Conf. Comput. significantly boosting its detection performance. Simi- Vis. Pattern Recognit., 2018, pp. 1625–1634. larly, a GAN is also used by Zheng et al.30 to generate 4. T. B. Brown, D. Mane, A. Roy, M. Abadi, and J. Gilmer, latent representations of malicious users solely based “Adversarial patch,” in Proc. Neural Inf. Process. Syst. on the representations of benign ones. The representa- Workshops, 2017. [Online]. Available: https://arxiv.org/ tions of the real benign users are leveraged in combina- abs/1712.09665 tion with the artificial representations of the malicious 5. D. M. J. Lazer et al., “The science of fake news,” Science, users to train a discriminator for distinguishing betw- vol. 359, no. 6380, pp. 1094–1096, 2018. een benign and malicious users. 6. T. Quandt, L. Frischlich, S. Boberg, and T. Schatto–Eckrodt, “Fake news,” in The International Encyclopedia of CONCLUSIONS Journalism Studies. Hoboken, NJ, USA: Wiley, 2019, pp. 1–6. The success of a learning system is crucial in many sce- 7. C. Gangware and W. Nemr, Weapons of Mass narios of our life, be it virtual or not: correctly recognizing Distraction: Foreign State-Sponsored Disinformation in a road sign, or discriminating between genuine and fake the Digital Age. Park Advisors, 2019. [Online]. Available: news. Here, we focused on adversarial examples—cre- https://www.park-advisors.com/disinforeport ated to fool a trained model, and on AML—which 8. S. Cresci, “A decade of social bot detection,” Commun. exploits such examples to strengthen the model. ACM, vol. 63, no. 10, pp. 72–83, 2020. Remarkably, adversarial examples, originally in the 9. X. Zhou and R. Zafarani, “A survey of fake news: limelight especially in the field of computer vision, Fundamental theories, detection methods, and now threaten various domains. We concentrated on opportunities,” ACM Comput. Surv., vol. 53, no. 5, the recognition of false news and false accounts and Sep. 2020, Art. no. 109. we highlighted how, despite the antagonistic nature 10. B. D. Horne, J. Nùrregaard, and S. Adali, “Robust fake of the examples, scholars are moving proactively to news detection over time and attack,” ACM Trans. let attack patterns be curative and reinforce the Intell. Syst. Technol., vol. 11, no. 1, 2019, Art. no. 7. learning machines. Outside computer vision, these 11. K. Shu, L. Cui, S. Wang, D. Lee, and H. Liu, “dEFEND: efforts are still few and far between. Improvements Explainable fake news detection,” in Proc. ACM KDD, along this direction are thus much needed, especially 2019, pp. 395–405. in those domains that are naturally polluted by 12. J. Li, S. Ji, T. Du, B. Li, and T. Wang, “Textbugger: Generating adversaries. adversarial text against real-world applications,” in Proc. 26th Annu. Netw. Distrib. Syst. Security Symp., 2019. [Online]. Available: https://www.ndss-symposium.org/ ACKNOWLEDGEMENT ndss-paper/textbugger-generating-adversarial-text- The work of Marinella Petrocchi was supported by against-real-world-applications/ H2020 MEDINA under Grant 952633. The work of Angelo 13. X. Zhang, J. Zhao, and Y. LeCun, “Character-level Spognardi was supported in part by MIUR (Italian Minis- convolutional networks for text classification,” in Proc. try of Education, University, and Research) under Grant Neural Inf. Process. Syst., 2015, pp. 649–657. March/April 2022 IEEE Internet Computing 51
SPOTLIGHT 14. T. Le, S. Wang, and D. Lee, “MALCOM: Generating malicious 27. N. Noorshams, S. Verma, and A. Hofleitner, “TIES: comments to attack neural fake news detection models,” in Temporal interaction embeddings for enhancing social Proc. IEEE Int. Conf. Data Mining, 2020, pp. 282–291. media integrity at Facebook,” in Proc. ACM KDD, 2020, 15. A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and pp. 3128–3135. I. Sutskever, “Language models are unsupervised 28. C. Grimme, M. Preuss, L. Adam, and H. Trautmann, multitask learners,” OpenAI, vol. 1, no. 8, p. 9, 2019. “Social bots: Human-like by means of human control?” 16. A. Mosallanezhad, K. Shu, and H. Liu, “Topic-preserving Big Data, vol. 5, no. 4, pp. 279–293, 2017. synthetic news generation: An adversarial deep 29. B. Wu, L. Liu, Y. Yang, K. Zheng, and X. Wang, “Using reinforcement learning approach,” 2020, improved conditional generative adversarial networks arXiv:2010.16324. [Online]. Available: https://www. to detect social bots on twitter,” IEEE Access, vol. 8, semanticscholar.org/paper/Language-Models-are- pp. 36664–36680, 2020. Unsupervised-Multitask-Learners-Radford-Wu/ 30. P. Zheng, S. Yuan, X. Wu, J. Li, and A. Lu, “One-class 9405cc0d6169988371b2755e573cc28650d14dfe adversarial nets for fraud detection,” in Proc. AAAI 17. G. Da San et al., “A survey on computational Conf. Artif. Intell., 2019, pp. 1286–1293. propaganda detection,” in Proc. Int. Joint Conf. Artif. Intell., 2020, pp. 4826–4832. STEFANO CRESCI is currently a researcher at IIT-CNR, 56124, 18. R. Zellers et al., “Defending against neural fake news,” in Pisa, Italy. His research interests broadly fall at the intersection Neural Inf. Process. Syst., 2019, pp. 9054–9065. of web science and data science, with a focus on information 19. T. B. Brown et al., “Language models are few-shot learners,” disorder and online misbehavior, social media analysis, and cri- in Proc. Neural Inf. Process. Syst., 2020, pp. 1877–1901. sis informatics. Stefano received his Ph.D. degree in informa- 20. D. J. Miller, Z. Xiang, and G. Kesidis, “Adversarial learning targeting deep neural network classification: A tion engineering from the University of Pisa, Pisa, Italy. He is a comprehensive review of defenses against attacks,” Member of IEEE and ACM. Contact him at s.cresci@iit.cnr.it. Proc. IEEE, vol. 108, no. 3, pp. 402–433, Mar. 2020. 21. S. Cresci, R. Di Pietro, M. Petrocchi, A. Spognardi, and MARINELLA PETROCCHI is currently a senior researcher at M. Tesconi, “The paradigm-shift of social spambots: IIT-CNR, 56124, Pisa, Italy, and a guest scholar at IMT Scuola Evidence, theories, and tools for the arms race,” in Proc. Alti Studi Lucca, 55100, Lucca, Italy. Her research focuses on ACM WWW, 2017, pp. 963–972. detection of fake news. She is the Work Package (WP) leader 22. C. Yang, R. Harkreader, and G. Gu, “Empirical evaluation of H2020 MEDINA and one of the principal investigators of the and new design for fighting evolving Twitter Integrated Activity Project TOFFEe (TOols for Fighting FakEs). spammers,” IEEE Trans. Inf. Forensics Secur., vol. 8, no. 8, pp. 1280–1293, Aug. 2013. Contact her at m.petrocchi@iit.cnr.it. 23. D. Boneh, A. J. Grotto, P. McDaniel, and N. Papernot, “How relevant is the Turing test in the age of ANGELO SPOGNARDI is currently an associate professor with sophisbots?” IEEE Secur. Privacy, vol. 17, no. 6, di the Computer Science Department, Sapienza Universita pp. 64–71, Nov./Dec. 2019. Roma, 00161, Rome, Italy. His main research interests include 24. S. Cresci, M. Petrocchi, A. Spognardi, and S. Tognazzi, “On the capability of evolved spambots to evade social networks modeling and analysis, and network protocol detection via genetic engineering,” Online Social Netw. security and privacy. Contact him at spognardi@di.uniroma1.it. Media, vol. 9, pp. 1–16, 2019. 25. S. Cresci, M. Petrocchi, A. Spognardi, and S. Tognazzi, STEFANO TOGNAZZI is currently a postdoc at the Centre for the “Better safe than sorry: An adversarial approach to improve Advanced Study of Collective Behaviour, University of Konstanz, social bot detection,” in Proc. ACM WebSci, 2019, pp. 47–56. 78464, Konstanz, Germany, working on formal modeling of collec- 26. B. He, M. Ahamad, and S. Kumar, “PETGEN: Personalized text generation attack on deep sequence tive behavior. Tognazzi received his Ph.D. degree in computer sci- embedding-based classification models,” in Proc. ACM ence and system engineering from the IMT School for Advanced KDD, 2021, pp. 575–584. Studies Lucca. Contact him at stefano.tognazzi@uni-konstanz.de. 52 IEEE Internet Computing March/April 2022
You can also read