COMPRESSIVE ANALYSIS AND THE FUTURE OF PRIVACY
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
C OMPRESSIVE ANALYSIS AND THE FUTURE OF PRIVACY A P REPRINT Suyash Shandilya Defence Research and Development Organisation New Delhi - 110054 arXiv:2006.03835v1 [cs.CY] 6 Jun 2020 suyash@ece.ism.ac.in June 9, 2020 A BSTRACT Compressive analysis is the name given to the family of techniques that map raw data to their smaller representation. Largely, this includes data compression, data encoding, data encryption, and hashing. In this paper, we analyse the prospects of such technologies in realising customisable individual privacy. We enlist the dire need to establish privacy preserving frameworks and policies and how can individuals achieve a trade-off between the comfort of an intuitive digital service ensemble and their privacy. We examine the current technologies being implemented, and suggest the crucial advantages of compressive analysis. Keywords Privacy · Data crumbling · compressive analysis · hashing · encryption · security 1 Introduction As more and more technologies grow around models that require large amount of minute data, at some point we will need to step back and evaluate its repercussions on individual privacy. All of us are secure or none of us are - Edward Snowden. Even if a sufficiently large subset of the complex mesh of our social data is revealed, it compromises the privacy of even those who were never quite a part of it. We need to start developing frameworks of a system where there are limits to what an analytic software can and cannot know. And each individual should have a throttle over their own data. Simply ensuring secure and trustworthy systems won’t solve the impending problem at large. In the following sections we re-evaluate the exigency of privacy preserving protocols. Key areas which are either booming or on the verge of a technological revolution, and also need special privacy concerns are then identified. Following that, current solutions to the issue are discussed. Towards the end we state the significance of compressive analysis as a holistic solution to the privacy problem. 2 Pivotal trends and their need for privacy Comfort is the compass for consumer technology. This includes the convenience in instructing the system (parsing natural language), easier interface, bundled service suits, intuitive suggestions, personalised automations and sugges- tions, ad infinitum. Additionally, comfort can also be defined as a peaceful state of mind. A state where a person is not agitated by prying advertisements, or obsequious suggestions. Where they can enjoy the repose of a private space, knowing well enough that their personal information is secure and entirely under their control - the leisure of privacy. Here we consider 4 major fields which will (continue to) grow at an accelerated pace in the coming times. 2.1 Era of Data Analytics There is no denial to the advent of the data analytics industry over the past couple of decades, and this trend is far from slowing down. It helps in understanding user preferences, their perceptions, etc. More specifically, it’s about quantitatively defining a user, whether it’s an individual or a firm. Knowing their daily schedule, monitoring the
A PREPRINT - J UNE 9, 2020 quantity and nature of their daily subscription services like mail, contacts etc. In the personal domain, it even includes logging health data, finances, IoT device logs, etc. Typically, a user can benefit a lot from this data. They can improve their spending, health, work efficiency, and what not. The technology is now mature enough to predict your future ’footprints’ in terms of decision and resource consumption, to a high degree of accuracy. Users can use these data to make more informed, statistically wiser decisions. But this wisdom comes as a compromise of privacy. The user has to allow the services to access their raw data to benefit from the analysis. The service may assure non-distribution of access to other parties but that limits 2.2 Human-Computer Interaction Human Computer Interaction (HCI) have always been the face of futuristic technologies. They cover all domains and aspects of how a (human) user interacts with a computer system. It includes common input devices like mouse,keyboard, to more advanced modules like motion capture, Brain Computer Interface (BCI) etc. They can be completely non-physical like an interactive AI-assistant. They all intend to make your interaction easier with the computer, and in the process learn fine intricacies of your commands - your voice, your way of speech, your physical gestures, your facial expressions, etc. Their efficacy as an interface hinges on the data they retain about you, and their ability to extract relevant features. Given the fact that these human actions undergo subtle changes with time (and across users in a multi-user scenario), it is difficult to request it to stop logging data without compromising efficiency. This makes HCI evolution particularly poised in privacy concerns. Huge logs of raw BCI data mean that the computer would know more objectively, the thought-action sequence of a user than the user themselves. Unfortunately there has been very limited research concerning privacy preserving HCI methods. 2.3 Emotional Intelligence As on date, we have an outlandish amount of ever increasing computation power. Any feasible problem which can have a procedural solution can be solved. For most of our common problems, the IQ of our current systems, is pretty satisfactory. Construing human emotions however, has been a pretty elusive task. To break into the rooms of every bloke in the world, our systems will need to connect with us at a less logical and more emotional level. The primary bottleneck has been rigorous definitions of emotional boundaries. The scientific understanding, detection and classification of ’emotion’ or ’emotional response’ has largely been based on physiological signals like ECG [1], EEG [2], breathing rate [3], etc. Typically one couldn’t access these signals in a mundane scenario without bundling the subject with multiple probes. However, it is demonstrated in [3] that such inferences are possible wirelessly, and provide remarkable accuracy. Wireless sensing [4] has enabled a completely unintrusive sensing scheme which also leads to less errors as the subjects don’t feel the physical agitation and mental sense of being under observation. While beneficial from a health perspective, such technology is a serious threat to privacy. It can be very conveniently deployed to read any unaware individual’s precise breathing rate and pulse rate counts. Although less objective than physiological signals, Humans normally don’t construe emotions by reading each other’s breathing rate or heartbeats. We simply rely on our empirical understanding of facial expressions. But implementing such a system in public domain would be a paragon of privacy violation. Nevertheless emotional intelligence is an imperative course of future. The society still has a long way to go towards mainstreaming mental health issues, but things are on a favourable course. There are many projects aimed at examining a subject’s emotions and providing tailored therapeutic treatment. [5] addresses the privacy problem using a multicomputing model. [6] makes an attempt at reading emotions via a system of individual participation. Other solutions of the problem are discussed in section 4.2 2.4 Internet of Things The Internet of Things (IoT) is another major technology that has already started manifesting in our lives. It consists of all the hardware and software in an individual’s ambience collaborating towards an optimised routine. Despite being industrialised at a fast pace, it still remains as a popular spot for not just privacy violations, but security violations as well [7]. The shear number of these primitive yet garrulous IoT devices make security guarantees difficult to enforce. Nevertheless, there have been remarkable developments in the field of IoT security and privacy [8]. Notwithstanding the many roadblocks in achieving privacy preservation, trust in a secure system can ensure that the data being collected by the system is not being shared unsolicitedly with third parties for personal profit. In fact, there are good reasons to secure centralised data collection like healthcare, policy planning. Additionally, most of the ’intelligent’ system can not afford to run locally (i.e. as completely standalone, non-communicating machines). The manufacturers would require data from vast sources to develop more robustly responsive systems. It is seemingly impossible to develop better systems without regular due feedback from our existing ones. Having said that, centralised data collection by any organisation - no matter how secure or trustworthy - must be impeded. There are many fictional 2
A PREPRINT - J UNE 9, 2020 dystopian descriptions of societies where certain organisations controlled micromovements of their subjects by the virtue of large scale data access. Although exaggerated and dramaitsed, these scenarios provide reference marks for policy makers and technology designers to consider while developing modern systems. 3 State of the art solutions Security is the parent condition for privacy. There can be no question about privacy if there is no security. Before discussing the current solutions, one should be acquainted with the idea of differential privacy. It is a pri- vacy guarantee which requires that computations be formally indistinguishable when run with and without any one record,almost as if each participant had opted out of the data set [9]. They discuss its implementation by adding a calculated amount of random noise, either to the input data, or to the inference accessed from the database [10]. The idea of homomorphic encryption is to "preserve the shape" of the plain text in the cipher text. This means that the ciphertext can be operated (with limited operators like addition and multiplication [11]) to form inferences, with- out the need for decryption. This property makes it perfect for privacy preserving applications and has rightly found many applications in e-voting [12],biometric-medical data processing [5, 13], recommender systems [14, 15] and in multicomputation systems.[16, 6]. Interestingly, it is a lattice-based cryptography technique [17] which is among the forerunners for post quantum cryptography algorithms [18]. Thus along with privacy preservation, it is well suited as an advanced security solution as well. A similar approach to securing data is conceptualised in cryptDB [19]. In it, the authors have devised a system where along with the database, the requests and results are encrypted as well. Decentralisation is about as globally accepted as a solution, as the privacy preservation problem itself. By far the most popular implementation of decentralisation is cryptocurrency. What had been centralised for centuries, was con- ceived and implemented as a decentralised, abstract entity. This has often been purported as such an extreme solution to privacy that it disturbed the authorities as another serious problem of lack of control. Despite being notoriously data mongering of all modern deep learning models, neural networks are still a rage in the industry. Their incomparable dexterity in establishing landmarks in the field of machine learning has assured that their trend won’t recede anytime soon. Nevertheless, their large appetite for data raises serious privacy concerns. Split learning [20] and Federated learning [21] have emerged as the most popular solution to privacy preserving learning. They both achieve this by implementing standard learning in decentralised models. [22] provides an excellent comparison between the two mod- els of learning. Despite all the technical support the industry might extend, privacy remains fundamentally a policy level problem. Digital security solutions are not an immediate concern in our time as the current systems have been working satisfactorily well and any breach can be quickly located and patched. But if a secure system is asked to share the information it houses - through proper channel and authentication - it will comply. It is up to the service agreement between the individual and its service provider to christen what data will be shared with whom and when. Unfortunately, due to lack of awareness, people don’t heed the service agreement much and simply consent to what is being offered. Often there aren’t many choices provided by the service provider. And even for a genuinely aware user, it is not easy to parse the legalese of each agreement they sign with each of their service provider. Projects like the Usable Privacy project [23] help the users to parse their agreements more conveniently but stronger policy level enforcements will still be required if the problem is to be addressed globally. 4 Compressive analysis Compressive analysis is a way to draw relevant inferences from a compressed data, without decompressing it. The compressed data is also encrypted in some way. Typically, neither decryption nor decompression of any kind is expected, or even made possible. This means that a lot of data is already lost in the process of storing or acquisition (more on this later). Ideally, one can expect that only the parts of data that are required for drawing relevant inferences are retained in the compression. This can be achieved by training the compression algorithm on large amounts of dummy data which contains the raw information along with the sensitive labels. We can then draw simultaneous analyses and private information out of the compressed data. They will work as adversial networks where the analyser function will try to minimise the number of components required, while the devious function will attempt at extracting private information from the set components. Additionally, it gives a throttle to the user to determine their own degree of privacy against different analyser softwares. It is also particularly useful where expurgation can not be a privacy solution. The challenge will be to guarantee differential privacy against auxillary information which is available to the attacker via ulterior sources [9]. 3
A PREPRINT - J UNE 9, 2020 4.1 Hash analysis Hashing is a tool which takes in a large chunk of data and returns a small, finite bit long ’digest’ of the data. The function is one-sided thus it is impossible to reconstruct the original data from the hash. It is designed to be incoherent enough that different data have different hash and hash collisions are extremely rare. The most commonly known hashes like SHA-256, SHA3 family of hashes, are all cryptographic hashes. They work on avalanche effect which means, any bit changed results in large change in the digest of the data. Perceptual hashes [24] however are commonly used in industry to classify similar (same but transformed in some way) images together. They are used to test duplicacy of copyright images, and often for fast image searches as well. These perceptual hashes can also be used for specific feature detection when the original data source does not want its data to be seen. However this method requires that the data is read during the hashing process. Following is a solution which ensures that the data is never stored digitally starting from the acquisition stage. 4.2 Compressive Aquisition Compressive Sensing is a relatively young technology which posits that accurate reconstrution (or erroneous recon- struction where the error is limited by the level of noise in sampling) is possible via random linear compressed samples [25]. Interested readers may refer to a well summarised yet basic introduction [26]. This reconstitutes compression as an embedded acquisition step instead of an impending post-processing. It has found some applications in the privacy preservation domain as well. One of the earliest works I encountered was [27] where the authors use compressed sensing as a matrix masking [28] technique and prove theoretically that regression analysis can be deployed on a transformed dataset as effectively as in its raw form. In a sense, it is similar to homomorphic transformation. Related works can be found in [29, 30]. The theoretical foundations of compressive sensing were first laid out by Donoho [31], Candes and Tao [32], but the most popular practical demonstration was shown by Duarte, et al. in [33] - A single pixel camera which acquired few random compressed samples to reconstruct images. In [34], the concept of smashed filters is introduced which shows that ’target classification’ can be achieved using a maximum likelihood estimator on the compressed samples themselves. These ideas and subsequent demonstration give rise to the pertinent concept of compressive acquisition. If the process of acquisition itself can be designed to be random and lossy, while allowing for reasonable reconstruction nevertheless, much of the sensitive data will never even be digitised, let alone thieved. The first privacy oriented demonstration of this concept was on print error detection in [35]. It is shown that even when the degree of compression is less than 1%, very high accuracy classification can be achieved. Further, even if the decryption key (sensing matrix, in this case) is known and is used to reconstruct the image via the best possible algorithm, the result is practically unusable. Such techniques can in general be deployed in various systems. For example, a set of monte carlo linear measurements (apropos to compressive sensing) from a wireless sensor network can be used instead of continuous values to ensure privacy preservation. Thus, among all the possible solutions discussed in this section, compressive acquisition would be the most promising privacy preserving approach as it eliminates the existence of private information in a dataset altogether. 5 Conclusion We began with acknowldedging the dire need of privacy in current times. We noted four major fields that are going to grow at an accelerated pace and will have serious privacy concerns namely: Data analytics, Human Computer Interaction, Emotion recognition and Internet of Things. We then looked at some of the common approaches that are currently being deployed to combat the privacy problem and noted that most of them are centred around security and decentralisation. We then moved to indtroduce compressive analysis which is a family of all the different kinds of approach which attempt to remove all the privacy sensitive infomration from a dataset, before storing it for analytics. Among it were hashes, and my novel idea of compressive acquisition which is based on the principle of compressive sensing and it aims to acquire a data in a randomised, linearly compressed manner so that no meaningful reconstruction is possible while guaranteeing satisfactory analytical result. We concluded that compressive acquisition, whenever applicable, is the most effective solution of assuring privacy. References [1] F. Agrafioti, D. Hatzinakos, and A. K. Anderson. Ecg pattern analysis for emotion detection. IEEE Transactions on Affective Computing, 3(1):102–115, Jan 2012. 4
A PREPRINT - J UNE 9, 2020 [2] Panagiotis C Petrantonakis and Leontios J Hadjileontiadis. Emotion recognition from eeg using higher order crossings. IEEE Transactions on Information Technology in Biomedicine, 14(2):186–197, 2009. [3] Mingmin Zhao, Fadel Adib, and Dina Katabi. Emotion recognition using wireless signals. In Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking, pages 95–108. ACM, 2016. [4] Axel Scherer, Muhammad Mujeeb-U-Rahman, Meisam Honavar Nazari, and Muhammad Musab Jilani. Mini- mally invasive wireless sensing devices and methods, January 8 2019. US Patent 10,172,520. [5] Zekeriya Erkin, Martin Franz, Jorge Guajardo, Stefan Katzenbeisser, Inald Lagendijk, and Tomas Toft. Privacy- preserving face recognition. In Ian Goldberg and Mikhail J. Atallah, editors, Privacy Enhancing Technologies, pages 235–253, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg. [6] Zeki Erkin, Jie Li, Arnold POS Vermeeren, and Huib de Ridder. Privacy-preserving emotion detection for crowd management. In International Conference on Active Media Technology, pages 359–370. Springer, 2014. [7] Manos Antonakakis, Tim April, Michael Bailey, Matt Bernhard, Elie Bursztein, Jaime Cochran, Zakir Durumeric, J Alex Halderman, Luca Invernizzi, Michalis Kallitsis, et al. Understanding the mirai botnet. In 26th {USENIX} Security Symposium ({USENIX} Security 17), pages 1093–1110, 2017. [8] Kevin Kiningham, Mark Horowitz, Philip Levis, and Dan Boneh. Cesel: Securing a mote for 20 years. In Proceedings of the 2016 International Conference on Embedded Wireless Systems and Networks, EWSN ’16, pages 307–312, USA, 2016. Junction Publishing. [9] Dwork Cynthia. Differential privacy. Automata, languages and programming, pages 1–12, 2006. [10] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Shai Halevi and Tal Rabin, editors, Theory of Cryptography, pages 265–284, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg. [11] Craig Gentry. A fully homomorphic encryption scheme. PhD thesis, Stanford University, 2009. crypto.stanford.edu/craig. [12] Martin Hirt. Receipt-free k-out-of-l voting based on elgamal encryption. In Towards Trustworthy Elections, pages 64–82. Springer, 2010. [13] Mauro Barni, Pierluigi Failla, Riccardo Lazzeretti, Ahmad-Reza Sadeghi, and Thomas Schneider. Privacy- preserving ecg classification with branching programs and neural networks. IEEE Transactions on Information Forensics and Security, 6(2):452–468, 2011. [14] Z. Erkin, T. Veugen, T. Toft, and R. L. Lagendijk. Generating private recommendations efficiently using homo- morphic encryption and data packing. IEEE Transactions on Information Forensics and Security, 7(3):1053– 1066, June 2012. [15] Arjan J. P. Jeckmans, Michael Beye, Zekeriya Erkin, Pieter Hartel, Reginald L. Lagendijk, and Qiang Tang. Privacy in Recommender Systems, pages 263–281. Springer London, London, 2013. [16] R. L. Lagendijk, Z. Erkin, and M. Barni. Encrypted signal processing for privacy protection: Conveying the utility of homomorphic encryption and multiparty computation. IEEE Signal Processing Magazine, 30(1):82– 105, Jan 2013. [17] Oded Regev. Lattice-based cryptography. In Annual International Cryptology Conference, pages 131–141. Springer, 2006. [18] Daniel J Bernstein. Introduction to post-quantum cryptography. In Post-quantum cryptography, pages 1–14. Springer, 2009. [19] Raluca Ada Popa, Catherine Redfield, Nickolai Zeldovich, and Hari Balakrishnan. Cryptdb: protecting confi- dentiality with encrypted query processing. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pages 85–100. ACM, 2011. [20] Praneeth Vepakomma, Otkrist Gupta, Tristan Swedish, and Ramesh Raskar. Split learning for health: Distributed deep learning without sharing raw patient data. arXiv preprint arXiv:1812.00564, 2018. [21] Jakub Konečnỳ, H Brendan McMahan, Felix X Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492, 2016. [22] Abhishek Singh, Praneeth Vepakomma, Otkrist Gupta, and Ramesh Raskar. Detailed comparison of communi- cation efficiency of split learning and federated learning, 2019. [23] Abhilasha Ravichander, Alan W Black, Shomir Wilson, Thomas Norton, and Norman Sadeh. Question answer- ing for privacy policies: Combining computational and legal perspectives. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural 5
A PREPRINT - J UNE 9, 2020 Language Processing (EMNLP-IJCNLP), pages 4946–4957, Hong Kong, China, November 2019. Association for Computational Linguistics. [24] Xia-mu Niu and Yu-hua Jiao. An overview of perceptual hashing. Acta Electronica Sinica, 36(7):1405–1411, 2008. [25] Richard G Baraniuk. Compressive sensing. IEEE signal processing magazine, 24(4), 2007. [26] E. J. Candes and M. B. Wakin. An introduction to compressive sampling. IEEE Signal Processing Magazine, 25(2):21–30, March 2008. [27] Shuheng Zhou, John Lafferty, and Larry Wasserman. Compressed and privacy-sensitive sparse regression. IEEE Transactions on Information Theory, 55(2):846–866, 2009. [28] George T Duncan, Robert W Pearson, et al. Enhancing access to microdata while protecting confidentiality: Prospects for the future. Statistical Science, 6(3):219–232, 1991. [29] C. Wang, B. Zhang, K. Ren, and J. M. Roveda. Privacy-assured outsourcing of image reconstruction service in cloud. IEEE Transactions on Emerging Topics in Computing, 1(1):166–177, June 2013. [30] Cai Chen, Manyuan Zhang, Huanzhi Zhang, Zhenyun Huang, and Yong Li. Privacy-preserving sensory data recovery. In 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Com- munications/12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), pages 1646–1650. IEEE, 2018. [31] D. L. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4):1289–1306, April 2006. [32] E. J. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory, 52(2):489–509, Feb 2006. [33] Marco F Duarte, Mark A Davenport, Dharmpal Takhar, Jason N Laska, Ting Sun, Kevin F Kelly, and Richard G Baraniuk. Single-pixel imaging via compressive sampling. IEEE signal processing magazine, 25(2):83–91, 2008. [34] Mark A. Davenport, Marco Duarte, Michael B. Wakin, Jason N. Laska, Dharmpal Takhar, Kevin Kelly, and Richard Baraniuk. The smashed filter for compressive classification and target recognition - art. no. 64980h. Proceedings of SPIE, 6498, 02 2007. [35] Suyash Shandilya. Minimizing acquisition maximizing inference - a demonstration on print error detection. Springer, 2019. 6
You can also read