Twitter Sentiment Analysis of the Indian Union Budget 2020
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
International Journal of Advanced Science and Technology Vol. 29, No. 4s, (2020), pp. 2282 - 2288 Twitter Sentiment Analysis of the Indian Union Budget 2020 Rupinder Kaur1, Rajvir Kaur2, Manpreet Singh3, Dr. Sandeep Ranjan4* 1,2,3 M.Tech Scholar, GNA University, Phagwara (India) 4 Assistant Professor, GNA University, Phagwara (India) Abstract The presented study conducts a real-time sentiment analysis of the public reaction towards the announcement of the Indian Union Budget 2020. On social media platforms, the general public vents their opinions regarding popular issues. A total of 6000 tweets were mined to gauge the quick response of the public sentiment about the union budget which was presented in the Indian Parliament on February 1, 2020, at 11 am. The instantaneous reaction of the general public in the form of tweets was mined during the budget announcement hours. A sentiment analysis based model was used to analyze the dataset. Subjectivity and polarity values obtained for the dataset were used to calculate the sentiment for individual tweets. The experiment produced an overall positive score of +149.3387 based on the Twitter user’s collective opinions. Keywords: Instant reaction, Sentiment Analysis, TextBlob, Twitter, Union budget. I. INTRODUCTION Events like general elections, movie releases, annual budget announcement, and sports tournaments attract the attention of the general public. With the advancement in Internet and telecommunication technologies, social media has emerged as a popular platform for conversations [1]. Twitter is one such platform where users express their opinions about topics and issues of concern. India is the second-most populous country in the world with a population of about 1375 million [2] and is the 8th leading country based on the number of active Twitter users. The number of active Twitter users in India is approximately 34 million [3]. Twitter can be exploited to analyze the instant reaction of the general public about important events. The present experiment studies a dataset of tweets related to the Indian Union Budget 2020 presented by the Honorable finance minister of India, Ms. Nirmala Sitharaman on 1st February 2020, at 11 AM IST. The general public is very curious about the budget as it strongly affects every aspect of life including the financial sector, health, education, and travel, etc. This paper contains instantaneous reactions of the public towards the budget as expressed on Twitter. In this experiment, a total of 6000 tweets posted during the announcement hours of the union budget were mined from Twitter. For this purpose, the Twitter Application Program Interface (API) was used for fetching tweets using hashtag “Budget2020”. To remove noise and other anomalies from data, text processing was performed which includes removing non-relevant words and unwanted blank spaces. Thereafter, sentiment analysis was done on the classification of these tweets. Sentiment analysis techniques enable us to make sense of data present in social media for understanding social or political events, movie releasing or product marketing and to make more informed decisions [4]. To illustrate the sentiment analysis, two parameters are used; subjectivity and polarity [5]. As a result of which it can be concluded that whether the response is positive, neutral or negative. The summation of the sentiment analysis score for the entire dataset represents the collective opinion of Twitter users about the union budget. II. LITERATURE REVIEW In the past few years, a lot of work has been done on sentiment analysis. Sentiment analysis of tweets related to the Indian Union Budget 2016 was performed [6]. Lexicon-based sentiment analysis along with a dictionary of positive and negative words was used. The researchers worked on the co-occurrence of words in the dataset. Taxation related terms had the highest frequency which reflects the opinions of the general public. It was concluded that youth and the rural population had a little contribution to the conversations on Twitter. Public opinion on large hydro projects was studied using sentiment analysis [7]. A lexicon-based sentiment analysis model was proposed by the researchers. To show the efficiency of the model, the Three Gorges Project (a ISSN: 2005-4238 IJAST 2282 Copyright ⓒ 2020 SERSC
International Journal of Advanced Science and Technology Vol. 29, No. 4s, (2020), pp. 2282 - 2288 large hydro project) was used as a case study that assisted leaders in decision making related to the project. With the help of the model, users' opinions were mined from social network sites and the sentiment analysis showed that about 50% of the responses contained negative sentiment towards the project and the remaining messages were either neutral or positive. The public response to the demonetization policy of the Indian Government to ban INR 500 and 1000 currency notes with effect from 8th Nov 2016 was studied by analyzing tweets [8]. The experiment concluded that people from 21 out of 30 states and union territories of India considered in the study showed support to this decision of the union government. Social media user feedback can be used by governments in public policymaking [9]. The feedback of the general public was seen against the government public policy using sentiment analysis. The research studied Indonesian language tweets about public policy criticism. The European Union (EU) Cohesion Policy in the mass media was analyzed by applying Computational Text Analysis using three media sources: user-generated content, news media and social media [10]. Sentiment Analysis of news media revealed that international media publishes a higher negative sentiment compared to the EU media. User-generated comments from the UK showed a much higher bias towards negative sentiment while in Spain it is mainly neutral or positive. Sentiment analysis of social media concluded that the majority of tweets and posts mostly were expressed as objective statements and are largely neutral. Sentiment analysis was performed to study international negotiations (Brexit negotiations) to contribute towards government decision making [11],[12]. The results of the studies provided real-time input for sensitive policymaking. The European Union constituent country’s policies about the common bonds triggered market sentiments. These sentiments were analyzed in the context of the Brexit referendum of 2016. Government-citizen interactions on important issues were studied for 5 Latin American countries [13]. The results were validated for work, health, environment, education and social development sectors. A social opinion gold standard for the Malta Government Budget 2018 has been formulated by researchers [14]. Data from various social media platforms like Facebook and Twitter; newswires like MaltaToday, Times of Malta was fetched. Malta has about 90% population on social media platforms and 80% of the population reads online news. The research claims to be significant for the government in building intelligent tools for the economy. III. METHODOLOGY Figure 1. Research model flow Chart ISSN: 2005-4238 IJAST 2283 Copyright ⓒ 2020 SERSC
International Journal of Advanced Science and Technology Vol. 29, No. 4s, (2020), pp. 2282 - 2288 Figure 1 shows the flow chart of the research model implemented in this study. The initial steps shown in the flow chart include fetching tweets from Twitter using Tweepy, the Twitter API. These steps form part of the data mining process. Often tweets contain uniform resource locators (URLs), emojis, hashtags (#) and spelling mistakes. The text preprocessing step removes noise from the dataset and rectifies spelling and punctuation errors. Sentiment analysis is performed on the processed dataset to calculate the subjectivity and polarity of individual tweets. The weight of individual tweets is calculated using the formulae of the model and summation of the weights of all the tweets gives a final score of the dataset. This score helps in ascertaining whether the instant reactions of the general public were for the budget announcements or against them. A. Data Collection Twitter API Tweepy was used to retrieve the Indian Union budget 2020 related tweets. A total of 6000 tweets were fetched for the hashtag “Budget2020” starting at 11:00 AM IST 1st February 2020, the day the union budget was presented by the Honorable Finance Minister of India, Ms. Nirmala Sitharaman in the Indian Parliament. The tweets were exported to a Comma Separated Values (csv) file. Figure 2 shows a snapshot of the mined tweets. Figure 2. Mined Tweets B. Text Preprocessing Text pre-processing is the stage, where the raw data containing noise and impurities is transformed into a refined form that a classifier can read [15]. The tweets downloaded from Twitter generally result in the noisy and obscure dataset because of the casual nature of people’s usage of social media. This dataset contains special characters such as hashtags, user mentions, Uniform Resource Locators (URLs), etc which need to be removed before doing any analysis. A number of preprocessing steps are applied for cleaning noise and other anomalies from data. Some basic steps such as removing numbers, punctuations, stopwords such as “of”, “in”, “and”, “the”, etc are applied to standardize the data. It includes removing unwanted symbols such as (“@”, “|”, “#”, “/”), etc. and replace them with blank space. URLs are not significant for sentiment analysis models as they lead to inadequate features and inaccurate classification [16]. URLs in tweets are replaced with white space. Figure 3 shows a snapshot of tweets after preprocessing extracted from the preprocessed column of the csv file. Finally, 4732 tweets were obtained after the preprocessing phase. ISSN: 2005-4238 IJAST 2284 Copyright ⓒ 2020 SERSC
International Journal of Advanced Science and Technology Vol. 29, No. 4s, (2020), pp. 2282 - 2288 Figure 3. Preprocessed Tweets C. Sentiment Analysis Sentiment analysis provides valuable insights by detecting emotions or opinions from a large volume of data [17], [18]. Sentiment analysis can be treated as a branch of computational linguistics, machine learning, natural language processing, data mining and borrows elements from psychology and sociology [19],[20]. In the presented study, polarity and subjectivity have been computed for each tweet of the dataset using the Python TextBlob library. Polarity ranges from -1 to +1 and subjectivity ranges from 0 to 1. Subjectivity value close to 0 indicates a highly objective phrase while the value close to 1 indicates a highly subjective phrase. Polarity value -1 indicates a negative statement, 0 indicates neutral and +1 indicates a positive statement. The sentiment is calculated as the product of subjectivity and polarity for individual tweets as shown in table 1. Table 1. Sentiment computation Polarity Subjectivity Sentiment Cleaned tweet (p) (s) (p*s) Today Budget India going very important economy moves according to govt already done -0.525 1 -0.525 Insurance Bank Deposits raised from lakh rupees 0.6786 0 0 Here India gets money spends 0.7 0 0 allocation approved budget Mission expected provide piped water 0.5 0.4 0.2 budget speech political rally speech 0.45 0.1 0.045 Airports 0 0 0 Setting large solar panel capacity alongside railway tracks land owned railways 0 0.4286 0 ISSN: 2005-4238 IJAST 2285 Copyright ⓒ 2020 SERSC
International Journal of Advanced Science and Technology Vol. 29, No. 4s, (2020), pp. 2282 - 2288 Words having a high occurrence frequency depict their significance in the topic of discussion. Table 2 shows the most common words in the dataset with their frequency of appearance. Table 2 Word Frequency S.No. Words Frequency 1 Budget 2606 2 Speech 536 3 Finance 423 4 Government 377 5 Minister 375 6 Jobs 332 7 India 317 8 Youth 311 9 Income 277 10 Union 180 Wordcloud representation is an effective visualization technique to understand the dataset in view of the most popular words. Figure 4 shows the word cloud of high-frequency words in the dataset. Figure 4. Word Cloud ISSN: 2005-4238 IJAST 2286 Copyright ⓒ 2020 SERSC
International Journal of Advanced Science and Technology Vol. 29, No. 4s, (2020), pp. 2282 - 2288 IV. RESULTS A sentiment analysis calculation was done for each of the 4732 tweets. Considering only the polarity values, the number of negative tweets was 506, positive was 1690 and neutral was 2537. The mean subjectivity of the dataset was 0.28 with 2277 tweets having 0 subjectivity and 3344 tweets had subjectivity less than 0.5. The final sentiment score obtained has been calculated as the summation of sentiment scores of all the tweets. Final sentiment score of the dataset = +149.3387 V. CONCLUSIONS The study focused on the sentiment analysis of the instant reaction of Twitter users to the Union Budget of India for the year 2020. A total of 6000 tweets were mined for the hashtag “Budget2020”. After preprocessing, a total of 4732 tweets are analyzed for sentiment analysis. The final sentiment score of the dataset is +149.3387 which implies that the overall sentiment of the general public towards the Indian Union budget of 2020 is positive. The results of the study are limited due to a large number of neutral sentiment tweets that can be improved by training the model with large and labeled datasets and building a custom dictionary of words for better sentiment analysis. REFERENCES [1] S. Ranjan and S. Sood, Exploring Twitter for Large Data Analysis, International Journal of Advanced Research in Computer Science and Software Engineering, 6(7), 2016, 325–330. [2] India Population (2020) - Worldometer. [Online]. Available: https://www.worldometers.info/world- population/india-population/. [Accessed: 18-Feb-2020]. [3] Twiter users in India 2019 | Statista. [Online]. Available: https://www.statista.com/statistics/381832/twitter- users-india/. [Accessed: 18-Feb-2020]. [4] C. A. Iglesias and A. Moreno, Sentiment analysis for social media, Applied Sciences (Switzerland), 9(23), 2019, 1-4. [5] S. Ranjan and S. Sood, Investor community sentiment analysis for predicting stock price trends, International Journal of Management, Technology and Engineering, 9(5), 2019, 6012-6020. [6] M. Shakeel and V. Karwal, Lexicon-based sentiment analysis of Indian Union Budget 2016-17, Proc. 2016 International Conference on Signal Processing and Communication, 2016, 299-302. [7] H. Jiang, P. Lin, and M. Qiang, Public-opinion sentiment analysis for large hydro projects, Journal of Construction Engineering and Management, 142(2), 2016, 1-12. [8] P. Singh, R. S. Sawhney, and K. S. Kahlon, Sentiment analysis of demonetization of 500 & 1000 rupee banknotes by Indian government, ICT Express, 4(3), 2018, 124-129. [9] Y. Watequlis Syaifudin and D. Puspitasari, Twitter Data Mining for Sentiment Analysis on Peoples Feedback Against Government Public Policy, MATTER: International Journal of Science and Technology, 3(1), 2017, 110-122. [10] J. M. Carrascosa, C. Mendez, and V. Triga, EU Cohesion policy in the media : A computational text analysis of online news, user comments and social media, Cyprus University of Technology, University of Strathclyde, European Policies Research Centre. [online]. Available from: http://www.cohesify.eu/wpcontent/uploads/2018/04/CTA_Report_ReseasrchPaper12_CUTfinal.pdf 693127, 2020. [11] E. Georgiadou, S. Angelopoulos, and H. Drake, Big data analytics and international negotiations: Sentiment analysis of Brexit negotiating outcomes, International Journal of Information Management, October, 2019, 102048. [12] P. Schwendner, M. Schüle, and M. Hillebrand, Sentiment Analysis of European Bonds 2016-2018, Frontiers in Artificial Intelligence, 2, 2019, 20. [13] R. B. Hubert, E. Estevez, A. Maguitman, and T. Janowski, Examining government-citizen interactions on twitter using visual and sentiment analysis, Proc. ACM International Conference Proceeding Series, 2018, 1- 10. [14] K. Cortis and B. Davis, A Social Opinion Gold Standard for the Malta Government Budget 2018, Proc. 5th Workshop on Noisy User-generated Text (W-NUT 2019), 2019, 364-369. ISSN: 2005-4238 IJAST 2287 Copyright ⓒ 2020 SERSC
International Journal of Advanced Science and Technology Vol. 29, No. 4s, (2020), pp. 2282 - 2288 [15] J. P. Pinto and V. Murari, Real Time Sentiment Analysis of Political Twitter Data Using Machine Learning Approach, International Research Journal of Engineering and Technology(IRJET), 6(4), 2019, 4124-4129. [16] S. Joshi and D. Deshpande, Twitter Sentiment Analysis System, International Journal of Computer Application, 180(47), 2018, 35-39. [17] A. Hasan, S. Moin, A. Karim, and S. Shamshirband, Machine Learning-Based Sentiment Analysis for Twitter Accounts, Mathematical and Computational Application, 23(1), 2018, 11. [18] U. Yaqub, N. Sharma, R. Pabreja, S. A. Chun, V. Atluri, and J. Vaidya, Analysis and visualization of subjectivity and polarity of twitter location data, Proc. ACM International Conference, 2018, 1-10. [19] L. Yue, W. Chen, X. Li, W. Zuo, and M. Yin, A survey of sentiment analysis in social media, Knowledge and Information Systems, 60(2), 2019, 617-663. [20] A. Kumar and A. Jaiswal, Systematic literature review of sentiment analysis on Twitter using soft computing techniques, Concurrency Computation, 32(1), 2020, 1-29. ISSN: 2005-4238 IJAST 2288 Copyright ⓒ 2020 SERSC
You can also read