Forecasting And Prediction Of Air Pollution Levels To Protect Human Beings From Health Hazards
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 01, JANUARY 2020 ISSN 2277-8616 Forecasting And Prediction Of Air Pollution Levels To Protect Human Beings From Health Hazards S. Suganya, Professor Dr. T. Meyyappan Abstract: Prevention and control of air pollution has become an essential activity in many cities. Air is polluted at unacceptable levels by industries and heavy vehicular traffic in cities which affects human health conditions to a great extent. Forecasting, Predicting and controlling air pollution is the need of the hour to protect human beings from health hazards. Air pollution poses threats not only to humans but also to entire flora and fauna. The prime objective of this paper is to propose a new method to predict air pollution using data collected on monthly basis and provide recommendations to prevent and control air pollution. This research work comprises of two phases. The first phase preprocesses the chosen dataset using python coding. The second phase analyzes the preprocessed data to predict air pollution levels. Kaggle dataset containing monthly air pollution data collected over the period 2000 to 2010 is subjected to the proposed method. Predictions for a future month are made by computing Air Quality Index(AQI) metric and computed threshold value for the previous two months. The proposed method shows acceptable accuracy in performance. Index Terms: Air Pollution, Air Quality Index, Analysis, Pollution Forecasting, Prediction, Prevention, Control. —————————— —————————— 1. INTRODUCTION Air Quality Index (AQI) is a specific number. AQI number is Today air pollution occurs when harmful or unnecessary used to characterize the quality of the air at a particular time in quantity of substance including gases are present in air. They the given location. The proposed research work predicts the may cause severe health problems, diseases, allergy and air pollution for a future date using the AQI of past months. even death to humans. Air pollution plays a significant role in Paper. weakening health conditions of skin, eye and human organs, which reduces human life time. Flora and Fauna are also affected by polluted air. Healthy life of future generation is 2 RELATED WORKS under threat by ever increasing air pollution at global level due Lot of work has been carried out as found in the literature in industrialisation and increase in the use of petroleum products. the learning, analysis and prediction of air pollution as well as These problems motivated to do research on air pollution data forecasting the future trends. Following are the three active for accurate and early prediction. Storage, processing and researchers in this field: Polaiah Bojja , Y i-Ting Tsai , Ranjana analysis of the pollution data set using traditional techniques is Waman Gore, and Ling WangPolaiah Bojja[6] has studied the complex due to its huge volume. Hence, there is a need to Artificial Neural Networks (ANN), Fuzzy Logic Controller, move to Big Data Analytic techniques. In Big Data techniques, Pollution Forecasting, Ecosystem, Effect of PM 10 and SO2. HDFS and Hbase can store high volume of data. Hive and Pig Accuracy of measurement is ensured by evaluating the can process both semi structured and unstructured data. Map minimum forecasting error using MA TLAB software. The level Reduce can be used to analyze the dataset, The proposed work analyzes the air pollution data set to predict air pollution of air pollution due to increase in number of vehicles in India for all the months of any future year. The results obtained can and Andhra Pradesh is determined using Artificial Neural be used by policy makers to control and prevent air pollution in Networks (ANN), Fuzzy Logic Controller, Pollution future. In this proposed work the most dangerous air pollutant Forecasting, Ecosystem, Effect of PMIO and S02' with particles considered are arbon-dioxide (CO), Nitrogen Oxides MATLAB coding. Soft computing approaches Feed-forward (NO2) and Sulpher Dioxide and Ozone. The main sources of Back Propagation network ( BPN ) model and Mamdani Fuzzy these particles are vehicles, traffic, smokes, burning plastics, Inference model are trained and tested using five years past burning electronic wastages, and industries. When human data (meteorological data). Yi-Ting Tsai[7] proposed an beings inhale the polluted air, these particles get into their approach to forecast PM2.5 concentration using RNN blood and cause dangerous diseases. Children and aged (Recurrent Neural Network) with LSTM (Long Short-Term people are affected at a faster rate by the polluted air. Memory). The training data used in the network is retrieved from the EPA (Environmental Protection Administration) of ———————————————— Taiwan from year 2012 to 2016 and is combined into 20- S. SUGANYA is currently pursuing Ph.D. in Department of Computer dimensions data; and the forecasting test data is the year Science in Alagappa University, India, PH-9786383278. E-mail: 2017. Experiments are conducted to evaluate the forecasting suganyasudhakar04@gmail.com value of PM2.5 concentration for next four hours at 66 stations DR. T. MEYYAPPAN is currently Working as a Professor in around Taiwan. The proposed approach forecasts PM2.5 Department of Computer Science in Alagappa University, India, E- mail: meyyappant@alagappauniversity.ac.in concentration using RNN (Recurrent Neural Network) with LSTM (Long Short- Term Memory). Keras, which is a high- level neural networks API written in Python is exploited in their 2541 IJSTR©2020 www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 01, JANUARY 2020 ISSN 2277-8616 research work. Ranjana Waman Gore[8], analyzed how the air pllution affects people. The classification based on Air Quality Index(AQI) are good, moderate, (unhealthy for sensitive groups) unhealthy, unhealthy, very_unhealthy, This paper focused on analysis of air based on the available data of various air pollutants such as NO2, SO2, CO and O3 with corresponding AQI values. Naïve Bayes and Decision tree J48 algorithm are adopted for predicting the health concern. Ling Wang[9] proposed a model named ―Prediction of Air Pollution Based on FCM-HMM Multi-model‖. It analyzes the relationships between the air pollution index (API) and meteorological factors using correlation analysis and principal component regression. A multi model frame is constructed with FCM-HMM clustering and TS fuzzy inference. Firstly, fuzzy c- means clustering (FCM) algorithm is adopted to obtain the initial clusters of the observation sequences used as a tool for the prediction of air pollution index. Compared to nonlinear regression, gray model and ANN, HMM offers a powerful framework for temporal modeling of features extracted from time series data. The proposed strategy derives predictive Fig. 2. The Steps in the Proposed Air Pollution Prediction Model model to predict air pollution index values in urban areas. Multi Model Method Based on FCM-HMM is implemented. Many researchers have adopted Clustering, Fuzzy C-means Steps in Analysis: Clustering Algorithm, Hidden Markov Models, Observation 1. In preprocessing step, the parameters NO2, CO, SO2, Sequence Generation based on PCA methods. Many O3 are separated on monthly basis. researchers have analyzed and monitored the air quality and 2. The following new formula is computed air pollution in Delhi, Agra and USA.. In our work, we have AQI = 0.3 * NO2 + 0.3 * CO + 0.2 * SO2 + 0.2 * O3 introduced a new formula to compute Air Quality Index. The To find the value of Air Quality Index for every month in an new AQI formula uses 30%, 30%, 20% and 20% weightages year. for NO2, CO, SO2, O3 respectively. The air pollution for a given 3. Average of AQI value in 24 hours of a day is computed period has a relationship with past month’s air pollution levels. for each day. Hence, AQI for the future month is computed based on the 4. Average of AQI values of all the days in chosen month AQI values of past two months. is computed. n Threshold T = i 1 AQI i where n is the no. of days in 3 METHODOLOGY the month 5. Compute the difference between the threshold and 3.1 Data Set Used actual AQI values of all the days in the chosen month The proposed research work uses air pollution data set DIFFi = T - AQIi (i = 1,2 … n) where n is the no. of days downloaded from kaggle website. This data set contains data in the month with four attributes that were collected during the years 2000 If DIFFi is < 0 then to 2010. Air pollutants parameters are Air Quality Index (AQI) Air pollution is present values of NO2, SO2, CO, and O3. Data set is preprocessed and Else analysed using Big Data analytic techniques to predict air Air pollution is not present pollution on monthly basis. Python coding is used for 6. Stop implementation of techniques adopted. Sample data from the Classification Accuracy Rate and Error Rate for the month are data set is shown in Fig 1. computed using confusion matrix as follows: Classification Accuracy Rate = (TP + FN) / (TP + TN + FP + FN) 3.2 Process Flow in the Proposed Work: Classification Error Rate = (TN + FP) / (TP + TN + FP + FN) 2542 IJSTR©2020 www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 01, JANUARY 2020 ISSN 2277-8616 Step 2: Threshold Computation Step 3: Prediction PHASE I Fig. 1. Fig.1. Example Dataset Pre-Processing (And) AQI Value Calculation In this proposed work, data sets for the years 2000 to 2010 are Aggregation of Air Quality Index: collected from the Kaggle website and preprocessed using big data analytics and python coding. After preprocessing, AQI values of NO2, CO, SO2, O3 are computed on monthly basis. Then New AQI is calculated using the formula (1) for every month in the years from 2000 to 2010. This New AQI = 0.3/NO2 + 0.3/CO + 0.2/SO2 + 0.2/O3 ………… (1) PHASE II Analysis of New AQI After calculating the New AQI for every month,air pollution is predicted for every month using the AQI values of previous two months. For example, to make prediction for the month of April, average of the AQI value is calculated for the previous two months March and February. The average of this two Fig. 3.. AQI Aggregation Index month’s AQI value is also calculated. Threshold Calculation The proposed work is carried out in two Phases. Phase I Threshold value is computed as an average value of all the computes AQI (Air Quality Index) value which is used in Phase AQI values of a the chosen month. II to make prediction. Phase II makes prediction based on the AQI values and Threshold value computed. Prediction Prediction for the chosen month is made by comparing the Phase I actual AQI value with the Threshold value. The Threshold Step 1: Pre-processing value is compared with all the average values of the previous Step 2: AQI Value Computation two month’s AQI. Average value lesser than the Threshold value indicates absence of air pollution. Average value greater Phase II than or equal to the Threshold value indicates the presence of Step 1: Analysis of New AQI air pollution. The process is repeated for all the months in an year. 2543 IJSTR©2020 www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 01, JANUARY 2020 ISSN 2277-8616 4 RESULT AND DISCUSSION Confusion Matrix has the information on actual class and predicted class. Performance of this proposed work is Table 1 shows the difference between calculated and actual evaluated using the data in the matrix. AQI values for a given month. The threshold value is 4.38 Table.2. Actual Class and Predicted Class - Illustration which is computed based on the Actual AQI values. The threshold value is compared with each average values of the Actual days in previous two months. Either the presence or absence of air pollution is determined based on the difference value TRUE FALSE compared with threshold value. Predicted Table.1 Prediction of Air Pollution for a month based on AQI TRUE 14 8 value THRESHOLD T = 4.380556 (Average for the month of April) FALSE 4 4 Threshold=4.38 It is evident from the data shown in table 1, for the given month (for example April), the Difference value is found to be Accuracy Rate: lesser than the Threshold value for 10 days. Hence, it is Accuracy Rate is the proportion of the total number of concluded that the air pollution is not present on those days predictions that are correct. It is determined by the following and it is present for remaining 20 days equation. Classification Accuracy Rate = (TP + FN) / (TP + TN + FP + Confusion Matrix FN) TABLE 1 PREDICTION OF AIR POLLUTION FOR A MONTH BASED ON AQI VALUE Day AQI Actual February March Average Difference Prediction Value(April) (A) (T – A) If (T-A) >=0 or (T-A)
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 01, JANUARY 2020 ISSN 2277-8616 Accuracy Rate for the April Month = (14+4) / 30 = 18/30 = 0.6 Error Rate: Error Rate is the proportion of the total number of predictions that are incorrect. It is determined by the following equation. Classification Error Rate = (TN + FP) / (TP + TN + FP + FN) Error Rate for the year 2010 = (8+4) / 30 = 12/30 = 0.4 5 CONCLUSION Air pollution is dangerous for nature as well as for human beings. Prediction and remedial actions is the need of the hour. In this research work, the data set chosen from Kaggle website is preprocessed first to separate pollutant parameters NO2, CO, SO2, O3. The prediction of air pollution is performed in two phases. The first phase computes AQI (Air Quality Index) values for all the days in a month. The second phase computes threshold value of AQI as an average of previous months average AQI values. Air pollution for the days in chosen month is predicted by comparing the threshold value with the average of the previous two month values. Big data analytics are used to handle huge data volumes and Python coding is used to implement computational procedures. Prediction accuracy and error rate are computed. The results are found to be encouraging. Further research work is in progress to include other environmental parameters. ACKNOWLEDGMENT This article has been written with the financial Support of RUSA-Phase 2.0 grant sanctioned vide Letter NO.F,24- 51/2014-U,Policy (TN Multi-Gen),Dept of Edn. Govt of India, Dt. 09.10.2018 REFERENCES [1] https://en.wikipedia.org/wiki/Air_pollution. [2] Shweta Taneja,Dr.Nidhi Sharma ―Predicting Trends in air pollution in Delhi using data mining‖,2016 IEEE. [3] Peijiang Zhao, Koji Zettsu ―Convolution Recurrent Neural Networks Based Dynamic Transboundary Air Pollution Predictiona‖, 2019 the 4th IEEE International Conference on Big Data Analytics. [4] HOW can affect the human being atmospheric And environment pollution. [5] https://en.wikipedia.org/wiki/Air_pollution. [6] Polaiah Bojja, Vivith Kumar Karumuri ―Development and Evaluation of Pollution Forecasting Model Using Soft- Computing Methods for PMIO and S02 in Ambient Air‖ IEEE WiSPNET 2016 conference. [7] Yi-Ting Tsai, Dept. of Computer Science and Information ngineering National Taipei University.‖ Air pollution forecasting using RNN with LSTM‖, 2018 IEEE 16th Int. Conference. [8] Ranjana Waman Gore, ―An Approach for Classification of Health Risks Based on Air Quality Levels‖ 978-1-5090- 4264-7/17/$31.00 ©2017 IEEE. [9] Ling Wang ―Prediction of Air Pollution Based on FCM-HMM Multi-model‖Proceedings of the 35th Chinese Control Conference July 27-29, 2016, Chengdu, China 2545 IJSTR©2020 www.ijstr.org
You can also read