CHARACTER DETECTION IN BNI INTERNET BANKING CAPTCHA IMAGE USING TEMPLATE MATCHING CORRELATION - ESQ Business School
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
IJTB | International Journal of Technology And Business CHARACTER DETECTION IN BNI INTERNET BANKING CAPTCHA IMAGE USING TEMPLATE MATCHING CORRELATION Deni Sutaji Universitas Muhammdiyah Gresik/Informatika, Gresik, 61121, Indonesia Email: sutaji.deni@umg.ac.id Abstract Captcha image is already used in Internet Banking as a security device in the process of authentication for active users as a humans or a robots. BNI Bank is the one that applies capthca image, contain four number characters in it. In this study, we propose how to read characters in the captcha image using Template Matching Correlation method. Later in the next study it can be used as an automatic login system for Micro, Small and Medium Enterprises (MSME) players who apply transfer payments to BNI Bank accounts. The first step to recognize is pre-processing, then proceed with segmentation and labeling per characters, and the last is matching with a reference template dataset using Template Matching Correlation. Of the 100 test data, this system produces an accuracy rate that reaches 100%. So this method is suitable for identifying characters of captcha image on the BNI Internet Banking login page. Keywords : captcha, internet banking, template matching correlation INTRODUCTION accounts, so it is better to have a system for checking automatic mutations for MSME In Indonesia, the economic characteristics are entrepreneurs. There is one sub-system of an dominated by Micro, Small and Medium Enterprises automatic system is how to recognize characters in (MSMEs). MSMEs have a vital and strategic task in captcha image automatically. Because the written helping national economic development. Other characters are randomly generated by the BNI Bank benefits of MSMEs for employment are also Internet Banking event if website page is re-loaded beneficial for Indonesia's economic growth. It is (for login / logout). recorded that up to 2015 the number of Captcha contained in the BNI Internet entrepreneurs recorded in the tax director general Banking login line is used as a security code was 56,539,560 and 99.9% in the form of MSMEs consisting of a row of numbers of characters [14]. According to the world bank, MSMEs are generated randomly on the login page in the form of grouped into 3 categories, namely: Micro images. The aim is to can’t copy-paste the form Enterprises (number of employees 10); Small contents. Internet Banking users are required to Business (number of employees 30); and Medium write the sequence number in the fields as a sign to Business (number of employees up to 300 persons) be allowed to log in on the Internet Banking page. [14]. The popular method for recognizing In carrying out its business, many MSMEs characters in various types of letters and numbers is use transfer facilities, both from ATMs and Internet using the Optical Character Recognition (OCR) Banking, one of which is BNI Bank Internet model with Template Matching. Kurniawan in 2016 Banking. MSME entrepreneurs will certainly see the in his research succeeded in recognizing the transfer of accounts for their business transactions character of vehicle number plate images using the with their customers whether the payments made by Template Matching method, with 30 sample data consumers have entered into a debit account or not. capable of recognizing as many as 238 characters By using BNI Internet Banking, MSME with an accuracy value of 80.25% [1]. Whereas entrepreneurs must enter a username, password and research conducted by Sutaji in 2018 using the same captcha image. Of course this will make routine method can recognize the captcha images of BRI activities at all times carried out by the entrepreneur, Internet Login with an accuracy of 93.5% [11]. With or even carried out by administrative staff. It will the approach of artificial intelligence, research take more time and concentration and the risk of conducted by Ye Wang and Mi Lu novel adaptive errors that result in blocked Internet Banking algorithm managed to recognize characters in captcha images with an average accuracy value of Deni Sutaji ©2019 IJTB All rights reserved. E-mail address: sutaji.deni@umg.ac.id
IJTB | International Journal of Technology And Business 70.78% [13]. Sliding window based on the neural a Bank that provides Internet Banking services in a network was also carried out by Hussain, et al. With limited scope and no transactions are carried out. a success rate of 95.5% in character recognition [4]. c) Transaction Internet Banking, which is a service In this article, we will discuss how to provided by the Bank to customers to execute recognize captcha image characters on the Bank transactions through the internet network. BNI Internet Banking login page using Template Matching Correlation. In the third type, the captcha image is embedded in the login authentication system that is on the BNI THEORY/CALCULATION Internet Banking login page and which will be the object of this research. a. Capthca Image Captcha stands for Completely Automaticand c. Optical Character Recognition Human Apart is a fully automated public test to Optical Character Recognition (OCR) is a identify whether a user includes a computer or system that functions to recognize character letters human [3] [4]. The point is Captcha serves to ensure and numbers to be converted into written files. This that the sender of the data is not human (script / letter or number recognition system can be used to program / robot) that automatically sends data increase the flexibility, ability and intelligence of continuously. computer systems [8][13]. A smart character Generally Captcha in the form of an image in recognition system can be used to help humans in which there is a code, where the code can be easily activities that are currently carried out by many read by humans, but the computer will have parties namely information and knowledge difficulty reading the code in the image (easier to digitalization activities. For example in the making read the code in text form) because for computers, of digital library collections, ancient digital satra an image is a collection of color intensity values collections, automation of screening notes, etc.[8]. from each pixels, so a process that is not simple and The OCR algorithm can be seen in Figure 1. complex is needed to be able to recognize objects in the image, let alone to know the meaning of the image [12]. Start However for humans, it is very easy to read the code in the form of images and enter the code in a text input as a condition for sending data, so that File input in this way only humans can be expected to continue sending data while the computer / robot cannot [6]. Pre-Processing b. Internet Banking Internet Banking according to its constituent words is a combination of two words, namely internet and bank. The internet is a network system Segmentation that connects computers in global coverage throughout the world [2]. According to Bank Indonesia, Internet Banking is one of the services in Normlaisation the form of services that facilitate customers to obtain information, carry out communications and carry out banking transactions with the help of the Fiture Extraction internet network. There are three types of Internet Banking services, namely [2]: a) Informational Internet Banking, which is a Recognition service provided by the Bank to customers in the form of information with internet network media and no transactions being carried out. FInish b) Communicative Internet Banking, which is a service provided by a bank to a customer in the form Fig. 1. OCR Flowchart of communication, characterized by interaction with 2 Copyright © 2019 IJTB
IJTB | International Journal of Technology And Business d. Template Matching Correlation Basically Template Matching is a simple process in character recognition in a captcha image. The way the Template Matching algorithm works is that it starts with an input image which contains letters or numbers. The image is then compared with the template image stored in the database. Then the template is placed at the center of the part of the image that will be compared. After that the calculation is done to get how many pixel points are the most suitable for the template image [9]. These steps are repeated continuously for the entire input image that will be compared. The highest suitability value of the pixel point between the input image and template image indicates that the template is the template image that best fits the input image [4]. Illustration of this algorithm can be seen in Figure 2.2 e. Connected Component Labeling Connected Component Labeling is an image segmentation technique that can also be used to classify regions or digital image objects. This technique uses the theory of pixel connectivity in the image. Pixels that enter the region are called Fig. 2. Template Matching Illustration connected (indicating there is connectivity or connectivity) by adhering to adjacency rules (rules of proximity of pixels) [9]. This pikesl proximity In this study, the template image dataset rule utilizes the neighboring properties of pixels. collection stage was started by downloading the The pixels that are connected say basically have Captcha image from the ibank.bni.co.id test page. adjacency properties with each other because they Then the image is cut and changed in character size. still have neighboring relationships. Suppose a Then the image is converted into a binary image to symbol √ denotes the pixel intensity value. Let's get a dataset image that is in the form of a character just say that value is from the range (0,1). Keep in template from characters numbers 0 to 9 which will mind, that images that can be processed using this later be used as a reference template dataset. method are binary images. Neighboring must have a In this study, the system will be divided into 2 length or distance of 1 unit (directly between pixels main stages, the first is the pre-processing of the with pixels without any intermediate) [4] [5]. initial data, character segmentation and labeling and the last is character recognition using the Template EXPERIMENTAL METHOD Matching algorithm to recognize the pattern of Captcha character images. System design can be The problem is due to the absence of a system seen in Figure 3. that can recognize the character of the Captcha number, in the e-banking application the user is The following is the flowchart of each stage: required to enter the Captcha character number by 1. Pre-Processing typing through the keyboard, when the user wants to The Pre-Processing is needed in this study. enter the Captcha character number by typing The first step is convert the RGB captcha image through the keyboard the user can make an error in from internet banking page to grayscale image. entering the character of the Captcha number and Second step is image adjustment, the grayscale the worst risk is that the Internet Banking account image is improved by contrast and brightness with will be blocked. Then a system is needed that can adjustment. After the image is adjusted, the third read and recognize Captcha number characters. step is converting the image to a binary image using otsu thresholding. In this step the value of thresholding is important to success of this study. After obtaining a binary image, then the process 3 Copyright © 2019 IJTB
IJTB | International Journal of Technology And Business goes to the second stage, namely character recognition for each numbers which are contained in the image. The flowchart of pre-processing process Start can be seen in Figure 4 and the result of pre- processing step is shown in Figure 5 respectively. Capthca Start Images Convert to Grayscale Captcha Images Image Adjustment Convert to Binary Image Pre-Processing Thresholding Binary Detection Character Images using Template Matching FInish Fig. 4. Flowchart Pre-Processing Process Finish 2. The detection process At this stage, generally the process image is left with only pixel characters. This stage is the final and main stage in the system, and the previous stage can be called the auxiliary stage or the initial stage only. This stage begins with the labeling process, cutting Fig. 3. Flowchart sistem pengenalan karakter dengan pixels for each label, and matching pixel patterns Template Matching. 4 Copyright © 2019 IJTB
IJTB | International Journal of Technology And Business with available datasets. The results of segmentation While in the character detection step, the in each character can be seen in Figure 6. character segmentation process becomes an Furthermore, the matching process with the important part. Because from the results of this template matching correlation algorithm by segmentation each character will be matched with comparing images that have been separated will be database templates number 0 to 9. The advantages tested with patterns available in the dataset to obtain obtained from capthca images from BNI bank are character information. The process is repeated in a the number characters contained in them separately, number of 4 character label indices. so the segmentation process has no difficulty in getting 4 the character number that is in the captcha image. The number of captcha character images tested are 100 sample data obtained from the download results on the website ibank.bni.co.id page. The purpose of this test is to determine the success and failure rate of the system which built and the most important thing to measure accuracy, and later conclusions can be drawn according to observations from the test. To see examples of test results can be seen in Table 1. Fig. 5. Pre-Processing Process (a) Threshold 0.1 (b) Threshold 0.2 (c) Threshold 0.3 (d) Threshold 0.4 Fig. 6. Segmentation Labeled of Each Characters. RESULTS AND DISCUSSION (e) Threshold 0.5 (f) Threshold 0.6 This section will explain the testing of digital image processing applications to detect captcha image characters on internet banking using the (g) Threshold 0.7 (h) Threshold 0.8 Template Matching Correlation method. Testing is done by looking for a value that is close to or even Fig. 7. Otsu threshold from 0.1 until 0.8 values respectively, 0.4 the same for the match between the pixel value of is the best threshold value the image of the test data and the template pixel Shown in Figure 7 that for the 0.1 threshold image that has been previously provided by using value it produces a poor binary image. Each the Template Matching Correlation Algorithm. character there are missing pixels, so that the formed In the pre-processing step, the process of number is not perfect, this will affect the process of converting grayscale images into binary images with segmentation and character detection. Another case the otsu method is done by trial and error to get the with a threshold value of 0.2 and 0.3. Character best character results. If it's too thick, the characters number looks thinner than the original number, this can’t match the template, and vice versa. From the does not affect the segmentation process, but it results of the search trials the best threshold value affects the character detection process. obtained is 0.4. Contrast with other threshold values Whereas for the threshold values of 0.5 and that are close to are 0.1, 0.2, 0.3, 0.4, 0.5, and 0.8 0.6, it can be seen that the character number looks can be seen in Figure 7. fatter than the original image, this does not affect the segmentation process but will affect the error 5 Copyright © 2019 IJTB
IJTB | International Journal of Technology And Business detection process. For the last value thresholds are Evaluation for the results of accuracy is stated in the 0.7 and 0.8, it can be seen that there are additional following equation: pixels in the character number area, this will result in errors in the process of segmentation and character detection. So the most appropriate threshold value in this study is 0.4. = 100 100 Table 1. Result Character Detection of 10 captcha images = 100% No Image Read Correct Failed Based on the results of the accuracy evaluation 1 1002 4 0 above, we can conduct that the level of accuracy with 100 test capthca images is 100%. 2 5772 4 0 3 1437 4 0 CONCLUSION 4 7418 4 0 Based on the research that has been done, it can be concluded that the Template Matching 5 2349 4 0 Correlation method can be used to detect the character numbers in captcha image from Internet 6 1626 4 0 Banking of BNI Bank by comparing the same 7 2689 4 0 number of pixels between the reference dataset template and the input template. 8 1234 4 0 The suggestions from this study are that it is expected that the processed character images not 9 3331 4 0 only come from the ibank.bni.co.id test page, but also from other test pages such as bri, bca and other 10 5069 4 0 test pages. This research can be developed by combining with the grabbing method for data collections, so that it can be used as an automatic system for reading account mutations by MSME Table 2. Confusion matrix Evaluasi Sistem players in Indonesia. Result Detection ACKNOWLEDGMENT Confusion Matrix Correct Char Failed Thank you as much as possible to the Ministry of Research, Technology and Higher Readed as TP FN Education of Indoensia, for the opportunity that Original Number 100 0 given to the author, because it was trusted to Class research this topic with the Penelitian Dosen Failed Read as FP TN Pemula scheme. Number 0 0 From the 10 examples of the results of the REFERENCES trial, then the test continued for all test data. From the whole test the system can recognize numbers in [1] Bayu S., Kurniawan. 2016. Aplikasi Pengenalan Citra Nomor Kendaraan Bermotor Menggunakan captcha images properly. It caused that the number Metode Template Matching, Tugas Akhir Teknik characters in each captcha images are not contain Informatika Universitas Sam Ratulangi Manado. overlap pixels area. Each number characters in the [2] Budi Agus R, Aspek Hukum Internet Banking, image can be well identified correctly, as shown in (jakarta:PT.Raja Grafindo Persada, 2005). Table 2. 6 Copyright © 2019 IJTB
IJTB | International Journal of Technology And Business [3] Dani Rohpandi 2010, Aplikasi Pengenalan Citra [17] Gonzales, R.C.; Woods, R.E; Eddins, S.L. 2004. Dalam Huruf Ngalena Menggunakan Matlab Digital Image Processing Using MATLAB. Pearson STMIK Tasik Malaya. LPE. [4] Hussain R., Gao Hui, Ahmed S. R., Parveen S. S. [18] Gonzalez, R.C.; Woods, R.E. 2002. Digital Image 2016. “Recognition Based Segmentation of Processing. Prentice Hall. Connected Characters in Text Based,” 8th IEEE [19] Jahne, B. 2002. Digital Image Processing. Berlin: International Conference on Communication Springer-Verlag. Software and Networks. [5] Raden, S.B., Irfan M., 2012. Perbandingan Algoritma Template Matching dan Feature Extraksion Pada Optical Character Recognition. Fakultas Teknik dan Ikmu Komputer Indonesia Jln. AUTHOR’S PROFILE Dipati Ukur No. 112-116 Bandung. Deni Sutaji was born in Gresik on October 11th, [6] Kusuma, W.A, Sutaji, D. 2017. “Segmentasi 1984. He earned his Master’s degree in Informatics from Pembuluh Darah Pada Citra Retina Menggunakan Institut Teknologi Sepuluh November Surabaya on Multi-Scale Line Detector (MSLD) dan Adatptive October 2016. He is currently works as a lecturer at Morphology,” Jurnal Register, vol. 3, pp. 49–56. Muhammadiyah Gresik University. [7] Kusumanto R.D. 2011, Pengolahan Citra digital Untuk Deteksi Obyek Menggunakan Pengolahan Warna Model RGB. Jurusan Teknik Komputer Politeknik Negeri Sriwijaya Palembang. [8] Prasetyo, E. 2011. Pengolahan Citra Digital dan Aplikasinya Menggunakan Matlab. Yogyakarta: Andi Publisher. [9] Hartanto, S., Sugianto, A., dan Endah, S.N. 2014. Optical Character Recognition Menggunakan Algoritma Template Matching Correlation. Jurnal Masyarakat Informatika, Vol.5 No.9, pp. 1-12. [10] Sutaji D., Fatichah C., dan Adni, N.D. 2016. Segmentasi Pembuluh Darah Retina Pada Citra Fundus Menggunakan Gradient Based Adaptive Thresholding Dan Region Growing. Jurnal Register, vol. 2, pp. 105–116. [11] Sutaji D., Husenti N. 2018. Deteksi Karakter Pada Citra Captcha Login Internet Banking Menggunakan Template Matching. Prosiding SNTE 2018, vol. 4, pp. 37–40. [12] Louis V.A., Manual B., and John L. 2004. Telling Humans and Computers Apart Automatically. Comm. Of the ACM, 47(2):57-60. [13] Ye Wang and Mi Lu, “A self-adaptive algorithm to defeat text-based CAPTCHA,” IEEE International Conference on Industrial Technology (ICIT), 2016. [14] Kerjasama LPPI dengan Bank Indonesia. 2015. Profil Bisnis Usaha Mikro, Kecil dan Menengah (UMKM). [15] Sakkatos P, Theerayut W, Nuttapol V and Surapong P 2014 Analysis of text-based CAPTCHA images using template matching correlation technique JICTEE 2014 - 4th Jt. Int. Conf. Inf. Commun. Technol. Electron. Electr. Eng. 5–9 [16] Zou H, Zhang B, Tao Z and Wang X 2016 A Finger Vein Identification Method Based on Template Matching J. Phys. Conf. Ser. 680 7 Copyright © 2019 IJTB
You can also read