PLAGIARISM DETECTION TECHNIQUES AND LITERATURE SURVEY
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
International Journal of Computer Engineering and Applications, Volume XV, Issue V, May 2021, www.ijcea.com ISSN 2321-3469 PLAGIARISM DETECTION TECHNIQUES AND LITERATURE SURVEY Kapil Vilasrao Gawande1, Dr. Piyush Pratap Singh2 1 Center of Informatics and Language Engineering, Mahatma Gandhi Antarrashtriya Hindi VishwaVidyalaya, Wardha, Maharashtra, India. 2 Associate Professor, School of Computer & System Science, Jawaharlal Nehru University, New Delhi, India. ABSTRACT: Literature is an intellectual knowledge and new arguments are being made for the theft of that literature. New technical tools are being created for this problem. It’s said that “Money can be stolen, some goods can be stolen but knowledge cannot be stolen”. But by stealing the same knowledge as literature, the theft of the same literature begins with writing on paper. In recent years, many online tools have been able to identify potential plagiarism in research areas. In this paper major contents are the dimensions and techniques of plagiarism, NLP problems of plagiarism identifier, problem of sentences. Keywords: Plagiarism detection, curevin , Plagiarism of Code, NLP Methodology ,Text Similarity INTRODUCTION Plagiarism is a challenging task for publishers, researchers, universities and educational institutions. To call another person's words, thoughts, is plagiarism, whether it is written text, audio/video music, or a picture. So far, plagiarism has been defined in various dictionaries. [13]But according to the Oxford Dictionary, plagiarism is defined as “Plagiarism is presenting someone else’s work or ideas as your own, with or without their consent, by incorporating it into your work without full acknowledgement. All published and unpublished material, whether in Kapil V. gawande and Piyush Pratap Singh 1
PLAGIARISM DETECTION TECHNIQUES AND LITERATURE SURVEY manuscript, printed or electronic form, is covered under this definition.” It is plagiarism to use others texts, pictures, audios and videos without permission. If someone uses the literature of others, he has to mention and give credit as a Citation and references in his literature, from whose literature has been recited or taken Or if someone translates another language to include it in their text, images, audio, or video, it is also considered plagiarism. 1Earlier any text was taken like that. But in the present time, by stealing a text or paraphrasing, they commit plagiarism. There are many types of Plagiarism, the basic and commonly user Text Similarity plagiarism. Under text similarity plagiarism there is a Copy-paste text similarity, code similarity and Translation Similarity. For example student makes uses of English text translate to Hindi text Language copying by other’s works for their assignments to get more marks with no efforts. REVIREW 1. Plagiarism in Document 2. Plagiarism in Code 3. Citation and References 4. Plagiarism Methodology 5. Survey of Papers 1. PLAGIARISM IN DOCUMENT There are two types to check the plagiarism in the Documents. 1. Web embedded System 2. Stand-alone System 1.1 Web embedded System [2]Web enabled systems are more commonly used because they make their search for playarized resources easier on the World Wide Web and are more reliable. It is found as two types of system, * curvein Intelligent Identification System in which the presented document is compared to the works of a previous student and other international databases to ascertain whether it is a literary document. *Secure identity Search or check the paper presented with the following database, i) Internet Database ii) Document already published iii) Data Warehouse or Global Database Examples: Plagiarism Checker by Grammarly, Quetexrt: Plagiarism Checker, CopyScape, ProWritingAid, Copyleaks, etc 1.2 Stand-alone Systems: This is a system that needs to be installed in the computer. There are two types of it which are as follows- Kapil V. gawande and Piyush Pratap Singh 2
International Journal of Computer Engineering and Applications, Volume XV, Issue V, May 2021, www.ijcea.com ISSN 2321-3469 *Verification System: [2]This system works only when connected to the Internet. It searches by searching the Internet to match the sentences in the query document with suspicious websites. *CopyFind: [2] This system works to detect plagiarism of documents between two or more documents. Examples: Plagiarism checker X, Turnitin, AntiPlagiarism, ect. 1.3 Plagiarism text Similarity 1.3.1 Lexical text Similarity: What is the difference between the words of the two sentences taken in the text? The words rat and cat below were spelled differently. [14]Eg. The cat ate the mouse. The mouse ate the cat food. 1.3.2 Semantic text Similarity: How much difference there is in terms of meaning in the sentences of both given text is the similarity of the mean text? Eg. Modiji declared the lockdown on 22 March. The Prime Minister of India declared Lockdown on 22 March. 1.3.3 Monolingual : [3]In the context of plagiarism, stealing text from a homogeneous document, such as - from Hindi text to Hindi text without reference or wrong reference. 1.3.4 Crosslingual: [3]In the context of plagiarism, stealing text from a document in a different language, such as - from Hindi text to English text without reference or incorrect reference. 2 PLAGIARISM OF CODE: [15]C#, Java, Python, HTML, XML, GO, C, C++, Javascript, Swift, Ruby, PHP, Perl, Scala and many more programming languages available to learn. But a variety of approaches have been introduced to detect common logic and code to source code written with C, C ++, JAVA, C#, or .Net. Programming is the language of the future. Therefore, it attracts this language to more students every year. With more and more students learning to code, a growing number are finding themselves with plagiarism allegations. [3]Code plagiarism can be investigated as follows- Level 0 - Basic Program without Modifications Level 1 - Only comments are changed Level 2 - Replaces the identifier name Level 3 - Change in the position of the variable Level 4 - Change Constant and Work Level 5 - The loops are replaced in this level program Level 6 - Control structures are transformed into a uniform form using different control structures 3 Citation and References Kapil V. gawande and Piyush Pratap Singh 3
PLAGIARISM DETECTION TECHNIQUES AND LITERATURE SURVEY It is important to check the citations and references given in the literature. This shows how accurately the litterateur has obtained the information or not, and from where the information has been obtained. It can be searched from the Internet and from the corpus. 1. Most documents are available on the Internet and many e-libraries which can be used for checking citations and references. 2. Books can be checked from the Corpus Data Warehouse in a stand-alone system. 4 PLAGIARISM DETECTION METHODOLOGY: So many Plagiarism Detection tools have been made for plagiarism and many techniques have been used. But the text is still based on pre-processing NLP methods. 4.1.1 Pre-Processing and NLP methodology. i. Tokenization A document is distinguished by breaking it into tokens or words where a token is a unit of the document that can be used. ii. Stop word remover Stop words are words that have no meaning in themselves. They are used in languages to give a structure to a sentence. They can be removed from the split method without affecting the accuracy of parity. iii. Lemmatization Words can have different forms, which are formed as a result of adding suffixes and prefixes to the original forms of words. These suffixes can be removed by lamination. Thus different forms of the same word are reduced by the same word. iv. Stemming The original words are used to transform their meanings by applying the preceding Prefix or suffix. Steaming is the process of searching the root word from such word. Example: रे लगाड़ी= रे ल+गाड़ी= रे ल= prefix, गाड़ी =root word v. Synonym Replacement A litterateur never wants to be caught or searched. So they can either insert or delete parts of a sentence or simply paraphrase it. At this stage a word and all its synonyms are detected, thus the algorithm can be detected paraphrasing. 4.1.2 Document frequency Comparison method [1]The vector space model is a generic model, often applied to information retrieval, translation, or other textual process tasks. To detect plagiarism, the vector space model can be viewed as a global similarity measurement method. Sentences extracted from suspect and source documents are seen as groups that are mutually independent. Using the vector space model, the frequency of the text is derived and then it can be matched to other text frequencies. Frequency is measured between 0 and 1. 4.1.3 Multinomial Naive Bayes [1] Naive Bayes Classifier This is suitable for pattern recognition that can be used to detect plagiarism. When “S” it be a sentence, t1, t2,t3… tn have cautious results on many features displayed by the word. Apply Condition of Bayes theorem: P(S/t1,t2,t3......tn)= Kapil V. gawande and Piyush Pratap Singh 4
International Journal of Computer Engineering and Applications, Volume XV, Issue V, May 2021, www.ijcea.com ISSN 2321-3469 Apply Conditional Probability: P(S/t1,t2,t3......tn)=P(S).P(t1,t2,t3.....tn)/S. 5 SURVEY OF PAPERS: SR.NO DESCRIPTION OF PAPER 1 [2]Research paper written by Prasanth.S, Rajshree.R and Saravana Balaji.B entitled "A Survey on Plagiarism Detection" is written about the type, technique of text similarity in plagiarism. 2 [4]Research paper written by Vítor T. Martins, Daniela Fonte, Pedro Rangel Henriques, and Daniela da Cruz entitled "Plagiarism Detection: A Tool Survey and Comparison" shows the tools and their comparison and accuracy. 3 [5]A research paper written by Ali Bukar Maina, Mahmoud Bukar Maina and SuleimanSalihu Jauro titled "PLAGIARISM: A PERSPECTIVE FROM A CASE OF A NORTHERN NIGERIAN UNIVERSITY" surveyed work on plagiarism. 4 [6]Research paper 2012 written by A. S. Bin-Habtoor and M. A. Zaher entitled "A Survey on Plagiarism Detection Systems" surveyed the tools and techniques on plagiarism. 5 [3]A paper written by Hussain A Chowdhury and Dhruba K Bhattacharyya entitled "Plagiarism: Taxonomy, Tools and Detection Techniques" discusses plagiarism like - Types, Plagiarism Detection Method, NLP Related Problems, Techniques and Tools. 6 [7]Research paper written by Yuehong (Helen) ZHANG and Xiaoyan JIA titled "A survey on the use of CrossCheck for The survey conducted in detecting plagiarism in journal articles "has re-examined and showed its results. 7 [8]P.Rubini & Ms. Research paper written by S.Leela entitled "A SURVEY ON PLAGIARISM DETECTION IN TEXT MINING" has illustrated the techniques of plagiarism and identification. 8 [1]Research paper written by Harshall Lamba and Sharvari Govilkar entitled "A Survey on Plagiarism Detection Techniques for Indian Regional Languages "introduces plagiarism, plagiarism techniques -Candidate Document Retrieval, Document Comparison Techniques, Multinomial Naïve Bayes, Semantic Role Labeling, Fingerprinting based Plagiarism Detection, Latent Semantic Analysis (LSA) and Fuzzy Semantic Similarity Techniques. Written and explained. 9 [12]Research paper written by Jens Lykkesfeldt entitled "Strategies for Using Plagiarism Software in the Screening of Incoming Journal Manuscripts: Recommendations Based on a Recent Literature Survey "surveyed the software screening and showed their results. 10 [11]A research paper written by Hermann Maurer, Frank Kappe and Bilal Zaka entitled "Plagiarism - A Survey" introduces plagiarism, tools of text similarity and their technique. [6] CONCLUSION A survey has been conducted about plagiarism techniques and text-related difficulties. NLP depicts the problem encountered in word and sentence. Prevention of plagiarism requires new algorithms so that new knowledge and research can be done and the theft can be curbed. Kapil V. gawande and Piyush Pratap Singh 5
PLAGIARISM DETECTION TECHNIQUES AND LITERATURE SURVEY REFERENCES [1] Harshall Lamba, Sharvari Govilkar, 4,April 2017. “A Survey on Plagiarism Detection Techniques for Indian Regional Languages” Vol. 164, International Journal of Computer Applications, pp.44-50. [2] Prasanth.S, Rajshree.R,Saravana Balaji.B, 19,January 2014. “A Survey on Plagiarism Detection” Vol. 86,International Journal of Computer Applications, pp.21-23. [3] Hussain A Chowdhury,Dhruba K Bhattacharyya, “Plagiarism: taxonomy, Tools and Detection Techniques” arxiv.org (5, Feb 2021) [4] Vitor T. Martins, Daniela Fonte, Pedro Rangel Henriques, and Daniela de Cruz “Plagiarism Detection: A Tool Survey and Comparison”OASICS, Dangstuhl Publishing,Germany pp.143- 158 [5] Ali Bukar Maina, Mahmoud Bukar Maina and SuleimanSalihu Jauro, December, 2014. “Plagiarism: A Perspective From A Case Of A Northern Nigerian University” Vol. 1, IJIRR,Issue 12, pp.225-230. [6] A.S. Bin-Habtoor, M. A. Zaaher, April 2012. “A Survey on Plagiarism Detection Systems” Vol. 4, No. 2,International Journal of Computer Theory and Engineering ,pp 185-188. [7] Yuehong(Helen) Zhang and Xiaoyan Jia, OCTOBER 2012. “A survey on the use of CrossCheck for Detecting Plagiarism in journal articles”, Vol. 25 , No. 4, Learned Publishing, 5:292-307. [8] P. Rubini, Ms. S.Leela , December 2013. “A Survey on Plagiarism Detection In Text Mining”, Vol.1, International Journal of research in computer applications and robotics, Issue 9, pp. 117- 119. [9] Martin Potthast, Andreas Eiselt, Alberto Barron-Cedeno, “Overview of the 3rd International Competition on Plagiarism Detection”, PAN (webis.de) (18, April 2021) [10] 25th ANNUAL CONFERENCE OF THE SPANISH SOCIETY FOR NATURAL LANGUAGE PROCESSING, SEPLN 2009, 3rd PAN Workshop, Uncovering plagiarism, authorship and social software misuse. [11] Hermann Maurer, Frank Kappe, Bilal Zaka, 25, Aug 2006. “Plagiarism-A survey”, Vol 12, no.8, Journal of Universal Computer Science, pp. 1050-1084. [12] Jens Lykkesfeldt, February,2016. “Strategies for Using Plagiarism Software in the Screening of Incoming Journal Manuscripts: Recommendations Based on Recent Literature Survey”, BCPT, pp.161-164. [13] https://www.ox.ac.uk/students/academic/guidance/skills/plagiarism#:~:text=Plagiarism%20 is%20presenting%20someone%20else's,is%20covered%20under%20this%20definition (06 ,April 2021) Kapil V. gawande and Piyush Pratap Singh 6
International Journal of Computer Engineering and Applications, Volume XV, Issue V, May 2021, www.ijcea.com ISSN 2321-3469 [14] https://kavita-ganesan.com/what-is-text-similarity/#.YHrCTa8zZPY (17, April 2021) [15] https://copyleaks.com/code-plagiarism-checker (20,April 2021) Kapil V. gawande and Piyush Pratap Singh 7
You can also read