Cross-Language Plagiarism Detection: Methods, Tools, and Challenges: A Systematic Review

Miguel Botto-Tobar; Alexander Serebrenik; Mark G.J. van den Brand

doi:10.18517/ijaseit.12.2.14711

Cross-Language Plagiarism Detection: Methods, Tools, and Challenges: A Systematic Review

Miguel Botto-Tobar, Alexander Serebrenik, Mark G.J. van den Brand

Abstract

Plagiarism is one of the most serious academic offenses. However, people have adopted different approaches to avoid plagiarism, such as transcribing excerpts from one language. Thus, it is challenging to realize this plagiarism form unless someone fully understands another language. Researchers have developed approaches for detecting plagiarism in a variety of different languages. However, most methods created in the past have proved effective for detecting plagiarism in papers published in a single language, most notably English. Therefore, this paper aims to provide a systematic literature review of cross-language plagiarism detection methods (CLPD) in a natural language context. The approach used to perform this study consisted of an extensive search for relevant literature through an SLR and Snowballing. Therefore, we present an overview of (i) cross-language plagiarism detection techniques; (ii)the artifacts and the aspects that were considered in the evaluation phase; and(iii) the lack of guidelines and tools for its implementation. Its contribution lies in its ability to highlight emerging cross-language plagiarism detection techniques trends. Further, we identify any of these techniques in other domains, for instance, software engineering.

Keywords

Cross-language; plagiarism detection; SLR; snowballing.

Full Text:

PDF

References

IEEE, â€œA Plagiarism FAQ,â€ 2015. [Online]. Available: http://www.ieee.org/publications_standards/publications/rights/plagiarism_FAQ.html. [Accessed: 11-May-2018].

M. Potthast, B. Stein, A. BarrÃ³n-CedeÃ±o, and P. Rosso, â€œAn evaluation framework for plagiarism detection,â€ in Coling 2010: Posters, 2010, pp. 997â€“1005.

S. M. Alzahrani, N. Salim, and A. Abraham, â€œUnderstanding plagiarism linguistic patterns, textual features, and detection methods,â€ IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, vol. 42, no. 2. pp. 133â€“149, 2012.

A. BarrÃ³n-CedeÃ±o, P. Gupta, and P. Rosso, â€œMethods for cross-language plagiarism detection,â€ Knowledge-Based Syst., vol. 50, pp. 211â€“217, 2013.

M. Franco-Salvador, P. Rosso, and M. Montes-y-GÃ³mez, â€œA systematic study of knowledge graph analysis for cross-language plagiarism detection,â€ Inf. Process. Manag., vol. 52, no. 4, pp. 550â€“570, 2016.

J. Ferrero, L. Besacier, D. Schwab, and F. AgnÃ¨s, â€œDeep Investigation of Cross-Language Plagiarism Detection Methods,â€ in Proceedings of the 10th Workshop on Building and Using Comparable Corpora, 2017, pp. 6â€“15.

A. E. Tlitova, A. S. Toschev, M. Talanov, and V. Kurnosov, â€œMeta-Analysis of Cross-Language Plagiarism and Self-Plagiarism Detection Methods for Russian-English Language Pair.,â€ Front. Comput. Sci., vol. 2, p. 523053, 2020.

A. Kumar and S. Das, â€œAn evolutionary survey from Monolingual Text Reuse to Cross Lingual Text Reuse in context to English-Hindi,â€ Int. J. Sci. Eng. Res., vol. 6, no. 2, pp. 996â€“1003, 2015.

S. Shimpikar and S. Govilkar, â€œA Survey of Text Summarization Techniques for Indian Regional Languages,â€ Int. J. Comput. Appl., vol. 165, no. 11, pp. 29â€“33, 2017.

P. Rosso, â€œAuthor profiling and Plagiarism detection,â€ in Communications in Computer and Information Science, 2015, vol. 505, pp. 229â€“250.

C. Wohlin, â€œGuidelines for snowballing in systematic literature studies and a replication in software engineering,â€ in Proceedings of the 18th international conference on evaluation and assessment in software engineering, 2014, pp. 1â€“10.

J. Webster and R. T. Watson, â€œAnalyzing the past to prepare for the future: Writing a literature review,â€ MIS Q., pp. xiii--xxiii, 2002.

Keele University, â€œGuidelines for performing systematic literature reviews in software engineering,â€ 2007.

A. MartÃn-MartÃn, M. Thelwall, E. Orduna-Malea, and E. D. LÃ³pez-CÃ³zar, â€œGoogle Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitationsâ€™ COCI: a multidisciplinary comparison of coverage via citations,â€ Scientometrics, vol. 126, no. 1. Springer, pp. 907â€“908, 2021.

D. Landman, A. Serebrenik, and J. J. Vinju, â€œChallenges for static analysis of java reflection-literature review and empirical study,â€ in 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), 2017, pp. 507â€“518.

J. Cohen, â€œA coefficient of agreement for nominal scales,â€ Educ. Psychol. Meas., vol. 20, no. 1, pp. 37â€“46, 1960.

J. R. Landis and G. G. Koch, â€œThe measurement of observer agreement for categorical data,â€ Biometrics, pp. 159â€“174, 1977.

L. Barbosa and J. Feng, â€œRobust sentiment detection on twitter from biased and noisy data,â€ in Coling 2010: Posters, 2010, pp. 36â€“44.

A. BarrÃ³n CedeÃ±o, â€œOn the Mono- and cross-language detection of text re-use and plagiarism,â€ Proces. Leng. Nat., vol. 50, pp. 103â€“105, 2013.

M. Potthast, A. BarrÃ³n-CedeÃ±o, B. Stein, and P. Rosso, â€œCross-language plagiarism detection,â€ Language Resources and Evaluation, vol. 45, no. 1. pp. 45â€“62, 2011.

J. Kasprzak and M. Brandejs, â€œImproving the reliability of the plagiarism detection system: Lab report for PAN at CLEF 2010,â€ in CEUR Workshop Proceedings, 2010, vol. 1176, pp. 359â€“366.

C. K. Kent and N. Salim, â€œWeb based cross language plagiarism detection,â€ in 2010 Second International Conference on Computational Intelligence, Modelling and Simulation, 2010, pp. 199â€“204.

M. Pataki, â€œA new approach for searching translated plagiarism,â€ 2012.

C. K. Kent and N. Salim, â€œWeb based cross language semantic plagiarism detection,â€ in Proceedings - IEEE 9th International Conference on Dependable, Autonomic and Secure Computing, DASC 2011, 2011, pp. 1096â€“1102.

S. Alzahrani, â€œCross-Language Semantic Similarity of Arabic-English Short Phrases and Sentences.,â€ J. Comput. Sci., vol. 12, no. 1, pp. 1â€“18, 2016.

F. Safi-Esfahani, S. Rakian, and M. H. Nadimi-Shahraki, â€œEnglish-Persian Plagiarism Detection based on a Semantic Approach,â€ J. AI Data Min., vol. 5, no. 2, pp. 275â€“284, 2017.

R. Kothwal and V. Varma, â€œCross lingual text reuse detection based on keyphrase extraction and similarity measures,â€ in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2013, vol. 7536 LNCS, pp. 71â€“78.

P. Gupta, K. Singhal, P. Majumder, and P. Rosso, â€œDetection of Paraphrastic Cases of Mono-lingual and Cross-lingual Plagiarism,â€ ICON, 2011.

P. Gupta and K. Singhal, â€œMapping Hindi-English text re-use document pairs,â€ in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2013, vol. 7536 LNCS, pp. 79â€“85.

T. BrychcÃn, â€œLinear transformations for cross-lingual semantic textual similarity,â€ Knowledge-Based Syst., vol. 187, p. 104819, 2020.

Z. Ceska, M. Toman, and K. Jezek, â€œMultilingual plagiarism detection,â€ in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2008, vol. 5253 LNAI, pp. 83â€“92.

P. Gupta, A. BarrÃ³n-CedeÃ±o, and P. Rosso, â€œCross-language high similarity search using a conceptual thesaurus,â€ in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012, vol. 7488 LNCS, pp. 67â€“75.

D. Pinto, J. Civera, A. BarrÃ³n-Cedeno, A. Juan, and P. Rosso, â€œA statistical approach to crosslingual natural language tasks,â€ J. Algorithms, vol. 64, no. 1, pp. 51â€“60, 2009.

S. Yahyaei, M. Bonzanini, and T. Roelleke, â€œCross-lingual text fragment alignment using divergence from randomness,â€ in International Symposium on String Processing and Information Retrieval, 2011, pp. 14â€“25.

M. Mostafa and L. Agarwal, â€œMultilingual Plagiarism Detection,â€ 2014.

S. Alzahrani, N. Salim, A. A.-I. T. on, and undefined 2011, â€œUnderstanding plagiarism linguistic patterns, textual features and detection methods,â€ researchgate.net.

N. Ehsan, F. Tompa, â€¦ A. S. the 2016 A. S. on, and undefined 2016, â€œUsing a dictionary and n-gram alignment to improve fine-grained cross-language plagiarism detection,â€ dl.acm.org.

P. Gupta, A. BarrÃ³n-Cedeno, and P. Rosso, â€œCross-language high similarity search using a conceptual thesaurus,â€ Conf. Cross-Language , 2012.

A. BarrÃ³n-Cedeno, P. Rosso, and E. Agirre, â€œPlagiarism detection across distant language pairs,â€ Proc. 23rd, 2010.

J. Ferrero, L. Besacier, D. Schwab, and F. Agnes, â€œDeep Investigation of Cross-Language Plagiarism Detection Methods,â€ in Proceedings of the 10th Workshop on Building and Using Comparable Corpora, 2017, pp. 6â€“15.

J. Ferrero, F. AgnÃ¨s, L. Besacier, and D. Schwab, â€œA multilingual, multi-style and multi-granularity dataset for cross-language textual similarity detection,â€ in Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, 2016, pp. 4162â€“4169.

M. Potthast, A. BarrÃ³n-CedeÃ±o, and B. Stein, â€œCross-language plagiarism detection,â€ Lang. Resour., 2011.

L. T. Nguyen and D. Dien, â€œVietnamese- English Cross-Lingual Paraphrase Identification Using Siamese Recurrent Architectures,â€ in Proceedings - 2019 19th International Symposium on Communications and Information Technologies, ISCIT 2019, 2019, pp. 70â€“75.

J. Ferrero, F. Agnes, L. Besacier, and D. Schwab, â€œCompiLIG at SemEval-2017 Task 1: Cross-language plagiarism detection methods for semantic textual similarity,â€ arXiv Prepr. arXiv1704.01346, 2017.

E. M. B. Nagoudi, J. Ferrero, D. Schwab, and H. Cherroun, â€œWord embedding-based approaches for measuring semantic similarity of arabic-english sentences,â€ in International Conference on Arabic Language Processing, 2017, pp. 19â€“33.

H. Ezzikouri, M. Erritali, and M. Oukessou, â€œPlagiarism Detection in Across Less Related Languages (English-Arabic): A Comparative Study,â€ in Smart Data and Computational Intelligence, 2019, pp. 207â€“213.

C. Vania and M. Adriani, â€œAutomatic external plagiarism detection using passage similarities,â€ in CEUR Workshop Proceedings, 2010, vol. 1176.

E. Loginova, S. Varanasi, and G. Neumann, â€œTowards End-to-End Multilingual Question Answering,â€ Inf. Syst. Front., vol. 23, no. 1, pp. 227â€“241, 2021.

R. Blloshmi, R. Tripodi, and R. Navigli, â€œXL-AMR: Enabling cross-lingual AMR parsing with transfer learning techniques,â€ in EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 2020, pp. 2487â€“2500.

F. Issa, M. Damonte, S. B. Cohen, X. Yan, and Y. Chang, â€œAbstract meaning representation for paraphrase detection,â€ in NAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 2018, vol. 1, pp. 442â€“452.

H. Asghari, O. Fatemi, S. Mohtaj, H. Faili, and P. Rosso, â€œOn the use of word embedding for cross language plagiarism detection,â€ Intell. Data Anal., vol. 23, no. 3, pp. 661â€“680, 2019.

A. Micsik, P. Pallinger, and D. SiklÃ³si, â€œScaling a Plagiarism search service on the BonFIRE testbed,â€ in Proceedings of the International Conference on Cloud Computing Technology and Science, CloudCom, 2013, vol. 2, pp. 57â€“62.

C. Chang, C.-H. Chang, and S.-Y. Hwang, â€œEmploying word moverâ€™s distance for crossâ€lingual plagiarized text detection,â€ Proc. Assoc. Inf. Sci. Technol., vol. 57, no. 1, p. e229, 2020.

D. A. R. TorrejÃ³n, J. Manuel, and M. Ramos, â€œDetailed Comparison Module In CoReMo 1 . 9 Plagiarism Detector Notebook for PAN at CLEF 2012,â€ CLEF (Online Work. Notes/Labs/Workshop), pp. 1â€“8, 2012.

A. P. Zakiy Firdaus Alfikr, â€œThe Construction Of Indonesian-English Cross Language Plagiarism Detection System using Fingerprinting Technique,â€ J. Comput. Sci. Inf., vol. 5, no. 1, pp. 16â€“23, 2012.

N. Ehsan and A. Shakery, â€œCandidate document retrieval for cross-lingual plagiarism detection using two-level proximity information,â€ Inf. Process. Manag., vol. 52, no. 6, pp. 1004â€“1017, 2016.

R. C. Pereira, V. P. Moreira, and R. Galante, â€œUFRGS @ PAN2010 : Detecting External Plagiarism,â€ in Lab Report for PAN at CLEF 2010, 2010.

L. Gang, Z. Quan, and L. Guang, â€œCross-language plagiarism detection based on WordNet,â€ in ACM International Conference Proceeding Series, 2018, vol. Part F1376, pp. 163â€“168.

K. Mustofa and Y. A. Sir, â€œEarly-Detection system for cross-language (translated) plagiarism,â€ in Information and Communication Technology-EurAsia Conference, 2013, pp. 21â€“30.

L. T. Nguyen and D. Dien, â€œEnglish-Vietnamese cross-language paraphrase identification method,â€ in ACM International Conference Proceeding Series, 2017, vol. 2017-Decem, pp. 42â€“49.

A. RÃ¼cklÃ©, N. S. Moosavi, and I. Gurevych, â€œNeural duplicate question detection without labeled training data,â€ arXiv Prepr. arXiv1911.05594, 2019.

R. Lachraf, Y. Ayachi, A. Abdelali, D. Schwab, and others, â€œArbEngVec: Arabic-English cross-lingual word embedding model,â€ in Proceedings of the Fourth Arabic Natural Language Processing Workshop, 2019, pp. 40â€“48.

A. Shojaei and F. Safi-Esfahani, â€œExternal Plagiarism Detection based on Human Behaviors in Producing Paraphrases of Sentences in English and Persian Languages,â€ J. AI Data Min., vol. 7, no. 3, pp. 451â€“466, 2019.

O. Bakhteev, A. Ogaltsov, A. Khazov, K. Safin, and R. Kuznetsova, â€œCrossLang: the system of cross-lingual plagiarism detection,â€ Work. Doc. Intell. NeurIPS 2019, no. 18, pp. 1â€“5, 2019.

Z. Guan et al., â€œCross-lingual multi-keyword rank search with semantic extension over encrypted data,â€ Inf. Sci. (Ny)., vol. 514, pp. 523â€“540, 2020.

M. Franco-Salvador, P. Rosso, and R. Navigli, â€œA knowledge-based representation for cross-language document retrieval and categorization,â€ in 14th Conference of the European Chapter of the Association for Computational Linguistics 2014, EACL 2014, 2014, pp. 414â€“423.

A. A. Putri Ratna, F. Astha Ekadiyanto, I. Ibrahim, D. Husna, and F. Rahimullah, â€œInvestigating Parallelization of Cross-language Plagiarism Detection System Using the Winnowing Algorithm in Cloud Based Implementation,â€ in 2019 IEEE 10th International Conference on Awareness Science and Technology, iCAST 2019 - Proceedings, 2019, pp. 1â€“7.

M. Pataki and A. C. Marosi, â€œSearching for Translated Plagiarism with the Help of Desktop Grids,â€ J. Grid Comput., vol. 11, no. 1, pp. 149â€“166, 2013.

J. Camacho-Collados, Y. Doval, E. MartÃnez-CÃ¡mara, L. Espinosa-Anke, F. Barbieri, and S. Schockaert, â€œLearning cross-lingualword embeddings from Twitter via distant supervision,â€ in Proceedings of the 14th International AAAI Conference on Web and Social Media, ICWSM 2020, 2020, vol. 14, pp. 72â€“82.

S. Levy and W. Y. Wang, â€œCross-lingual Transfer Learning for COVID-19 Outbreak Alignment,â€ arXiv Prepr. arXiv2006.03202, 2020.

N. Poerner and H. SchÃ¼tze, â€œMulti-view domain adapted sentence embeddings for low-resource unsupervised duplicate question detection,â€ in EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, 2020, pp. 1630â€“1641.

P. Rosso, â€œOn the risk of cross-language plagiarism for less resourced languages such as Amazigh,â€ users.dsic.upv.es, vol. 5, pp. 53â€“70, 2008.

I. Muneer, M. Sharjeel, M. Iqbal, R. M. A. Nawab, and P. Rayson, â€œCLEU - A Cross-language english-urdu corpus and benchmark for text reuse experiments,â€ J. Assoc. Inf. Sci. Technol., vol. 70, no. 7, pp. 729â€“741, 2019.

M. Potthast, B. Stein, and M. Anderka, â€œA Wikipedia-based multilingual retrieval model,â€ in European conference on information retrieval, 2008, pp. 522â€“530.

D. Gupta, K. Vani, and C. K. Singh, â€œUsing Natural Language Processing techniques and fuzzy-semantic similarity for automatic external plagiarism detection,â€ in Proceedings of the 2014 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2014, 2014, pp. 2694â€“2699.

D. V Zubarev and I. V Sochenkov, â€œCross-language text alignment for plagiarism detection based on contextual and context-free models,â€ in Kompâ€™juternaja Lingvistika i Intellektualâ€™nye Tehnologii, 2019, vol. 2019-May, no. 18, pp. 809â€“820.

N. Ehsan, A. Shakery, and F. W. Tompa, â€œCross-lingual text alignment for fine-grained plagiarism detection,â€ J. Inf. Sci., vol. 45, no. 4, pp. 443â€“459, 2019.

M. Botto-Tobar, W. Torres, A. Lozano, M. G. J. van den Brand, B. Vasilescu, and A. Serebrenik, â€œIs stack overflow in portuguese attractive for brazilian users?,â€ in Proceedings of the 13th International Conference on Global Software Engineering, 2018, pp. 21â€“29.

B. Gipp, N. Meuschke, C. Breitinger, J. Pitman, and A. NÃ¼rnberger, â€œWeb-based demonstration of semantic similarity detection using citation pattern visualization for a cross language plagiarism case,â€ in ICEIS 2014 - Proceedings of the 16th International Conference on Enterprise Information Systems, 2014, vol. 2, pp. 677â€“683.

S. Alzahrani, N. Salim, C. K. Kent, M. S. Binwahlan, and L. Suanmali, â€œThe development of cross-language plagiarism detection tool utilising fuzzy swarm-based summarisation,â€ in Proceedings of the 2010 10th International Conference on Intelligent Systems Design and Applications, ISDAâ€™10, 2010, pp. 86â€“90.

H. Ezzikouri, M. Erritali, and M. Oukessou, â€œPlagiarism Detection in Across Less Related Languages (English-Arabic): A Comparative Study,â€ in International Conference on Advanced Information Technology, Services and Systems, 2018, pp. 207â€“213.

S. Parida, E. Villatoro-Tello, S. Kumar, P. Motlicek, and Q. Zhan, â€œIdiap Submission to Swiss-German Language Detection Shared Task.,â€ in SwissText/KONVENS, 2020.

M. Roostaee, M. H. Sadreddini, and S. M. Fakhrahmad, â€œAn effective approach to candidate retrieval for cross-language plagiarism detection: A fusion of conceptual and keyword-based schemes,â€ Inf. Process. & Manag., vol. 57, no. 2, p. 102150, 2020.

A. Shojaie and F. Safi-Esfahani, â€œExternal Plagiarism Detection based on Human Behaviors in Producing Paraphrases of Sentences in English and Persian Languages,â€ J. AI Data Min., vol. 7, no. 3, pp. 451â€“466, 2019.

F. Ture, T. Elsayed, and J. Lin, â€œNo free lunch: brute force vs. locality-sensitive hashing for cross-lingual pairwise similarity,â€ in Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, 2011, pp. 943â€“952.

R. Pereira, V. Moreira, and R. Galante, â€œA new approach for cross-language plagiarism analysis,â€ Conf. Cross-Language â€¦, 2010.

B. Pouliquen, R. Steinberger, and C. Ignat, â€œAutomatic identification of document translations in large multilingual document collections,â€ arXiv Prepr. cs/0609060, 2006.

Z. Ceska, M. Toman, and K. Jezek, â€œMultilingual plagiarism detection,â€ Int. Conf. Artif., 2008.

V. Thompson, â€œDetecting cross-lingual plagiarism using simulated word embeddings,â€ arXiv Prepr. arXiv1712.10190, 2017.

J. Ray Chowdhury, C. Caragea, and D. Caragea, â€œCross-lingual disaster-related multi-label tweet classification with manifold mixup,â€ in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, 2020.

S. Alzahrani and H. Aljuaid, â€œIdentifying cross-lingual plagiarism using rich semantic features and deep neural networks: A study on Arabic-English plagiarism cases,â€ J. King Saud Univ. Inf. Sci., 2020.

C. Lo and M. Simard, â€œFully unsupervised crosslingual semantic textual similarity metric based on BERT for identifying parallel data,â€ in Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), 2019, pp. 206â€“215.

A. K. Khakimova, M. M. Charnine, A. A. Klokov, and E. G. Sokolov, â€œApproaches to assessing the semantic similarity of texts in a multilingual space,â€ 2020.

N. Alotaibi and M. Joy, â€œUsing Sentence Embedding for Cross-Language Plagiarism Detection,â€ in International Conference on Innovative Techniques and Applications of Artificial Intelligence, 2020, pp. 373â€“379.

M. Ustaszewski, â€œExploring Adequacy Errors in Neural Machine Translation with the Help of Cross-Language Aligned Word Embeddings,â€ in Proceedings of the Human-Informed Translation and Interpreting Technology Workshop (HiT-IT 2019), 2019, pp. 122â€“128.

J. Ferrero, F. Agnes, L. Besacier, and D. Schwab, â€œUsingword embedding for cross-language plagiarism detection,â€ arXiv Prepr. arXiv1702.03082, 2017.

A. A. P. Ratna et al., â€œCross-language plagiarism detection system using latent semantic analysis and learning vector quantization,â€ Algorithms, vol. 10, no. 2, p. 69, 2017.

A. A. P. Ratna et al., â€œCross-Language Automatic Plagiarism Detector Using Latent Semantic Analysis and Self-Organizing Map,â€ in Proceedings of the 2018 International Conference on Artificial Intelligence and Virtual Reality, 2018, pp. 83â€“87.

S. Srivastava and S. Govilkar, â€œA Survey on Paraphrase Detection Techniques for Indian Regional Languages,â€ Int. J. Comput. Appl., vol. 163, no. 9, pp. 975â€“8887, 2017.

M. S. Arefin, Y. Morimoto, and M. A. Sharif, â€œBAENPD: A Bilingual Plagiarism Detector.,â€ J. Comput., vol. 8, no. 5, pp. 1145â€“1156, 2013.

M. Muhr and R. Kern, â€œExternal and intrinsic plagiarism detection using a cross-lingual retrieval and segmentation system,â€ in 2nd International Competition on Plagiarism Detection, 2010.

Y. Qin, â€œCross-Lingual Similarity Discrimination with Translation Characteristics,â€ Int. J. Artif. Intell. & Appl., vol. 4, no. 5, p. 39, 2013.

A. BarrÃ³n-Cedeno, P. Rosso, D. Pinto, and A. Juan, â€œOn Cross-lingual Plagiarism Analysis using a Statistical Model.,â€ PAN, vol. 212, pp. 1â€“10, 2008.

M. Franco-Salvador, P. Gupta, and P. Rosso, â€œKnowledge graphs as context models: Improving the detection of cross-language plagiarism with paraphrasing,â€ in PROMISE Winter School, 2013, pp. 227â€“236.

M. Franco-Salvador, P. Gupta, â€¦ P. R.-K.-B., and undefined 2016, â€œCross-language plagiarism detection over continuous-space-and knowledge graph-based representations of language,â€ Elsevier.

N. Radoev, A. Zouq, and M. Gagnon, â€œMultilingual Question Answering using Lexico-Syntactic Patterns,â€ Resource, vol. 65, pp. 86â€“88.

M. Potthast, B. Stein, and M. Anderka, â€œA Wikipedia-based multilingual retrieval model,â€ Eur. Conf. Inf., 2008.

H. Ezzikouri, M. Erritali, and M. Oukessou, â€œFuzzy-semantic similarity for automatic multilingual plagiarism detection,â€ Int. J. Adv. Comput. Sci. Appl, vol. 8, no. 9, pp. 86â€“90, 2017.

H. Ezzikouri, M. Oukessou, M. Youness, and M. Erritali, â€œFuzzy cross language plagiarism detection (Arabic-English) using WordNet in a big data environment,â€ in Proceedings of the 2018 2nd International Conference on Cloud and Big Data Computing, 2018, pp. 22â€“27.

D. Dinh and N. Le Thanh, â€œEnglish--Vietnamese cross-language paraphrase identification using hybrid feature classes,â€ J. Heuristics, pp. 1â€“17, 2019.

A. BarrÃ³n-Cedeno, P. Rosso, S. Devi, and P. Clough, â€œPan@ fire: Overview of the cross-language! ndian text re-use detection competition,â€ Inf. Access â€¦, 2013.

J. Kasprzak and M. Brandejs, â€œImproving the Reliability of the Plagiarism Detection System,â€ in Proceedings of the International Conference of the Cross-Language Evaluation Forum (CLEF 2010), Uncovering Plagiarism, Authorship, and Social Software Misuse Worksop (PANâ€™10), 2010, pp. 359â€“366.

R. Kothwal and V. Varma, â€œCross lingual text reuse detection based on keyphrase extraction and similarity measures,â€ Multiling. Inf. Access South Asian, 2013.

D. A. R. TorrejÃ³n and J. M. M. Ramos, â€œText alignment module in CoReMo 2.1 plagiarism detector,â€ Forner et al.[34], 2013.

Z. Alaa, S. Tiun, and M. Abdulameer, â€œCross-Language Plagiarism of Arabic-English Documents Using Linear Logistic Regression.,â€ J. Theor. & Appl. Inf. Technol., vol. 83, no. 1, 2016.

A. Aljohani and M. Mohd, â€œArabic-English cross-language plagiarism detection using winnowing algorithm,â€ Inf. Technol. J., vol. 13, no. 14, p. 2349, 2014.

M. Al-suhaiqi11, M. A. S. Hazaa22, and M. Albared33, â€œArabic English Cross-Lingual Plagiarism Detection Based on Keyphrases Extraction, 2 Monolingual and Machine Learning Approach 3,â€ 2018.

M. Sharjeel, Mono-and cross-lingual paraphrased text reuse and extrinsic plagiarism detection. Lancaster University (United Kingdom), 2020.

A. RÃ¼cklÃ©, K. Swarnkar, and I. Gurevych, â€œImproved cross-lingual question retrieval for community question answering,â€ in The world wide web conference, 2019, pp. 3179â€“3186.

R. Jungnickel, A. Pomp, A. Kirmse, X. Li, V. Samsonov, and T. Meisen, â€œEvaluation and Comparison of Cross-lingual Text Processing Pipelines,â€ in 2019 IEEE Symposium Series on Computational Intelligence (SSCI), 2019, pp. 417â€“425.

L. Gang, Z. Quan, and Y. Qianru, â€œCross-language plagiarism detection technology based on fingerprint fusion.â€

DOI: http://dx.doi.org/10.18517/ijaseit.12.2.14711

Refbacks

There are currently no refbacks.

Published by INSIGHT - Indonesian Society for Knowledge and Human Development

International Journal on Advanced Science, Engineering and Information Technology

Cross-Language Plagiarism Detection: Methods, Tools, and Challenges: A Systematic Review

Abstract

Keywords

Full Text:

References

Refbacks