Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
A comparison of code similarity analysers
Ragkhitwetsagul C., Krinke J., Clark D. Empirical Software Engineering23 (4):2464-2519,2018.Type:Article
Date Reviewed: Jan 11 2019

Analyzing code similarity in source code is essential to detecting clones, duplicate code, plagiarism, and software copyright violations. Over the years, researchers have proposed various approaches for detecting code similarity, for example, token based, graph based, and metrics based. There have been some attempts to compare the vast number of code similarity analyzers available; however, tests were carried out on different datasets and hence were not very effective.

The authors of this paper propose a framework to compare 30 code similarity analyzers on similar datasets based on Java source code, and perform five major experimental scenarios of code similarity. Three of the scenarios include checking the performance of the various analyzers on the application of (1) pervasive modifications (global transformations throughout the file) [1], (2) boilerplate code modifications (local transformations within function or block) [2], and (3) combined pervasive and boilerplate code modifications. As a fourth scenario, the authors check performance by creating normalized representations of code using compilation or decompilation before applying the modifications. An example of the normalizing effect of the compilation is that all control structures (while, for) of source code are converted to the same bytecode structures (if, goto). For the fifth scenario, the authors compare performances using the weighted mean of precision and recall of each of the analyzers.

The authors propose a five-step framework to compare the code similarity analyzers. As a first step, Java source code is collected to design the datasets. In the second step, pervasive and boilerplate modifications are applied using source-level and byte-level obfuscation. As part of the third step, they normalize the actual and the modified source code using pretty printing [3] and decompilation. The authors run the various similarity analyzers over the set of normalized codes in the fourth step. Finally, they analyze the true positives, true negatives, false positives, and false negatives for the analyzers and each of the datasets, and compute a similarity score.

The authors set up an experimental framework based on 259 pieces of Java source code, apply 100 modifications, and report results for every experimental scenario. Similarity detectors CCFinderX [4] and jplag-text [5] yield the best performances for pervasive code and boilerplate modifications, respectively. Based on the experimental results, the authors validate the fact that normalization through compilation/decompilation can help improve the similarity detection process.

Reviewer:  Partha Pratim Das Review #: CR146376 (1904-0122)
1) Chuda, D.; Navrat, P.; Kovacova, B.; Humay, P. The issue of (software) plagiarism: a student view. IEEE Transactions on Education 55, 1(2012), 22–28.
2) Kapser, C.; Godfrey, M. W. “Cloning considered harmful” considered harmful. In 13th Working Conference on Reverse Engineering IEEE, 2006, 19–28.
3) Roy, C. K.; Cordy, J. R. NICAD: accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization. In 16th IEEE International Conference on Program Comprehension IEEE, 2008, 172–181 .
4) Kamiya, T.; Kusumoto, S.; Inoue, K. CCFInder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering 28, 7(2002), 654–670.
5) Prechelt, L.; Malpohl, G.; Philippsen, M. Finding plagiarisms among a set of programs with JPlag. Journal of Universal Computer Science 8, 11(2002), 1016–1038.
Bookmark and Share
 
General (D.2.0 )
 
Would you recommend this review?
yes
no
Other reviews under "General": Date
Development of distributed software
Shatz S. (ed), Macmillan Publishing Co., Inc., Indianapolis, IN, 1993. Type: Book (9780024096111)
Aug 1 1994
Fundamentals of software engineering
Ghezzi C., Jazayeri M., Mandrioli D., Prentice-Hall, Inc., Upper Saddle River, NJ, 1991. Type: Book (013820432)
Jul 1 1992
Software engineering
Sodhi J., TAB Books, Blue Ridge Summit, PA, 1991. Type: Book (9780830633425)
Feb 1 1992
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy