Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
BCGen: a comment generation method for bytecode
Huang Y., Huang J., Chen X., He K., Zhou X. Automated Software Engineering30 (1):1-1,2022.Type:Article
Date Reviewed: Dec 6 2023

The authors have undertaken a project to make bytecode more readable by interspersing it with machine-generated comments. There are two salient questions regarding this project: Did they (at least mostly) succeed? And to the extent that they did succeed, will these comments actually prove useful?

An important factor to consider here is that often a bad comment may be more harmful than nine good comments are helpful. In the absence of a good comment, a programmer can at least read the code and see what it does. But a bad comment may actively mislead the programmer, and, thinking they know what is going on, they misunderstand the code. Thus, even if automatic comment generation is, say, 90 percent accurate, it still may do more harm than good.

The authors spend some time explaining how they tested their project. They searched Maven (a build tool for Java) repositories; after rejecting certain repositories as inappropriate due to a lack of either bytecode or source code with comments, the authors also worked to eliminate most templates since they are not truly independent instances.

But the paper is less than clear on what they did to detect templates:

Assuming that <> represents any character, then the sentences ”get the type of error” and “get the type of event” can be represented by the template “get the type of <>.”

I must admit that this sentence is entirely opaque to me.

In any case, 55130 bytecode-comment pairs were collected for machine learning. The authors extracted information from the tokens contained in the bytecode. They used a clever method of reducing the number of tokens they considered by taking advantage of Java’s typical camel case variable naming.

The bytecode could be treated as plaintext, but this would lose much structural information. To deal with this fact, the authors introduce a control-flow-graph representation of the bytecode.

The authors take some time describing their experimental setup, in which they compare their project against several other natural language processing models. The evaluation criterion is how similar the comments each method generates are to the original source code comments. They find their model performs significantly better than the state-of-the-art baselines. And when the comments generated were rated by human programmers, the BCGen model also outperformed its rivals. Nevertheless, neither the scores for both similarity of the generated comments to the original comments nor the scores assigned by humans were particularly high, meaning many of the comments were not good.

So this brings us back to the second question asked at the beginning of the review: Is the activity of automated comment generation actually helpful to working engineers? Unfortunately, the authors do not address this point. Furthermore, for the activities in which bytecode analysis is used, they do not show that interspersing bytecode with comments is even potentially helpful. For instance, in detecting malware, one is dealing with code that is intentionally deceptive. Comments automatically generated from such code would seem likely to be derivatively deceptive, and thus of little help in the task at hand.

In summary, this is a very interesting project that has improved the state of the art in automated comment generation. But whether it will have practical benefits remains to be seen.

Reviewer:  Eugene Callahan Review #: CR147673
Bookmark and Share
  Editor Recommended
Featured Reviewer
 
 
General (D.2.0 )
 
 
Software Process (K.6.3 ... )
 
 
Software Architectures (D.2.11 )
 
Would you recommend this review?
yes
no
Other reviews under "General": Date
Development of distributed software
Shatz S. (ed), Macmillan Publishing Co., Inc., Indianapolis, IN, 1993. Type: Book (9780024096111)
Aug 1 1994
Fundamentals of software engineering
Ghezzi C., Jazayeri M., Mandrioli D., Prentice-Hall, Inc., Upper Saddle River, NJ, 1991. Type: Book (013820432)
Jul 1 1992
Software engineering
Sodhi J., TAB Books, Blue Ridge Summit, PA, 1991. Type: Book (9780830633425)
Feb 1 1992
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy