Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
A survey of machine learning for big code and naturalness
Allamanis M., Barr E., Devanbu P., Sutton C. ACM Computing Surveys51 (4):1-37,2018.Type:Article
Date Reviewed: Nov 4 2021

There is a rising demand for effective software tools that can help developers build reliable and maintainable software systems. There has been abundant research to help developers track bugs and verify program properties and refactor code. Recently, widely used open-source projects have been made available to the public, with not only the source code but also additional important metadata like commit logs, bug fix summaries, authorship details, and process documents. This whole collection (popularly referred to as “big code”) has spearheaded a new research direction to aid software development and maintenance, based on a data-driven approach to analyze programs and uncover common software characteristics.

The authors study the available literature on probabilistic machine learning and natural language processing (NLP) models for the code and associated metadata (big code), mostly in three areas:

(1) Code generating models focus on modeling how a code is written, to subsequently learn a distribution and generate code to be used in various applications like code migration, pseudocode generation, code synthesis, and code completion. For this, researchers have developed language models, machine translation models, and multi-modal models using the structure of a programming language along with its correlation to metadata, for example, comments, commits, and design documents.
(2) Representational models learn intermediate characterizations of code constructs and their relation and properties, mostly based on a distributed representation of the same in a vector space, coupled with structured predictions using sequence models. This representation helps in program analysis, feature location, code search, and data and control traceability.
(3) Pattern mining models are used to mine resolvable patterns from source code and mostly help with code summarization, documentation generation, and bug fixing.

The authors review around 200 papers that aim to develop probabilistic models of code and use it effectively in constructing software. The major applications of these models are to enable code auto completion and migration, infer coding conventions, mine code defects, and facilitate code translation and copying.

Reviewer:  Partha Pratim Das Review #: CR147381
Bookmark and Share
Would you recommend this review?
yes
no
Other reviews under "Learning": Date
Machine learning: applications in expert systems and information retrieval
Forsyth R. (ed), Rada R., Halsted Press, New York, NY, 1986. Type: Book (9789780470203095)
Nov 1 1987
Wrapper induction: efficiency and expressiveness
Kushmerick N. Artificial Intelligence 118(1-2): 15-68, 2000. Type: Article
Aug 1 2001
A hybrid language model based on a combination of N-grams and stochastic context-free grammars
Linares D., Benedí J., Sánchez J. ACM Transactions on Asian Language Information Processing 3(2): 113-127, 2004. Type: Article
Jan 14 2005
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 2004 Reviews.com™
Terms of Use
| Privacy Policy