Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Advanced structured prediction
Nowozin S., Gehler P., Jancsary J., Lampert C., The MIT Press, Cambridge, MA, 2014. 456 pp. Type: Book (978-0-262028-37-0)
Date Reviewed: Apr 17 2015

Machine learning has received a lot of attention in recent years and has become common in an ever-growing list of application domains. The simplest forms of learning are concerned with algorithms that predict a single output, such as a class label, for a given input (for instance, predicting if an email message is spam or not). Structured prediction, which is the theme of this book, is concerned with much more difficult tasks, where the goal is to predict multiple interrelated values (for example, predicting the most likely parse tree for a natural language sentence). Structured prediction is a huge computational challenge, not only due to the complexity of the learning algorithms, but also to the necessity of dealing with a lot of training data.

This book presents several recent developments that tackle the complexity of structured learning from different angles, serving as an excellent survey of the state of the art, as well as a good tutorial on various algorithmic techniques. Although targeted to experts and researchers in the field, the book is quite accessible to a larger audience familiar with the basics of learning. The editors and the authors of individual chapters did an excellent job in keeping the material accessible and in providing a very thorough discussion of previous and related work. A long list of application examples spread throughout goes a long way in making the material concrete.

Apart from the introduction (chapter 1) and two chapters (10 and 11) devoted to a theoretical understanding of structured prediction, the book is organized into three major, fairly independent, sections:

Chapters 2 to 6 cover linear programming (LP) relaxations for structured prediction, which is an approach that has become very popular recently. Chapter 2 provides both a tutorial to this approach, laying out the fundamental ideas involved, and hardness results, showing the difficulty of the problem. The other chapters in this group present practical approximation algorithms and applications, covering a lot of ground among them. For example, chapter 3, which can be used as a tutorial on its own, introduces a technique with very strong convergence guarantees; chapter 4 builds on a message-passing algorithm that is among the fastest for the problem and develops various properties of these algorithms; and chapter 5 focuses on the dual LP, which is structurally simpler than the primal, giving precise convergence rates, as well as an experimental validation of the idea.

Chapters 7 to 9 are dedicated to modeling issues, including how to define a probabilistic model and use nonlinear model components to enhance their capacity. One particularly interesting approach is that of chapter 7, which covers the recently introduced “perturb-and-MAP” generative probabilistic random-field model, in which random noise is injected into the system before a maximum a posteriori (MAP) prediction step. If done right, perturb-and-MAP has good theoretical guarantees. The chapter gives an overview of the approach and contrasts it to the standard MAP approach, Monte Carlo approaches, and variational Bayes. Chapter 8 discusses an interesting approach, called herding, aimed at departing from the standard train-and-test divide, showing very promising results.

Another good chunk of the book (chapters 12 to 16) is devoted to other practical aspects of learning. The work discussed in chapter 12 aims high: to enable learning from “cheap data,” or data that is partially labeled or labeled by nonexpert humans. This is another fairly self-contained chapter that can be used as a good tutorial/introduction for a more general audience. Chapter 13 focuses on feature selection, which is often a bottleneck for structured prediction. Segment-based support vector machines (SVMs) are featured in chapter 14, which discusses a framework for event detection. Two noteworthy properties of the segment-based SVMs in that chapter are linear time inference and “early” event detection, which has immediate applications in video surveillance.

As far as applications are concerned, the book covers such a wide range that to list them all would far exceed the space of this review. Instead, here is just a small sample. Chapter 3 covers natural language processing, with the very promising turbo parsing algorithm, as well as text summarization. Chapter 7 focuses on computer vision, which is also the subject of the herding approach in chapter 8 that touches on the Go game. Chapter 13 covers handwriting recognition and human posing estimation in video. Numerous computer vision tasks are discussed in chapter 14.

Reviewer:  Denilson Barbosa Review #: CR143361 (1507-0561)
Bookmark and Share
  Reviewer Selected
Editor Recommended
Featured Reviewer
 
 
Learning (I.2.6 )
 
 
Model Development (I.6.5 )
 
 
Model Validation And Analysis (I.6.4 )
 
Would you recommend this review?
yes
no
Other reviews under "Learning": Date
Learning in parallel networks: simulating learning in a probabilistic system
Hinton G. (ed) BYTE 10(4): 265-273, 1985. Type: Article
Nov 1 1985
Macro-operators: a weak method for learning
Korf R. Artificial Intelligence 26(1): 35-77, 1985. Type: Article
Feb 1 1986
Inferring (mal) rules from pupils’ protocols
Sleeman D.  Progress in artificial intelligence (, Orsay, France,391985. Type: Proceedings
Dec 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy