Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Next-generation big data : a practical guide to Apache Kudu, Impala, and Spark
Quinto B., Apress, New York, NY, 2018. 557 pp. Type: Book (978-1-484231-46-3)
Date Reviewed: Apr 10 2019

Since the introduction of Apache Hadoop nearly a decade ago, new tools and methods for analyzing large datasets have evolved rapidly, dramatically improving performance and providing easier and more powerful query languages.

This trend continues, as outlined in Quinto’s guide to three recent advances in data analytics: Kudu, Impala, and enhancements to Spark, all of which are open-source Apache Software Foundation projects. These technologies exploit concepts from relational databases and include SQL or SQL-like interfaces, which encourages faster learning of their query syntax.

The author first introduces Kudu, a columnar database that supports both Hadoop and Spark infrastructures and uses SQL for transaction queries. He presents numerous well-explained examples and use cases for creating and managing structured Kudu data.

Impala is a high-performance SQL for Hadoop. Quinto carefully describes the Impala architecture and again gives many examples of standard SQL queries taking advantage of that architecture’s parallelism.

Several chapters are devoted to describing and using Apache Spark, a memory-based analytical framework that includes SQL support as well as application programming interfaces (APIs) for other familiar tools like Java, R, and Python. Spark is rapidly becoming the most frequently used big data processing framework due to its orders-of-magnitude performance advantages over traditional Hadoop implementations.

The book includes additional chapters on data governance and management, topics often overlooked in other big data references, and adds helpful discussions on data management in cloud computing systems like AWS, Azure, and Google.

In the final chapter, Quinto briefly summarizes six big data case studies, each of which makes use of the technologies presented earlier. For example, British Telecom built a large Hadoop-based data analytics platform that included Spark and Impala, significantly increasing query performance and throughput.

Each chapter of Quinto’s book concludes with extensive references leading to greater detail on the topics discussed. The book assumes familiarity with basic data analytics methods and tools, especially Hadoop, and presents high-level introductions to emerging technologies. It thus serves as a helpful guide for data analytics professionals seeking to keep pace with this dynamic and quickly advancing field.

Reviewer:  Harry J. Foxwell Review #: CR146523 (1906-0215)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
General (H.2.0 )
 
Would you recommend this review?
yes
no
Other reviews under "General": Date
Design of the Mneme persistent object store
Moss J. ACM Transactions on Information Systems 8(2): 103-139, 2001. Type: Article
Jul 1 1991
Database management systems
Gorman M., QED Information Sciences, Inc., Wellesley, MA, 1991. Type: Book (9780894353239)
Dec 1 1991
Database management (3rd ed.)
McFadden F., Hoffer J., Benjamin-Cummings Publ. Co., Inc., Redwood City, CA, 1991. Type: Book (9780805360400)
Jun 1 1992
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy