Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Distributed computing in big data analytics : concepts, technologies and applications
Mazumder S., Bhadoria R., Deka G., Springer International Publishing, New York, NY, 2017. 162 pp. Type: Book (978-3-319598-33-8)
Date Reviewed: Jun 6 2018

Day by day the prominence of big data analytics is increasing. Not only does it impact the way we work, but it is gradually impacting the way we live our lives. Yet it is also an area filled with a myriad of opportunities for ethical and moral missteps, as attested to by the popular press in recent months. As the various examples of big data analytics are brought to the public’s attention, having an easy-to-understand, easy-to-read text on the subject matter would be very timely.

Unfortunately, this is not that book. For the most part poorly written and for the whole part badly edited, it leaves the reader disappointed. It could have been so much better. The first chapter defines NoSQL as “Not Only SQL,” contrasts the genre against traditional relational database management systems, and then provides a few variant implementation styles that fall within the broad definition. As expected from an introductory chapter, it also mentions Brewer’s consistency, availability, and partition tolerance (CAP) theorem; the Vs (volume, velocity, and variety) that form the typical definition of big data; and a paragraph each on the open-source products Hadoop, YARN, MapReduce, and Spark.

Chapter 2 very briefly describes a number of concepts that are encountered when exploring distributed computing and big data. Starting with differences between multithreading and multiprocessing, vector processing, and distributed processing architectures, it discusses scalability, synchronous and asynchronous communication, fault tolerance, and load balancing, among a few others. It also expands on the CAP theorem mentioned in the prior chapter. Synchronous and asynchronous communication are repeated again in the following chapter, which incorporates the concepts to describe how the Hadoop MapReduce product works. The remainder of the chapter discusses the challenges of building geographically distributed computation clusters and the use of cloud-based service providers, both as a source of compute resources and as providers of data redundancy and resilience.

The chapter on distributed computing technologies once more describes the CAP theorem and the three main types of NoSQL databases, namely key-value, document-based, and column-oriented data stores. This is followed by a brief overview of the Hadoop Distributed File System (HDFS) and how the MapReduce product utilizes it by collocating processing with the data, before listing the key improvements that the Spark product provides. The author then briefly describes machine learning platforms, search systems, messaging, and caching before finishing off with visualization tools. In the chapter on security, the authors detail how the distributed nature of the typical big data analytics platform creates additional challenges to securing communication, ensuring privacy and data integrity, and performing intrusion detection.

Chapter 6, on applications in climate science, describes the computational complexity of climate simulation and how an ever-increasing resolution in modeling is creating a demand for distributed analytics to improve on existing predictive capabilities. The chapter on cognitive analytics talks about machine learning, briefly itemizing a number of requirements, before proposing three use cases as suitable domains in which machine learning could be applied. These are healthcare, the Internet of Things (IoT), and customer relationship management.

The chapter on social media analytics is the first of only two chapters that not only has substantive content but is also well written. Riemer takes the reader through the state of the art in social media analytics, right down to the open-source and commercial products used in each implementation. First, Apache Spark’s GraphX is applied to identifying the prominence of various individuals and commercial news agencies during the Wimbledon tennis championships. The author then details social polling, looking at sentiment analysis through supervised deep learning of neural networks, topic monitoring, and user segmentation. Finally, the chapter ends by discussing how mining social media information can be used for product demand planning.

The final chapter is on building language-agnostic semantic knowledge bases, or to put it differently, implementing search capabilities that use contextual information to provide more relevant search results. Starting with an overview of search techniques, the authors describe inverted indices, the sharding of data across multiple computational nodes to improve performance, data replication for resilience, and the use of a denormalized data model. The authors also remember to mention how the problems of distributed aggregation caused by sharding the inverted index are overcome. Next they discuss how to build semantic relationships between search terms using the probabilistic graphical model for massive hierarchical data. Having built relationships between terms, the next stage is to resolve ambiguity caused by terms that have different meanings in different contexts. These concepts are then applied to create a knowledge graph that can be used for both the discovery and the scoring of semantic relationships.

Each chapter is by a different author or set of authors, so quality does vary significantly between chapters. If the first seven chapters were of the same quality as the final two, I would wholeheartedly recommend the book. However, the editorial quality is poor throughout.

Reviewer:  Bernard Kuc Review #: CR146067 (1808-0416)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Content Analysis And Indexing (H.3.1 )
 
 
Data Mining (H.2.8 ... )
 
 
Distributed Databases (C.2.4 ... )
 
 
Distributed Systems (C.2.4 )
 
Would you recommend this review?
yes
no
Other reviews under "Content Analysis And Indexing": Date
Personal bibliographic indexes and their computerisation
Heeks R., Taylor Graham Publishing, London, UK, 1986. Type: Book (9789780947568115)
Sep 1 1987
Development of a term association interface for browsing bibliographic data bases based on end users’ word associations
Pejtersen A., Olsen S., Zunde P., Taylor Graham Publishing, London, UK, 1987. Type: Book (9780947568306)
Nov 1 1989
Transforming text into hypertext for a compact disc encyclopedia
Glushko R. ACM SIGCHI Bulletin 20(SI): 293-298, 1989. Type: Article
May 1 1990
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy