Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Data-intensive workflow management : for clouds and data-intensive and scalable computing environments
de Oliveira D., Liu J., Pacitti E., Morgan&Claypool Publishers, San Rafael, CA, 2019. 180 pp. Type: Book (978-1-681735-57-3)
Date Reviewed: Sep 16 2020

Data-intensive workflows turn up in scientific domains where the most current information technologies find application areas. The “differentia specifica” between business and scientific workflows is the importance of provenance data rather than the volume of processed data; however, the huge volume of big data plays a significant role, and the efficient and effective handling of a huge volume of data is the central theme of the book.

Chapter 1 gives a survey about the structure of the book and motivating examples to comprehend the authors’ purpose.

Chapter 2 overviews the basic knowledge necessary to make the whole book understandable. It begins with a formal description of workflows and the relating standards, and discusses the properties of scientific workflow management systems (SWfMSs). The chapter reviews and compares existing solutions, and outlines the subject matter discussed in later chapters, that is, the scheduling problems and cost calculations of SWfMSs.

Chapter 3 discusses SWfMS performance issues and approaches in the case of a single-site cloud. The authors scrutinize scheduling algorithms and their cost models in a general sense, including execution time and financial costs related to cloud computing service providers.

Chapter 4 repeats the previous chapter’s investigation in a more complex environment, namely the execution of SWfMSs in a multi-site cloud. The performance problem can be formulated the following way: the intra-site, the inter-site, and the inter-communication between the sites should be taken into consideration by any scheduling algorithm and cost calculation model. In a multi-site cloud computing environment, the essential issue is the handling and distributing of data and metadata and not only the processing load among the sites and within a site among the servers. The activation scheduling of SWfMSs is a crucial performance issue to lessen the execution time. The scheduling problem causes difficulty if it has multi-objectives to consider, for example, total execution time and financial costs, thereby the authors look at how to reduce the execution time of the whole workflow. The efficient scheduling of activities within a workflow requires knowledge of the distribution of data; this information can be gained from the metadata and should be available for the multi-site activity scheduler. The chapter analyzes scheduling solutions for fine-grained and coarse-grained workflow executions. Fine-grained SWfMSs execute activities at different sites to work with the distributed data. In course-grained SWfMSs, the input data is not distributed among different sites; for that reason, it looks similar to single-site scheduling problems. The chapter contains a comparative study of the performance of the various scheduling algorithms.

Chapter 5 deals with data-intensive scalable computing (DISC) frameworks. The chapter discusses in detail the Apache Spark solution and the use of provenance data in this environment. One of the contributions of the chapter is the description of a provenance data server called SAMbA (to make a difference this acronym is used). SpaCE supports the fine-tuning of the more than hundreds of configuration parameters of the Spark system. The TARDIS system can be joined to the Spark system to create optimal scheduling for the activities of the workflows defined within the Spark system. The benefit of the application of DISC is that it yields in-memory data management that helps big data management quicken the data processing.

The book could be interesting for researchers and professionals working in various domains. Researchers and professionals interested in performance within cloud computing environments will find the cost and execution models useful. Researchers who investigate the different problem areas of workflows (modeling, managing, and model checking) can use the book as a good empirical starting point that contains good practices and workflow examples.

Reviewer:  Bálint Molnár Review #: CR147062 (2101-0002)
Bookmark and Share
  Reviewer Selected
Editor Recommended
Featured Reviewer
 
 
Frameworks (D.3.3 ... )
 
 
Cloud Computing (C.2.4 ... )
 
 
Environments (I.6.7 ... )
 
 
Languages (D.2.1 ... )
 
 
Management (D.2.9 )
 
Would you recommend this review?
yes
no
Other reviews under "Frameworks": Date
Programmable access control
Hale J., Papa M., Shenoi S. Journal of Computer Security 11(3): 331-351, 2003. Type: Article
Nov 14 2003
On the application of UML to designing on-line business model
Park Y., Kim S. In UML and the unified process. Hershey, PA: Idea Group Publishing, 2003. Type: Book Chapter
Feb 26 2004
A framework for evaluating privacy preserving data mining algorithms
Bertino E., Fovino I., Provenza L. Data Mining and Knowledge Discovery 11(2): 121-154, 2005. Type: Article
Apr 18 2007
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy