When processing different big data workflows, many new and (so far) unknown patterns and performance requirements are visible. We are forced to search new processing models and management techniques that can support the design of different aspects of big data workflows: infrastructure (hardware), platforms (software), and efficient methods for scheduling and deployment workflows. These big, serious, scientific, technological, organizational, and technical problems lead to at least three important challenges (research questions) that are expanded and developed in this paper:
- (1) A description of “the different models and fundamental requirements of big data workflow applications”;
- (2) The new challenges related to the cloud and edge data centers, and this type of workflow application; and
- (3) The known “approaches, techniques, tools, and technologies” for developing “a new big data orchestration system.”
In successive sections, the authors present different research challenges, an existing knowledge and approaches survey, and possible future development directions for orchestrating big data analysis workflows. They give a detailed overview of many different issues related to workflow orchestration, big data workflow requirements in the cloud, a new big data workflow application classification and research taxonomy, currently used techniques and approaches, different systems and examples with data workflow support, and some still open challenges in the field.
The paper is supported with many valuable literature references, which show a general outline and the state of the art in big data computer systems organization and computing methodologies. This survey is a comprehensive analysis of many important and sometimes secondary issues, which suggests we may be facing an important paradigm shift in computer systems processing.