Large enterprises with massive amounts of data require reliable and efficient algorithms for exploring and forecasting trends in available supplies, user purchases, and advertisements. The scalability of the MapReduce approach is advantageous for the parallel processing of such data. The statistical efficiency of an online streaming approach is also valuable for processing big data. But how should both computational strategies be combined to attain fast computing and the necessary statistics for huge data processing? Hector et al. present their parallel-and-stream accelerator (PASA) approach that integrates the algorithmic powers of parallel and sequential computing to cope with the computational chokepoint of supervised learning and reasoning with big data.
Initially in the hybrid PASA methodology, a huge dataset is indiscriminately split into parallel data blocks. Each data block is then processed in sequence using an online valuation and inference technique, to help retain low computing costs and staunch statistical efficacy. PASA contains algorithms for streaming and combining parameter estimates from data blocks, to provide fast learning and inferences in the efficient generalized linear model (GLM) used for regression analysis of big datasets.
The authors clearly present appropriate equations for the method of moments and the maximum likelihood functions for parameter estimations in PASA. They investigate the PASA methodology for its effectiveness in forecasting gains related to remodeling buildings. The experimental and simulation results are reliable enough to recommend PASA for computationally fast supervised learning.
Are there valuable application examples of PASA? Students and practitioners in data science and artificial intelligence (AI) will find the efficient linear and logistic regression examples in this paper valuable. Students in theoretical computer science courses will find this paper useful for defining algorithmic and computational efficiency and complexity of algorithms.