The sheer complexity of storage and processing methods deployed by big data applications has resulted in a prolonged wait for a deep learning framework that effectively works with big data. This paper announces the end of this wait, explaining the project, its execution model, and also real-world case studies.
Ease of working is ensured with the implementation of the BigDL framework as a library, which users can invoke as standard Spark programs running over clusters. The open-source project, implemented and made available through a collaborative portal (https://github.com/intel-analytics/BigDL), claims rich deep learning support, better performance, and also enhanced scale-out features.
A key innovation in this new BigDL execution model is data parallel training: gradients are computed using a “̺model forward-backward’ job,” and are then used to update “the parameters of the neural network model [using] a ‘parameter synchronization’ job.” The paper illustrates a detailed working of this technique. By adopting a “coarse-grained functional compute model,” which is state of the art in big data systems, BigDL claims to be “a viable design alternative for distributed model training.”
An evaluation test carried out for verifying computing performance on the BigDL and Caffe frameworks reveals that BigDL is 3.83 times faster than Caffe, even though both systems had slightly different hardware platforms. Another evaluation test carried out for scalability shows a speedup of 5.3 times when the nodes were increased to 96 from 16 (six times). The solution was observed “to scale reasonably well up to 256 nodes.”
The paper concludes by highlighting some of the case studies where BigDL was deployed. These include (i) an end-to-end object detection and feature detection model, (ii) a precipitation nowcasting (weather prediction) application that uses a convolutional long short-term memory (LSTM) network, and (iii) a real-time streaming speech classification.
The paper and the project have certainly elevated the reach of deep learning frameworks to newer domains, hitherto unavailable for studies and exploration. This is sure to attract more contributors, researchers, and application developers.