Deep learning architectures, tools, and algorithms are, in general, not adapted to the storage and computation resources of a mobile device. Thus, one needs better hardware for mobile devices and smaller footprints for learning and inference algorithms. This survey paper giving an overview of various technologies and applications of deep learning (DL) for mobile multimedia covers all the aspects of this problem.
The paper starts with a history of deep neural networks (DNNs), how they came into being in the 1960s to 1980s with shallow artificial neural networks (ANNs), back propagation, and the introduction of hidden layers. The authors cover how NN took a backseat in the 1990s and early 2000s when non-neural methods such as support vector machines (SVMs) became popular, but came back in 2010s with the invention of deep belief networks (DBNs) (and restricted Boltzmann machines), which achieved great results in handwriting classification. The authors provide an overview of various machine learning software frameworks such as Caffe, Neon, Torch, Tensorflow, and so on. Various hardware options for DL such as graphics processing units (GPUs), field-programmable gate arrays (FPGAs), virtual processing units (VPUs), and high-performance computing (HPC) are also covered. This overview gives context to the following sections.
Just like generic DL, the authors provide an overview of various mobile-specific algorithms, frameworks, and hardware. They provide two ways in which inferencing can be performed over mobile platforms: first where inferencing is done on the mobile itself, and the other where sensed data is sent to a cloud that returns the inferred output. The authors cover various techniques used to reduce footprints of DNNs so that inferencing can be done on mobile platforms. These techniques can be categorized in two groups: the first where the number of NN weights are reduced and the second where compression is used. These techniques reduce the storage requirements by a factor of 35 on ImageNet (compared to AlexNet) without losing too much accuracy.
Next, the authors present software frameworks supported on mobile platforms. Some of the generic software frameworks, such as MXNet, TensorFlow, Android Caffe, Torch, and so on, have specific support for mobile platforms. CNNdroid, DeepSense, Boda-RTC, and others are covered as example frameworks specifically for mobile platforms. A summary of various hardware technologies for mobile DNNs is presented next, including FPGAs, application-specific integrated circuits (ASICs), and custom mobile chips. The authors cover a set of mobile applications implementing deep learning in the areas of health (estimating calorie value of food, human activity monitoring, and helping physicians in diagnosis); security (malware detection using deep belief networks); ambient intelligence (places of interest detection and mobi-ear); and translation and speech recognition (Google Translate and Amazon Polly).
Overall, this is a comprehensive survey paper covering lots of ground. A few places where the paper could have done better are: (1) the authors could have covered some video processing applications, (2) the paper doesn’t cover the cases where DL models are augmented (trained) on mobile devices, and (3) differences between DNN, ANN, DBN, and so on are not outlined. Despite these weaknesses, I highly recommend this paper.