Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Resource characterisation of personal-scale sensing models on edge accelerators
Antonini M., Vu T., Min C., Montanari A., Mathur A., Kawsar F.  AIChallengeIoT 2019 (Proceedings of the First International Workshop onChallenges in Artificial Intelligence and Machine Learning for Internet of Things, New York, NY, Nov 10-13, 2019)49-55.2019.Type:Proceedings
Date Reviewed: Aug 18 2021

Edge computing is going through an exciting phase where several chip vendors are looking to infiltrate the ecosystem with accelerators. As compute requirements at the edge increase, choosing the right compute architecture coupled with a suitable accelerator will be key. Domain-specific acceleration will be needed to maximize gains.

Deep learning and inference models are key to infer various sensory input activities like motion, audio, and vision at the edge. A comprehensive study of various accelerators for different models is needed. This paper surveys a representative set of accelerators needed for deep learning and inference models: Coral (RPi 4B)/Coral Dev accelerators by Google, NCS2(RPi 4B) by Intel, and Jetson Nano (GPU)/Jetson Nano (RT) by NVIDIA.

The experimental design evaluates the following deep learning models to infer motion, audio, and vision activities:

  • Motion: Aroma;
  • Audio: Emotion, deep keyword spotting (DKWS); and
  • Vision: MobileNet V1, EfficientNet-EdgeTPU, Inception V1, DenseNet121.

The authors evaluate the execution of these models on the accelerators to study the following metrics:

  • Memory footprint;
  • Execution time;
  • Energy consumption; and
  • Overall performance.

They analyze these metrics during model load, warmup (first inference), and subsequent inferences.

The authors find that devices with dedicated on-chip memory coupled with software pipelines (including compiler optimizations and TensorFlow runtimes) can reduce memory footprint requirements. This works favorably for the Coral/Coral Dev and the NCS2 accelerators, compared to Jetson Nano that shares memory between the central processing unit (CPU) and the graphics processing units (GPUs).Also, the on-chip memory on Coral Dev tensor processing units (TPUs) results in faster execution times except when large models like the DenseNet121 can’t fit into the on-chip memory. This proves their hypothesis that on-chip memory makes a difference in memory requirements and execution time.

They find some aberrations to this theory, however, with the NCS2 Movidius chip when models have a large kernel size in their first convolution layer. Also, lower loading and warmup times in the Coral Dev TPUs make them better suited for reactive situations when multiple models have to be loaded/unloaded for a changed sensory input in a dynamic usage model.

With their power measurements, they discover a few important patterns:

(1) The Coral/Coral Dev boards consistently draw less power than the other accelerators;
(2) Tensor flow optimizations affect energy efficiencies; and
(3)Self-contained accelerators draw less power than the ones that need Host interface.

Finally, the authors compare RPi 3B+ based accelerators with RPi 4B accelerators and find that RPi 4B accelerators generally perform better (due to the USB 3.0 interface) but tend to consume more energy. This has implications for battery capacity.

In summary, this paper describes a well-designed experimental methodology to analyze edge accelerators and their efficacy toward inference models. They have automated this methodology into a toolkit to further analyze additional edge accelerators in the future.

Reviewer:  Shyamkumar Iyer Review #: CR147335 (2111-0268)
Bookmark and Share
Would you recommend this review?
Other reviews under "Real-Time And Embedded Systems": Date
RISC design for computer image generation
Anido M., Allerton D. Microprocessors & Microsystems 14(6): 341-350, 1990. Type: Article
May 1 1991
UML for real: design of embedded real-time systems
Lavagno L., Martin G., Selic B., Kluwer Academic Publishers, Norwell, MA, 2003.  369, Type: Book (9781402075018)
May 18 2004
Divide and recycle: types and compilation for a hybrid synchronous language
Benveniste A., Bourke T., Caillaud B., Pouzet M.  LCTES 2011 (Proceedings of the 2011 SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems, Chicago, IL, Apr 11-14, 2011)61-70, 2011. Type: Proceedings
Jun 10 2011

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 2004™
Terms of Use
| Privacy Policy