Draft:Embedded Machine Learning

Embedded Machine Learning is the field of study where Embedded System and Machine Learning interact. Normally, Machine Learning Models consumes resources in terms of processing power, memory and interference speed during the training and inference phase. On the opposite, Embedded Systems such as microcontrollers, ECUs, wearable devices and edge devices have limited computing resources (memory, processor speed etc). Enabling such large models to run (Interference) on these devices is the main goal of this field. Various techniques such as hardware acceleration & model optimisation are used to achieve this goal.

ML models can be trained on larger computing systems like cloud or server but it is challenging when it comes to deploy (download/flash) that models on embedded devices. Another challenge to run that models (inference phase) on the embedded systems to make predictions without significant loss in accuracy.

Hardware based methods edit

Hardware acceleration techniques leverages specialized hardware components, such as Digital Signal Processors (DSPs), Graphics Processing Units (GPUs), Field-Programmable Gate Arrays (FPGAs), and dedicated Neural Network Accelerators (NNAs), to accelerate the inference process and improve the efficiency of embedded machine learning algorithms.

Software based methods edit

Some Model Optimization (Compression) techniques are mentioned below which are used to compress/alter a ML model in such a way that the model take less space and make predictions faster without significant loss in accuracy.

Pruning edit

Removes less important connections and parameters from the model results in reduced size and complexities.

Quantization edit

Reduces the precision of parameters by using lower-bit representation (e.g., from 32 bits to 8 bits), leading to a smaller model size and faster inference.

It can be during the training phase (Quantization-aware training) or can be after training (post-training quantization).

Knowledge Distillation edit

Transfer knowledge from a large, pre-trained teacher network to a smaller student network, resulting in a smaller and efficient model with comparable accuracy.

Low-Rank Factorization edit

Decomposes weight matrices into lower-rank factors, reducing the number of parameters without significant loss of accuracy.

Network Architecture Search (NAS) edit

Optimizes network architectures for both accuracy and efficiency, potentially leading to a smaller models with high performance.

Parameter Sharing edit

To-do

Example Weight Sharing

References edit

[1][2]

  1. ^ Ajani, Taiwo Samuel; Imoize, Agbotiname Lucky; Atayero, Aderemi A. (January 2021). "An Overview of Machine Learning within Embedded and Mobile Devices–Optimizations and Applications". Sensors. 21 (13): 4412. Bibcode:2021Senso..21.4412A. doi:10.3390/s21134412. ISSN 1424-8220. PMC 8271867. PMID 34203119.
  2. ^ Ajani, Taiwo Samuel; Imoize, Agbotiname Lucky; Atayero, Aderemi A. (2021-06-28). "An Overview of Machine Learning within Embedded and Mobile Devices–Optimizations and Applications". Sensors. 21 (13): 4412. Bibcode:2021Senso..21.4412A. doi:10.3390/s21134412. ISSN 1424-8220. PMC 8271867. PMID 34203119.