Literature Review of Deep Network Compression

Ali alqahtani, xianghua xie and mark w jones.

Deep networks often possess a vast number of parameters, and their significant redundancy in parameterization has become a widely-recognized property. This presents significant challenges and restricts many deep learning applications, making the focus on reducing the complexity of models while maintaining their powerful performance. In this paper, we present an overview of popular methods and review recent works on compressing and accelerating deep neural networks. We consider not only pruning methods but also quantization methods, and low-rank factorization methods. This review also intends to clarify these major concepts, and highlights their characteristics, advantages, and shortcomings.

Related Files

10.3390/informatics8040077 https://dx.doi.org/10.3390/informatics8040077

Ali Alqahtani, Xianghua Xie and Mark W Jones, Literature Review of Deep Network Compression, Informatics 8(4):77 (2021). https://dx.doi.org/10.3390/informatics8040077

  • Corpus ID: 222208626

A Survey on Deep Neural Network Compression: Challenges, Overview, and Solutions

  • Rahul Mishra , Hari Prabhat Gupta , Tanima Dutta
  • Published in arXiv.org 5 October 2020
  • Computer Science, Engineering

Figures and Tables from this paper

figure 1

77 Citations

Compacting deep neural networks for internet of things: methods and applications, from algorithm to hardware: a survey on efficient and safe deployment of deep neural networks, chain of compression: a systematic approach to combinationally compress convolutional neural networks, compressed neural architecture utilizing dimensionality reduction and quantization, designing and training of lightweight neural networks on edge devices using early halting in knowledge distillation, model compression for non-sequential and sequential visual pattern recognition networks ― a hybrid approach, clipped quantization aware training for hardware friendly implementation of image classification networks, enabling deep learning for all-in edge paradigm, simplification of deep neural network-based object detector for real-time edge computing, adversarial attacks on machine learning in embedded and iot platforms.

  • Highly Influenced

98 References

To compress, or not to compress: characterizing deep learning model compression for embedded inference.

  • Highly Influential

Efficient memory compression in deep neural networks using coarse-grain sparsification for speech applications

Deepiot: compressing deep neural network structures for sensing systems with a compressor-critic framework.

  • 11 Excerpts

On-Demand Deep Model Compression for Mobile Devices: A Usage-Driven Model Selection Framework

A method to estimate the energy consumption of deep neural networks, compensated-dnn: energy efficient low-precision deep neural networks by compensating quantization errors, deep compression: compressing deep neural network with pruning, trained quantization and huffman coding.

  • 10 Excerpts

A portable, automatic data qantizer for deep neural networks

Eie: efficient inference engine on compressed deep neural network, dynamic network surgery for efficient dnns, related papers.

Showing 1 through 3 of 0 Related Papers

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

chrome icon

A Survey on Deep Neural Network Compression: Challenges, Overview, and Solutions.

Chat with Paper

Compacting Deep Neural Networks for Internet of Things: Methods and Applications

Overfitting measurement of convolutional neural networks using trained network weights, a task offloading and reallocation scheme for passenger assistance using fog computing, designing and training of lightweight neural networks on edge devices using early halting in knowledge distillation, adjustable model compression using multiple genetic algorithm, shufflenet: an extremely efficient convolutional neural network for mobile devices, learning both weights and connections for efficient neural networks, mnasnet: platform-aware neural architecture search for mobile, exploiting linear structure within convolutional networks for efficient evaluation, channel pruning for accelerating very deep neural networks, related papers (5), resource-constrained fpga/dnn co-design, going deeper than deep learning for massive data analytics under physical constraints, towards efficient hardware acceleration of deep neural networks on fpga, txsim: modeling training of deep neural networks on resistive crossbar systems, an automated approach to accelerate dnns on edge devices.

An Overview of Neural Network Compression

literature review of deep network compression

Overparameterized networks trained to convergence have shown impressive performance in domains such as computer vision and natural language processing . Pushing state of the art on salient tasks within these domains corresponds to these models becoming larger and more difficult for machine learning practitioners to use given the increasing memory and storage requirements, not to mention the larger carbon footprint. Thus, in recent years there has been a resurgence in model compression techniques, particularly for deep convolutional neural networks and self-attention based networks such as the Transformer. Hence, this paper provides a timely overview of both old and current compression techniques for deep neural networks , including pruning, quantization, tensor decomposition, knowledge distillation and combinations thereof. We assume a basic familiarity with deep learning architectures[%s], namely, Recurrent Neural Networks  <cit.>, Convolutional Neural Networks <cit.> [%s] and Self-Attention based networks <cit.>[%s],[%s]. Most of the papers discussed are proposed in the context of at least one of these DNN architectures.

Related Research

Knowledge distillation in vision transformers: a critical review, self-attention for raw optical satellite time series classification, a survey on deep neural network compression: challenges, overview, and solutions, compression of deep learning models for text: a survey, comcat: towards efficient compression and customization of attention-based vision models, dct-former: efficient self-attention with discrete cosine transform, analyzing compression techniques for computer vision.

Please sign up or login with your details

Generation Overview

AI Generator calls

AI Video Generator calls

AI Chat messages

Genius Mode messages

Genius Mode images

AD-free experience

Private images

  • Includes 500 AI Image generations, 1750 AI Chat Messages, 30 AI Video generations, 60 Genius Mode Messages and 60 Genius Mode Images per month. If you go over any of these limits, you will be charged an extra $5 for that group.
  • For example: if you go over 500 AI images, but stay within the limits for AI Chat and Genius Mode, you'll be charged $5 per additional 500 AI Image generations.
  • Includes 100 AI Image generations and 300 AI Chat Messages. If you go over any of these limits, you will have to pay as you go.
  • For example: if you go over 100 AI images, but stay within the limits for AI Chat, you'll have to reload on credits to generate more images. Choose from $5 - $1000. You'll only pay for what you use.

Out of credits

Refill your membership to continue using DeepAI

Share your generations with friends

  •  Sign into My Research
  •  Create My Research Account
  • Company Website
  • Our Products
  • About Dissertations
  • Español (España)
  • Support Center

Select language

  • Bahasa Indonesia
  • Português (Brasil)
  • Português (Portugal)

Welcome to My Research!

You may have access to the free features available through My Research. You can save searches, save documents, create alerts and more. Please log in through your library or institution to check if you have access.

Welcome to My Research!

Translate this article into 20 different languages!

If you log in through your library or institution you might have access to this article in multiple languages.

Translate this article into 20 different languages!

Get access to 20+ different citations styles

Styles include MLA, APA, Chicago and many more. This feature may be available for free if you log in through your library or institution.

Get access to 20+ different citations styles

Looking for a PDF of this document?

You may have access to it for free by logging in through your library or institution.

Looking for a PDF of this document?

Want to save this document?

You may have access to different export options including Google Drive and Microsoft OneDrive and citation management tools like RefWorks and EasyBib. Try logging in through your library or institution to get access to these tools.

Want to save this document?

  • Document 1 of 1
  • More like this
  • Scholarly Journal

Literature Review of Deep Network Compression

No items selected.

Please select one or more items.

Select results items first to use the cite, email, save, and export options

1. Introduction

In recent years, deep learning has rapidly grown and begun to show its robust ability in representation learning, achieving remarkable success in diverse applications. This achievement has been possible through its ability to discover, learn, and perform automatic representation by transforming raw data into an abstract representation. The process of deep learning utilizes a hierarchical level of neural networks of different kinds, such as multilayer perceptron (MLP), convolutional neural networks (CNNs), and recurrent neural networks (RNNs). This hierarchical representation allows models to learn features at multiple abstraction levels, meaning that complicated concepts can be learned from simpler ones. Neurons in earlier layers of a network learn low-level features, while neurons in later layers learn more complex concepts [1].

The achievement of neural networks in a variety of applications is accompanied by a dramatic increase in computational costs and memory requirements. Due to the sufficient amount of data and advanced computing power, neural networks have turned into wider and deeper architectures, driving state-of-the-art performances in a wide range of applications. Despite their great success, neural networks have a massive number of parameters, and their significant redundancy in the parameterization has become a widely-recognized property [2]. The over-parametrized and redundant nature of neural networks incur expensive computational costs and high storage requirements. To classify a single image, the VGG-16 model [3], for instance, requires more than 30 billion float point operations (FLOPs), and contains about 138 million parameters with more than 500 MB of storage space. This presents significant challenges and restricts many CNN applications. Recognizing the importance of network units can help to reduce the model complexity by discarding less essential units.

Most of the computational complexity originates in the convolutional layers due to massive multiplication and addition operations, although they contain less parameters due to parameter sharing. The number of FLOPs is utilized as a popular metric to estimate the complexity of CNN models. The FLOPs in convolutional layers are calculated as follows [4]:

(1) FLOPs = 2 HW ( C i n K 2 + 1 ) C o u t ,

These complexities present significant challenges and restrict many applications. For instance, deploying sizeable deep learning models to a resource-limited device leads to various constraints as on-device memory is limited [8]. Therefore, reducing computational costs and storage requirements is critical to widen the applicability of deep learning models in a broader range of applications (e.g., mobile devices, autonomous agents, embedded systems, and real-time applications). Reducing the complexity of models while maintaining their powerful performance creates unprecedented opportunities for researchers to tackle major challenges in deploying deep learning systems to a resource-limited device. Network pruning focuses on discarding unnecessary parts of neural networks to reduce the computational costs and memory requirements associated with deep models. Pruning approaches have received considerable attention as a way to tackle over-parameterization and redundancy. Consequently, over-parameterized networks can be efficiently compressed and allow for the acquisition of a small subset of the whole model, representing the reference model with fewer parameters [9]. There is no authoritative guide for choosing the best network architecture; a model may require a certain level of redundancy during model training to guarantee excellent performance [10]. Hence, decreasing the size of a model after training can be an effective solution.

Pruning approaches were conceptualized in the early 1980s and ’90s, and can be applied to any part of deep neural networks [11,12,13,14,15,16,17]. Optimal Brain Damage (OBD) by LeCun et al. [13], and Optimal Brain Surgeon (OBS) by Hassibi et al. [14] are considered pioneering works of network pruning, demonstrating that several unimportant weights can be removed from a trained network with little accuracy loss. Due to expensive computation costs, these methods are not applicable to today’s deep models. Obtaining a sub-network with fewer parameters without reducing accuracy is the main goal of pruning algorithms. The pruned version, a subset of the whole model, can represent the reference model at a smaller size or with a smaller number of parameters. Over-parameterized networks can therefore be efficiently compressed while maintaining the property of better generalization [18].

In this paper, we present an overview of popular methods and review recent works on compressing and accelerating deep neural networks, which have received considerable attention from the deep learning community and have already achieved remarkable progress. The types of compression methods discussed below are intended to provide an overview of popular techniques used in the research of deep neural network compression and acceleration.

The rest of this paper is organized as follows. Section 2 describes the methodology used to collect related research papers and the scope of the literature. Section 3 presents a detailed review of deep network compression, derived from our general classification for deep network compression and acceleration. Section 4 summarizes and discusses the future challenges reported within our collection. Finally, concluding remarks and summary are provided in Section 5.

2. Methodology

2.1. Survey Search Methodology

A variety of concepts and methods are involved in obtaining a sub-network with fewer parameters without reducing accuracy. Our search methodology was to collect, study, and analyze many papers in the field of deep network compression and network pruning. In our search of the literature, we started by looking at each individual journal and conference in the computer vision and deep learning communities. We performed a keyword search, e.g., ‘network compression’, ’network pruning’, ‘network acceleration’,’model compression and acceleration’, or ‘compact network architectures’. We list all the literature sources searched in Table 2.

2.2. Survey Scope

In scope: To fulfil the scope of our survey, we selected papers that focus on deep network compression and model pruning approaches. We found and collected 57 papers to include in our deep network survey. We pay attention to compression methods and pruning levels for all papers whether a model is pre-trained or trained from scratch.

Out of scope: We restrict our literature to papers that include a review of deep network compression approaches. Papers that focus on data compression are out of our survey’s scope. Unlike model compression, data compression (i.e., text compression [19], genomic compression [20], and image compression [21,22,23]) forms a central role to handle the bottleneck of data storage, transmission, and processing.

2.3. Survey Classification

The recently advanced approaches for deep network compression and acceleration presented in this work can be classified into three categories: pruning methods, quantization methods, and low-rank factorization methods.

3. Deep Network Compression

3.1. Pruning Methods

This section illustrates approaches that have been proposed to prune non-informative parts from heavy, over-parameterized deep models, including weights (i.e., parameters or connections) and units (i.e., neurons or filters). The core of network pruning is eliminating unimportant, redundant, or unnecessary parts according to the level of importance. Pruning methods can be applied to pre-trained models or trained from scratch and are further categorized into two classes according to pruning level: weights level and units level. Weight-based pruning eliminates unnecessary, low-weight connections between layers of a neural network while unit-based methods remove all weight connections to a specific unit, where both income or outgoing weights are removed.

3.1.1. Weight-Based Methods

Several weight-based methods have been proposed to prune non-informative connections. Recently, Han et al. [24] introduced a pruning method to remove connections whose absolute values are smaller than a predefined threshold value calculated using the standard deviation of a layer’s weights. The network is then retrained to account for the drop in accuracy. Although Han’s framework received significant attention and has become a typical method of network pruning, it focuses on the magnitude of weights, relies on iterative pruning and fine-tuning, and requires a particular software/hardware accelerator not supported by off-the-shelf libraries. Moreover, the reliance on a predefined threshold is not practical and too inflexible for some applications.

Liu et al. [25] showed the possibility of overriding the retraining phase by random reinitialization before the retraining step, which delivers equal accuracy with comparable training time. Furthermore, Mocanu et al. [26] replaced the fully-connected layers with sparsely-connected layers by applying initial topology based on the Erdős–Rényi random graph. During training, fractions of the smallest weights are iteratively removed and replaced with the new random weights. Applying initial topology allows for the finding of a sparse architecture before training; however, this requires expensive training steps and obviously benefits from iteratively random initialization. The random connectivity of non-structured sparse models can also cause poor cache locality and jumping memory access, which extremely limits the practical acceleration [27].

Through an iterative pruning technique, Frankle et al. [28] found that over- parameterized networks contain small sub-networks (winning tickets) that reach test accuracy comparable to the original network. The obtained sparse network can be trained from scratch using the same initialization as the original model to achieve the same level of accuracy. Their core idea was to find a smaller architecture better suited to the target task at the training phase. In a follow-up study, Frankle et al. [29] found that pruning networks at initialization values does not work well with deeper architectures, and suggested setting the weights to those obtained at a given early epoch in training. Various extensions have been developed for further improvement and to experimentally analyze the existence of the lottery hypothesis in other types of networks [30,31,32,33].

To overcome the weaknesses associated with unstructured pruning, strategies corresponding to group-wise sparsity-based network pruning have been explored. Wen et al. [27] proposed the Structured Sparsity Learning (SSL) method, which imposes group-wise sparsity regularization on CNNs, applying the sparsity at different levels of their structure (filters, channels, and layers) to construct compressed networks. Lebedev et al. [34] also employed group-wise sparsity regularization to shrink individual weights toward zero so they can be effectively ignored. Furthermore, Zhou et al. [35] incorporated sparsity constraints on network weights during the training stage, aiming to build pruned DNNs. Although this proved successful in such sparse solutions, it results in damage to the original network structure and there is still a need to adopt special libraries or use particular sparse matrix multiplication to accelerate the inference speed in real applications.

It can be argued that the use of weight-based methods suffers from certain limitations. The need to remove low-weight connections means that important neurons whose activation does not contribute enough due to low-magnitude income or outgoing connections could be ignored. Moreover, the overall impact of weight-based pruning on network compression is lower than neuron-based methods. Pruning a neuron eliminates entire rows or columns of the weight matrices from both the former and later layers connected to that neuron, while weight-based methods only prune the low-weight connections between layers. To process the resulting sparse weight-matrices, some methods also require a particular software/hardware accelerator that off-the-shelf libraries do not support. Despite these drawbacks, the weight-based methods can be applied in combination with unit-based methods to add extra compression value.

3.1.2. Unit-Based Methods (Neurons, Kernels, and Filters)

Unit-based methods represent a pruning approach proposed to eliminate the least important units. He et al. [36] developed a simple unit-based pruning strategy that involves evaluating the importance of a neuron by summing the output weights of each one, and eliminating unimportant nodes based on this. They also apply neuron-based pruning utilizing the entropy of neuron activation. Their entropy function evaluates the activation distribution of each neuron based on a predefined threshold, which is only suitable with a sigmoid activation function. Since this method damages the network’s accuracy, additional fine-tuning is required to obtain satisfactory performance. Alqahtani et al. [37] proposed a majority voting technique to compare the activation values among neurons and assign a voting score to quantitatively evaluate their importance, which helps to effectively reduce model complexity by eliminating the less influential neurons. Their method simultaneously identifies the critical neurons and prunes the model during training without involving any pre-training or fine-tuning procedures.

Srinivas et al. [38] also introduced a unit-based pruning method by evaluating the weights similarity of neurons in a layer. A neuron is removed when its weights are similar to that of another in its layer. Mariet et al. [39] introduced Divnet, which selects a subset of diverse neurons and merges similar neurons into one. The subset is selected based on activation patterns by defining a probability measure over subsets of neurons. As with others, these pruning methods require software/hardware accelerators that are unsupported by off-the-shelf libraries and a multi-step procedure to prune neurons.

Filter-level pruning strategies have been widely studied. The aim of these strategies is to evaluate the importance of intermediate units, where pruning is conducted according to the lowest scores. Li et al. [40] suggested such a pruning method based on the absolute weighted sum, and Liu et al. [41] proposed a pruning method based on the mean gradient of feature maps in each layer, which reflects the importance of features extracted by convolutional kernels. Other data-driven pruning methods have been developed to prune non-informative filters. For instance, Polyak et al. [42] designed a statistical pruning method that removes filters based on variance of channels by applying the feature maps activation variance to evaluate the critical filters. Unimportant filters can also be pruned according to the level of importance. Luo’s [43] pruning method is based on the entropy of the channels’ output to evaluate the importance of their filters, and prunes the lowest output entropy, while Hu et al. [44] evaluated the importance of filters based on the average percentage of zero activations (APoZ) in their output feature maps.

Furthermore, Luo et al. [10] proposed the ThiNet method, which applies a greedy strategy for channel selection. This prunes the target layer by greedily selecting the input channel with the smallest increase in reconstruction error. The least-squares approach is applied to indicate a subset of input channels which have the smallest impact to approximate the output feature map. A general channel pruning approach is also presented by Liu et al. [45], where a layer-grouping algorithm is proposed to find coupled channels automatically. Then a unified metric based on Fisher information is derived to evaluate the importance of a single channel and coupled channels. These methods tend to compress networks by simply adopting straightforward selection criteria based on statistical information. However, dealing with an individual CNN filter requires an intuitive process to determine selective and semantically meaningful criteria for filter selection, where each convolution filter responds to a specific high-level concept associated with different semantic parts. The most recent work is a CNN pruning method inspired by neural network interpretability. Yeom et al. [46] combined the two disconnected research lines of interpretability and model compression by basing a pruning method on layer-wise relevance propagation (LRP) [47], where weights or filters are pruned based on their relevance score. Alqahtani et al. [48] proposed a framework to measure the importance of individual hidden units by computing a measure of relevance to identify the most critical filters, introducing the use of the activation of feature maps to detect valuable information and the essential semantic parts to evaluate the importance of feature maps.

It could be argued that compressing a network via a training process may provide more effective solutions. Ding et al. [49] presented an optimization method that enforces correlation among filters to converge at the same values to create identical filters, of which, redundant ones are safely eliminated during training. He et al. [50] proposed a filter pruning method which prunes convolutional filters in the training phase. After each training epoch, the method measures the importance of filters based on L2 norm, and the least essential filters are set to zero. He et al. [51] later iteratively measured the importance of the filter by calculating the distance between the convolution kernel and the origin or the geometric mean based on which redundant kernels are identified and pruned during training. Liu et al. [52] trained an auxiliary network to predict the weights of the pruned networks and estimate the performance of the remaining filters. Moreover, Zhonghui et al. [53] applied a training objective to compress the model as a task of learning a scaling factor associated with each filter and estimating its importance by evaluating the change in the loss function. AutoPruner [54] embedded the pruning phase into an end-to-end trainable framework. After each activation, an extra layer is added to estimate a similar scaling effect of activation, which is then binarized for pruning. A significant drawback of iterative pruning is the extensive computational cost; and pruning procedures based on training iterations often change the optimization function and even introduce hyper-parameters which make the training more challenging to converge.

3.2. Quantization Methods

Network quantization is a deep network compression procedure in which quantization, low precision, or binary representations are used to reduce the number of bits when representing each weight. Typical deep networks apply floating point (e.g., 32-bit) precision for training and inference, which is accompanied by a dramatic increase in computational costs, memory, and storage requirements. Several works [55,56,57] introduced low bit-width models with a high level of accuracy, considering both activation and weight quantization. In the parameter space, Gong et al. [58], and Wu et al. [8] applied Kmeans clustering on the weight values for quantization. As a result, the network weights are stored in a compressed format after completing the training process, which allows them to reduce storage requirements and computational complexity. 8-bit quantization of the parameters has been proved to achieve significant speedup with minimal accuracy loss [59]. Suyog et al. [60] showed that truncating all parameters to 16-bits can result in a significant reduction in memory usage and floating point operations without compromising accuracy.

Others have proposed to simultaneously prune and quantize the weights’ magnitudes of a trained neural network. Han et al. [61] iteratively eliminated the unnecessary weight connections and quantized the weights, which were then encoded to single-bit precision by applying Huffman coding for further compression. This achieved state-of-the-art performance with no drop in model accuracy. Soft weight-sharing [62] was also developed to combine quantization and pruning approaches in one retraining procedure. Chen et al. [63] introduced a HashedNets model that applied a random hash function on the connection weights to force the weights to share identical values, resulting in a reduction in the number of trainable parameters by grouping them into hash buckets. These pruning approaches typically generate connection pruning in CNNs. In advanced cases, 1-bit quantization is used to represent each weight. A number of binary-based methods exist to directly train networks with binary weights (i.e., BinaryNet [64], BinaryConnect [65], and XNORNetworks [55]), who shared the idea of learning binary weights or activation during the training process.

The disadvantages of binary networks include significant performance drops when dealing with larger CNNs, and they ignore the impact of binarization on accuracy loss. To overcome this, Hou et al. [66] employed a proximal Newton algorithm with a diagonal Hessian approximation to minimize the overall loss associated with binary weights, and Lin et al. [67] quantized the representations at each layer when computing parameter gradients, converting multiplications into binary shifts by enforcing the values of the neurons of power-of-two integers.

3.3. Low-Rank Factorization Methods

Low-rank approximation (factorization) is applied to determine the informative parameters, applying matrix or tensor decomposition. A weight matrix is factorized into a product of two smaller matrices, performing a similar function to the original weight matrix. In deep CNNs, the greatest computational cost results from convolution operations, so compressing the convolutional layers would improve overall speedup and compression rate. Convolutional units can be viewed as a 4D tensor, as the fact that the 4D tensor consists of a significant amount of redundancy drives the idea of tensor decomposition, which is an effective way to eliminate redundancy.

Low-rank factorization has been utilized for model compression and acceleration to achieve further speedup and obtain small CNN models. Rigamonti et al. [68] post-processed the learned filters by employing a shared set of separable 1D filters to approximate convolutional filters with low-rank filters, and Denton et al. used low-rank approximation and clustering schemes to reduce the computational complexity of CNNs. Jaderberg et al. [69] suggested using different tensor decomposition schemes, achieving double speedup for a particular convolutional layer with little drop in model accuracy. Low-rank factorization has also been used to exploit low-rankness in fully-connected layers. Denil et al. [9] utilized a low-rank decomposition of the weight matrices which learned from an auto-encoder to reduce the number of dynamic parameters, while Sainath et al. [70] showed that low-rank factorization of the last weighting layer significantly reduces the number of parameters. Lu et al. [71] adopted SVD to composite the fully-connected layer, attempting to design compact multi-task deep learning architectures. Low-rank approximation is made in a layer-by-layer fashion: at each layer, the layer is fine-tuned based on a reconstruction objective, while keeping all other layers fixed. Following this approach, Lebedev et al. [72] applied the non-linear least-squares algorithm, a type of Canonical Polyadic Decomposition (CPD), to approximate the weight tensors of the convolution kernels. Tai et al. [73] introduced a closed-form solution to obtain results of the low-rank decomposition through training constrained CNNs from scratch. The Batch Normalization layer (BN) is utilized to normalize the activations of the latent, hidden layers. This procedure has been shown to be effective in learning the low-rank constrained networks.

Low-rank factorization approaches are computationally expensive because they involve decomposition operations. They also cannot perform global parameter compression as low-rank approximation is carried out layer-by-layer [74]. Undertaking sufficient retraining is the only technique which can be used to achieve convergence when compared to the original model. Despite their downsides, these approaches can be integrated with conventional pruning methods to obtain more compressed networks for further improvement.

4. Discussion of Challenges and Future Directions

After we present a review of network compression and acceleration works and classify them into three categories, we, here, highlight opportunities and potential future directions. Reducing the complexity of models while maintaining their high performance creates unprecedented opportunities for researchers to tackle the challenges of deploying deep learning systems to a resource-limited device and increasing deep network models’ applicability to a broader range of applications. Choosing well-suited methods to compress and accelerate deep neural networks relies on the applications and requirements. For instance, pruning and low-rank factorization-based methods may present effective solutions when dealing with pre-trained models. In particular tasks (i.e., object detection), low-rank factorization may be better suited when accelerating convolutional layers, while the pruning method can be adopted when compressing fully-connected layers.

Applying initial topology or random connectivity of sparse models allows for the finding of a sparse architecture. Although this process has been proved successful [25,26,28,29,30,31,32,33], it is still part of a relatively young and emerging field. Most of the proposed methods damage the original network structure, demonstrating the necessity to adopt some special libraries or to use particular sparse matrix multiplication to accelerate the inference speed in real applications. Random connectivity causes cache and memory access issues so that the acceleration of even high sparsity models is very limited. Therefore, more theoretical analysis requires further study to better understand how to improve sparse models and introduce more effective methods.

The effectiveness of deep representations has been shown to extend to network pruning [46,48]. For instance, the pruning methods presented in [48] make use of quantifying the importance of latent representations, compressing and accelerating CNNs for image classification tasks, including CIFAR object recognition, CUB-200 fine-grained classification, and ImageNet large-scale object classification. Applying such pruning methods to real applications in several different computer vision tasks, including object detection, semantic segmentation, image generation, image retrieval, and style transfer, is a fertile avenue for future research, as these visual tasks require richer knowledge and more abstract feature representation than image classification, meaning that they may face a sharp reduction in model performance [75,76]. Research could visually explore how such applications are capable of making use of our pruning method, particularly semantic segmentation and image generation.

Several filter-level pruning strategies, proposed for CNN compression and acceleration approach, mainly focus on filter-level pruning [4,10,40,41,44,48], where removing the unimportant filter in its entirety does not affect the network structure. This would allow for more significant compression and acceleration by other compression approaches, such as the parameter quantization approach and low-rank factorization methods. Although these approaches are computationally expensive and cannot perform global parameter compression, integrating them with filter-level pruning methods would obtain more compressed networks for further improvement. It would also be fruitful to explore the usage of a hybrid scheme for network compression, where the advantages of each network compression category can be exploited to prune models further.

There are also several challenges and extensions we perceive as useful research directions. The first would be to extend the multi-step filter-level pruning framework and combine it with an iterative pruning method to more deeply explore the problem and accomplish effective CNN compression and acceleration, as pruning a network via a training process may provide more effective solutions. Secondly, most pruning methods are data-driven based, so their speed efficiency is a significant concern. Although pruning-based methods inspired by neural network interpretability achieved better results, it can be time-consuming to complete their process. Although a few images are selected from each category to form our evaluation set used to find the optimal channel subset, the [48] method still requires more than seven minutes to estimate IoU scores and MV values for one block only on ResNet-50 and ImageNet. Therefore, parallel implementation could be a promising solution, where CNN-based methods are more suitable for efficient parallelization benefit on both CPUs and GPUs. Consideration of a set of nodes, filters, and layers for pruning, instead of one by one in a greedy manner is also worthwhile to study in our future work.

Overall, the potential for deep network compression is vast; the field has many open problems to understand and explore. The remarkable advancement of neural network interpretability should encourage the development of efficient methods for network compression and acceleration to facilitate the deployment of advanced deep networks.

5. Conclusions

The over-parametrized and redundant nature of network models incurs expensive computational costs and high storage requirements, presenting significant challenges, and restricts many of their applications. Therefore, reducing the complexity of models while maintaining their powerful performance is always desirable. This paper has discussed necessary background information for deep network compression. We presented a comprehensive, detailed review of recent works on compressing and accelerating deep neural networks. Popular methods such as pruning methods, quantization methods, and low-rank factorization methods were described. We hope this paper can act as a keystone for future research on deep network compression.

Conceptualization, methodology, validation, formal analysis, investigation, writing—original draft preparation, writing—review and editing: All. All authors have read and agreed to the published version of the manuscript.

This work was supported by the Deanship of Scientific Research, King Khalid University of Kingdom of Saudi Arabia under research grant number (RGP1/207/42).

Not applicable.

The authors declare no conflict of interest.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Summary of Modern CNNs with their performance, computational, and parameter complexities in ImageNet database. M/B indicates million/billion ( / ), respectively.

Performance Computational Complexity Parameter Complexity
Year Network Layers (#) Size Top-1 (%) Top-5 (%) FLOPs Conv (%) FC (%) Par.(#) Conv (%) FC (%)
2012 [ ] 8 240 megabyte 36.70 15.30 724 M 91.9 8.1 61 M 3.8 96.2
2014 [ ] 16 528 megabyte 23.70 6.80 15.5 B 99.2 0.8 138 M 10.6 89.4
2014 [ ] 22 88 megabyte 22.10 6.30 1.6 B 99.9 0.1 6.9 M 85.1 14.9
2015 [ ] 50 98 megabyte 20.74 5.25 3.9 B 100 0 25.5 M 100 0

A list of literature sources searched for Deep Network Compression. We mainly use IEEE Xplore, the ACM Digital Library, the Elsevier Library, the Springer Library, and Google Scholar to search for literature.

Conferences and Journals Papers
Advances in Neural Information Processing Systems 13
International Conference on Learning Representations 12
IEEE Conference on Computer Vision and Pattern Recognition 5
CoRR 6
International Conference on Machine Learning 3
European Conference on Computer Vision 2
International Conference on Acoustics, Speech and Signal Processing 2
British Machine Vision Conference 2
Pattern Recognition 2
IEEE Transactions on Pattern Analysis and Machine Intelligence 1
IEEE International Conference on Computer Vision 1
Computer Vision and Image Understanding 1
International Conference on Pattern Recognition 1
Nature communications 1
International Conference on Applications of Intelligent Systems 1
Signal Processing 1
IEEE Access 1
IEEE International Joint Conference on Neural Networks 1
International Joint Conference on Artificial Intelligence 1
Total 57

1. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning ; MIT Press: Cambridge, MA, USA, 2016.

2. Denton, E.L.; Zaremba, W.; Bruna, J.; LeCun, Y.; Fergus, R. Exploiting linear structure within convolutional networks for efficient evaluation. Proceedings of the Advances in Neural Information Processing Systems ; Montreal, QC, Canada, 8–13 December 2014; pp. 1269-1277.

3. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. Proceedings of the International Conference on Learning Representations ; San Diego, CA, USA, 7–9 May 2015.

4. Molchanov, P.; Tyree, S.; Karras, T.; Aila, T.; Kautz, J. Pruning convolutional neural networks for resource efficient inference. Proceedings of the International Conference on Learning Representations ; Toulon, France, 24–26 April 2017.

5. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Proceedings of the Advances in Neural Information Processing Systems ; Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097-1105.

6. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ; Boston, MA, USA, 7–12 June 2015; pp. 1-9.

7. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ; Las Vegas, NV, USA, 27–30 June 2016; pp. 770-778.

8. Wu, J.; Leng, C.; Wang, Y.; Hu, Q.; Cheng, J. Quantized convolutional neural networks for mobile devices. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ; Las Vegas, NV, USA, 27–30 June 2016; pp. 4820-4828.

9. Denil, M.; Shakibi, B.; Dinh, L.; Ranzato, M.; De Freitas, N. Predicting parameters in deep learning. Proceedings of the Advances in Neural Information Processing Systems ; Lake Tahoe, NV, USA, 5–10 December 2013; pp. 2148-2156.

10. Luo, J.; Zhang, H.; Zhou, H.; Xie, C.; Wu, J.; Lin, W. ThiNet: Pruning CNN Filters for a Thinner Net. IEEE Trans. Pattern Anal. Mach. Intell. ; 2019; 41 , pp. 2525-2538. [DOI: https://dx.doi.org/10.1109/TPAMI.2018.2858232] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30040622]

11. Mozer, M.C.; Smolensky, P. Skeletonization: A technique for trimming the fat from a network via relevance assessment. Proceedings of the Advances in Neural Information Processing Systems ; Denver, CO, USA, 1988; Volume 1 , pp. 107-115.

12. Reed, R. Pruning algorithms: A survey. IEEE Trans. Neural Netw. ; 1993; 4 , pp. 740-747. [DOI: https://dx.doi.org/10.1109/72.248452] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/18276504]

13. LeCun, Y.; Denker, J.S.; Solla, S.A. Optimal brain damage. Proceedings of the Advances in Neural Information Processing Systems ; Denver, CO, USA, 26–29 November 1990; pp. 598-605.

14. Hassibi, B.; Stork, D.G. Second order derivatives for network pruning: Optimal brain surgeon. Proceedings of the Advances in Neural Information Processing Systems ; Denver, CO, USA, 1993; pp. 164-171.

15. Weigend, A.S.; Rumelhart, D.E.; Huberman, B.A. Generalization by weight-elimination applied to currency exchange rate prediction. Proceedings of the IEEE International Joint Conference on Neural Networks ; Seattle, WA, USA, 8–12 July 1991; pp. 2374-2379.

16. Hanson, S.; Pratt, L. Comparing biases for minimal network construction with back-propagation. Proceedings of the Advances in Neural Information Processing Systems ; Denver, CO, USA, 1988; pp. 177-185.

17. Weigend, A.S.; Rumelhart, D.E.; Huberman, B.A. Back-propagation, weight-elimination and time series prediction. Connectionist Models ; Morgan Kaufmann: Burlington, MA, USA, 1991; pp. 105-116. [DOI: https://dx.doi.org/10.1016/B978-1-4832-1448-1.50016-0]

18. Arora, S.; Ge, R.; Neyshabur, B.; Zhang, Y. Stronger generalization bounds for deep nets via a compression approach. Proceedings of the International Conference on Machine Learning ; Stockholm, Sweden, 10–15 July 2018; pp. 254-263.

19. Li, Z.; Zhang, Z.; Zhao, H.; Wang, R.; Chen, K.; Utiyama, M.; Sumita, E. Text Compression-aided Transformer Encoding. IEEE Trans. Pattern Anal. Mach. Intell. ; 2021; 1. [DOI: https://dx.doi.org/10.1109/TPAMI.2021.3058341] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33577448]

20. Amich, M.; Luca, P.D.; Fiscale, S. Accelerated implementation of FQSqueezer novel genomic compression method. Proceedings of the International Symposium on Parallel and Distributed Computing ; Warsaw, Poland, 5–8 July 2020; pp. 158-163.

21. Weinberger, M.; Seroussi, G.; Sapiro, G. The LOCO-I lossless image compression algorithm: Principles and standardization into JPEG-LS. IEEE Trans. Image Process. ; 2000; 9 , pp. 1309-1324. [DOI: https://dx.doi.org/10.1109/83.855427] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/18262969]

22. Nagoor, O.; Whittle, J.; Deng, J.; Mora, B.; Jones, M.W. MedZip: 3D Medical Images Lossless Compressor Using Recurrent Neural Network (LSTM). Proceedings of the International Conference on Pattern Recognition ; Milan, Italy, 10–15 January 2021; pp. 2874-2881.

23. Nagoor, O.; Whittle, J.; Deng, J.; Mora, B.; Jones, M.W. Lossless Compression For Volumetric Medical Images Using Deep Neural Network With Local Sampling. Proceedings of the IEEE International Conference on Image Processing ; Abu Dhabi, United Arab Emirates, 25–28 October 2020; pp. 2815-2819.

24. Han, S.; Pool, J.; Tran, J.; Dally, W. Learning both weights and connections for efficient neural network. Proceedings of the Advances in Neural Information Processing Systems ; Montreal, QC, Canada, 7–12 December 2015; pp. 1135-1143.

25. Liu, Z.; Sun, M.; Zhou, T.; Huang, G.; Darrell, T. Rethinking the value of network pruning. Proceedings of the International Conference on Learning Representations ; New Orleans, LA, USA, 6–9 May 2019.

26. Mocanu, D.C.; Mocanu, E.; Stone, P.; Nguyen, P.H.; Gibescu, M.; Liotta, A. Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nat. Commun. ; 2018; 9 , 2383. [DOI: https://dx.doi.org/10.1038/s41467-018-04316-3] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29921910]

27. Wen, W.; Wu, C.; Wang, Y.; Chen, Y.; Li, H. Learning structured sparsity in deep neural networks. Proceedings of the Advances in Neural Information Processing Systems ; Barcelona, Spain, 5–10 December 2016; pp. 2074-2082.

28. Frankle, J.; Carbin, M. The lottery ticket hypothesis: Finding sparse, trainable neural networks. Proceedings of the International Conference on Learning Representations ; New Orleans, LA, USA, 6–9 May 2019.

29. Frankle, J.; Dziugaite, G.K.; Roy, D.M.; Carbin, M. Stabilizing the lottery ticket hypothesis. arXiv ; 2019; arXiv: 1903.01611

30. Morcos, A.; Yu, H.; Paganini, M.; Tian, Y. One ticket to win them all: Generalizing lottery ticket initializations across datasets and optimizers. Proceedings of the Neural Information Processing Systems ; Vancouver, BC, Canada, 10–12 December 2019; pp. 4932-4942.

31. Hubens, N.; Mancas, M.; Decombas, M.; Preda, M.; Zaharia, T.; Gosselin, B.; Dutoit, T. An Experimental Study of the Impact of Pre-Training on the Pruning of a Convolutional Neural Network. Proceedings of the International Conference on Applications of Intelligent Systems ; Las Palmas de Gran Canaria, Spain, 7–12 January 2020; pp. 1-6.

32. Zhou, H.; Lan, J.; Liu, R.; Yosinski, J. Deconstructing lottery tickets: Zeros, signs, and the supermask. Proceedings of the Neural Information Processing Systems ; Vancouver, BC, Canada, 10–12 December 2019; pp. 3597-3607.

33. Yu, H.; Edunov, S.; Tian, Y.; Morcos, A.S. Playing the lottery with rewards and multiple languages: Lottery tickets in RL and NLP. arXiv ; 2020; arXiv: 1906.02768

34. Lebedev, V.; Lempitsky, V. Fast ConvNets using group-wise brain damage. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ; Las Vegas, NV, USA, 27–30 June 2016; pp. 2554-2564.

35. Zhou, H.; Alvarez, J.M.; Porikli, F. Less is more: Towards compact CNNs. Proceedings of the European Conference on Computer Vision ; Amsterdam, The Netherlands, 11–14 October 2016; pp. 662-677.

36. He, T.; Fan, Y.; Qian, Y.; Tan, T.; Yu, K. Reshaping deep neural network for fast decoding by node-pruning. Proceedings of the International Conference on Acoustics, Speech and Signal Processing ; Florence, Italy, 4–9 May 2014; pp. 245-249.

37. Alqahtani, A.; Xie, X.; Essa, E.; Jones, M.W. Neuron-based Network Pruning Based on Majority Voting. Proceedings of the International Conference on Pattern Recognition ; Milan, Italy, 10–15 January 2021; pp. 3090-3097.

38. Srinivas, S.; Babu, R.V. Data-free Parameter Pruning for Deep Neural Networks. Proceedings of the British Machine Vision Conference ; Swansea, UK, 7–10 September 2015; pp. 31.1-31.12.

39. Mariet, Z.; Sra, S. Diversity networks: Neural network compression using determinantal point processes. Proceedings of the International Conference on Learning Representations ; San Juan, Puerto Rico, 2–4 May 2016.

40. Li, H.; Kadav, A.; Durdanovic, I.; Samet, H.; Graf, H.P. Pruning filters for efficient ConvNets. Proceedings of the International Conference on Learning Representations ; Toulon, France, 24–26 April 2017.

41. Liu, C.; Wu, H. Channel pruning based on mean gradient for accelerating Convolutional Neural Networks. Signal Process. ; 2019; 156 , pp. 84-91. [DOI: https://dx.doi.org/10.1016/j.sigpro.2018.10.019]

42. Polyak, A.; Wolf, L. Channel-level acceleration of deep face representations. IEEE Access ; 2015; 3 , pp. 2163-2175. [DOI: https://dx.doi.org/10.1109/ACCESS.2015.2494536]

43. Luo, J.H.; Wu, J. An entropy-based pruning method for cnn compression. arXiv ; 2017; arXiv: 1706.05791

44. Hu, H.; Peng, R.; Tai, Y.W.; Tang, C.K. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv ; 2016; arXiv: 1607.03250

45. Liu, L.; Zhang, S.; Kuang, Z.; Zhou, A.; Xue, J.; Wang, X.; Chen, Y.; Yang, W.; Liao, Q.; Zhang, W. Group Fisher Pruning for Practical Network Compression. Proceedings of the International Conference on Machine Learning ; Virtual, Vienna, Austria, 18–24 July 2021; pp. 7021-7032.

46. Yeom, S.K.; Seegerer, P.; Lapuschkin, S.; Wiedemann, S.; Müller, K.R.; Samek, W. Pruning by Explaining: A Novel Criterion for Deep Neural Network Pruning. Pattern Recognit. ; 2021; 115 , 107899. [DOI: https://dx.doi.org/10.1016/j.patcog.2021.107899]

47. Bach, S.; Binder, A.; Montavon, G.; Klauschen, F.; Muller, K.R.; Samek, W. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE ; 2015; 10 , e0130140. [DOI: https://dx.doi.org/10.1371/journal.pone.0130140] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/26161953]

48. Alqahtani, A.; Xie, X.; Jones, M.W.; Essa, E. Pruning CNN filters via quantifying the importance of deep visual representations. Comput. Vis. Image Underst. ; 2021; 208 , 103220. [DOI: https://dx.doi.org/10.1016/j.cviu.2021.103220]

49. Ding, X.; Ding, G.; Guo, Y.; Han, J. Centripetal SGD for pruning very deep convolutional networks with complicated structure. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ; Long Beach, CA, USA, 15–20 June 2019; pp. 4943-4953.

50. He, Y.; Kang, G.; Dong, X.; Fu, Y.; Yang, Y. Soft filter pruning for accelerating deep convolutional neural networks. Proceedings of the International Joint Conference on Artificial Intelligence ; Stockholm, Sweden, 13–19 July 2018; pp. 2234-2240.

51. He, Y.; Liu, P.; Wang, Z.; Hu, Z.; Yang, Y. Filter pruning via geometric median for deep convolutional neural networks acceleration. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ; Long Beach, CA, USA, 15–20 June 2019; pp. 4340-4349.

52. Liu, Z.; Mu, H.; Zhang, X.; Guo, Z.; Yang, X.; Cheng, K.T.; Sun, J. Metapruning: Meta learning for automatic neural network channel pruning. Proceedings of the IEEE International Conference on Computer Vision ; Seoul, Korea, 27–28 October 2019; pp. 3296-3305.

53. You, Z.; Yan, K.; Ye, J.; Ma, M.; Wang, P. Gate decorator: Global filter pruning method for accelerating deep convolutional neural networks. Proceedings of the Neural Information Processing Systems ; Vancouver, BC, Canada, 10–12 December 2019; pp. 2133-2144.

54. Luo, J.H.; Wu, J. Autopruner: An end-to-end trainable filter pruning method for efficient deep model inference. Pattern Recognit. ; 2020; 107 , 107461. [DOI: https://dx.doi.org/10.1016/j.patcog.2020.107461]

55. Rastegari, M.; Ordonez, V.; Redmon, J.; Farhadi, A. XNOR-Net: Imagenet classification using binary convolutional neural networks. Proceedings of the European Conference on Computer Vision ; Amsterdam, The Netherlands, 11–14 October 2016; pp. 525-542.

56. Zhao, Y.; Gao, X.; Bates, D.; Mullins, R.; Xu, C.Z. Focused quantization for sparse CNNs. Proceedings of the Neural Information Processing Systems ; Vancouver, BC, Canada, 8–14 December 2019; pp. 5584-5593.

57. Zhou, A.; Yao, A.; Guo, Y.; Xu, L.; Chen, Y. Incremental network quantization: Towards lossless CNNs with low-precision weights. arXiv ; 2017; arXiv: 1702.03044

58. Gong, Y.; Liu, L.; Yang, M.; Bourdev, L. Compressing deep convolutional networks using vector quantization. arXiv ; 2014; arXiv: 1412.6115

59. Vanhoucke, V.; Senior, A.; Mao, M.Z. Improving the speed of neural networks on CPUs. Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning ; Grenada, Spain, 16 December 2011.

60. Gupta, S.; Agrawal, A.; Gopalakrishnan, K.; Narayanan, P. Deep learning with limited numerical precision. Proceedings of the International Conference on Machine Learning ; Lille, France, 6–11 July 2015; pp. 1737-1746.

61. Han, S.; Mao, H.; Dally, W.J. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. Proceedings of the International Conference on Learning Representations ; San Juan, Puerto Rico, 2–4 May 2016.

62. Ullrich, K.; Meeds, E.; Welling, M. Soft weight-sharing for neural network compression. Proceedings of the International Conference on Learning Representations ; Toulon, France, 24–26 April 2017.

63. Chen, W.; Wilson, J.; Tyree, S.; Weinberger, K.; Chen, Y. Compressing neural networks with the hashing trick. Proceedings of the International Conference on Machine Learning ; Lille, France, 6–11 July 2015; pp. 2285-2294.

64. Courbariaux, M.; Hubara, I.; Soudry, D.; El-Yaniv, R.; Bengio, Y. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or −1. arXiv ; 2016; arXiv: 1602.02830

65. Courbariaux, M.; Bengio, Y.; David, J.P. Binaryconnect: Training deep neural networks with binary weights during propagations. Proceedings of the Advances in Neural Information Processing Systems ; Montreal, QC, Canada, 7–12 December 2015; pp. 3123-3131.

66. Hou, L.; Yao, Q.; Kwok, J.T. Loss-aware binarization of deep networks. Proceedings of the International Conference on Learning Representations ; Toulon, France, 24–26 April 2017.

67. Lin, Z.; Courbariaux, M.; Memisevic, R.; Bengio, Y. Neural networks with few multiplications. Proceedings of the International Conference on Learning Representations ; San Juan, Puerto Rico, 2–4 May 2016.

68. Sironi, A.; Tekin, B.; Rigamonti, R.; Lepetit, V.; Fua, P. Learning Separable Filters. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ; Portland, OR, USA, 23–28 June 2013; pp. 2754-2761.

69. Jaderberg, M.; Vedaldi, A.; Zisserman, A. Speeding up Convolutional Neural Networks with Low Rank Expansions. Proceedings of the British Machine Vision Conference ; Nottingham, UK, 1–5 September 2014.

70. Sainath, T.; Kingsbury, B.; Sindhwani, V.; Arisoy, E.; Ramabhadran, B. Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing ; Prague, Czech Republic, 22–27 May 2013; pp. 6655-6659.

71. Lu, Y.; Kumar, A.; Zhai, S.; Cheng, Y.; Javidi, T.; Feris, R. Fully-Adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ; Honolulu, HI, USA, 21–26 July 2017; pp. 1131-1140.

72. Lebedev, V.; Ganin, Y.; Rakhuba, M.; Oseledets, I.; Lempitsky, V. Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition. Proceedings of the International Conference on Learning Representations ; San Diego, CA, USA, 7–9 May 2015.

73. Tai, C.; Xiao, T.; Wang, X.; Weinan, E. Convolutional neural networks with low-rank regularization. Proceedings of the International Conference on Learning Representations ; San Juan, Puerto Rico, 2–4 May 2016.

74. Cheng, Y.; Wang, D.; Zhou, P.; Zhang, T. Model compression and acceleration for deep neural networks: The principles, progress, and challenges. IEEE Signal Process. Mag. ; 2018; 35 , pp. 126-136. [DOI: https://dx.doi.org/10.1109/MSP.2017.2765695]

75. Zeng, D.; Zhao, F.; Shen, W.; Ge, S. Compressing and accelerating neural network for facial point localization. Cogn. Comput. ; 2018; 10 , pp. 359-367. [DOI: https://dx.doi.org/10.1007/s12559-017-9506-0]

76. Ge, S. Efficient deep learning in network compression and acceleration. Digital Systems ; IntechOpen: London, UK, 2018.

You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer

Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Deep networks often possess a vast number of parameters, and their significant redundancy in parameterization has become a widely-recognized property. This presents significant challenges and restricts many deep learning applications, making the focus on reducing the complexity of models while maintaining their powerful performance. In this paper, we present an overview of popular methods and review recent works on compressing and accelerating deep neural networks. We consider not only pruning methods but also quantization methods, and low-rank factorization methods. This review also intends to clarify these major concepts, and highlights their characteristics, advantages, and shortcomings.

VIAFID ORCID Logo

Suggested sources

  • About ProQuest
  • Terms of Use
  • Privacy Policy
  • Cookie Policy

Grab your spot at the free arXiv Accessibility Forum

Help | Advanced Search

Computer Science > Machine Learning

Title: a survey on deep neural network compression: challenges, overview, and solutions.

Abstract: Deep Neural Network (DNN) has gained unprecedented performance due to its automated feature extraction capability. This high order performance leads to significant incorporation of DNN models in different Internet of Things (IoT) applications in the past decade. However, the colossal requirement of computation, energy, and storage of DNN models make their deployment prohibitive on resource constraint IoT devices. Therefore, several compression techniques were proposed in recent years for reducing the storage and computation requirements of the DNN model. These techniques on DNN compression have utilized a different perspective for compressing DNN with minimal accuracy compromise. It encourages us to make a comprehensive overview of the DNN compression techniques. In this paper, we present a comprehensive review of existing literature on compressing DNN model that reduces both storage and computation requirements. We divide the existing approaches into five broad categories, i.e., network pruning, sparse representation, bits precision, knowledge distillation, and miscellaneous, based upon the mechanism incorporated for compressing the DNN model. The paper also discussed the challenges associated with each category of DNN compression techniques. Finally, we provide a quick summary of existing work under each category with the future direction in DNN compression.
Comments: 19 pages, 9 figures
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Cite as: [cs.LG]
  (or [cs.LG] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

DBLP - CS Bibliography

Bibtex formatted citation.

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

A comprehensive survey of deep learning-based lightweight object detection models for edge devices

  • Open access
  • Published: 10 August 2024
  • Volume 57 , article number  242 , ( 2024 )

Cite this article

You have full access to this open access article

literature review of deep network compression

  • Payal Mittal 1  

134 Accesses

Explore all metrics

This study concentrates on deep learning-based lightweight object detection models on edge devices. Designing such lightweight object recognition models is more difficult than ever due to the growing demand for accurate, quick, and low-latency models for various edge devices. The most recent deep learning-based lightweight object detection methods are comprehensively described in this work. Information on the lightweight backbone architectures used by these object detectors has been listed. The training and inference processes concerning to deep learning applications on edge devices is being discussed. To raise readers’ awareness of this developing domain, a variety of applications for deep learning-based lightweight object detectors and related utilities have been offered. Designing potent, lightweight object detectors based on deep learning has been suggested as a counter to such problems. On well-known datasets such as MS-COCO and PASCAL-VOC, we thoroughly examine the performance of certain conventional deep learning-based lightweight object detectors.

Similar content being viewed by others

literature review of deep network compression

Multi-scale Lightweight Neural Network for Real-Time Object Detection

literature review of deep network compression

Face Detection with YOLO on Edge

literature review of deep network compression

Optimized convolutional neural network architectures for efficient on-device vision-based object detection

Explore related subjects.

  • Artificial Intelligence

Avoid common mistakes on your manuscript.

1 Introduction

The advancement of effective deep learning-based object detectors has been influenced by Internet of Things (IoT)-based technologies. The majority of deep object models demand too much Central Processing Unit (CPU) power and cannot be used on edge devices, despite the fact that many object detectors attain outstanding accuracy and carry out inference in real-time (Wang et al. 2021a , 2021b , 2021c , 2022 ). Exciting outcomes have already been achieved using a variety of strategies. The brief study of strategies to deployment of deep learning-based applications into edge devices include (Wang et al. 2020a , 2020b , 2020c , 2021a , 2021b , 2021c ; Véstias et al. 2020 ; Li and Ye 2023 ; Subedi et al. 2021 ):

Using a partitioning technique, since various layers may execute at different times. For example, in a fully connected or convolutional layer, divide the processing graph into offloadable tasks so that the execution time of each composite task unit is the same.

Large-scale analytics platforms require intermediate resource standardisation for data manageability and low latency, as opposed to standalone applications on mobile devices. With the provisioning of intermediate resources, deep learning-based analytics platform can determine the proportion of local processing, provided that there is a mechanism to divide the load between buffering and memory loading. The offloaded execution through efficient partitioning can reduces costs, latency, or any other issue-related aim.

Moreover, a detailed study is provided in Sect. 4.6 of manuscript. In recent years, a new field of study i.e., lightweight object detectors have emerged with the goal of developing compact, effective networks for deployments of the IoT that frequently take place in low computing or resource-constrained settings. The research community has long worked to identify the best accuracy detection models through advanced architectural searches, as developing the deep learning-based lightweight network architecture is a difficult procedure. When using these models in edge devices, such as high-performance embedded processors, the question arises regarding usage of high-end innovative applications with fewer resources. It is still not entirely possible to perform detection using a smart phone or edge devices. Although existing models available today are capable of doing this task, but their precision level is just insufficient and undesirable in real-time instances.

Edge computing, according to Gartner, is a component of an architecture of distributed computing where data processing resides near the edge where devices or individuals generate or consume that data (Hua et al. 2023 ). Because of the constant growth in data created by the IoT, edge computing was first allocated to reduce bandwidth costs for data travelling long distances. On the other hand, the emergence of real-time applications that require processing at the edge is driving the current technological advancements. Among many other benefits, data minimization at the network level can prevent bottlenecks and significantly reduce energy, bandwidth, and storage expenses. A single device is able to send data across a network, problems occur when hundreds of devices send data at once. In addition to reducing quality due to delay, it also raises bandwidth expenses and creates bottlenecks that might result in cost spikes. By acting as a local source for these systems’ data processing and storage, edge computing services and offerings assist in fixing this problem. It also serves as an edge gateway, minimizing bandwidth requirements by processing data from an edge device and sending the pertinent data back through the cloud (Jin et al. 2021 ). A key element in modern integrated real-world Artificial Intelligence (AI) systems is edge devices. IoT devices could only gather data in the beginning and send it to the cloud for processing. By putting services closer to a network’s edge, edge computing expands the possibilities of cloud computing and enables a wider range of AI services and machine learning applications. IoT computing devices, mobile devices, embedded computers, smart TVs, and other connected gadgets can all be termed edge devices. Real-time application development and deployment can be accelerated by edge computing devices through high-speed networking technologies such 5G networking. Robotics, image and video processing, intelligent video analytics, self-driving cars, medical imaging, machine vision, industrial inspection, among examples of such applications (Véstias et al. 2020 ).

Edge computing can be applied to devices that are directly connected to sensors, routers or gateways that transfer data, or small servers installed locally in a closet. There are an increasing number of edge computing use cases as well as smart devices capable of doing various activities at the edge. The range of applications for edge computing is expanding in tandem with the development of AI capabilities. The applications spanning a wide range can be found utilising edge computing (Xu et al. 2020 ). Additionally, there is a good deal of overlap among the various use cases for edge computing. In particular, edge computing functionality in traffic management systems is closely related to that of autonomous vehicles as briefly discussed below:

Industrial infrastructure

Predictive maintenance and failure detection management in industries are supported by the edge computing. When a machine or component breaks down, the capability kicks in, enabling factory workers to fix the issue or replace the part in advance and save money by preventing lost output. The architecture of edge computing can handle large amounts of data from sensors and programmable logic controllers, as well as facilitate effective communications across extremely complicated supervisory control and data gathering systems.

Huge amounts of data are produced by retail applications from different point-of-sale systems, item stocking procedures, and other company operations. Edge computing can assist in analysing this vast quantity of data and locating problems that require quick resolution. Additionally, edge computing provides a way to handle consumer data locally, preventing it from leaving the client’s residence, a privacy regulation problem that is becoming more pressing.

In order to give medical practitioners precise, timely information about a patient’s status, the healthcare and medical industries gather patient data from sensors, monitors, wearable technology, and other devices. Edge computing solutions can provide dashboards with such data so users can see all the key indications in one convenient place. AI-enabled edge computing solutions can recognise anomalous data, allowing medical personnel to respond to patient requirements quickly and with the minimal possible false alarms. Furthermore, edge computing devices can aid in addressing concerns related to patient confidentiality and data privacy by processing data locally.

Global energy

Cities and smart grid systems can monitor public buildings and facilities for improved energy efficiency in areas like lighting, heating, and clean energy use by using edge computing devices. As an illustration: edge computing devices are utilised by intelligent lighting controls to regulate individual lights for optimal efficiency and public space safety; Embedded edge computer devices are used in solar fields to detect changes in the weather and modify their position; Edge computing is used by wind farms to send sensor data to substations and link to cell towers.

Public transit systems

Only the data necessary to support in-car activities and dispatcher insights in public transportation applications can be collected and transmitted by edge computing systems deployed in buses, passenger rail systems, and paratransit vehicles.

Travel transport utilities

In order to increase convenience and safety, edge computing can control when traffic signals turn on and off, open and close additional lanes of traffic, make sure that communications are maintained in the event of a public emergency, and do other real-time tasks. The adoption of autonomous vehicles will be significantly influenced by sophisticated traffic management systems, as was previously indicated.

Advanced industries

In advanced industries, vehicle sensors and cameras can provide data to edge computing devices, which make choices in milliseconds without any latency. This fast decision making is necessary in autonomous vehicles, for safety reasons. Self-parking apps and lane-departure warning are two examples of edge computing services that are currently readily accessible. Furthermore, as more cars are able to communicate with their surroundings, a quick and responsive network will be required. In order to assist predictive maintenance, electric vehicles require constant monitoring. Edge computing can be used to manage data in this regard. Data aggregation is supported by edge computing, which reports actionable data for maintenance and performance. These above-mentioned multitude of industries investing in implacability of edge devices. These industries include travel, transport and logistics, cross-vertical, retail, public sector utilities, global energy and materials, banking insurance, infrastructure and agriculture etc. Their share representation with respect to employability in various edge computing devices is shown in Fig. 1 a (Chabas et al. 2018 ). The travel, transport and logistics holds the maximum share of 24.2%, then 13.1% in global energy markets, 10.1% in retail and advanced industries followed by less shares by other industries. We have also represented hardware costs comparisons in terms of minimum and maximum cost in case of edge computing devices for mentioned industries. The hardware value includes opportunity across the tech stack on the basis of sensors, on-device firmware, storage and processor. By 2025, the edge computing-based devices depicts $175 to $215 billion potential hardware value. The industries such as travel, transport and logistics approximate $35 to $43, cross-vertical estimated to be $32 to $40 billion, $20 to $28 billion in retail sector, $16 to $24 billion in public sector utilities, $9 to $17 billion in global energy and materials, $4 to $11 billion in infrastructure and agriculture as depicted in Fig. 1 b (Chabas et al. 2018 ). There is a dire need to focus on advancing development of lightweight object detection models to boost their employability in heterogeneous edge devices. This survey study analyses the state-of-the-art deep learning-based lightweight object identification models in order to attain excellent performance on edge devices. With equivalent accuracy, powerful lightweight object detection models offer these advantages (Kim et al. 2023 ; Huang et al. 2022 ):

Lightweight object detection models based on deep learning require less communication between edge distributed training.

Less bandwidth will be needed to export a cutting-edge detection model from the cloud to a particular application.

Deploying lightweight detectors on Field Programmable Gate Arrays (FPGAs) and other hardware with limited resources is more practical.

figure 1

a Share representation of various industries embedded in edge computing devices. b Comparison of hardware costs in case of edge computing devices

1.1 Motivation

Object detection is the core concept in deploying innovative edge devices-based applications such as face detection (Li et al. 2015 ), objects tracking (Nguyen et al. 2019 ), video surveillance (Yang and Rothkrantz 2011 ), pedestrian detection (Brunetti et al. 2018 ), etc. The powerful capabilities of deep learning boost the performance of object detection in these applications. The generic deep learning-based object detection models have computational complexities such as extensive use of platform resources, more bandwidth, and large data processing pipelines (Jiao et al. 2019 ; Zhao et al. 2019 ). However, a detection network might potentially use three orders of magnitude more Floating Point Operations (FLOPs) than a classification network due to the computational complexity, making its deployment on an edge device much more difficult (Ren et al. 2018 ). The generic deep object detectors often use more network layers which eventually require high parameter tuning. Deep models have more network layers, which makes it harder for the network to detect small targets because they lose position and feature information over time. The network parameters being too large could damage the model’s effectiveness and make it challenging to implement on smart mobile terminals, which brings us to our final possible concern.

For the development of lightweight object detection on edge devices, a comprehensive assessment of the research directions related to this topic is necessary, particularly for researchers who are interested in pursuing this line of inquiry. To assess the usefulness of deep learning-based lightweight object detection on edge devices, more research is required than just a basic review of the literature. Because the proposed research can offer a comprehensive examination of the literature, it can achieve each of these objectives. A deep learning-based lightweight detection evaluation hasn’t been written about recently in the literature. There are generic and application specific surveys dedicated to deep learning-based object detectors (Jiao et al. 2019 ; Zou et al. 2023 ; Liu et al. 2020a , 2020b , 2020c , 2020d ; Mittal et al. 2020 ; Han et al. 2018 ; Zhou et al. 2021a , 2021b ) but not have consolidated study specifically for lightweight detectors for edge devices as mentioned in Table  1 . To raise readers’ understanding of this developing subject, deep learning-based lightweight object detectors on edge devices have been investigated in this work. The research of deep learning-based lightweight object identification models with regard to various backbone architectures and diverse applications on edge devices will be advanced by the release of this study. The key objectives of the survey are as follows:

To provide taxonomy of deep learning-based lightweight object detection algorithms on edge devices

To provide an analysis of deep learning-based lightweight backbone architectures for edge devices

Literature findings of applications deployed through lightweight object detectors

Comparison of lightweight object detectors by analyzing results on leading detection datasets

The organization of research paper is as follows: Sect.  2 elaborates the work related to development of deep learning-based object detectors. The deep learning-based object detectors have further categorized into two, one and advanced stage. Section  3 describes materials and methods required for deep learning-based lightweight detection models on edge devices. The architectural details related to training and inference lightweight models have also been mentioned in this section. Further, detailed crucial properties and performance milestone of lightweight object detection methods have been mentioned in this section. Section  4 discusses commonly utilized backbone architectures in deep learning-based lightweight object detection models. Further, applications of lightweight object detection models have also been mentioned. The recommendations for designing powerful deep learning-based lightweight model are provided in Sect.  4 . The final section brings the entire study to a close and outlines some crucial implications for more research.

2 Background

Recent developments in the field of deep learning-based object detectors have mostly concentrated on raising the benchmark datasets’ state-of-the-art accuracy, which has caused an explosion in model size and parameters. The research, on the other hand, has demonstrated interest in suggesting lighter, smaller, and smarter networks that would minimise the parameters while keeping cutting-edge performance (Nguyen et al. 2023 ). In the next section, we will provide a brief summary regarding categorization of generic object detection models.

2.1 Taxonomy of deep learning-based object detectors

During the last years, there has been a rapid and successful expansion in lightweight object detection research domain. This domain has exploded from adopting and familiarizing the latest machine and deep methods through development of new representations. The generic deep learning-based object detection models have been classified into two, one and, advanced stage each having different concepts.

2.1.1 Two-stage object detection models

Two-stage algorithms, having two different stages of region proposal and detection head. The first stage was for the calculation of RoI proposals using anchors in external region proposal techniques such as Edge Box (Zitnick and Dollár 2014 ) or Selective Search (Uijlings et al. 2013 ). The second stage consists of processing extracted RoIs into final bounding boxes, coordinate values and class labels. The examples of two-stage algorithms include Faster RCNN (Ren et al. 2015 ), Cascade RCNN (Cai and Vasconcelos 2018 ), R-FCN (Dai et. al. 2016 ) etc. The advantages of two-stage object detectors include better analysis of objects through given stages, multi-stage architecture to regress the bounding box values efficiently and better handling of class imbalance in datasets. Two-stage detectors adopted a deep neural-based Region Proposal Network (RPN) and a detection head. Even if the existing Light-Head R-CNN (Li et al. 2017 ) used a lightweight detection head, the backbone and detection part become imbalanced when the detection part is combined with a small backbone. This mismatch increases the danger of overfitting and causes repetitive calculation.

2.1.2 One-stage object detection models

Two-stage detectors helped deep learning-based object detection get off to a good start, but these systems struggled with speed. Due to their flexibility in satisfying demands like fast speed and minimal memory needs, one-stage detectors were ultimately adopted by researchers. The region proposal stage of two-stage detectors was eliminated by the one-stage algorithms since they saw the object identification problem as a regression problem. Instead of sending portions of the image to a fixed grid-based CNN, the entire image is sent at once, and anchors assist in identifying specific region suggestions. For the purpose of detecting the given area in a picture, boundary box coordinates were included. The examples of one-stage detectors include YOLO (Redmon et al. 2016 ), SSD (Liu et al. 2016 ), RetinaNet (Lin et al. 2017a , 2017b ) etc. The YOLO series outperforms two-stage models in terms of efficiency and accuracy.

2.1.3 Advanced-stage object detection models

The recently emerged advanced-stage object detectors removed the anchors concept in one-stage detectors for detecting objects. The advanced detector, CornerNet (Law and Deng 2018 ) detected objects as paired key points and a new corner pooling layer was introduced to better localize corners. CenterNet (Duan et al. 2019 ) detected the object as a triplet, rather than a pair of key points. Foveabox (Kong et al. 2020a , 2020b ) predicted category-sensitive semantic maps and category-agnostic bounding box for the object. The advanced-stage detectors also found struggling in locating multiple targets having small-size, complex backgrounds and slow detection speed. The one-stage methods (Bochkovskiy et al. 2020 ; Qin et al. 2019 ) utilized predefined anchor boxes and anchor-free (Duan et al. 2019 ) concepts for predicting bounding boxes.

2.1.4 Light-weight object detection models

The low computation in terms of bandwidth and resource utilization are light-weight object detectors and few examples include ThunderNet (Qin et al. 2019 ), PP-YOLO (Long et al. 2020a , 2020b ), YOLObile (Cai et al. 2021 ), Trident-YOLO (Wang et al. 2022a , 2022b , 2022c , 2022d ), YOLOV4-tiny (Jiang et al. 2020 ), Trident FPN (Picron and Tuytelaars 2021 ) etc.

The deep learning-based object detection algorithms have been categorized into two, one, advanced-stage and light weight detectors are highlighted in Fig.  2 . The algorithms such as Faster RCNN (Ren et al. 2015 ), Mask RCNN (He et al. 2017 ), Cascade RCNN (Cai and Vasconcelos 2018 ), FPN (Lin et al. 2017a , 2017b ) and R-FCN (Dai et al. 2016 ) etc., fall under two-stage detectors whereas YOLO (Redmon and Farhadi 2018 ), SSD (Liu et al. 2016 ), RefineDet (Zhang et al. 2018a , 2018b ) and RetinaNet (Lin et al. 2017a , 2017b ) under one-stage detectors. The advanced object detectors such as CornerNet (Law and Deng 2018 ), Objects as points (Zhou et al. 2019a ) and Foveabox (Kong et al. 2020a , 2020b ) are listed in Fig.  2 . However, the algorithms listed above often include a large number of channels and convolutional layers, which demand a lot of computing power for deployment in edge devices. The deep learning-based lightweight object detectors presented in Fig.  2 are specifically designed for contexts with limited resources. Due to their efficiency and compactness, the one and advanced stage detectors’ pipeline is the industry standard for designing lightweight object detectors.

figure 2

Taxonomy of recent deep learning-based object detection algorithms

3 Deep learning-based lightweight object detection models for edge devices

Numerous computer vision tasks, such as autonomous driving, robot vision, intelligent transportation, industrial quality inspection, object tracking, etc., have used deep learning-based object detection to a large extent. Deep models typically improve performance, but the deployment of real-world applications onto edge devices is constrained by their resource-intensive network. Lightweight mobile object detectors have drawn growing research interest as a solution to this issue, with the goal of creating extremely effective object detection. Deep learning-based lightweight object detectors have recently been developed for situations with limited computer resources, such as mobile devices.

The necessity to execute backbone designs on edge devices with constrained memory and processing power stimulates research and development of deep learning-based lightweight object identification models. A number of efficient lightweight backbone architectures have been proposed in recent years, for example, MobileNet (Howard et al. 2017 ), ShuffleNet (Zhang et al. 2018a , 2018b ), and DetNaS (Chen et al. 2019 ). However, all these architectures are heavily dependent on widely deployed depth-wise separable convolution-based methodologies (Ding et al. 2019 ). With regard to deep learning-based lightweight object identification models, we will describe methodology and each component in depth in the following sections. Our deep learning-based simple object detection models were heavily influenced by existing simple and complex object detection models. We give an architectural breakdown of deep learning-based lightweight object detection models in the following section.

3.1 Architecture methodology of lightweight object detection models

The different building blocks of deep learning-based lightweight object detection algorithms on edge devices consist of number of components consisting of input, backbone, neck and detector head. The definition and details of each component is tabulated in Table  2 . An input for the lightweight object detector is either an image, patch or pyramid, initially fed in the lightweight backbone architecture such as CSPDarkNet (Redmon and Farhadi 2018 ), ShuffleNet (Zhang et al. 2018a , 2018b ), MobileNet (Qian et al. 2021 ), PeleeNet (Wang et al. 2018 ) for the calculation of feature maps. The backbone is the part of deep learning-based lightweight object detection architecture which converts an image to feature maps, whereas the neck transforms the feature maps by connecting the backbone to detector head. The input image is passed to lightweight backbone architecture to calculate initial features vectors of objects. This backbone network may be a pre-trained network or a neural network built from scratch with the aim of feature extraction. The backbone architecture performs feature extraction and produces feature maps as an output. Then, neck component transforms this feature map to a required feature vector for handling various object detection challenges as per application. The lightweight detector head can be visualized as a deep neural network focusing on extraction of RoIs. Further, some pooling layer fixes the size of calculated RoIs to calculate final features of the detected objects. The final features are then passed onto classification and regression loss functions to assign class labels and regressing the coordinates values of bounding boxes. This whole process is repeated until the final regressed values of bounding boxes are obtained with the required class labels. The detailed methodology as presented in Fig.  3 , deep learning-based lightweight object detection consist of three parts i.e., backbone architecture, neck components, and lightweight head prediction. The input images are fed to the backbone and their architecture converts the input image into feature maps. In case of deep learning-based lightweight models, the backbone architecture should be deployed from given categories in Table  2 . The Conv2D + batch normalization + ReLU activation function is represented by a fundamental convolutional module that makes up the backbone architecture. By eliminating redundant gradient information from CNN’s optimization process and integrating gradient modifications into the feature map, it lowers input parameters and model size (Wang et al. 2020a , 2020b , 2020c ). In the bottleneck cross stage partial darknet model, for instance, a 640 × 640-pixel image is divided into four 320 × 320-pixel images, which are then combined to form a 320 × 320-pixel feature map. This 320 × 320x32 resulting feature map was produced using 32 convolutional kernels. Additionally, include the SPP module to add features of various sizes and increase the network’s receptive area. By enhancing the information flow between the backbone architecture and the detecting head, the neck alters the feature maps. The neck, PANet is built on a FPN topology utilized to provide strong semantic characteristics from top to bottom (Wang et al. 2019 ). FPN layers from bottom to top also express important positional features.

figure 3

Methodology of deep learning-based lightweight object detection model

Furthermore, PANet encourages the transmission of low-level characteristics and the use of precise localization signals in the bottom layers. This improves the target object’s position accuracy. The prediction layer, sometimes referred to as the detection layer, creates many feature maps in order to accomplish multiscale prediction. However, at the prediction layer, the model is capable of classifying and detecting objects of various sizes. As a result, it is projected that each feature map will have various regression bounding boxes at each position, yielding various regression bounding boxes. The anticipated output of the model with bounding boxes is then shown as a detection result. The three steps mentioned above combine the training model for detection into the lightweight object detection model. After model training, the test data is passed to get fine-tuned lightweight model with modified features as shown in Fig.  3 . The parameters in context of deep learning-based light-weight models are discussed below:

To train an edge-cloud-based deep learning model, edge devices and cloud servers must share model parameters and other data. More data must be transferred between edge devices and cloud servers as the training model gets bigger. A number of methods have been put forth to lower the cost of communication during training, including Edge Stochastic Gradient Descent (eSGD), which can reduce a CNN model’s gradient size by up to 90% by communicating only the most important gradients, and intermediate edge aggregation prior to federated learning server aggregation. The two main components of training deep learning-based lightweight detection models are the ability to exit before the input data completes a full forward pass through each layer of a neural network distributed over heterogeneous nodes and the use of binarized neural networks to reduce memory and compute load on resource-constrained end devices (Koubaa et al. 2021 ; Dey and Mukherjee 2018 ).

Researchers have created a novel architecture known as Agile Condor that carries out real-time computer vision tasks using machine learning methods. At the network edge, close to the data sources, Agile Condor can be utilised for autonomous target detection (Isereau et al. 2017 ). Precog is a new method that lowers latency for mobile applications by prefetching and caching that anticipates the subsequent classification request and uses end-device caching to store essential portions of a trained classifier. As a result, fewer offloads to the cloud occur and edge servers calculate the likelihood that linked end devices may make a request in the future. These pre-fetched modules function as smaller models that minimise network traffic and cloud processing while accelerating inference on the end devices (Drolia et al. 2017 ). Another example include ECHO is a feature-rich, thoroughly tested framework for implementing data analytics in a distributed hybrid Edge-Fog-Cloud configuration. ECHO offers services such virtualized application status monitoring, resource discovery, deployment, and interfaces to data analytics runtime engines (Ogden and Guo 2019 ).

When feasible, distributed deep network designs enable the deployment on edge-cloud infrastructure to support local inference on edge devices. A distributed neural network model’s ability to function effectively on minimising inter-device communication costs. Inference on the end-edge-cloud architecture is a dynamic problem because of evolving network conditions (Subedi et al. 2021 ). Static methods like remote inference only or on-device inference only are also not the best. Ogden and Guo have created a distributed architecture that provides a flexible answer to this problem for mobile deep inference. A centralised model manager will house many deep learning models, and the inference environment (memory, bandwidth, and power) will be used to dynamically determine which model should run on which device. If resources are scarce in the inference environment, one of the compressed models may be employed; if not, an uncompressed model with higher accuracy is used. Edge servers handle remote inference when networks are sluggish.

Privacy and security

Edge devices can be used to filter personally identifiable information prior to data transfer in order to enhance user privacy and security when processing data remotely (Xu et al. 2020 ; Hu et al. 2023a , 2023b ). Since data generated by end devices is not available to a central location, training deep learning models across several edge devices in a distributed way leads to more privacy. Personally identifiable information in photographs and videos can be removed at the edge before being uploaded to an external server, enhancing user privacy. The privacy of critical training data becomes an issue when training is conducted remotely. To ensure local and global privacy techniques, it is imperative to keep an eye out for any decline in accuracy, ensure low computing overheads, and provide resilience to communication errors and delays (Abou et al. 2023 ; Makkar et al. 2021 ).

3.2 Comprehensive analysis of lightweight object detection models

The development of extremely effective object detection outcomes has garnered increasing scientific attention in the small, transportable object detectors. With the use of efficient components and compression techniques like pruning, quantization, hashing, and other techniques, the effectiveness of deep learning lightweight object identification models has grown. Distillation, which employs a large network that has been used to train smaller models, has produced some surprising results as well. A comprehensive list containing multiple details of deep learning-based lightweight object detection models in the recent years is presented in Tables 3 , 4 . The categorization of anchor-based and anchor-free detectors for lightweight object detectors have been identified. Anchor-based methods are the mechanism of extracting RoIs employed in object detection models, such as Fast R-CNN (Girshick 2015 ). The anchor boxes are of various scales, which can be viewed as RoIs, as a priori for performing bounding box regression for coordinates values. The detectors including YOLOv2 (Redmon and Farhadi 2017 ), YOLOv3 (Redmon and Farhadi 2018 ), YOLOv4 (Bochkovskiy et al. 2020 ), RetinaNet (Lin et al. 2017a , 2017b ), RefineDet (Zhang et al. 2018a , 2018b ), EfficientDet (Tan et al. 2020 ), Faster R-CNN (Ren et al. 2015 ), Cascade R-CNN (Cai and Vasconcelos 2018 ), Trident-Net (Li et al. 2019 ), belonging to one and two-stage detectors have anchor mechanism to elevate the performance of deep learning-based object detection. Besides, anchor-free detectors have recently received more attention in academia and research by witnessing a large number of new anchor-free methods have been proposed. Earlier works such as YOLOv1 (Redmon et al. 2016 ), DenseBox (Huang et al. 2015 ) and UnitBox (Yu et al. 2016 ) can be considered as early anchor-free detectors. In anchor-free methods, anchor and key points are utilized to perform detection. The former approach does object bounding box regression-based on anchor points instead of anchor boxes, including FCOS (Detector 2022 ), FoveaBox (Kong et al. 2020a , 2020b ), whereas latter approach reformulates the object detection as keypoints localization problem, including CornerNet (Law and Deng 2018 ; Law et al. 2019 ), CenterNet (Duan et al. 2019 ), ExtremeNet (Zhou et al. 2019b ) and RepPoint (Yang et al. 2019 ). By eliminating the handcraft anchors’ restrictions, anchor-free techniques have a lot of promise for working with extremely large and small objects. The anchor-based detectors shown in Table  3 can compete with some newly proposed anchor-free lightweight object detectors in terms of performance. Further, input image type, code link and published sources are also mentioned in Table  3 . While Table  4 reports crucial milestones such as AP, description, loss function etc. for individual deep learning-based light-weight detector.

Tiny-DSOD (Li et al. 2018 ) a lightweight object detector inspired by a thoroughly supervised object detection framework, has been proposed for resource-constrained applications. With only 0.95 M parameters and 1.06B FLOPs, it uses depth-wise dense block as a backbone architecture and depth-wise FPN in neck components, which is by far the most advanced result with such a small resource demand. The context enhancement module and the spatial attention module of ThunderNet (Qin et al. 2019 ), a lightweight two-stage detector, are used as the backbone architectural blocks to produce more discriminative feature representation representation. The effective RPN used in a portable detecting head. ThunderNet outperforms earlier lightweight one-stage detectors by operating at 24.1 frames per second with 19.2 AP on COCO on an ARM-based smartphone. One of the most recent, cutting-edge lightweight object detection algorithms, PP-YOLO (Long et al. 2020a , 2020b ) employs MobileNetV3 (Qian et al. 2021 ), a practical backbone architecture for edge devices. The depth-wise separable convolutions used by PPYOLOtiny’s detection head make it better suited for mobile devices. PPYOLOtiny adopts the optimisation techniques used by PPYOLO algorithms but does away with techniques that have a big impact on model size and performance. Block-punched pruning and a mobile acceleration unit with a mobile GPU-CPU collaboration approach are provided by YOLObile (Cai et al. 2021 ). Trident-YOLO (Wang et al. 2022a , 2022b , 2022c , 2022d ) is an upgrade to YOLOV4-tiny (Jiang et al. 2020 ), designed for mobile devices with limited computing power. In neck components, Trident FPN (Picron and Tuytelaars 2021 ) improves the recall and accuracy of basic object recognition methods by reorganising the network topology of neck components. Trident-YOLO proposes fewer cross-stage partial RFBs and smaller cross-stage partial SPPs, as well as enlarging the receptive field of the network with the fewest FLOPs. Conversely, Trident-FPN significantly enhances lightweight object detection performance by increasing the computational complexity through an increase in a limited number of FLOPs and producing a multi-scale model feature map. In order to simplify computation, YOLOV4-tiny (Jiang et al. 2020 ) uses two ResBlock-D modules in place of two CSPBlock modules in the ResNet-D network. In order to extract more feature information about the object, such as global features, channel, and spatial attention, it also creates an auxiliary residual network block with consecutive 3 × 3 convolutions that is utilized to obtain 5 × 5 receptive fields with the goal of reducing detection error. Optimizing the original YOLOv4 (Bochkovskiy et al. 2020 ), Slim YOLOv4 (Ding et al. 2022 ) changes the backbone architecture from CSPDarknet53 to MobileNetv2 (Sandler et al. 2018 ). Separable convolution and depth-wise over-parameterized convolutional layers were chosen to minimize computation and enhance the performance of the detection network. Based on YOLOv2 (Redmon and Farhadi 2017 ; Wang et al. 2022a ), YOLO-LITE (Huang et al. 2018 ; Wang et al. 2021a ) offers a quicker, more effective lightweight variant for mobile devices. On a PC without a GPU, YOLO-LITE works at roughly 21 frames per second and 10 frames per second when used on a website with only 7 layers and 482 million FLOPS. Object recognition using Fully Convolutional One‐Stage (FCOS) (Detector 2022 ) addresses the issue of label overlap within the ground-truth data. Unlike previous anchor-free detectors, there is no complex hyper-parameter adjustment. Large-scale server detectors constitute the majority of anchor-free detectors in general. The two small minority of anchor-free mobile device detectors are NanoDet (Li et al. 2020a , 2020b ) and YOLOX-Nano (Ge et al. 2021 ). The issue is that compact anchor-free detectors typically struggle to strike a good balance between efficiency and accuracy. In order to choose positive and negative samples, the FCOS method NanoDet employs Adaptive Training Sample Selection (ATSS) (Zhang et al. 2020a , 2020b , 2020c ) and uses generalised focal loss as the loss function for classification and bounding box regression. The centerness branch of FCOS and numerous convolutions on this branch are eliminated by the application of this loss function, which lowers the computational cost of the detection head. A lightweight detector dubbed L-DETR (Li et al. 2022a ) is created based on DETR and PP-LCNet to balance efficiency and accuracy. L-DETR has fewer parameters with the new backbone than the DETR. It is utilised to compute the overall data and arrive at the final prediction. Its normalisation and FFN are enhanced, and thus raises the precision of frame detection. In Table  5 , some well-known metrics in calculating performance of lightweight object detection models have been highlighted. The metrics termed FLOPs are frequently used to determine how computationally complex deep learning models are. They provide a quick and simple method of figuring out how many arithmetic operations are needed to complete a particular computation. It can offer extremely helpful insights on computational costs or requirements or energy consumption, which is particularly helpful for edge computing. It is useful when we have to estimate the total number of arithmetic operations needed, which is usually when computing efficiency is being measured. As highlighted, YOLOv7-x has highest FLOPs i.e., 189.9G among the mentioned detectors. One of the more important components of using a deep network architecture in deployment is the network latency/inference time. The majority of real-world applications need inference times that are quick—a few milliseconds to a second. It needs in-depth knowledge to measure a neural network’s inference time accurately. The time it takes for a deep learning algorithm to process fresh input and produce a prediction is known as the inference time in deep learning. The number of layers, the complexity of the network, and the number of neurons in each layer can all impact this time. Inference times typically rise with network complexity and scale. In our analysis, YOLOv3-Tiny has lowest inference time of 4.5 ms. The Frame Per Second (FPS) is a measure of how rapidly a deep learning model can handle frames. It also specifies how quickly your object detection model will process your photos and videos and produce the desired results. YOLOv4-Tiny has highest FPS among presented ones in Table  5 . Weight and bias are the model parameters in deep learning, which are characteristics of the training data that will be discovered throughout the learning process. The total number of parameters, which is a common indicator of a model’s performance, is the sum of all the weights and biases on the neural network. YOLO-X Nano has least learning parameters when compared with others. Moreover, with respect to each light-weight object detector, prediction regarding deployment of individual detector in real-time applications has been done on the basis of their AP values highlighted in Table  4 . MobileNet-SSD, MobileNetV2-SSDLite, Tiny-DSOD, Pelee, YOLO-Lite, MnasNet-A1 + SSDLite, YOLOv3-Tiny, NanoDet and Mini YOLO are not efficient when deployed.

Additionally, in latest years, one-stage YOLO-based lightweight object detectors have been developed which are mentioned in Table  6 . In 2024, DSP-YOLO (Zhang et al. 2024 ) and YOLO-NL (Zhou 2024 ) emerged but not ready to be deployed in real-life applications. On the contrary, EL-YOLO (Hu et al. 2023a , 2023b ), YOLO-S (Betti and Tucci 2023 ) GCL-YOLO (Cao et al. 2023 ), Light YOLO (Yin et al. 2023 ), Edge YOLO (Li and Ye 2023 ), GL-YOLO-Lite (Dai and Liu 2023 ) and LC-YOLO (Cui et al. 2023 ) can be merged in real-life applications of computing world. Further, we have added performance parameters in terms of FLOPs, Inference time, FPS and number of parameters with respect to each latest YOLO-based light-weight object detector. YOLO-S utilized least number of FLOPs i.e., 34.59B whereas Light YOLO has maximum FPS of 102.04 and GCL-YOLO has lease number of parameters as depicted in Table  6 .

3.3 Backbone architecture for deep learning-based lightweight object detection models

Deep learning-based models for image processing advanced and effectively outperformed more conventional methods in terms of object classification (Krizhevsky et al. 2012 ). The most effective deep learning object categorization architectures have been Convolutional Neural Networks (CNNs), which function similarly to human brains and include neurons that react to their surroundings in real time (Makantasis et al. 2015 ; Fernández-Delgado et al. 2014 ). Well-known CNN architectures based on deep learning have been used for object classification-based feature extractors to fine-tune the classifiers. Forward propagation is used to process the training with random seeds for the filters and parameters. However, due to severely resource-constrained conditions, notably in memory bandwidth, the development of specialised CNN architectures for lightweight object identification models has received less attention than expected. In this section, we summarized backbones i.e., feature extractors for deep learning-based lightweight object detection models. Backbone architectures are used to extract the features for conducting lightweight object identification tasks where an image is provided as an input and a feature map is produced as an output. The majority of backbone architectures for detection tasks are essentially networks for classification problems, which take into account the final fully linked layers. DetNaS convolutional neural network is shown in Fig.  4 to help understand how backbone architectures function in the context of lightweight object identification models. These architectures are shown block-by-block. ShuffleNetv2 5*5 and 7*7 blocks are what the blue and green blocks are made of. Kernel sizes for the blue blocks are 3. In comparison to pink 3*3 ShuffleNetv2 blocks, the peach colour blocks are Xception ShuffleNetv2 blocks (Ma et al. 2018 ). Each level has eight blocks and the total number of blocks is forty. Large-kernel blocks are found in low-level layers while deep blocks are found in high-level layers in the lightweight DetNAS architecture. Blocks of huge kernels (5*5, 7*7) are present in DetNAS’ stage 1 and stage 2’s low-level layers. Pink-colored blocks, on the other hand, have kernels that are 3*3. Stages 3 and 4 are comprised of peach and pink blocks, as shown in the centre of Fig.  4 . Six of these eight blocks—Xception and ShufflNetv2 blocks—are deeper than standard 3*3 ShufflNetv2 blocks. These results lead us to the conclusion that lightweight object detection networks differ visually from conventional detection and classification networks. In the next section, a brief introduction about deep learning-based lightweight backbone architectures have been given:

figure 4

Architectural details of backbone architecture DetNaS (Chen et al. 2019 )

3.3.1 MobileNet (Howard et al. 2017 )

MobileNet created an efficient network architecture made up of 28 depth-wise separable convolutions to factorise a standard convolution into a depth-wise convolution and a 1 × 1 point-wise convolution. By applying different kernels, isolating the filtering, and merging the features using pointwise convolution in depth-wise convolution, the computing cost and model size were reduced. Two more model-shrinking hyperparameters, width and resolution multiplier, were added in order to improve performance and reduce the size of the model. The model’s oversimplification and linearity, which resulted in fewer channels for gradient flow, were corrected in later versions.

3.3.2 MobileNetV2 (Sandler et al. 2018 )

The inverted residual with linear bottleneck, a novel module, was added to MobileNetv2 in order to speed up calculations and improve accuracy. In the MobileNetv2 there were two convolutional layers followed by 19 bottleneck modules. The computationally efficient MobileNetv2 feature extractor was used by the SSD writers to detect objects. With respect to the original SSD, the new device, known as SSDLite, touted an 8 × reduction in parameters. It is simple to construct, generalises well to different datasets, and as a result, garnered positive feedback from the community.

3.3.3 MobileNetv3 (Qian et al. 2021 )

In MobileNetv3, the unneeded portions of the network were iteratively removed during an automated platform search in a factorised hierarchical search space. This model is then modified to increase the desired metrics after the design proposal has been prepared. Since the architecture’s filters regularly reflect one another, accuracy can be maintained even if half of them are discarded, which reduces the need for further processing. MobileNetv3 merged harsh Swish and RELU activation filters because the latter was computationally more efficient while preserving accuracy.

3.3.4 ShuffleNet (Zhang et al. 2018a , 2018b )

According to the authors, many effective networks lose their effectiveness as they scale down because of expensive 1 × 1 convolutions. ShuffleNet is a neural network design that was very computationally effective and created especially for mobile devices. To overcome the issue of restricted information flow, it was suggested to use group convolution along with channel shuffling. The ShuffleNet unit, like the ResNet block, substituted a pointwise group convolution for the 1 × 1 layer and a depth-wise convolution for the 3 × 3 layer.

3.3.5 ShuffleNetv2 (Ma et al. 2018 )

ShuffleNetv2 advocated in favour of using speed or latency as direct measures rather than FLOPs or other indirect metrics to determine how complex a computation is. Four guiding principles served as its foundation: equal channel width to lower memory access costs, group convolution selection based on target platform, multi-path ways to boost accuracy, and element-wise operations. The input was split in half by a channel split layer in this model, and the residual link was concatenated by three convolutional layers before being sent through a channel shuffle layer. ShuffleNetv2 outperformed other cutting-edge models of comparable complexity, outperforming its peers.

3.3.6 PeleeNet (Wang et al. 2018 )

PeleeNet is an inventive and effective architecture based on traditional convolution that was created using a number of computation-saving strategies. PeleeNet’s design comprised four iterations of modified dense and transition layers, followed by the classification layer. The two-way dense layer helps to obtain distinct receptive field scales, which makes it simpler to identify larger things. Using a stem block minimised the loss of information. While our model’s performance was not as good as modern object detectors on mobile and edge devices, it did demonstrate how even seemingly little design decisions can have a substantial impact on total performance.

3.3.7 mNASNet (Tan et al. 2019 )

Using NAS (Neural Architecture Search) automation, mNASNet was created. It conceptualised the search problem as a multi-object optimisation problem, with a dual focus on latency and accuracy. Unlike previous models that stacked identical blocks, this allowed for the design of individual blocks. By dividing the CNN into distinct blocks and then looking for operations and connections in each of those blocks separately, it factorised the search space. mNASNet was roughly twice as rapid as MobileNetv2 and more accurate.

3.3.8 Once for all (OFA) (Cai et al. 2019 )

In recent years, modern models have been constructed using NAS for architecture design; nonetheless, the sampled model training resulted in costly computations. This model just needs to be trained once, after which sub-networks can be constructed from it based on the requirements. Thanks to the OFA network, such sub-networks can be variable in the four key dimensions of a convolutional neural network: depth, width, kernel size, and dimension. They slowed down the training process and caused layering within the OFA network, which eventually resulted in gradual shrinkage.

3.3.9 MobileViT (Mehta and Rastegari 2021 )

Combining the benefits of CNNs and Vision Transformers (ViT), MobileViT is a transformer-based detector that is lightweight, portable, and compatible with edge devices. It was able to successfully identify both short- and long-range dependencies by utilising a unique MobileViT block. Alongside the MobileViT block, MobileNetv2 modules (Sandler et al. 2018 ) were made available in serial form. Unlike previous transformer-based networks, they used a transformer as a convolution, which automatically incorporated spatial bias, therefore location encoding was not necessary. MobileViT performed well on complex problems, supporting its claim to be a general-purpose backbone for various vision applications. Because of the constraints of transformers on mobile devices, it was able to attain better accuracy with a smaller parameter budget.

3.3.10 SqueezeNet (Iandola et al. 2016 )

SqueezeNet attempts to maintain the accuracy of the network by using techniques with fewer parameters. Smaller filters, 3 × 3 filters for the input channels, and later network placement of the down-sampling layers were the design strategies employed. SqueezeNet’s core module, the fire module, consisted of an extended layer and a squeeze layer, each containing a ReLU activation. Eight Fire modules were stacked and jammed in between the convolution layers to form the SqueezeNet architecture. Accuracy was increased over the basic model using SqueezeNet with residual connections, which was also developed and inspired by ResNet (He et al. 2016 ). SqueezeNet showed out as a serious contender for boosting the hardware efficiency of neural network topologies.

The year and initial usage in which backbone architectures have been utilized, the number of parameters, merits, and top-1 accuracy have been elaborated in the given Table  7 . According to research on deep learning-based backbone architectures, SqueezeNet (Iandola et al. 2016 ) and ShuffleNetv2 (Ma et al. 2018 ) are the most widely used lightweight backbone architectures used in edge devices today. The performance of the model built from depth-wise separable convolutions, inverted residual topologies with linear bottlenecks, and automatic complementary search structures is gradually enhanced by the MobileNet series (Howard et al. 2017 ; Qian et al. 2021 ; Sandler et al. 2018 ).

4 Performance analysis of deep learning-based lightweight object detectors

In this section, a comprehensive analysis has been made from above-discussed lightweight object detectors and related backbone architectures. It can be said that lightweight object detectors based on deep learning strike a balance between accuracy and efficiency. Although the above-mentioned lightweight detectors from the previous sections have a quick inference rate, accuracy isn’t always up to par for some jobs. As shown in Fig.  5 , performance evaluation of deep learning-based lightweight object detectors in terms of mAP on MS-COCO dataset, lightweight object detector YOLOv7-x performs best among mentioned detectors. The backbone architectures in deep learning-based lightweight object detectors play a vital role in determining the accuracy of models. The convolutional architectures specifically designed for edge devices in terms of limited bandwidth usage would be an ideal choice for embedding in detection models. The top-1 accuracy comparison of deep learning-based lightweight backbone architectures in detection models is presented in Fig.  6 . The backbone architecture ShuffleNetV2 attains 70.9 accuracy, a large jump from SqueezeNet (Iandola et al. 2016 ) results. A marginal accuracy increase can be seen in the architectures such as PeleeNet (Wang et al. 2018 ), DetNas (Chen et al. 2019 ), mNASNet (Tan et al. 2019 ), GhostNet (Han et al. 2020a , 2020b ) but the recently emerged transformers-based architecture, i.e., MobileViT (Mehta and Rastegari 2021 ) achieves best state-of-the-art results. Moreover, from year 2017 to 2023, we have shown literature summary in terms of number of publications for deep learning-based lightweight backbone architectures in Fig.  7 . The most popular architecture SqueezeNet has been utilized over years in lightweight detectors as shown in Fig.  7 . GhostNet (Paoletti et al. 2021 ) and MobileViT (Mehta and Rastegari 2021 ) backbone architectures have more literature in 2022 and 2023 year. As mentioned above, the state-of-the-art object detection works are either accuracy-oriented using a large model size (Ren et al. 2015 ; Liu et al. 2016 ; Bochkovskiy et al. 2020 ) or speed- oriented using a lightweight model but sacrificing accuracy (Wang et al. 2018 ; Sandler et al. 2018 ; Li et al. 2018 ; Liu and Huang 2018 ). It is difficult for any of existing lightweight detectors to meet the accuracy and latency requirements of real-world applications on mobile and edge devices at the same time. Therefore, we require a mobile device solution that can accomplish both high accuracy and low latency to deploy lightweight object detection models.

figure 5

mAP Performance evaluation of major deep learning-based lightweight object detectors

figure 6

Accuracy comparison of deep learning-based lightweight backbone architectures in detection models

figure 7

Year-wise literature summary of backbone architectures in case of lightweight detection models

4.1 Benchmark detection databases for light-weight object detection models

In this section, most popular datasets have been discussed concerning to deep learning-based lightweight object detectors. Datasets are essential for lightweight object detection because they allow for standard comparisons of competing algorithms and the establishment of objectives for solutions.

4.1.1 PASCAL VOC (Everingham et al. 2010 )

The most well-known object detection dataset is this one. The PASCAL-VOC versions VOC2007 and VOC2012 are frequently used in papers. 2501 training, 2510 validation, and 5011 testing images make up VOC2007. VOC2012, on the other hand, comprises of 10,991 training, 5823 validation images, and 5717 testing images. The PASCAL VOC datasets include 11,000 images spread across 20 visual object classes. Animals, vehicles, people, and domestic things are the four broad categories into which these 20 classes can be divided. Additionally, classifications of objects with semantic similarities, such as trucks and buses, enhance the complexity levels for detection. Visit http://host.robots.ox.ac.uk/pascal/VOC/to get the dataset.

4.1.2 MS-COCO (Lin et al. 2014 )

A sizable image dataset called MS-COCO (Microsoft Common Objects in Context) contains 328,000 photographs of commonplace items and people. It is now one of the most well-liked and difficult object detection datasets. It has 897,000 tagged objects in 164,000 photos across 80 categories. For the training, validation, and testing sets, there are 118,287, 5000, and 40,670 photos, respectively. The distribution of objects in MS-COCO is more in line with real-world circumstances. There is no information available regarding the MS-COCO testing set’s annotations. The following categories of annotations are offered by MS-COCO, including those for captioning, keypoints, panoptic segmentation, dense pose, and object detection. The MS-COCO dataset provides a wide range of realistic images, showing disorganized scenes with various backgrounds, overlapping objects, etc. The URL of the dataset is http://cocodataset.org .

4.1.3 KITTI (Geiger et al. 2013 )

It is a well-known dataset for traffic scene analysis and includes 7518 photos for testing and 7481 for training that have been labelled. There are 100,000 pedestrian cases, 6000 IDs, and an average of one person per photograph. The pedestrian and cyclist are the two subclasses of the human class in KITTI. Based on how much the objects are obscured and shortened, the object labels are divided into easy, moderate, and hard levels. In this dataset, there are two subcategories for people: pedestrians and cyclists. Utilizing three criteria that differ in the minimum bounding box height and maximum occlusion level, the models trained on it are assessed. Visit http://www.cvlibs.net/datasets/kitti/index.php to download the dataset.

We have presented performance of deep learning-based lightweight detection models on above-discussed detection datasets in Fig.  8 . The lightweight object detector YOLOv4-dense achieves mAP value of 84.30 on KITTI dataset, 71.60 on PASCAL VOC dataset. L4Net detector attains mAP value of 71.68 on KITTI, 82.30 on PASCAL VOC and 42.90 on MSCOCO dataset. RefineDet-lite detector achieves mAP value of 26.80 on MSCOCO dataset. Further to compare performances, FE_YOLO performs best on KITTI dataset as presented in Fig.  8 whereas L4Net detector performs best on MSCOCO dataset and finally, lightweight YOLO-Compact detector outperforms other detectors on PASCAL VOC dataset.

figure 8

Performance evaluation of deep learning-based lightweight models on leading datasets

4.2 Evaluation parameters

Lightweight object identification models based on deep learning use the same evaluation criteria as generic object detection models. Out of all predictions made, accuracy is the proportion of things that were successfully anticipated. When dealing with class unbalanced data, where the number of instances is not equal for each class, the accuracy result can be quite deceptive because it places more emphasis on learning the majority classes than the minority classes. Therefore, mean Average Precision (mAP), Frames Per Second, and the size of the model weight file serve as the primary evaluation indices for the effectiveness of the lightweight object identification model. The correct labelling data for each image provides the precise number of objects in each category in the image. Intersection Over Union (IoU) quantifies the similarity between the ground and predicted bounding box to evaluate how good the predicted bounding box is as represented in Eq. ( 1 ):

The calculation of IoU value takes place between each prediction box and ground data. Then consider the largest IoU value, and, based on the IoU threshold, we can calculate the number of True Positives (TP) and False Positives (FP) for each object category in an image. From this, the Precision of each category is calculated according to Eq. ( 2 ):

When the correct number of TP is obtained, the number of False Negatives (FN) are measured through Recall as in Eq. ( 3 ).

By figuring out various recall rates and associated accuracy rates for each category, PR curves for each can be plotted. The value of AP is identical to the region enclosed by the PR curve in the PASCAL VOC 2010 object detection competition evaluation criteria. Precision, recall rate, and average accuracy are three metrics that can be used to assess the model’s accuracy for detecting tasks. MS COCO averages mAP with a step of 0.05 over a range of IoU thresholds (0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, and 0.95). The main metric used to judge competitors is called “mAP,” which averages AP over all 80 COCO dataset categories and all 10 criteria. A higher AP score according to the COCO evaluation criteria denotes flawless bounding box localization of the discovered items. The typical COCO-style AP metric, which averages APs over IoU threshold ranges of 0.5 to 0.95 with 0.05 steps. The performance is measured using AP50, AP75 at various IoU thresholds and AP s , AP m , and AP l on objects that are small, medium, and large in size. By averaging over all 10 IoU thresholds across all categories with a uniform step size of 0:05, the primary metric, AP(IoU) = 0.50:0.05:0.95, is determined.

4.3 A summary of edge devices-based platforms for lightweight object detectors

In the upcoming years, a ton of data will be produced by mobile users and IoT devices. Data growth will bring new problems like latency. Additionally, traditional methods cannot be relied upon for very long if intelligence is to be derived from deep learning-based object detection and recognition algorithms in real-time. Edge computing devices have drawn a lot of interest as a result of prominent firms’ efforts to make supercomputers affordable. It is vital to enable developers to swiftly design and deploy edge applications from lightweight detection models as the IoT, 5G, and portable processing device eras approach. As a result of advancements in the field of deep learning, numerous enhancements to object identification models have been presented that are aimed at edge device applications. DeepSense, TinyML, DeepThings, and, DeepIoT are just a few of the frameworks that have been published in recent years with the intention of compressing deep models for IoT edge devices. To satisfy the processing demands of deep learning-based lightweight object detectors, the model must be able to overcome several constraints like a limited battery, high energy consumption, limited computational capabilities, and a constrained memory while maintaining a level of accuracy. The primary goal should be to create a framework that makes it possible for machine learning models to be quickly implemented in Internet of Things devices. The well-known TinyML frameworks TensorFlow Lite from Google, ELL from Microsoft, ARM-NN and CMSIS-NN from ARM, STM 32Cube-Al from STMicroelectronics, and Alfes from Fraunhofer IMS enable the use of deep learning at the peripheral. When combined with other microcontroller-based tasks, low-latency, low-power, and low-bandwidth AI algorithms can function as part of an intelligent system at a low cost thanks to TinyML on a microcontroller. The DeepIoT framework reduces neural network designs into less dense matrices while preserving the performance of sensing applications by figuring out how few non-redundant hidden components, including filters and dimensions, are needed by each layer. Another well-known framework that provides deep learning-based lightweight object recognition is TensorFlow. TensorFlow Lite is a cross-platform, quick, and lightweight mobile and IoT framework (TFLite) to scale down their massive models. The majority of lightweight models employ TensorFlow lite quantization, which is easy to deploy on edge devices.

4.3.1 Mobile phones

The limitations imposed by mobile devices may be the reason why less research is being done on the deployment of object detectors on mobile phones than on other embedded platforms. Smartphone complexity and capabilities are rising quickly, but their size and weight are also probably going to decrease. Few literature studies have tried to perform implementation on smartphones-based devices (Lan et al. 2019 ; Liu et al. 2020a , 2020b , 2020c , 2020d ; Liu et al. 2021a , 2021b , 2021c ; Li et al. 2021a , 2021b , 2021c ; Paluru et al. 2021 ). It can be seen that it puts a heavy burden on creating models that are small, light, and require a minimum number of computations. It is advised to test novel concepts for deep learning inference optimization on transportable models that are regularly utilized with cellphones (Xu et al. 2019 ). Either the spatial or temporal complexity of deep learning models can be reduced to the point where they can be fully implemented on mobile devices. But there may be a lot of security issues that need to be fixed Steimle et al. ( 2017 ). Although deep learning for smartphone item detection appears to be a promising field of study, success will need many more contributions (Wang et al. 2022a , 2022b , 2022c , 2022d ).

4.3.2 IoT edge devices

Another way to enable deep learning on IoT edge devices is to transfer model inference to a cloud server. Another way to boost the power of these inexpensive devices is to add an accelerator. The price of using these accelerators is a major drawback, though. Some edge devices, like the Raspberry Pi, may require an extra accelerator, but some, like the Coral Dev Board, already have edge TPU accelerators built in. Deep learning can be more easily enabled to run locally or remotely using a distributed design that links computationally inefficient front-end devices with more potent back-end devices, like a cloud server or accelerator (Ran et al. 2017 ).

4.3.3 Embedded boards

To provide the finest design options, processor-FPGA combinations, and FPGAs with hard processor cores embedded into its fabric are widely used. Lattice Semiconductor, Xilinx, Microchip, and Altera from Intel are the well-known manufacturers. The literature suggests that the Xilinx boards family is the one that is most frequently utilized for deep learning-based applications. An additional accelerator is often needed when employing FPGA devices (Saidi et al. 2021 ) to get acceptable performance. Due to Integrated Development Environment (IDE) and high-level language support, the Arduino and Spark-based boards at the top of the device family allow for greater software level programming (Kondaveeti et al. 2021 ).

4.4 Applications specific to deep learning-based lightweight object detectors

In the above sections, we have discussed architectural details, leading datasets of deep learning-based lightweight object detection models. These models offer a multitude of applications such as in remote-sensing (Xu and Wu 2021 ; Ma et al. 2023 ), and aerial images (Xu and Wu 2021 ; Zhou et al. 2022 ), traffic monitoring (Jiang et al. 2023 ; Zheng et al. 2023 ), fire detection (Chen et al. 2023 ), indoor robots (Jiang et al. 2022 ), pedestrian detection (Jian et al. 2023 ) etc. A summary of literature findings for supporting the applications of deep learning-based lightweight object detection models is listed in Table  6 . In (Zhou et al. 2019 ), for Range Doppler (RD) radar pictures, a lightweight detection network called YOLO-RD has been proposed. Additionally, a brand-new, lightweight mini-RD dataset has been created for effective network training. On the mini-RD dataset, YOLO-RD produced effective results with a smaller memory budget and a detection accuracy of 97.54%. Regarding both algorithm and hardware resource aspects in object detection, (Ding et al. 2019 ) introduced REQ-YOLO, a resource conscious, systematic weight quantization framework for object detection. For non-convex optimisation problems on FPGAs, it applied the block-circulant matrix approach and proposed a heterogeneous weight quantization. The outcomes demonstrated that the REQ-YOLO framework can greatly reduce the size of the YOLO model while just slightly reducing accuracy. It is suggested that autonomous vehicles use the L4Net Locating object suggestions from (Wu et al. 2021 ) which integrates a key point detection backbone with a co-attention strategy by attaining cheaper computation costs with improved detection accuracy under a variety of resource constraints. To generate more precise prediction boxes, the backbone capture context-wise information and co-attention method specifically combined the strength of both class-agnostic and semantic attention. With a 13.7 M model size and speeds of 149 FPS on an NVIDIA TX and 30.7 FPS on a Qualcomm-based device, respectively, L4Net achieved 71.68% mAP. The development of effective object detectors requires the rapid development of CPU-only hardware because to the huge data processing and resource-constrained scenarios on GPUs. With three orthogonal training strategies—IoU-guided loss, classes-aware weighting method, and balanced multi-task training approach, (Chen et al. 2020a , 2020b ) proposed a lightweight backbone and light-head detecting component. On a single-thread CPU, the suggested RefineDetLite obtained 26.8 mAP at a pace of 130 ms/pic. LiraNet, a compact CNN, was suggested by (Long et al. 2020a , 2020b ) for the recognition of marine ship objects in radar pictures. By creating Lira-YOLO, a compact model that is simple to set up on mobile devices, LiraNet was mounted on the already-existing detection framework Darknet. Additionally, a lightweight dataset of distant Doppler domain radar pictures known as mini-RD had been created to test the performance of the proposed model. Studies reveal that Lira-YOLO’s network complexity is minimal at 2.980 Bflops, and its parameter quantity is reduced at 4.3 MB thanks to its high detection accuracy of 83.21%. (Lu et al. 2020 ) developed a successful YOLO-compact network for real-time object detection in the single person category. The down sampling layer was separated in this network, which facilitated the modular design by enhancing the remaining bottleneck block. YOLO-compact’s AP result is 86.85% and its model size is 9 MB, making it smaller than tiny-yolov3, tiny-yolov2, and YOLOv3. By focusing on small targets and background complexity, (Xu and Wu 2021 ) presented FE-YOLO for deep learning-based target detection from remote sensing photos. The analyses on remote sensing datasets demonstrate that our suggested FE-YOLO outperformed existing cutting-edge target detection methods. A brand-new YOLOv4-dense model was put forth by (Jiang et al. 2023 ) for real-time object recognition on edge devices. To address the issue of losing small objects and further minimize the computing complexity, a dense block had been devised. With 20.3 M parameters, YOLOv4-dense obtained 84.3% mAP and 22.6 FPS. To improve the detection of small and medium-sized objects in aerial photos, (Zhou et al. 2022 ) developed Dense Feature Fusion Path Aggregation Network (DFF-PANet). The trials were conducted using the HRSC2016 dataset and the DOTA dataset, yielding 71.5% mAP and 9.2 M as a lightweight model. To help an indoor mobile robot solve the problem of object detection and recognition, (Jiang et al. 2022 ) presented ShuffleNet-SSD. Deep separable convolution, point-by-point grouping convolution, and channel rearrangement were all created using the suggested model. A dataset has been created for the mobility robot under the indoor scene. For the detection of dead trees, (Wang et al. 2022a , 2022b , 2022c , 2022d ) suggested a novel, lightweight architecture called LDS-YOLO based on the YOLO framework. These plants assisted in the timely regeneration of dead trees, allowing the ecosystem to remain stable and efficiently withstand catastrophic disasters. With the addition of the SoftPool approach in Spatial Pyramid Pooling (SPP), a unique feature extraction module is provided that makes use of the features from earlier layers in order to ensure that small targets are not ignored. On the basis of UAV-captured photos, the suggested approach is assessed, and the experimental findings show that the LDS-YOLO architecture works well when compared to AP of 89.11% and parameter size of 7.6 MB. The categorization of several applications concerning to lightweight object detectors as shown in Table  8 with respect to image type such as remote-sensing, aerial, medical and video streams and application type of healthcare, medical, military and industrial use.

4.5 Discussion and contributions

According to the above-mentioned analysis of deep learning-based light-weight object detectors, there is a need of focus to develop detectors for edge devices which can strike a good balance between speed and accuracy. Furthermore, real-time deployment of these detectors on edge devices is also needed while achieving accuracy of lightweight detectors without compromising precision. In 2022, lightweight backbone architectures ShuffleNet and SqueezeNet have highest publications with respect to lightweight object detectors. In 2023, transformers based MobileViT started getting attention of researchers as top-1 accuracy of 78.4 is also achieved and MobileNet backbone architectures were maximum employed when compared with others. As shown in Table  8 , according to input type, video streams have maximum employability in deep learning-based light-weight object detectors. With respect to diverse applications, traffic and pedestrians related detection problems, obstacles and driving assistance have highest studies whereas all other existing applications have limited light weight detectors on edge devices. As we witnessed, the majority of presented light-weight models are from the YOLO family, a number of deep network layers with increasing number of parameters needing to account for the improved accuracy. Therefore, the most important question to ask when a model migrates from a cloud device to an edge device is how to lower the parameters of a deep learning-based lightweight model. There are numerous approaches being used to address this which are described in the next section.

4.6 Recommendations for designing powerful deep learning-based lightweight models

Researchers have created new training methods that decrease the memory footprint in the edge device and speed up training on low-resource devices, in addition to specialized hardware for the deep learning model training process at the network edge. The techniques which we discussed in this section- pruning, quantification, knowledge distillation, and low-rank decomposition, are the four key categories used to compress pre-training networks (Kamath and Renuka 2023 ) and listed in the following (Koubaa et al. 2021 ; Makkar et al. 2021 ; Wang et al. 2020a , 2020b , 2020c ):

4.6.1 Pruning

Network pruning is a useful technique for reducing the size of the object detection model and speeding up model reasoning. By cutting out connections between neurons that are irrelevant to the application, this method lowers the amount of computations needed to analyse fresh input. In addition to eliminating connections, it can also eliminate neurons that are deemed irrelevant when the majority of their weights are low in relation to the deep neural network’s overall context. With the use of this method, a deep neural network with reduced size, greater speed, and improved memory efficiency can be used in low-resource devices, such as edge devices.

4.6.2 Weights quantization

The weight quantization approach, which trades precision for speed, shrinks the model’s storage capacity by reducing the number of floating-point parameters. In addition to eliminating pointless associations, every weight is kept as separate values. The weights quantization technique aims to compress these values to integers or numbers that occupy as few bits as possible by clustering related weight values into a single value. Consequently, there will be a re-adjustment of the weights, indicating a modification of the precision as well. This results in a cyclical implementation where the weights are quantified following each training.

4.6.3 Knowledge distillation

Dissection of knowledge presents itself as a new mode of transfer learning. This technique can extract knowledge from a big and well-trained deep neural network, dubbed teacher in this case, into a reduced deep network, called student. By doing this, the student network can learn to achieve the same outcomes as the teacher network while decreasing in size and increasing processing speed. Through the process of knowledge distillation, information is transferred from a large, thoroughly trained end-to-end detection network to numerous, quicker sub-models.

4.6.4 Training tiny networks

The deep neural network’s initial convolution kernel is mostly broken-down using matrix decomposition in the low-rank decomposition method, although the accuracy of the results will noticeably improve. Directly training tiny networks can drastically reduce network accuracy loss and speed up reasoning.

4.6.5 Federated learning and model partition

Distributed learning or federated learning are two possible training approaches for dealing with complicated tasks or a period of training including a lot of data. The data would be broken into smaller groups that would be distributed among the nodes of the edge network. As part of the final deep neural network, each node would train based on the data it received, enabling active learning capabilities at the network edge. Model partitioning is a strategy that may be applied in the inferring phase using the same methodology. To divide the burden, a separate node would compute each layer of the deep neural network in a model split. This approach would also make scaling simple.

Moreover, to boost the flow of information in a constrained amount of time, multi-scale feature learning in lightweight detection models that comprise single feature maps, pyramidical feature hierarchies, and integrated features may be used. The feature pyramid networks, as well as their variations such as feature fusion, feature pyramid generation, and multi-scaled fusion module, aid in overcoming object detection difficulties. Additionally, in order to boost the effectiveness of lightweight object identification models, researchers also work to encourage the development of activation functions and normalization in various applications. Above-mentioned techniques accelerate the usage of deep learning models into edge devices. The deep learning-based lightweight object detection models have not yet achieved comparable results when compared with generic object detection. Moreover, to mitigate these differences, a need for designing powerful and innovative lightweight detectors is a must. Some recommendations for designing powerful lightweight deep learning-based detectors are mentioned in this section.

Incorporation of FPNs - The bidirectional FPN can be utilized to improve the semantic information while incorporating feature fusion operations (Wang et al. 2023a , 2023b ). To successfully collect bottom-up and top-down features more than FPN, an effective feature-preserving and refining module can be introduced (Tang et al. 2020a , 2020b ). Deep learning-based lightweight detectors can be designed with the help of cross-layer connections and the extraction of features at various sizes while using depth-wise separable convolution. It is possible to take advantage of a multi-scale FPN architecture with a lightweight backbone to take out features from the input image.

Transformer-based Solutions - To increase the precision of the transformer-based lightweight detectors, group normalisation can be implemented in the encoder-and-decoder module and h-sigmoid activation function in the multi-layer perceptron (Li, Wang and Zhang 2022).

Receptive Fields Enlargement - The capacity of single-scale features to express themselves and to be detected on a single scale are both improved by the multi-branch block involving various receptive fields. The network width may increase and performance may be slightly enhanced with the use of several network branches (Liu et al. 2022).

Feature Fusion Operation - In order to combine several feature maps of the backbone and the collection of multi-scale features into a feature pyramid, the fusion operation offers a concatenation model (Mao et al. 2019 ). To improve the extraction of information from the suggested lightweight model, the feature maps’ weight of various channels can be reassigned. Furthermore, performance improvement may result from the integration of the attention module and data augmentation technique (Li et al. 2022a , 2022b ). The smooth fusion of semantic data from low-resolution scale to neighbourhood high-resolution scale is made possible by the implementation of FPN into the suggested lightweight detector architecture (Li et al. 2018 ).

Effect of Depth-wise Separable Convolution - The optimal design principle for lightweight object detection models consists of fewer channels with more convolutional layers (Kim et al. 2016 ). The approach to network scaling that modifies width, resolution, and network’s structure to reduce or balance the size of the feature map, keep the number of channels constant after convolution, and minimise convolutional input and output is where researchers can concentrate (Wang et al. 2021b ). The typical convolution in the network structure can be replaced with an over-parameterized depth-wise convolutional layer, which significantly reduces computation and boosts network performance. To increase the numerical resolution, ReLU6 can be used in place of the activation function known as Leaky ReLU (Ding et al. 2022 ).

Increase in Semantic Information - To keep semantic features and high-level feature maps in the deep lightweight object network, the proposal of smaller cross-stage partial SPPs and RFBs facilitates the integration of high-level semantic information with low-level feature maps (Wang et al. 2022a , 2022b , 2022c , 2022d ). The architectural additions of the context enhancement and spatial attention module can be employed to generate more discriminative feature representation (Qin et al. 2019 ).

Pruning Strategy - Block-punched pruning uses a fine-grained structured pruning method to maximise structural flexibility and minimise accuracy loss. High hardware parallelism can be achieved using the block-punched pruning strategy if the block size is suitable and compiler-level code generation is used (Cai et al. 2021 ).

Assignment Strategy- To improve the training of lightweight object detectors based on deep learning, use the SIMOTA dynamic label assignment method. When creating lightweight detection models, the combination of the regression method based on FCOS, dynamic and learnable sample assignment, and varifocal loss handling class imbalance works better (Yu et al. 2021 ). Designing lightweight object detectors using the anchor-free approach has been successful when combined with other cutting-edge detection methods using decoupled heads and the top label assignment strategy, known as SimOTA (Ge et al. 2021 ).

There are two ways of deployment of deep learning-based lightweight models on edge devices are when a light-weight model or compressed data are employed to match the compute capabilities of the limited edge devices. With regard to on-board object detection, this is true. The compromise between compression ratio and detection accuracy in this method is its drawback. Secondly, the model is distributed and data is exchanged when computations are spread over several devices and cloud server could be able to handle the computations in this situation. In this case, privacy and security seem to be the primary issues (Zhang et al. 2020a , 2020b , 2020c ). Consideration must be given when establishing device coordination in this scenario as it may also result in extra overhead in order to avoid the edge devices being overworked while conducting the collaborative learning algorithm. No matter the plan, all of these deployment methods rely on edge devices and have to deal with the problems with edge devices present. The primary causes of the issue are data disparity in real-world scenarios and the need to manage real-time sensor data while performing numerous deep learning tasks. The powerful processing units, the high computing requirements of deep learning models, and short battery life makes validity of light-weight models tough. In the future, we’ll strive to create such standards-compliant light-weight detection deployment models.

5 Conclusion

This study asserted that deep learning-based lightweight object detection models are a good candidate for improving the hardware efficiency of neural network architectures. This survey has examined and provided the most recent lightweight edge gadget models. The commonly utilized backbone architectures in deep learning-based lightweight object detection methods have also been stated in which ShuffleNet and MobileNetV2 employed majorly in these models. Some critical aspects after analyzing current state-of-the-art deep learning-based lightweight object detection models on edge devices have been discussed. The comparison has been drawn between emerging lightweight object detection models on the basis of COCO-based mAP scores. The presentation of a summary of heterogeneous applications for lightweight object identification models that take into account diverse types of photos and application categories. This study also gives information on edge platforms for using portable detector models. A few recommendations are also given for creating a potent deep learning-based lightweight model, including multi-scale and multi-branch FPNs, federated learning, partitioning strategy, pruning, knowledge distillation, and label assignment algorithms. The lightweight detectors still fall more than 50% short in delivering such outcomes, although having demonstrated significant potential by matching classification errors with the thorough models.

Abou El Houda Z, Brik B, Ksentini A, Khoukhi L (2023) A MEC-based architecture to secure IOT applications using federated deep learning. IEEE Internet Things Mag 6(1):60–63

Article   Google Scholar  

Agarwal S, Terrail JOD, Jurie F (2018) Recent advances in object detection in the age of deep convolutional neural networks. arXiv preprint arXiv:1809.03193

Alfasly S, Liu B, Hu Y, Wang Y, Li CT (2019) Auto-zooming CNN-based framework for real-time pedestrian detection in outdoor surveillance videos. IEEE Access 7:105816–105826

Bai X, Zhou J (2020) Efficient semantic segmentation using multi-path decoder. Appl Sci 10(18):6386

Betti A, Tucci M (2023) YOLO-S: a lightweight and accurate YOLO-like network for small target detection in aerial imagery. Sensors 23(4):1865

Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934

Brunetti A, Buongiorno D, Trotta GF, Bevilacqua V (2018) Computer vision and deep learning techniques for pedestrian detection and tracking: a survey. Neurocomputing 300:17–33

Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6154–6162)

Cai H, Gan C, Wang T, Zhang Z, Han S (2019) Once-for-all: Train one network and specialize it for efficient deployment. arXiv preprint arXiv:1908.09791

Cai Y, Li H, Yuan G, Niu W, Li Y, Tang X, Ren B, Wang Y (2021) Yolobile: real-time object detection on mobile devices via compression-compilation co-design. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, No. 2, pp. 955–963)

Cao J, Bao W, Shang H, Yuan M, Cheng Q (2023) GCL-YOLO: a GhostConv-based lightweight yolo network for UAV small object detection. Remote Sens 15(20):4932

Chabas JM, Chandra G, Sanchi G, Mitra M (2018) New demand, new markets: What edge computing means for hardware companies. https://www.mckinsey.com/industries/technology-media-and-telecommunications/our-insights/new-demand-new-markets-what-edge-computing-means-for-hardware-companies

Chang L, Zhang S, Du H, You Z, Wang S (2021) Position-aware lightweight object detectors with depthwise separable convolutions. J Real-Time Image Proc 18:857–871

Chen Y, Yang T, Zhang X, Meng G, Xiao X, Sun J (2019) Detnas: backbone search for object detection. Adv Neural Inf Process Syst. https://doi.org/10.48550/arXiv.1903.10979

Chen L, Ding Q, Zou Q, Chen Z, Li L (2020b) DenseLightNet: a light-weight vehicle detection network for autonomous driving. IEEE Trans Industr Electron 67(12):10600–10609

Chen C, Yu J, Lin Y, Lai F, Zheng G, Lin Y (2023) Fire detection based on improved PP-YOLO. SIViP 17(4):1061–1067

Chen C, Liu M, Meng X, Xiao W, Ju Q (2020) Refinedetlite: a lightweight one-stage object detection framework for cpu-only devices. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 700–701)

Cheng Y, Li G, Wong N, Chen HB, Yu H (2020) DEEPEYE: a deeply tensor-compressed neural network for video comprehension on terminal devices. ACM Trans Embed Comput Syst (TECS) 19(3):1–25

Cho C, Choi W, Kim T (2020) Leveraging uncertainties in Softmax decision-making models for low-power IoT devices. Sensors 20(16):4603

Cui B, Dong XM, Zhan Q, Peng J, Sun W (2021) LiteDepthwiseNet: a lightweight network for hyperspectral image classification. IEEE Trans Geosci Remote Sens 60:1–15

Google Scholar  

Cui M, Gong G, Chen G, Wang H, Jin M, Mao W, Lu H (2023) LC-YOLO: a lightweight model with efficient utilization of limited detail features for small object detection. Appl Sci 13(5):3174

Dai Y, Liu W (2023) GL-YOLO-Lite: a novel lightweight fallen person detection model. Entropy 25(4):587

Dai W, Li D, Tang D, Jiang Q, Wang D, Wang H, Peng Y (2021) Deep learning assisted vision inspection of resistance spot welds. J Manuf Process 62:262–274

Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. Advances in neural information processing systems, 29

Detector AFO (2022) Fcos: a simple and strong anchor-free object detector. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(4)

Dey S, Mukherjee A (2018) Implementing deep learning and inferencing on fog and edge computing systems. In 2018 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops) (pp. 818–823). IEEE

Ding P, Qian H, Chu S (2022) Slimyolov4: lightweight object detector based on yolov4. J Real-Time Image Proc 19(3):487–498

Ding C, Wang S, Liu N, Xu K, Wang Y, Liang Y (2019) REQ-YOLO: a resource-aware, efficient quantization framework for object detection on FPGAs. In proceedings of the 2019 ACM/SIGDA international symposium on field-programmable gate arrays (pp. 33–42)

Drolia U, Guo K, Narasimhan P (2017) Precog: prefetching for image recognition applications at the edge. In Proceedings of the Second ACM/IEEE Symposium on Edge Computing (pp. 1–13)

Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6569–6578)

Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. Int J Comput vis 88:303–338

Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181

MathSciNet   Google Scholar  

Gadosey PK, Li Y, Agyekum EA, Zhang T, Liu Z, Yamak PT, Essaf F (2020) SD-UNET: stripping down U-net for segmentation of biomedical images on platforms with low computational budgets. Diagnostics 10(2):110

Gagliardi A, de Gioia F, Saponara S (2021) A real-time video smoke detection algorithm based on Kalman filter and CNN. J Real-Time Image Proc 18(6):2085–2095

Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430

Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot Res 32(11):1231–1237

Girshick R (2015) Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 1440–1448)

Guo W, Li W, Li Z, Gong W, Cui J, Wang X (2020) A slimmer network with polymorphic and group attention modules for more efficient object detection in aerial images. Remote Sens 12(22):3750

Han J, Zhang D, Cheng G, Liu N, Xu D (2018) Advanced deep-learning techniques for salient and category-specific object detection: a survey. IEEE Signal Process Mag 35(1):84–100

Han S, Yoo J, Kwon S (2019) Real-time vehicle-detection method in bird-view unmanned-aerial-vehicle imagery. Sensors 19(18):3958

Han S, Liu X, Han X, Wang G, Wu S (2020b) Visual sorting of express parcels based on multi-task deep learning. Sensors 20(23):6785

Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C (2020) Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1580–1589)

Haque WA, Arefin S, Shihavuddin ASM, Hasan MA (2021) DeepThin: a novel lightweight CNN architecture for traffic sign recognition without GPU requirements. Expert Syst Appl 168:114481

He W, Huang Y, Fu Z, Lin Y (2020) Iconet: a lightweight network with greater environmental adaptivity. Symmetry 12(12):2119

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778)

He K, Gkioxari G, Dollár P, and Girshick R (2017) Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961–2969)

Hou Y, Li Q, Han Q, Peng B, Wang L, Gu X, Wang D (2021) MobileCrack: object classification in asphalt pavements using an adaptive lightweight deep learning. J Trans Eng Part b: Pavements 147(1):04020092

Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

Hu X, Yang W, Wen H, Liu Y, Peng Y (2021) A lightweight 1-D convolution augmented transformer with metric learning for hyperspectral image classification. Sensors 21(5):1751

Hu M, Li Z, Yu J, Wan X, Tan H, Lin Z (2023b) Efficient-lightweight yolo: improving small object detection in yolo for aerial images. Sensors 23(14):6423

Hu B, Wang Y, Cheng J, Zhao T, Xie Y, Guo X, Chen Y (2023) Secure and efficient mobile DNN using trusted execution environments. In Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security (pp. 274–285)

Hua H, Li Y, Wang T, Dong N, Li W, Cao J (2023) Edge computing with artificial intelligence: a machine learning perspective. ACM Comput Surv 55(9):1–35

Huang Z, Yang S, Zhou M, Gong Z, Abusorrah A, Lin C, Huang Z (2022) Making accurate object detection at the edge: review and new approach. Artif Intell Rev 55(3):2245–2274

Huang L, Yang Y, Deng Y, Yu Y (2015) Densebox: unifying landmark localization with end to end object detection. arXiv preprint arXiv:1509.04874

Huang R, Pedoeem J, Chen C (2018) YOLO-LITE: a real-time object detection algorithm optimized for non-GPU computers. In 2018 IEEE international conference on big data (big data) (pp. 2503–2510). IEEE

Huang X, Wang X, Lv W, Bai X, Long X, Deng K, Dang Q, Han S, Liu Q, Hu X, Yu D (2021) PP-YOLOv2: a practical object detector. arXiv preprint arXiv:2104.10419

Huyan L, Bai Y, Li Y, Jiang D, Zhang Y, Zhou Q, Wei J, Liu J, Zhang Y, Cui T (2021) A lightweight object detection framework for remote sensing images. Remote Sens 13(4):683

Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv:1602.07360

Isereau D, Capraro C, Cote E, Barnell M, Raymond C (2017) Utilizing high-performance embedded computing, agile condor, for intelligent processing: An artificial intelligence platform for remotely piloted aircraft. In 2017 Intelligent Systems Conference (IntelliSys) (pp. 1155–1159). IEEE

Jain DK, Zhao X, González-Almagro G, Gan C, Kotecha K (2023) Multimodal pedestrian detection using metaheuristics with deep convolutional neural network in crowded scenes. Inf Fus 95:401–414

Jeong M, Park M, Nam J, Ko BC (2020) Light-weight student LSTM for real-time wildfire smoke detection. Sensors 20(19):5508

Jiang S, Li H, Jin Z (2021) A visually interpretable deep learning framework for histopathological image-based skin cancer diagnosis. IEEE J Biomed Health Inform 25(5):1483–1494

Jiang L, Nie W, Zhu J, Gao X, Lei B (2022) Lightweight object detection network model suitable for indoor mobile robots. J Mech Sci Technol 36(2):907–920

Jiang Y, Li W, Zhang J, Li F, Wu Z (2023) YOLOv4-dense: a smaller and faster YOLOv4 for real-time edge-device based object detection in traffic scene. IET Image Proc 17(2):570–580

Jiang Z, Zhao L, Li S, Jia Y (2020) Real-time object detection method based on improved YOLOv4-tiny. arXiv preprint arXiv:2011.04244

Jiao L, Zhang F, Liu F, Yang S, Li L, Feng Z, Qu R (2019) A survey of deep learning-based object detection. IEEE Access 7:128837–128868

Jin R, Lin D (2019) Adaptive anchor for fast object detection in aerial image. IEEE Geosci Remote Sens Lett 17(5):839–843

Jin Y, Cai J, Xu J, Huan Y, Yan Y, Huang B, Guo Y, Zheng L, Zou Z (2021) Self-aware distributed deep learning framework for heterogeneous IoT edge devices. Futur Gener Comput Syst 125:908–920

Kamal KC, Yin Z, Wu M, Wu Z (2019) Depthwise separable convolution architectures for plant disease classification. Comput Electron Agric 165:104948

Kamath V, Renuka A (2023) Deep learning based object detection for resource constrained devices: systematic review, future trends and challenges ahead. Neurocomputing 531:34–60

Kang H, Zhou H, Wang X, Chen C (2020) Real-time fruit recognition and grasping estimation for robotic apple harvesting. Sensors 20(19):5670

Ke X, Lin X, Qin L (2021) Lightweight convolutional neural network-based pedestrian detection and re-identification in multiple scenarios. Mach vis Appl 32:1–23

Kim W, Jung WS, Choi HK (2019) Lightweight driver monitoring system based on multi-task mobilenets. Sensors 19(14):3200

Kim K, Jang SJ, Park J, Lee E, Lee SS (2023) Lightweight and energy-efficient deep learning accelerator for real-time object detection on edge devices. Sensors 23(3):1185

Kim KH, Hong S, Roh B, Cheon Y, and Park M (2016) Pvanet: Deep but lightweight neural networks for real-time object detection. arXiv preprint arXiv:1608.08021 .

Kondaveeti HK, Kumaravelu NK, Vanambathina SD, Mathe SE, Vappangi S (2021) A systematic literature review on prototyping with Arduino: applications, challenges, advantages, and limitations. Comput Sci Rev 40:100364

Kong T, Sun F, Liu H, Jiang Y, Li L, Shi J (2020a) Foveabox: beyound anchor-based object detection. IEEE Trans Image Process 29:7389–7398

Kong Z, Xiong F, Zhang C, Fu Z, Zhang M, Weng J, Fan M (2020b) Automated maxillofacial segmentation in panoramic dental X-ray images using an efficient encoder-decoder network. IEEE Access 8:207822–207833

Koubaa A, Ammar A, Kanhouch A, AlHabashi Y (2021) Cloud versus edge deployment strategies of real-time face recognition inference. IEEE Trans Netw Sci Eng 9(1):143–160

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst. 25

Kyrkou C (2020) YOLOpeds: efficient real-time single-shot pedestrian detection for smart camera applications. IET Comput Vision 14(7):417–425

Kyrkou C (2021) C 3 Net: end-to-end deep learning for efficient real-time visual active camera control. J Real-Time Image Proc 18(4):1421–1433

Kyrkou C, Theocharides T (2020) EmergencyNet: efficient aerial image classification for drone-based emergency monitoring using atrous convolutional feature fusion. IEEE J Sel Top Appl Earth Observ Remote Sens 13:1687–1699

Lai CY, Wu BX, Shivanna VM, Guo JI (2021) MTSAN: multi-task semantic attention network for ADAS applications. IEEE Access 9:50700–50714

Lan H, Meng J, Hundt C, Schmidt B, Deng M, Wang X, Liu W, Qiao Y, Feng S (2019) FeatherCNN: fast inference computation with TensorGEMM on ARM architectures. IEEE Trans Parallel Distrib Syst 31(3):580–594

Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In Proceedings of the European conference on computer vision (ECCV) (pp. 734–750)

Law H, Teng Y, Russakovsky O, Deng J (2019) Cornernet-lite: efficient keypoint based object detection. arXiv preprint arXiv:1904.08900

Li J, Ye J (2023) Edge-YOLO: lightweight infrared object detection method deployed on edge devices. Appl Sci 13(7):4402

Li X, Wang W, Wu L, Chen S, Hu X, Li J, Tang J, Yang J (2020a) Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection. Adv Neural Inf Process Syst 33:21002–21012

Li P, Han L, Tao X, Zhang X, Grecos C, Plaza A, Ren P (2020b) Hashing nets for hashing: a quantized deep learning to hash framework for remote sensing image retrieval. IEEE Trans Geosci Remote Sens 58(10):7331–7345

Li Y, Li M, Qi J, Zhou D, Zou Z, Liu K (2021a) Detection of typical obstacles in orchards based on deep convolutional neural network. Comput Electron Agric 181:105932

Li Z, Liu X, Zhao Y, Liu B, Huang Z, Hong R (2021b) A lightweight multi-scale aggregated model for detecting aerial images captured by UAVs. J vis Commun Image Represent 77:103058

Li C, Fan Y, Cai X (2021c) PyConvU-Net: a lightweight and multiscale network for biomedical image segmentation. BMC Bioinf 22:1–11

Li T, Wang J, Zhang T (2022a) L-DETR: a light-weight detector for end-to-end object detection with transformers. IEEE Access 10:105685–105692

Li S, Yang Z, Nie H, Chen X (2022b) Corn disease detection based on an improved YOLOX-Tiny network model. Int J Cognit Inform Nat Intell (IJCINI) 16(1):1–8

Li H, Lin Z, Shen X, Brandt J, Hua G (2015) A convolutional neural network cascade for face detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5325–5334)

Li Z, Peng C, Yu G, Zhang X, Deng Y, Sun J (2017) Light-head r-cnn: In defense of two-stage object detector. arXiv preprint arXiv:1711.07264

Li Y, Li J, Lin W, Li J (2018) Tiny-DSOD: lightweight object detection for resource-restricted usages. arXiv preprint arXiv:1807.11013

Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6054–6063)

Liang L, Wang G (2021) Efficient recurrent attention network for remote sensing scene classification. IET Image Proc 15(8):1712–1721

Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V 13 (pp. 740–755). Springer International Publishing

Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980–2988)

Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117–2125)

Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikäinen M (2020a) Deep learning for generic object detection: a survey. Int J Comput Vision 128:261–318

Liu X, Liu B, Liu G, Chen F, Xing T (2020b) Mobileaid: a fast and effective cognitive aid system on mobile devices. IEEE Access 8:101923–101933

Liu J, Li Q, Cao R, Tang W, Qiu G (2020c) MiniNet: an extremely lightweight convolutional neural network for real-time unsupervised monocular depth estimation. ISPRS J Photogramm Remote Sens 166:255–267

Liu X, Li Y, Shuang F, Gao F, Zhou X, Chen X (2020d) ISSD: improved SSD for insulator and spacer online detection based on UAV system. Sensors 20(23):6961

Liu Y, Sun P, Wergeles N, Shang Y (2021a) A survey and performance evaluation of deep learning methods for small object detection. Expert Syst Appl 172:114602

Liu S, Guo B, Ma K, Yu Z, Du J (2021b) AdaSpring: context-adaptive and runtime-evolutionary deep model compression for mobile applications. Proc ACM Interact Mobile Wearable Ubiquitous Technol 5(1):1–22

Liu Z, Ma J, Weng J, Huang F, Wu Y, Wei L, Li Y (2021c) LPPTE: a lightweight privacy-preserving trust evaluation scheme for facilitating distributed data fusion in cooperative vehicular safety applications. Inf Fus 73:144–156

Liu Y, Zhang C, Wu W, Zhang B, Zhou F (2022a) MiniYOLO: a lightweight object detection algorithm that realizes the trade-off between model size and detection accuracy. Int J Intell Syst 37(12):12135–12151

Liu T, Wang J, Huang X, Lu Y, Bao J (2022b) 3DSMDA-Net: an improved 3DCNN with separable structure and multi-dimensional attention for welding status recognition. J Manuf Syst 62:811–822

Liu S, Huang D (2018) Receptive field block net for accurate and fast object detection. In Proceedings of the European conference on computer vision (ECCV) (pp. 385–400)

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14 (pp. 21–37). Springer International Publishing

Long F (2020) Microscopy cell nuclei segmentation with enhanced U-Net. BMC Bioinf 21(1):8

Long ZHOU, Suyuan W, Zhongma CUI, Jiaqi FANG, Xiaoting YANG, Wei D (2020b) Lira-YOLO: a lightweight model for ship detection in radar images. J Syst Eng Electron 31(5):950–956

Long X, Deng K, Wang G, Zhang Y, Dang Q, Gao Y, Shen H, Ren J, Han S, Ding E, Wen S (2020) PP-YOLO: An effective and efficient implementation of object detector. arXiv preprint arXiv:2007.12099

Lu Y, Zhang L, and Xie W (2020) YOLO-compact: an efficient YOLO network for single category real-time object detection. In 2020 Chinese control and decision conference (CCDC) (pp. 1931–1936). IEEE

Luo X, Zhu J, Yu Q (2019) Efficient convNets for fast traffic sign recognition. IET Intel Transport Syst 13(6):1011–1015

Ma N, Yu X, Peng Y, Wang S (2019) A lightweight hyperspectral image anomaly detector for real-time mission. Remote Sens 11(13):1622

Ma M, Ma W, Jiao L, Liu X, Li L, Feng Z, Yang S (2023) A multimodal hyper-fusion transformer for remote sensing image classification. Inf Fus 96:66–79

Ma N, Zhang X, Zheng HT, Sun J (2018) Shufflenet v2: practical guidelines for efficient CNN architecture design. In Proceedings of the European conference on computer vision (ECCV) (pp. 116–131)

Makantasis K, Karantzalos K, Doulamis A, Doulamis N (2015) Deep supervised learning for hyperspectral data classification through convolutional neural networks. In 2015 IEEE international geoscience and remote sensing symposium (IGARSS) (pp. 4959–4962). IEEE

Makkar A, Ghosh U, Rawat DB, Abawajy JH (2021) Fedlearnsp: preserving privacy and security using federated learning and edge computing. IEEE Consumer Electron Mag 11(2):21–27

Mansouri SS, Kanellakis C, Kominiak D, Nikolakopoulos G (2020) Deploying MAVs for autonomous navigation in dark underground mine environments. Robot Auton Syst 126:103472

Mao QC, Sun HM, Liu YB, Jia RS (2019) Mini-YOLOv3: real-time object detector for embedded applications. IEEE Access 7:133529–133538

Mehta S, Rastegari M (2021) Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer. arXiv preprint arXiv:2110.02178

Mittal P, Singh R, Sharma A (2020) Deep learning-based object detection in low-altitude UAV datasets: a survey. Image vis Comput 104:104046

Muhammad K, Hussain T, Del Ser J, Palade V, De Albuquerque VHC (2019) DeepReS: a deep learning-based video summarization strategy for resource-constrained industrial surveillance scenarios. IEEE Trans Industr Inf 16(9):5938–5947

Nguyen HD, Na IS, Kim SH, Lee GS, Yang HJ, Choi JH (2019) Multiple human tracking in drone image. Multimedia Tools Appl 78:4563–4577

Nguyen TV, Tran AT, Dao NN, Moon H, Cho S (2023) Information fusion on delivery: a survey on the roles of mobile edge caching systems. Inf Fus 89:486–509

Ogden SS, Guo T (2019) Characterizing the deep neural networks inference performance of mobile applications. arXiv preprint arXiv:1909.04783

Ophoff T, Van Beeck K, Goedemé T (2019) Exploring RGB+ Depth fusion for real-time object detection. Sensors 19(4):866

Ouyang Z, Niu J, Liu Y, Guizani M (2019) Deep CNN-based real-time traffic light detector for self-driving vehicles. IEEE Trans Mob Comput 19(2):300–313

Paluru N, Dayal A, Jenssen HB, Sakinis T, Cenkeramaddi LR, Prakash J, Yalavarthy PK (2021) Anam-Net: anamorphic depth embedding-based lightweight CNN for segmentation of anomalies in COVID-19 chest CT images. IEEE Trans Neural Netw Learn Syst 32(3):932–946

Panero Martinez R, Schiopu I, Cornelis B, Munteanu A (2021) Real-time instance segmentation of traffic videos for embedded devices. Sensors 21(1):275

Pang J, Li C, Shi J, Xu Z, and Feng H (2019) R2-CNN: fast tiny object detection in large-scale remote sensing images. arXiv 2019. arXiv preprint arXiv:1902.06042

Paoletti ME, Haut JM, Pereira NS, Plaza J, Plaza A (2021) Ghostnet for hyperspectral image classification. IEEE Trans Geosci Remote Sens 59(12):10378–10393

Picron C, Tuytelaars T (2021) Trident pyramid networks: the importance of processing at the feature pyramid level for better object detection. arXiv preprint arXiv:2110.04004

Ping P, Huang C, Ding W, Liu Y, Chiyomi M, Kazuya T (2023) Distracted driving detection based on the fusion of deep learning and causal reasoning. Inf Fus 89:121–142

Qian S, Ning C, Hu Y (2021) MobileNetV3 for image classification. In 2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE) (pp. 490–497). IEEE

Qin Z, Li Z, Zhang Z, Bao Y, Yu G, Peng Y, Sun J (2019) ThunderNet: towards real-time generic object detection on mobile devices. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6718–6727)

Qin S, Liu S (2020) Efficient and unified license plate recognition via lightweight deep neural network. IET Image Proc 14(16):4102–4109

Quang TN, Lee S, Song BC (2021) Object detection using improved bi-directional feature pyramid network. Electronics 10(6):746

Ran X, Chen H, Liu Z, Chen J (2017) Delivering deep learning to mobile devices via offloading. In Proceedings of the Workshop on Virtual Reality and Augmented Reality Network (pp. 42–47)

Rani E (2021) LittleYOLO-SPP: a delicate real-time vehicle detection algorithm. Optik 225:165818

Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7263–7271)

Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767

Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779–788)

Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst. https://doi.org/10.1109/TPAMI.2016.2577031

Ren J, Guo Y, Zhang D, Liu Q, Zhang Y (2018) Distributed and efficient object detection in edge computing: challenges and solutions. IEEE Netw 32(6):137–143

Rodriguez-Conde I, Campos C, Fdez-Riverola F (2021) On-device object detection for more efficient and privacy-compliant visual perception in context-aware systems. Appl Sci 11(19):9173

Rui Z, Zhaokui W, Yulin Z (2019) A person-following nanosatellite for in-cabin astronaut assistance: system design and deep-learning-based astronaut visual tracking implementation. Acta Astronaut 162:121–134

Saidi A, Othman SB, Dhouibi M, Saoud SB (2021) FPGA-based implementation of classification techniques: a survey. Integration 81:280–299

Samore A, Rusci M, Lazzaro D, Melpignano P, Benini L, Morigi S (2020) BrightNet: a deep CNN for OLED-based point of care immunofluorescent diagnostic systems. IEEE Trans Instrum Meas 69(9):6766–6775

Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4510–4520)

Sharma VK, Mir RN (2020) A comprehensive and systematic look up into deep learning based object detection techniques: a review. Comput Sci Rev 38:100301

Article   MathSciNet   Google Scholar  

Shi C, Wang T, Wang L (2020) Branch feature fusion convolution network for remote sensing scene classification. IEEE J Sel Top Appl Earth Observ Remote Sens 13:5194–5210

Shoeibi A, Khodatars M, Jafari M, Ghassemi N, Moridian P, Alizadehsani R, Ling SH, Khosravi A, Alinejad-Rokny H, Lam HK, Fuller-Tyszkiewicz M (2023) Diagnosis of brain diseases in fusion of neuroimaging modalities using deep learning: a review. Inf Fus 93:85–117

Silva SH, Rad P, Beebe N, Choo KKR, Umapathy M (2019) Cooperative unmanned aerial vehicles with privacy preserving deep vision for real-time object identification and tracking. J Parallel Distrib Comput 131:147–160

Song S, Jing J, Huang Y, Shi M (2021) EfficientDet for fabric defect detection based on edge computing. J Eng Fibers Fabr 16:15589250211008346

Steimle F, Wieland M, Mitschang B, Wagner S, Leymann F (2017) Extended provisioning, security and analysis techniques for the ECHO health data management system. Computing 99:183–201

Subedi P, Hao J, Kim IK, Ramaswamy L (2021) AI multi-tenancy on edge: concurrent deep learning model executions and dynamic model placements on edge devices. In 2021 IEEE 14th International Conference on Cloud Computing (CLOUD) (pp. 31–42). IEEE

Sun Y, Pan B, Fu Y (2021) Lightweight deep neural network for real-time instrument semantic segmentation in robot assisted minimally invasive surgery. IEEE Robot Autom Lett 6(2):3870–3877

Tan M, Chen B, Pang R, Vasudevan V, Sandler M, Howard A, Le QV (2019) Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2820–2828)

Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10781–10790)

Tang Q, Li J, Shi Z, Hu Y (2020) Lightdet: a lightweight and accurate object detection network. In ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2243–2247). IEEE

Tang Z, Liu X, Shen G, and Yang B (2020) Penet: object detection using points estimation in aerial images. arXiv preprint arXiv:2001.08247 .

Tsai WC, Lai JS, Chen KC, Shivanna V, Guo JI (2021) A lightweight motional object behavior prediction system harnessing deep learning technology for embedded adas applications. Electronics 10(6):692

Tzelepi M, Tefas A (2020) Improving the performance of lightweight CNNs for binary classification using quadratic mutual information regularization. Pattern Recogn 106:107407

Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vision 104:154–171

Ullah A, Muhammad K, Ding W, Palade V, Haq IU, Baik SW (2021) Efficient activity recognition using lightweight CNN and DS-GRU network for surveillance applications. Appl Soft Comput 103:107102

Véstias MP, Duarte RP, de Sousa JT, Neto HC (2020) Moving deep learning to the edge. Algorithms 13(5):125

Wang RJ, Li X, Ling CX (2018) Pelee: a real-time object detection system on mobile devices. Adv Neural Inf Process Syst. https://doi.org/10.48550/arXiv.1804.06882

Wang X, Han Y, Leung VC, Niyato D, Yan X, Chen X (2020a) Convergence of edge computing and deep learning: a comprehensive survey. IEEE Commun Surv Tutor 22(2):869–904

Wang F, Xie F, Shen S, Huang L, Sun R, Le Yang J (2020c) A novel multiface recognition method with short training time and lightweight based on ABASNet and H-softmax. IEEE Access 8:175370–175384

Wang T, Wang P, Cai S, Zheng X, Ma Y, Jia W, Wang G (2021a) Mobile edge-enabled trust evaluation for the Internet of Things. Inf Fus 75:90–100

Wang J, Huang R, Guo S, Li L, Zhu M, Yang S, Jiao L (2021c) NAS-guided lightweight multiscale attention fusion network for hyperspectral image classification. IEEE Trans Geosci Remote Sens 59(10):8754–8767

Wang D, Ren J, Wang Z, Zhang Y, Shen XS (2022a) PrivStream: a privacy-preserving inference framework on IoT streaming data at the edge. Inf Fus 80:282–294

Wang G, Ding H, Li B, Nie R, Zhao Y (2022b) Trident-YOLO: improving the precision and speed of mobile device object detection. IET Image Proc 16(1):145–157

Wang Y, Wang J, Zhang W, Zhan Y, Guo S, Zheng Q, Wang X (2022c) A survey on deploying mobile deep learning applications: a systemic and technical perspective. Digit Commun Netw 8(1):1–17

Wang X, Zhao Q, Jiang P, Zheng Y, Yuan L, Yuan P (2022d) LDS-YOLO: a lightweight small object detection method for dead trees from shelter forest. Comput Electron Agric 198:107035

Wang C, Wang Z, Li K, Gao R, Yan L (2023b) Lightweight object detection model fused with feature pyramid. Multimedia Tools Appl 82(1):601–618

Wang K, Liew JH, Zou Y, Zhou D, Feng J (2019) Panet: few-shot image semantic segmentation with prototype alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 9197–9206)

Wang CY, Liao HYM, Wu YH, Chen PY, Hsieh JW, Yeh IH (2020) CSPNet: a new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 390–391).

Wang CY, Bochkovskiy A, Liao HYM (2021) Scaled-yolov4: scaling cross stage partial network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13029–13038)

Wang CY, Bochkovskiy A, Liao HYM (2023) YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7464–7475)

Wu Q, Wang H, Liu Y, Zhang L, Gao X (2019) SAT: single-shot adversarial tracker. IEEE Trans Industr Electron 67(11):9882–9892

Wu X, Sahoo D, Hoi SC (2020) Recent advances in deep learning for object detection. Neurocomputing 396:39–64

Wu Y, Feng S, Huang X, Wu Z (2021) L4Net: an anchor-free generic object detector with attention mechanism for autonomous driving. IET Comput Vision 15(1):36–46

Xiao Y, Tian Z, Yu J, Zhang Y, Liu S, Du S, Lan X (2020) A review of object detection based on deep learning. Multimedia Tools Appl 79:23729–23791

Xu D, Wu Y (2021) FE-YOLO: a feature enhancement network for remote sensing target detection. Remote Sens 13(7):1311

Xu Z, Liu W, Huang J, Yang C, Lu J, Tan H (2020) Artificial intelligence for securing IoT services in edge computing: a survey. Secur Commun Netw 2020(1):8872586

Xu C, Zhu G, Shu J (2021) A lightweight and robust lie group-convolutional neural networks joint representation for remote sensing scene classification. IEEE Trans Geosci Remote Sens 60:1–15

Xu M, Liu J, Liu Y, Lin F X, Liu Y, Liu X (2019) A first look at deep learning apps on smartphones. In The World Wide Web Conference (pp. 2125–2136)

Xu S, Wang X, Lv W, Chang Q, Cui C, Deng K, Wang G, Dang Q, Wei S, Du Y, Lai B (2022) PP-YOLOE: an evolved version of YOLO. arXiv preprint arXiv:2203.16250

Yang Z, Rothkrantz, L (2011) Surveillance system using abandoned object detection. In Proceedings of the 12th international conference on computer systems and technologies (pp. 380–386)

Yang Z, Liu S, Hu H, Wang L, Lin S (2019) Reppoints: point set representation for object detection. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9657–9666)

Yi Z, Yongliang S, Jun Z (2019) An improved tiny-yolov3 pedestrian detection algorithm. Optik 183:17–23

Yin R, Zhao W, Fan X, Yin Y (2020) AF-SSD: an accurate and fast single shot detector for high spatial remote sensing imagery. Sensors 20(22):6530

Yin T, Chen W, Liu B, Li C, Du L (2023) Light “You Only Look Once”: an improved lightweight vehicle-detection model for intelligent vehicles under dark conditions. Mathematics 12(1):124

Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: an advanced object detection network. In Proceedings of the 24th ACM International Conference on Multimedia (pp. 516–520)

Yu G, Chang Q, Lv W, Xu C, Cui C, Ji W, Dang Q, Deng K, Wang G, Du Y, Lai B, Ma Y (2021) PP-PicoDet: a better real-time object detector on mobile devices. arXiv preprint arXiv:2111.00902

Yuan F, Zhang L, Wan B, Xia X, Shi J (2019) Convolutional neural networks based on multi-scale additive merging layers for visual smoke recognition. Mach vis Appl 30:345–358

Zaidi S, Ansari SA, Aslam MS, Kanwal N, Asghar M, Lee B (2022) A survey of modern deep learning based object detection models. Digit Sig Process 126:103514

Zhang S, Wang X, Lei Z, Li SZ (2019a) Faceboxes: a CPU real-time and accurate unconstrained face detector. Neurocomputing 364:297–309

Zhang Y, Liu M, Chen Y, Zhang H, Guo Y (2019b) Real-time vision-based system of fault detection for freight trains. IEEE Trans Instrum Meas 69(7):5274–5284

Zhang X, Lin X, Zhang Z, Dong L, Sun X, Sun D, Yuan K (2020b) Artificial intelligence medical ultrasound equipment: application of breast lesions detection. Ultrason Imaging 42(4–5):191–202

Zhang S, Li Y, Liu X, Guo S, Wang W, Wang J, Ding B, Wu D (2020c) Towards real-time cooperative deep inference over the cloud and edge end devices. Proc ACM Interact Mobile Wearable Ubiquitous Technol 4(2):1–24

Zhang Y, Zhang H, Huang Q, Han Y, Zhao M (2024) DsP-YOLO: an anchor-free network with DsPAN for small object detection of multiscale defects. Expert Syst Appl 241:122669

Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4203–4212)

Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6848–6856)

Zhang S, Chi C, Yao Y, Lei Z, Li SZ (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9759–9768)

Zhao ZQ, Zheng P, Xu ST, Wu X (2019) Object detection with deep learning: a review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232

Zhao H, Zhou Y, Zhang L, Peng Y, Hu X, Peng H, Cai X (2020a) Mixed YOLOv3-LITE: a lightweight real-time object detection method. Sensors 20(7):1861

Zhao Z, Zhang Z, Xu X, Xu Y, Yan H, Zhang L (2020b) A lightweight object detection network for real-time detection of driver handheld call on embedded devices. Comput Intell Neurosci 2020(1):6616584

Zhao Y, Yin Y, Gui G (2020c) Lightweight deep learning based intelligent edge surveillance techniques. IEEE Trans Cognit Commun Netw 6(4):1146–1154

Zheng G, Chai WK, Duanmu JL, Katos V (2023) Hybrid deep learning models for traffic prediction in large-scale road networks. Inf Fus 92:93–114

Zhou Y (2024) A YOLO-NL object detector for real-time detection. Expert Syst Appl 238:122256

Zhou T, Fan DP, Cheng MM, Shen J, Shao L (2021a) RGB-D salient object detection: a survey. Comput Visual Media 7:37–69

Zhou X, Li X, Hu K, Zhang Y, Chen Z, Gao X (2021b) ERV-Net: an efficient 3D residual neural network for brain tumor segmentation. Expert Syst Appl 170:114566

Zhou L, Rao X, Li Y, Zuo X, Qiao B, Lin Y (2022) A lightweight object detection method in aerial images based on dense feature fusion path aggregation network. ISPRS Int J Geo Inf 11(3):189

Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv preprint arXiv:1904.07850

Zhou X, Zhuo J, Krahenbuhl P (2019) Bottom-up object detection by grouping extreme and center points. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 850–859)

Zhou L, Wei S, Cui Z, Ding W (2019) YOLO-RD: a lightweight object detection network for range doppler radar images. In IOP Conference Series: Materials Science and Engineering (Vol. 563, No. 4, p. 042027). IOP Publishing

Zhu Z, He X, Qi G, Li Y, Cong B, Liu Y (2023) Brain tumor segmentation based on the fusion of deep semantics and edge information in multimodal MRI. Inf Fus 91:376–387

Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V 13 (pp. 391–405). Springer International Publishing

Zou Z, Chen K, Shi Z, Guo Y, Ye J (2023) Object detection in 20 years: a survey. Proc IEEE 111(3):257–276

Download references

Author information

Authors and affiliations.

CSED, Thapar Institute of Engineering & Technology, Patiala, India

Payal Mittal

You can also search for this author in PubMed   Google Scholar

Contributions

I, Payal Mittal is the sole author of this manuscript.

Corresponding author

Correspondence to Payal Mittal .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Consent for publication

During the preparation of this work the author has not used Generative AI and AI-assisted technologies in writing of this manuscript. The author solely reviewed and edited the content manually as needed and takes full responsibility for the content of the publication.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Mittal, P. A comprehensive survey of deep learning-based lightweight object detection models for edge devices. Artif Intell Rev 57 , 242 (2024). https://doi.org/10.1007/s10462-024-10877-1

Download citation

Accepted : 25 July 2024

Published : 10 August 2024

DOI : https://doi.org/10.1007/s10462-024-10877-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Deep learning
  • Lightweight networks
  • Object detection
  • Computer vision
  • Edge devices
  • Computing power
  • Find a journal
  • Publish with us
  • Track your research

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

jcm-logo

Article Menu

literature review of deep network compression

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

The burden of deep vein thrombosis and risk factors in pregnancy and postpartum—mirroring our region’s particularities.

literature review of deep network compression

1. Introduction

2. materials and methods, statistical analyses, 4. discussion, 5. conclusions, author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest.

  • Bitsadze, V.; Khizroeva, J.; Alexander, M.; Elalamy, I. Venous thrombosis risk factors in pregnant women. J. Perinat. Med. 2022 , 50 , 505–518. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Kyrle, P.A.; Eichinger, S. Deep vein thrombosis. Lancet 2005 , 365 , 1163–1174. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Abe, K.; Kuklina, E.V.; Hooper, W.C.; Callaghan, W.M. Venous thromboembolism as a cause of severe maternal morbidity and mortality in the United States. Semin. Perinatol. 2019 , 43 , 200–204. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • van Lennep, J.E.R.; Nerenberg, K.A. Delivering evidence to prevent recurrent venous thromboembolism in pregnancy. Lancet 2022 , 400 , 1743–1745. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Rosendaal, F.R. Causes of venous thrombosis. Thromb. J. 2016 , 14 , 24. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Hough, R.E.; Makris, M.E.D.; Preston, F.E. Pregnancy in women with thrombophilia: Incidence of thrombosis and pregnancy outcome. Br. J. Haematol. 1996 , 16 , 742–748. [ Google Scholar ]
  • James, A.H. Thrombosis in pregnancy and maternal outcomes. Birth Defects Res. Part C Embryo Today: Rev. 2015 , 105 , 159–166. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Jacobsen, A.F.; Skjeldestad, F.E.; Sandset, P.M. Incidence and risk patterns of venous thromboembolism in pregnancy and puerperium—A register-based case-control study. Am. J. Obstet. Gynecol. 2008 , 198 , 233.e1–233.e7. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Maughan, B.C.; Marin, M.; Han, J.; Gibbins, K.J.; Brixey, A.G.; Caughey, A.B.; Kline, J.A.; Jarman, A.F. Venous Thromboembolism During Pregnancy and the Postpartum Period: Risk Factors, Diagnostic Testing, and Treatment. Obstet. Gynecol. Surv. 2022 , 77 , 433–444. [ Google Scholar ] [ CrossRef ]
  • Devis, P.; Knuttinen, M.G. Deep venous thrombosis in pregnancy: Incidence, pathogenesis and endovascular management. Cardiovasc. Diagn. Ther. 2017 , 7 , S309–S319. [ Google Scholar ] [ CrossRef ]
  • Levinta, S.; Castravet, I. The pulmonary embolism and venous thrombosis. In Scientific Annals , 5th ed.; State University of Medicine and Pharmacy “Nicolae Testemiţanu”: Chisinau, Moldova, 2011. [ Google Scholar ]
  • Gallo, G.; Volpe, M.; Savoia, C. Endothelial Dysfunction in Hypertension: Current Concepts and Clinical Implications. Front. Med. 2022 , 8 , 798958. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Gorar, S.; Alioglu, B.; Ademoglu, E.; Uyar, S.; Bekdemir, H.; Candan, Z.; Saglam, B.; Koc, G.; Culha, C.; Aral, Y. Is There a Tendency for Thrombosis in Gestational Diabetes Mellitus? J. Lab. Physicians 2016 , 8 , 101–105. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Schreiber, K.; Hunt, B.J. Managing antiphospholipid syndrome in pregnancy. Thromb. Res. 2019 , 181 , S41–S46. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Dahlquist, K.; Stuart, A.; Källén, K. Planned cesarean section vs planned vaginal delivery among women without formal medical indication for planned cesarean section: A retrospective cohort study of maternal short-term complications. Acta Obstet. Gynecol. Scand. 2022 , 101 , 1026–1032. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Jacobsen, A.F.; Skjeldestad, F.E.; Sandset, P.M. Ante- and postnatal risk factors of venous thrombosis: A hospital-based case-control study. J. Thromb. Haemost. 2008 , 6 , 905–912. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Liao, S.; Woulfe, T.; Hyder, S.; Merriman, E.; Simpson, D.; Chunilal, S. Incidence of venous thromboembolism in different ethnic groups: A regional direct comparison study. J. Thromb. Haemost. 2014 , 12 , 214–219. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • White, R.H.; Keenan, C.R. Effects of race and ethnicity on the incidence of venous thromboembolism. Thromb. Res. 2009 , 123 , S11–S17. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Tabaraii, R.; Farahmand, H.; Ahmadi, J.; Rezvan, S.; Arjmandnia, M.H.; Yosefi, M.; Naye, M.R.; Barati, A.; Razavinia, F.S. Deep vein thrombosis in pregnancy: A review article. J. Vessel Circ. 2020 , 1 , 27–34. [ Google Scholar ] [ CrossRef ]
  • Gaudineau, A.; Sananes, N.; Korganow, A.S.; Langer, B. Accidents thromboliques veineux et grossesse. EMC Obstet. Gynecol. 2014 , 9 , 1–9. [ Google Scholar ] [ CrossRef ]
  • Mohammed, A.A.; Alhanouf, M.A.; Albassam, R.A.; Alarfaj, R.M.; Zaidi, A.R.Z.; Al-Arfaj, O.; Abu-Shaheen, A. Pregnancy and Venous Thromboembolism: Risk Factors, Trends, Management, and Mortality. BioMed Res. Int. 2020 , 2020 , 4071892. [ Google Scholar ] [ CrossRef ]
  • Pană, R.C.; Pană, L.M.; Istratoaie, O.; Duță, L.M.; Gheorman, L.M.; Calborean, V.; Popescu, M.; Voinea, B.; Gheorman, V.V. Incidence of Pulmonary and/or Systemic Thromboembolism in Pregnancy. Curr. Health Sci. J. 2016 , 42 , 283–288. [ Google Scholar ] [ CrossRef ]
  • Societatea de Obstetrică şi Ginecologie din România (SOGR) şi Colegiul Medicilor din România. Boala Tromboembolică în Sarcină și Lehuzie ; Cîrstoiu, M.M., România; 2019; Volume 17, pp. 1–29. Available online: https://sogr.ro/wp-content/uploads/2019/06/13.-Boala-tromboembolică-în-sarcină-și-lehuzie.pdf (accessed on 29 July 2024).
  • Varrias, D.; Spanos, M.; Kokkinidis, D.G.; Zoumpourlis, P.; Kalaitzopoulos, D.R. Venous Thromboembolism in Pregnancy: Challenges and Solutions. Vasc. Health Risk Manag. 2023 , 19 , 469–484. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Park, J.E.; Park, Y.; Yuk, J.S. Incidence of and risk factors for thromboembolism during pregnancy and postpartum: A 10-year nationwide population-based study. Taiwan. J. Obstet. Gynecol. 2021 , 60 , 103–110. [ Google Scholar ] [ CrossRef ]
  • Pechlivani, N.; Ajjan, R.A. Thrombosis and Vascular Inflammation in Diabetes: Mechanisms and Potential Therapeutic Targets. Front. Cardiovasc. Med. 2018 , 5 , 1. [ Google Scholar ] [ CrossRef ]
  • Won, H.S.; Kim, D.Y.; Yang, M.S.; Lee, S.J.; Shin, H.H.; Park, J.B. Pregnancy-induced hypertension, but not gestational diabetes mellitus, is a risk factor for venous thromboembolism in pregnancy. Korean Circ. J. 2011 , 41 , 23–27. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Deischinger, C.; Dervic, E.; Nopp, S.; Kaleta, M.; Klimek, P.; Kautzky-Willer, A. Diabetes mellitus is associated with a higher relative risk for venous thromboembolism in females than in males. Diabetes Res. Clin. Pract. 2022 , 194 , 110190. [ Google Scholar ] [ CrossRef ]
  • Mahmoud, A.; Sandblad, K.G.; Lundberg, C.E.; Hellsén, G.; Hansson, P.O.; Adiels, M.; Rosengren, A. Prepregnancy overweight and obesity and long-term risk of venous thromboembolism in women. Sci. Rep. 2023 , 13 , 14597. [ Google Scholar ] [ CrossRef ]
  • Wik, H.S.; Jacobsen, A.F.; Sandvik, L.; Sandset, P.M. Prevalence and predictors for post-thrombotic syndrome 3 to 16 years after pregnancy-related venous thrombosis: A population-based, cross-sectional, case-control study. J. Thromb. Haemost. 2012 , 10 , 840–847. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Fardiazar, Z.; Hajizadeh, K.; Dinparvar, S.; Esmaili, F. Thromboembolism and Thrombosis during Pregnancy and After Delivery between 2009 and 2012 in Al-Zahra Educational Center. J. Caring Sci. 2014 , 3 , 221–226. [ Google Scholar ] [ CrossRef ]
  • Malhotra, A.; Weinberger, S. Deep Vein Thrombosis in Pregnancy Epidemiology, Pathogenesis and Diagnosis. 2023. Available online: https://www.uptodate.com/contents/deep-vein-thrombosis-in-pregnancy-epidemiology-pathogenesis-and-diagnosis?search=4.%09Atul%20Malhotra,%20MD,%20Steven%20E%20Weinberger%20et%20al.%20Deep%20vein%20thrombosis%20in%20pregnancy:%20Epidemiology,%20pathogenesis,%20and%20diagnosis&source=search_result&selectedTitle=2~150&usage_type=default&display_rank=2 (accessed on 19 December 2023).
  • American College of Obstetricians and Gynecologists’ Committee on Practice Bulletins—Obstetrics. ACOG Practice Bulletin No. 196: Thromboembolism in Pregnancy. Obstet. Gynecol. 2018 , 132 , e1–e17. [ Google Scholar ] [ CrossRef ]
  • Pomp, E.R.; Lenselink, A.M.; Rosendaal, F.R.; Doggen, C.J. Pregnancy, the postpartum period and prothrombotic defects: Risk of venous thrombosis in the MEGA study. J. Thromb. Haemost. 2008 , 6 , 632–637. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Dicks, A.B.; Moussallem, E.; Stanbro, M.; Walls, J.; Gandhi, S.; Gray, B.H. A Comprehensive Review of Risk Factors and Thrombophilia Evaluation in Venous Thromboembolism. J. Clin. Med. 2024 , 13 , 362. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Blondon, M.; Casini, A.; Hoppe, K.K.; Boehlen, F.; Righini, M.; Smith, N.L. Risks of Venous Thromboembolism After Cesarean Sections: A Meta-Analysis. Chest 2016 , 150 , 572–596. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Royal College of Obstetricians and Gynaecologists. Reducing the Risk of Venous Thromboembolism during Pregnancy and the Puerperium: Green-Top Guideline No. 37a. 2015 ; Royal College of Obstetricians and Gynaecologists: London, UK, 2015. [ Google Scholar ]

Click here to enlarge figure

General CharacteristicsN (%)
Age (mean)28.2 ± 5.9 years
Urban area45,305 (49.24%)
Rural area46,691 (50.75%)
Multiple pregnancies1469 (1.59%)
Twin pregnancies1429 (1.55%)
Triplet pregnancies38 (0.04%)
Singleton pregnancy patients32,456 (35.27%)
Recent postpartum patients (singleton pregnancies)59,540 (64.73%)
Vaginal birth29,200 (49.09%)
C-section birth30,340 (50.87%)
General Characteristics
Deep Vein Thrombosis Group
N (%)
Age (mean)31.3 ± 6.78 years
Rural area155 (54%)
Urban area132 (45.99%)
Multiparous (more than 4 births)48 (16.72%)
Multiple pregnancy11 (3.83%)
Singleton pregnancy patients185 (64.45%)
Recent postpartum patients (singleton pregnancies)102 (35.54%)
Vaginal births38 (37.25%)
C-section births64 (62.74%)
Mortality2 (0.69%)
Upper limb3 (1.04%)
Right lower limb67 (23.3%)
Left lower limb130 (45.2%)
Pulmonary embolism69 (20.4%)
Cerebral thrombosis18 (6.27%)
Risk Factors Associated with Pulmonary EmbolismN (%)
Pregnant women28 (40.57%)
Gestational age (mean) week19.21
Early postpartum women (singleton pregnancies)41 (59.42%)
Postpartum day (mean)16.6
Vaginal birth13 (31.7%)
C-section birth28 (68.29%)
Obesity13 (18.8%)
Gestational diabetes1 (1.44%)
Hypertension8 (11.59%)
Preeclampsia3 (4.34%)
Systemic lupus erythematosus1 (1.44%)
Hereditary thrombophilia9 (13.04%)
Inferior limb varicose veins associated with venous insufficiency12 (17.39%)
Risk Factors Associated with Cerebral ThrombosisN (%)
Pregnant women4
Gestational age week (mean)20
Early postpartum women (singleton pregnancies)14
Postpartum day (mean)8.5
Vaginal birth4 (22.22%)
C-section birth10 (71.42%)
Obesity3 (16.6%)
Gestational diabetes0
Hypertension1 (5.55%)
Preeclampsia1 (5.55%)
Systemic lupus erythematosus0
Hereditary thrombophilia7 (38.88%)
Inferior limb varicose veins associated with venous insufficiency0
Risk FactorsRisk Factors Encountered in the Entire Lot of Patients Included (91,996 Patients)Deep Vein Thrombosis Patients That Have Risk FactorsPulmonary Embolism Patients That Have Risk FactorsCerebral Thrombosis Patients That Have Risk Factors
N (%)N (%)N (%)N (%)
Obesity2648 (2.87%)28 (9.75%)13 (18.8%)3 (16.6%)
Gestational
diabetes
1791 (1.94%)18 (6.27%)1 (1.44%)0
Hypertension4415 (4.79%)30 (10.45%)8 (11.59%)1 (5.55%)
Preeclampsia558 (0.60%)8
(2.78%)
3 (4.34%)1 (5.55%)
Systemic lupus erythematosus84 (0.091%)1
(0.34%)
1 (1.44%)0
Hereditary thrombophilia1730 (1.88%)50 (17.42%)9 (13.04%)7 (38.88%)
Inferior limb varicose veins associated with venous insufficiency2758 (2.99%)65
(22.64%)
12 (17.39%)0
Total13,984 (15.2%)201 (70.03%)47 (68.11%)12 (66.66%)
Risk FactorOdds Ratio (OR)Significance (p-Value)Confidence Interval (CI)
Obesity3.676<0.0012.484–5.439
Gestational diabetes3.394<0.0012.101–5.483
Hypertension2.325<0.0011.591–3.397
Preeclampsia4.753<0.0012.342–9.645
Systemic lupus erythematosus3.8600.1800.536–27.821
Hereditary thrombophilia12.138<0.0018.973–16.417
Varicose veins9.678<0.0017.321–12.793
Risk FactorOdds Ratio (OR)Significance (p-Value)Confidence Interval (CI)
Obesity7.867<0.0014.297–14.401
Gestational diabetes0.741No significant effect (0.766)0.103–5.336
Hypertension2.6050.0111.246–5.446
Preeclampsia7.483<0.0012.346–23.872
Systemic lupus erythematosus0.997Lower riskNot applicable
Hereditary thrombophilia11.035<0.0015.910–20.602
Varicose veins6.837<0.0013.665–12.757
Risk FactorOdds Ratio (OR)Significance (p-Value)Confidence Interval (CI)
Obesity0.003<0.0011.954–23.347
Gestational diabetes0Lower risk (0.989)Not applicable
Hypertension0.881<0.0010.155–8.770
Preeclampsia9.655<0.0011.283–72.672
Systemic lupus erythematosus0Lower riskNot applicable
Hereditary thrombophilia33.275<0.00112.884–85.939
Varicose veins0No significant effect (0.987)Not applicable
Risk FactorCoefficientOdds RatioInterpretation (Times More Likely to Develop DVT)
Age0.0371.04Minimal impact
Gestational
age
0.0151.02Minimal impact
C-section1.1883.283.3
Obesity2.30110.0010
Gestational
diabetes
0.8602.362.4
Hypertension1.7395.695.7
Preeclampsia2.1979.009
Systemic lupus erythematosus1.8916.626.6
Hereditary thrombophilia3.00720.2220
Inferior limb varicose veins associated with venous insufficiency2.63914.0014
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Filip, C.; Socolov, S.A.; Matasariu, D.R.; Ursache, A.; Pisla, K.D.; Gisca, T.C.; Mihalceanu, E.; Boiculese, V.L.; Socolov, D. The Burden of Deep Vein Thrombosis and Risk Factors in Pregnancy and Postpartum—Mirroring Our Region’s Particularities. J. Clin. Med. 2024 , 13 , 4705. https://doi.org/10.3390/jcm13164705

Filip C, Socolov SA, Matasariu DR, Ursache A, Pisla KD, Gisca TC, Mihalceanu E, Boiculese VL, Socolov D. The Burden of Deep Vein Thrombosis and Risk Factors in Pregnancy and Postpartum—Mirroring Our Region’s Particularities. Journal of Clinical Medicine . 2024; 13(16):4705. https://doi.org/10.3390/jcm13164705

Filip, Catalina, Sofia Alexandra Socolov, Daniela Roxana Matasariu, Alexandra Ursache, Karina Delia Pisla, Tudor Catalin Gisca, Elena Mihalceanu, Vasile Lucian Boiculese, and Demetra Socolov. 2024. "The Burden of Deep Vein Thrombosis and Risk Factors in Pregnancy and Postpartum—Mirroring Our Region’s Particularities" Journal of Clinical Medicine 13, no. 16: 4705. https://doi.org/10.3390/jcm13164705

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Agricultural object detection with You Only Look Once (YOLO) Algorithm: : A bibliometric and systematic literature review

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, view options, recommendations, weed detection in soybean crops using convnets.

The use of Deep Learning for the detection of weeds in soybean crops is proposed.The approach uses ConvNets in images segmented by the SLIC Superpixels algorithm.An image database was created using photographs captured by UAVs.The performance of ...

DSE-YOLO: Detail semantics enhancement YOLO for multi-stage strawberry detection

  • Proposing a robust and effective DSE-YOLO network.
  • Proposing a detail-semantics ...

Multi-stage strawberry fruits detection is one of the important clues to estimate crop yields and assist robotic picking in modern agricultural production. However, it is difficult for detecting strawberries due to their small size, ...

Grape detection, segmentation, and tracking using deep neural networks and three-dimensional association

  • Wine grape clusters can be successfully detected, segmented and tracked using CNNs.

Agricultural applications such as yield prediction, precision agriculture and automated harvesting need systems able to infer the crop state from low-cost sensing devices. Proximal sensing using affordable cameras combined with ...

Information

Published in.

Elsevier Science Publishers B. V.

Netherlands

Publication History

Author tags.

  • Deep learning
  • Fruit detection
  • Computer vision
  • Transfer learning
  • Digital tools
  • Research-article

Contributors

Other metrics, bibliometrics, article metrics.

  • 0 Total Citations
  • 0 Total Downloads
  • Downloads (Last 12 months) 0
  • Downloads (Last 6 weeks) 0

View options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Share this publication link.

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

COMMENTS

  1. Literature Review of Deep Network Compression

    We presented a comprehensive, detailed review of recent works on compressing and accelerating deep neural networks. Popular methods such as pruning methods, quantization methods, and low-rank factorization methods were described. We hope this paper can act as a keystone for future research on deep network compression.

  2. Literature Review of Deep Network Compression

    This review also intends to clarify these major concepts, and highlights their characteristics, advantages, and shortcomings. Keywords: deep learning; neural networks pruning; model compression. 1 ...

  3. Literature Review of Deep Network Compression

    Literature Review of Deep Network Compression. This paper presents an overview of popular methods and review recent works on compressing and accelerating deep neural networks, considering not only pruning methods but also quantization methods, and low-rank factorization methods. Expand.

  4. PDF Literature Review of Deep Network Compression

    This paper has discussed necessary background information for deep network compression. We presented a com-prehensive, detailed review of recent works on compressing and accelerating deep neural networks. Popular methods such as pruning methods, quantization methods, and low-rank factorization methods were described.

  5. Literature Review of Deep Network Compression

    In this paper, we present an overview of popular methods and review recent works on compressing and accelerating deep neural networks. We consider not only pruning methods but also quantization methods, and low-rank factorization methods. This review also intends to clarify these major concepts, and highlights their characteristics, advantages ...

  6. Deep neural networks compression: A comparative survey and choice

    Abstract. The state-of-the-art performance for several real-world problems is currently reached by deep and, in particular, convolutional neural networks (CNN). Such learning models exploit recent results in the field of deep learning, leading to highly performing, yet very large neural networks with typically millions to billions of parameters ...

  7. Literature Review of Deep Network Compression

    Deep Network Compression is a topic that is now trending and in high demand. The amount of available material is adequate to produce high-quality literature reviews and other reviews.

  8. A Survey on Deep Neural Network Compression: Challenges, Overview, and

    After a thorough review of the existing literature on DNN compression, we come up with five broad cate-gories, i.e., network pruning, sparse representation, bits pre-cision, knowledge distillation, and miscellaneous techniques.

  9. [2010.03954] A Survey on Deep Neural Network Compression: Challenges

    A Survey on Deep Neural Network Compression: Challenges, Overview, and Solutions. Rahul Mishra, Hari Prabhat Gupta, Tanima Dutta. Deep Neural Network (DNN) has gained unprecedented performance due to its automated feature extraction capability. This high order performance leads to significant incorporation of DNN models in different Internet of ...

  10. A Survey of Deep Neural Network Compression

    This paper reviews and summarizes current mainstream methods of compressing deep neural networks. We divide these methods into three categories: weight compression, local compression, and global compression. In addition, we compare and analyze the results of different compression methods on the dataset.

  11. Literature Review of Deep Network Compression

    Deep networks often possess a vast number of parameters, and their significant redundancy in parameterization has become a widely-recognized property. This presents significant challenges and restricts many deep learning applications, making the focus on reducing the complexity of models while maint...

  12. A Survey on Deep Neural Network Compression: Challenges, Overview, and

    A comprehensive review of existing literature on compressing DNN model that reduces both storage and computation requirements is presented and the existing approaches are divided into five broad categories, i.e., network pruning, sparse representation, bits precision, knowledge distillation, and miscellaneous. Deep Neural Network (DNN) has gained unprecedented performance due to its automated ...

  13. Compression of deep neural networks: bridging the gap between

    Recently, many studies have been carried out on model compression to handle the high computational cost and high memory footprint brought by the implementation of deep neural networks. In this paper, model compression of convolutional neural networks is constructed as a multiobjective optimization problem with two conflicting objectives, reducing the model size and improving the performance. A ...

  14. A Survey on Deep Neural Network Compression: Challenges, Overview, and

    It encourages us to make a comprehensive overview of the DNN compression techniques. In this paper, we present a comprehensive review of existing literature on compressing DNN model that reduces both storage and computation requirements.

  15. An Overview of Deep Neural Network Model Compression

    An Overview of Deep Neural Network Model Compression. Abstract: In recent years, with the rapid development of machine learning, deep neural networks have achieved great success in the fields of computer vision and natural language processing. However, accompany with the remarkable performance, came the huge amount of parameters, the high cost ...

  16. (Open Access) A Survey on Deep Neural Network Compression: Challenges

    TL;DR: A comprehensive review of existing literature on compressing DNN model that reduces both storage and computation requirements is presented and the existing approaches are divided into five broad categories, i.e., network pruning, sparse representation, bits precision, knowledge distillation, and miscellaneous.

  17. PDF Implications of Deep Compression with Complex Neural Networks

    In this paper, we apply the principles of deep compression to multiple complex networks using Keras with Tensorflow 2 in order to make the models more suitable for deployment on embedded devices and other devices with limited resources. We use deep compression to three complex neural networks, CNN (Convolutional Neural Network), RNN (Recurrent ...

  18. PDF Neural Network Compression Via Sparse Optimization

    Literature Review: There have been numerous efforts devoted to network compression (Buciluˇa et al., 2006) to achieve the speedup and efficient model inference, where the studies are largely evolved into (i) weight pruning, (ii) quantization, and (iii) knowledge distillation.

  19. PDF Deep Neural Networks Model Compression and Acceleration: A Survey

    In this paper, we present a comprehensive survey of recent approaches in deep neural networks model compression and acceleration. We classify these approaches into five categories: network quantization, network pruning, low-rank approximation, knowledge distillation and compact network design. In general, the computational complexity of deep neural networks is dominated by the convolutional ...

  20. An Overview of Neural Network Compression

    An Overview of Neural Network Compression. Overparameterized networks trained to convergence have shown impressive performance in domains such as computer vision and natural language processing . Pushing state of the art on salient tasks within these domains corresponds to these models becoming larger and more difficult for machine learning ...

  21. Literature Review of Deep Network Compression

    Explore millions of resources from scholarly journals, books, newspapers, videos and more, on the ProQuest Platform.

  22. [2010.03954v1] A Survey on Deep Neural Network Compression: Challenges

    A Survey on Deep Neural Network Compression: Challenges, Overview, and Solutions. Deep Neural Network (DNN) has gained unprecedented performance due to its automated feature extraction capability. This high order performance leads to significant incorporation of DNN models in different Internet of Things (IoT) applications in the past decade ...

  23. Deep Architectures for Image Compression: A Critical Review

    The paper aimed to review over a hundred recent state-of-the-art techniques exploiting mostly lossy image compression using deep learning architectures. These deep learning algorithms consists of various architectures like CNN, RNN, GAN, autoencoders and variational autoencoders.

  24. A comprehensive survey of deep learning-based lightweight object

    To assess the usefulness of deep learning-based lightweight object detection on edge devices, more research is required than just a basic review of the literature. Because the proposed research can offer a comprehensive examination of the literature, it can achieve each of these objectives.

  25. JCM

    (1) Background: The three factors within the Virchow triad play the leading role in the development of deep vein thrombosis (DVT) during pregnancy. (2) Methods: This research approaches the various risk factors associated with DVT and its most representative complications, pulmonary thromboembolism and cerebral venous thrombosis, in pregnant and postpartum women across a 15-year period (2007 ...

  26. Agricultural object detection with You Only Look Once (YOLO) Algorithm

    Secondly, we conducted a systematic literature review on 30 selected articles to identify current knowledge, critical gaps, and modifications in YOLO for specific agricultural tasks. The study critically assessed and summarized the information on YOLO's end-to-end learning approach, including data acquisition, processing, network modification ...