• Research article
  • Open access
  • Published: 15 February 2021

Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making

  • Alan Brnabic 1 &
  • Lisa M. Hess   ORCID: orcid.org/0000-0003-3631-3941 2  

BMC Medical Informatics and Decision Making volume  21 , Article number:  54 ( 2021 ) Cite this article

29k Accesses

50 Citations

3 Altmetric

Metrics details

Machine learning is a broad term encompassing a number of methods that allow the investigator to learn from the data. These methods may permit large real-world databases to be more rapidly translated to applications to inform patient-provider decision making.

This systematic literature review was conducted to identify published observational research of employed machine learning to inform decision making at the patient-provider level. The search strategy was implemented and studies meeting eligibility criteria were evaluated by two independent reviewers. Relevant data related to study design, statistical methods and strengths and limitations were identified; study quality was assessed using a modified version of the Luo checklist.

A total of 34 publications from January 2014 to September 2020 were identified and evaluated for this review. There were diverse methods, statistical packages and approaches used across identified studies. The most common methods included decision tree and random forest approaches. Most studies applied internal validation but only two conducted external validation. Most studies utilized one algorithm, and only eight studies applied multiple machine learning algorithms to the data. Seven items on the Luo checklist failed to be met by more than 50% of published studies.

Conclusions

A wide variety of approaches, algorithms, statistical software, and validation strategies were employed in the application of machine learning methods to inform patient-provider decision making. There is a need to ensure that multiple machine learning approaches are used, the model selection strategy is clearly defined, and both internal and external validation are necessary to be sure that decisions for patient care are being made with the highest quality evidence. Future work should routinely employ ensemble methods incorporating multiple machine learning algorithms.

Peer Review reports

Traditional methods of analyzing large real-world databases (big data) and other observational studies are focused on the outcomes that can inform at the population-based level. The findings from real-world studies are relevant to populations as a whole, but the ability to predict or provide meaningful evidence at the patient level is much less well established due to the complexity with which clinical decision making is made and the variety of factors taken into account by the health care provider [ 1 , 2 ]. Using traditional methods that produce population estimates and measures of variability, it is very challenging to accurately predict how any one patient will perform, even when applying findings from subgroup analyses. The care of patients is nuanced, and multiple non-linear, interconnected factors must be taken into account in decision making. When data are available that are only relevant at the population level, health care decision making is less informed as to the optimal course of care for a given patient.

Clinical prediction models are an approach to utilizing patient-level evidence to help inform healthcare decision makers about patient care. These models are also known as prediction rules or prognostic models and have been used for decades by health care professionals [ 3 ]. Traditionally, these models combine patient demographic, clinical and treatment characteristics in the form of a statistical or mathematical model, usually regression, classification or neural networks, but deal with a limited number of predictor variables (usually below 25). The Framingham Heart Study is a classic example of the use of longitudinal data to build a traditional decision-making model. Multiple risk calculators and estimators have been built to predict a patient’s risk of a variety of cardiovascular outcomes, such as atrial fibrillation and coronary heart disease [ 4 , 5 , 6 ]. In general, these studies use multivariable regression evaluating risk factors identified in the literature. Based on these findings, a scoring system is derived for each factor to predict the likelihood of an adverse outcome based on a patient’s score across all risk factors evaluated.

With the advent of more complex data collection and readily available data sets for patients in routine clinical care, both sample sizes and potential predictor variables (such as genomic data) can exceed the tens of thousands, thus establishing the need for alternative approaches to rapidly process a large amount of information. Artificial intelligence (AI), particularly machine learning methods (a subset of AI), are increasingly being utilized in clinical research for prediction models, pattern recognition and deep-learning techniques used to combine complex information for example genomic and clinical data [ 7 , 8 , 9 ]. In the health care sciences, these methods are applied to replace a human expert to perform tasks that would otherwise take considerable time and expertise, and likely result in potential error. The underlying concept is that a machine will learn by trial and error from the data itself, to make predictions without having a pre-defined set of rules for decision making. Simply, machine learning can simply be better understood as “learning from data.” [ 8 ].

There are two types of learning from the data, unsupervised and supervised. Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labelled responses. The most common unsupervised learning method is cluster analysis, which is used for exploratory data analysis to find hidden patterns or grouping in data. Supervised learning involves making a prediction based on a set of pre-specified input and output variables. There are a number of statistical tools used for supervised learning. Some examples include traditional statistical prediction methods like regression models (e.g. regression splines, projection pursuit regression, penalized regression) that involve fitting a model to data, evaluating the fit and estimating parameters that are later used in a predictive equation. Other tools include tree-based methods (e.g. classification and regression trees [CART] and random forests), which successively partition a data set based on the relationships between predictor variables and a target (outcome) variable. Other examples include neural networks, discriminant functions and linear classifiers, support vector classifiers and machines. Often, predictive tools are built using various forms of model aggregation (or ensemble learning) that may combine models based on resampled or re-weighted data sets. These different types of models can be fitted to the same data using model averaging.

Classical statistical regression methods used for prediction modeling are well understood in the statistical sciences and the scientific community that employs them. These methods tend to be transparent and are usually hypothesis driven but can overlook complex associations with limited flexibility when a high number of variables are investigated. In addition, when using classic regression modeling, choosing the ‘right’ model is not straightforward. Non-traditional machine learning algorithms, and machine learning approaches, may overcome some of these limitations of classical regression models in this new era of big data, but are not a complete solution as they must be considered in the context of the limitations of data used in the analysis [ 2 ].

While machine learning methods can be used for both population-based models as well as for informed patient-provider decision making, it is important to note that the data, model, and outputs used to inform the care of an individual patient must meet the highest standards of research quality, as the choice made will likely have an impact on both the long- and short-term patient outcomes. While a range of uncertainty can be expected for population-based estimates, the risk of error for patient level models must be minimized to ensure quality patient care. The risks and concerns of utilizing machine learning for individual patient decision making have been raised by ethicists [ 10 ]. The risks are not limited to the lack of transparency, limited data regarding the confidence of the findings, and the risk of reducing patient autonomy in choice by relying on data that may foster a more paternalistic model of healthcare. These are all important and valid concerns, and therefore the role of machine learning for patient care must meet the highest standards to ensure that shared, not simply informed, evidence-based decision making be supported by these methods.

A systematic literature review was published in 2018 that evaluated the statistical methods that have been used to enable large, real-world databases to be used at the patient-provider level [ 11 ]. Briefly, this study identified a total of 115 articles that evaluated the use of logistic regression (n = 52, 45.2%), Cox regression (n = 24, 20.9%), and linear regression (n = 17, 14.8%). However, an interesting observation noted several studies utilizing novel statistical approaches such as machine learning, recursive partitioning, and development of mathematical algorithms to predict patient outcomes. More recently, publications are emerging describing the use of Individualized Treatment Recommendation algorithms and Outcome Weighted Learning for personalized medicine using large observational databases [ 12 , 13 ]. Therefore, this systematic literature review was designed to further pursue this observation to more comprehensively evaluate the use of machine learning methods to support patient-provider decision making, and to critically evaluate the strengths and weaknesses of these methods. For the purposes of this work, data supporting patient-provider decision making was defined as that which provided information specifically on a treatment or intervention choice; while both population-based and risk estimator data are certainly valuable for patient care and decision making, this study was designed to evaluate data that would specifically inform a choice for the patient with the provider. The overarching goal is to provide evidence of how large datasets can be used to inform decisions at the patient level using machine learning-based methods, and to evaluate the quality of such work to support informed decision making.

This study originated from a systematic literature review that was conducted in MEDLINE and PsychInfo; a refreshed search was conducted in September 2020 to obtain newer publications (Table 1 ). Eligible studies were those that analyzed prospective or retrospective observational data, reported quantitative results, and described statistical methods specifically applicable to patient-level decision making. Specifically, patient-level decision making referred to studies that provided data for or against a particular intervention at the patient level, so that the data could be used to inform decision making at the patient-provider level. Studies did not meet this criterion if only a population-based estimates, mortality risk predictors, or satisfaction with care were evaluated. Additionally, studies designed to improve diagnostic tools and those evaluating health care system quality indicators did not meet the patient-provider decision-making criterion. Eligible statistical methods for this study were limited to machine learning-based approaches. Eligibility was assessed by two reviewers and any discrepancies were discussed; a third reviewer was available to serve as a tie breaker in case of different opinions. The final set of eligible publications were then abstracted into a Microsoft Excel document. Study quality was evaluated using a modified Luo scale, which was developed specifically as a tool to standardize high-quality publication of machine learning models [ 14 ]. A modified version of this tool was utilized for this study; specifically, the optional item were removed, and three terms were clarified: item 6 (define the prediction problem) was redefined as “define the model,” item 7 (prepare data for model building) was renamed “model building and validation,” and item 8 (build the predictive model) was renamed “model selection” to more succinctly state what was being evaluated under each criterion. Data were abstracted and both extracted data and the Luo checklist items were reviewed and verified by a second reviewer to ensure data comprehensiveness and quality. In all cases of differences in eligibility assessment or data entry, the reviewers met and ensured agreement with the final set of data to be included in the database for data synthesis, with a third reviewer utilized as a tie breaker in case of discrepancies. Data were summarized descriptively and qualitatively, based on the following categories: publication and study characteristics; patient characteristics; statistical methodologies used, including statistical software packages; strengths and weaknesses; and interpretation of findings.

The search strategy was run on September 1, 2020 and identified a total of 34 publications that utilized machine learning methods for individual patient-level decision making (Fig.  1 ). The most common reason for study exclusion, as expected, was due to the study not meeting the patient-level decision making criterion. A summary of the characteristics of eligible studies and the patient data are included in Table 2 . Most of the real-world data sources included retrospective databases or designs (n = 27, 79.4%), primarily utilizing electronic health records. Six analyses utilized prospective cohort studies and one utilized data from a cross sectional study.

figure 1

Prisma diagram of screening and study identification

General approaches to machine learning

The types of classification or prediction machine learning algorithms are reported in Table 2 . These included decision tree/random forest analyses (19 studies) [ 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 ] and neural networks (19 studies) [ 24 , 25 , 26 , 27 , 28 , 29 , 30 , 32 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 ]. Other approaches included latent growth mixture modeling [ 45 ], support vector machine classifiers [ 46 ], LASSO regression [ 47 ], boosting methods [ 23 ], and a novel Bayesian approach [ 26 , 40 , 48 ]. Within the analytical approaches to support machine learning, a variety of methods were used to evaluate model fit, such as Akaike Information Criterion, Bayesian Information Criterion, and the Lo-Mendel-Rubin likelihood ratio test [ 22 , 45 , 47 ], and while most studies included the area under the curve (AUC) of receiver-operator characteristic (ROC) curves (Table 3 ), analyses also included sensitivity/specificity [ 16 , 19 , 24 , 30 , 41 , 42 , 43 ], positive predictive value [ 21 , 26 , 32 , 38 , 40 , 41 , 42 , 43 ], and a variety of less common approaches such as the geometric mean [ 16 ], use of the Matthews correlation coefficient (ranges from -1.0, completely erroneous information, to + 1.0, perfect prediction) [ 46 ], defining true/false negatives/positives by means of a confusion matrix [ 17 ], calculating the root mean square error of the predicted versus original outcome profiles [ 37 ], or identifying the model with the best average performance training and performance cross validation [ 36 ].

Statistical software packages

The statistical programs used to perform machine learning varied widely across these studies, no consistencies were observed (Table 2 ). As noted above, one study using decision tree analysis used Quinlan’s C5.0 decision tree algorithm [ 15 ] while a second used an earlier version of this program (C4.5) [ 20 ]. Other decision tree analyses utilized various versions of R [ 18 , 19 , 22 , 24 , 27 , 47 ], International Business Machines (IBM) Statistical Package for the Social Sciences (SPSS) [ 16 , 17 , 33 , 47 ], the Azure Machine Learning Platform [ 30 ], or programmed the model using Python [ 23 , 25 , 46 ]. Artificial neural network analyses used Neural Designer [ 34 ] or Statistica V10 [ 35 ]. Six studies did not report the software used for analysis [ 21 , 31 , 32 , 37 , 41 , 42 ].

Families of machine learning algorithms

Also as summarized in Table 2 , more than one third of all publications (n = 13, 38.2%) applied only one family of machine learning algorithm to model development [ 16 , 17 , 18 , 19 , 20 , 34 , 37 , 41 , 42 , 43 , 46 , 48 ]; and only four studies utilized five or more methods [ 23 , 25 , 28 , 45 ]. One applied an ensemble of six different algorithms and the software was set to run 200 iterations [ 23 ], and another ran seven algorithms [ 45 ].

Internal and external validation

Evaluation of study publication quality identified the most common gap in publications as the lack of external validation, which was conducted by only two studies [ 15 , 20 ]. Seven studies predefined the success criteria for model performance [ 20 , 21 , 23 , 35 , 36 , 46 , 47 ], and five studies discussed the generalizability of the model [ 20 , 23 , 34 , 45 , 48 ]. Six studies [ 17 , 18 , 21 , 22 , 35 , 36 ] discussed the balance between model accuracy and model simplicity or interpretability, which was also a criterion of quality publication in the Luo scale [ 14 ]. The items on the checklist that were least frequently met are presented in Fig.  2 . The complete quality assessment evaluation for each item in the checklist is included in Additional file 1 : Table S1.

figure 2

Least frequently met study quality items, modified Luo Scale [ 14 ]

There were a variety of approaches taken to validate the models developed (Table 3 ). Internal validation with splitting into a testing and validation dataset was performed in all studies. The cohort splitting approach was conducted in multiple ways, using a 2:1 split [ 26 ], 60/40 split [ 21 , 36 ], a 70/30 split [ 16 , 17 , 22 , 30 , 33 , 35 ], 75/25 split [ 27 , 40 ], 80/20 split [ 46 ], 90/10 split [ 25 , 29 ], splitting the data based on site of care [ 48 ], a 2/1/1 split for training, testing and validation [ 38 ], and splitting 60/20/20, where the third group was selected for model selection purposes prior to validation [ 34 ]. Nine studies did not specifically mention the form of splitting approach used [ 15 , 18 , 19 , 20 , 24 , 29 , 39 , 45 , 47 ], but most of those noted the use of k fold cross validation. One training set corresponded to 90% of the sample [ 23 ], whereas a second study was less clear, as input data were at the observation level with multiple observations per patient, and 3 of the 15 patients were included in the training set [ 37 ]. The remaining studies did not specifically state splitting the data into testing and validation samples, but most specified they performed five-fold cross validation (including one that generally mentioned cohort splitting) [ 18 , 45 ] or ten-fold cross validation strategies [ 15 , 19 , 20 , 28 ].

External validation was conducted by only two studies (5.9%). Hische and colleagues conducted a decision tree analysis, which was designed to identify patients with impaired fasting glucose [ 20 ]. Their model was developed in a cohort study of patients from the Berlin Potsdam Cohort Study (n = 1527) and was found to have a positive predictive value of 56.2% and a negative predictive value of 89.1%. The model was then tested on an independent from the Dresden Cohort (n = 1998) with a family history of type II diabetes. In external validation, positive predictive value was 43.9% and negative predictive value was 90.4% [ 20 ]. Toussi and colleagues conducted both internal and external validation in their decision tree analysis to evaluate individual physician prescribing behaviors using a database of 463 patient electronic medical records [ 15 ]. For the internal validation step, the cross-validation option was used from Quinlan’s C5.0 decision tree learning algorithm as their study sample was too small to split into a testing and validation sample, and external validation was conducted by comparing outcomes to published treatment guidelines. Unfortunately, they found little concordance between physician behavior and guidelines potentially due to the timing of the data not matching the time period in which guidelines were implemented, emphasizing the need for a contemporaneous external control [ 15 ].

Handling of missing values

Missing values were addressed in most studies (n = 21, 61.8%) in this review, but there were thirteen remaining studies that did not mention if there were missing data or how they were handled (Table 3 ). For those that reported methods related to missing data, there were a wide variety of approaches used in real-world datasets. The full information maximum likelihood method was used for estimating model parameters in the presence of missing data for the development of the model by Hertroijs and colleagues, but patients with missing covariate values at baseline were excluded from the validation of the model [ 45 ]. Missing covariate values were included in models as a discrete category [ 48 ]. Four studies removed patients from the model with missing data [ 46 ], resulting in the loss of 16%-41% of samples in three studies [ 17 , 36 , 47 ]. Missing data from primary outcome variables were reported among with 59% (men) and 70% (women) within a study of diabetes [ 16 ]. In this study, single imputation was used; for continuous variables CART (IBM SPSS modeler V14.2.03) and for categorical variables the authors used the weighted K-Nearest Neighbor approach using RapidMiner (V.5) [ 16 ]. Other studies reported exclusion but not specifically the impact on sample size [ 29 , 31 , 38 , 44 ]. Imputation was conducted in a variety of ways for studies with missing data [ 22 , 25 , 28 , 33 ]. Single imputation was used in the study by Bannister and colleagues, but followed by multiple imputation in the final model to evaluate differences in model parameters [ 22 ]. One study imputed with a standard last-imputation-forward approach [ 26 ]. Spline techniques were used to impute missing data in the training set of one study [ 37 ]. Missingness was largely retained as an informative variable, and only variables missing for 85% or more of participants were excluded by Alaa et al. [ 23 ] while Hearn et al. used a combination of imputation and exclusion strategies [ 40 ]. Lastly, missing or incomplete data were imputed using a model-based approach by Toussi et al. [ 15 ] and using an optimal-impute algorithm by Bertsimas et al. [ 21 ].

Strengths and weaknesses noted by authors

Publications summarized the strengths and weaknesses of the machine learning methods employed. Low complexity and simplicity of machine-based learning models were noted as strengths of this approach [ 15 , 20 ]. Machine learning approaches were both powerful and efficient methods to apply to large datasets [ 19 ]. It was noted that parameters in this study that were significant at the patient level were included, even if at the broader population-based level using traditional regression analysis model development they would have not been significant and therefore would have been otherwise excluded using traditional approaches [ 34 ]. One publication noted the value of machine learning being highly dependent on the model selection strategy and parameter optimization, and that machine learning in and of itself will not provide better estimates unless these steps are conducted properly [ 23 ].

Even when properly planned, machine learning approaches are not without issues that deserve attention in future studies that employ these techniques. Within the eligible publications, weaknesses included overfitting the model with the inclusion of too much detail [ 15 ]. Additional limitations are based on the data sources used for machine learning, such as the lack of availability of all desired variables and missing data that can affect the development and performance of these models [ 16 , 34 , 36 , 48 ]. The lack of all relevant variables was noted as a particular concern for retrospective database studies, where the investigator is limited to what has been recorded [ 26 , 28 , 29 , 38 , 40 ]. Importantly and as observed in the studies included in this review, the lack of external validation was stated as a limitation of studies included in this review [ 28 , 30 , 38 , 42 ].

Limitations can also be on the part of the research team, as the need for both clinical and statistical expertise in the development and execution of studies using machine learning-based methodology, and users are warned against applying these methods blindly [ 22 ]. The importance of the role of clinical and statistical experts in the research team was noted in one study and highlighted as a strength of their work [ 21 ].

This study systematically reviewed and summarized the methods and approaches used for machine learning as applied to observational datasets that can inform patient-provider decision making. Machine learning methods have been applied much more broadly across observational studies than in the context of individual decision making, so the summary of this work does not necessarily apply to all machine learning-based studies. The focus of this work is on an area that remains largely unexplored, which is how to use large datasets in a manner that can inform and improve patient care in a way that supports shared decision making with reliable evidence that is applicable to the individual patient. Multiple publications cite the limitations of using population-based estimates for individual decisions [ 49 , 50 , 51 ]. Specifically, a summary statistic at the population level does not apply to each person in that cohort. Population estimates represent a point on a potentially wide distribution, and any one patient could fall anywhere within that distribution and be far from the point estimate value. On the other extreme, case reports or case series provide very specific individual-level data, but are not generalizable to other patients [ 52 ]. This review and summary provides guidance and suggestions of best practices to improve and hopefully increase the use of these methods to provide data and models to inform patient-provider decision making.

It was common for single modeling strategies to be employed within the identified publications. It has long been known that single algorithms to estimation can produce a fair amount of uncertainty and variability [ 53 ]. To overcome this limitation, there is a need for multiple algorithms and multiple iterations of the models to be performed. This, combined with more powerful analytics in recent years, provides a new standard for machine learning algorithm choice and development. While in some cases, a single model may fit the data well and provide an accurate answer, the certainty of the model can be supported through novel approaches, such as model averaging [ 54 ]. Few studies in this review combined multiple families of modeling strategies along with multiple iterations of the models. This should become a best practice in the future and is recommended as an additional criterion to assess study quality among machine learning-based modeling [ 54 ].

External validation is critical to ensure model accuracy, but was rarely conducted in the publications included in this review. The reasons for this could be many, such as lack of appropriate datasets or due to the lack of awareness of the importance of external validation [ 55 ]. As model development using machine learning increases, there is a need for external validation prior to application of models in any patient-provider setting. The generalizability of models is largely unknown without these data. Publications that did not conduct external validation also did not note the need for this to be completed, as generalizability was discussed in only five studies, one of which had also conducted the external validation. Of the remaining four studies, the role of generalizability was noted in terms of the need for future external validation in only one study [ 48 ]. Other reviews that were more broadly conducted to evaluate machine learning methods similarly found a low rate of external validation (6.6% versus 5.9% in this study) [ 56 ]. It was shown that there was lower prediction accuracy by external validation than simply by cross validation alone. The current review, with a focus on machine learning to support decision making at a practical level, suggests external validation is an important gap that should be filled prior to using these models for patient-provider decision making.

Luo and others suggest that k -fold validation may be used with proper stratification of the response variable as part of the model selection strategy [ 14 , 55 ]. The studies identified in this review generally conducted 5- or tenfold validation. There is no formal rule for the selection for the value of k , which is typically based on the size of the dataset; as k increases, bias will be reduced, but in turn variance will increase. While the tradeoff has to be accounted for, k  = 5–10 has been found to be reasonable for most study purposes [ 57 ].

The evidence from identified publications suggests that the ethical concerns of lack of transparency and failure to report confidence in the findings are largely warranted. These limitations can be addressed through the use of multiple modeling approaches (to clarify the ‘black box’ nature of these approaches) and by including both external and high k-fold validation (to demonstrate the confidence in findings). To ensure these methods are used in a manner that improves patient care, the expectations of population-based risk prediction models of the past are no longer sufficient. It is essential that the right data, the right set of models, and appropriate validation are employed to ensure that the resulting data meet standards for high quality patient care.

This study did not evaluate the quality of the underlying real-world data used to develop, test or validate the algorithms. While not directly part of the evaluation in this review, researchers should be aware that all limitations of real-world data sources apply regardless of the methodology employed. However, when observational datasets are used for machine learning-based research, the investigator should be aware of the extent to which the methods they are using depend on the data structure and availability, and should evaluate a proposed data source to ensure it is appropriate for the machine learning project [ 45 ]. Importantly, databases should be evaluated to fully understand the variables included, as well as those variables that may have prognostic or predictive value, but may not be included in the dataset. The lack of important variables remains a concern with the use of retrospective databases for machine learning. The concerns with confounding (particularly unmeasured confounding), bias (including immortal time bias), and patient selection criteria to be in the database must also be evaluated [ 58 , 59 ]. These are factors that should be considered prior to implementing these methods, and not always at the forefront of consideration when applying machine learning approaches. The Luo checklist is a valuable tool to ensure that any machine-learning study meets high research standards for patient care, and importantly includes the evaluation of missing or potentially incorrect data (i.e. outliers) and generalizability [ 14 ]. This should be supplemented by a thorough evaluation of the potential data to inform the modeling work prior to its implementation, and ensuring that multiple modeling methods are applied.

This review found a wide variety of approaches, methods, statistical software and validation strategies that were employed in the application of machine learning methods to inform patient-provider decision making. Based on these findings, there is a need to ensure that multiple modeling approaches are employed in the development of machine learning-based models for patient care, which requires the highest research standards to reliably support shared evidence-based decision making. Models should be evaluated with clear criteria for model selection, and both internal and external validation are needed prior to applying these models to inform patient care. Few studies have yet to reach that bar of evidence to inform patient-provider decision making.

Availability of data and materials

All data generated or analyzed during this study are included in this published article and its supplementary information files.

Abbreviations

Artificial intelligence

Area under the curve

Classification and regression trees

Logistic least absolute shrinkage and selector operator

Steyerberg EW, Claggett B. Towards personalized therapy for multiple sclerosis: limitations of observational data. Brain. 2018;141(5):e38-e.

Fröhlich H, Balling R, Beerenwinkel N, Kohlbacher O, Kumar S, Lengauer T, et al. From hype to reality: data science enabling personalized medicine. BMC Med. 2018;16(1):150.

Article   PubMed   PubMed Central   Google Scholar  

Steyerberg EW. Clinical prediction models. Berlin: Springer; 2019.

Book   Google Scholar  

Schnabel RB, Sullivan LM, Levy D, Pencina MJ, Massaro JM, D’Agostino RB Sr, et al. Development of a risk score for atrial fibrillation (Framingham Heart Study): a community-based cohort study. Lancet. 2009;373(9665):739–45.

D’Agostino RB, Wolf PA, Belanger AJ, Kannel WB. Stroke risk profile: adjustment for antihypertensive medication. Framingham Study Stroke. 1994;25(1):40–3.

Article   CAS   PubMed   Google Scholar  

Framingham Heart Study: Risk Functions 2020. https://www.framinghamheartstudy.org/ .

Gawehn E, Hiss JA, Schneider G. Deep learning in drug discovery. Mol Inf. 2016;35:3–14.

Article   CAS   Google Scholar  

Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, et al. Applications of machine learning in drug discovery and development. Nat Rev Drug Discov. 2019;18(6):463–77.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Marcus G. Deep learning: A critical appraisal. arXiv preprint arXiv:180100631. 2018.

Grote T, Berens P. On the ethics of algorithmic decision-making in healthcare. J Med Ethics. 2020;46(3):205–11.

Article   PubMed   Google Scholar  

Brnabic A, Hess L, Carter GC, Robinson R, Araujo A, Swindle R. Methods used for the applicability of real-world data sources to individual patient decision making. Value Health. 2018;21:S102.

Article   Google Scholar  

Fu H, Zhou J, Faries DE. Estimating optimal treatment regimes via subgroup identification in randomized control trials and observational studies. Stat Med. 2016;35(19):3285–302.

Liang M, Ye T, Fu H. Estimating individualized optimal combination therapies through outcome weighted deep learning algorithms. Stat Med. 2018;37(27):3869–86.

Luo W, Phung D, Tran T, Gupta S, Rana S, Karmakar C, et al. Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J Med Internet Res. 2016;18(12):e323.

Toussi M, Lamy J-B, Le Toumelin P, Venot A. Using data mining techniques to explore physicians’ therapeutic decisions when clinical guidelines do not provide recommendations: methods and example for type 2 diabetes. BMC Med Inform Decis Mak. 2009;9(1):28.

Ramezankhani A, Hadavandi E, Pournik O, Shahrabi J, Azizi F, Hadaegh F. Decision tree-based modelling for identification of potential interactions between type 2 diabetes risk factors: a decade follow-up in a Middle East prospective cohort study. BMJ Open. 2016;6(12):e013336.

Pei D, Zhang C, Quan Y, Guo Q. Identification of potential type II diabetes in a Chinese population with a sensitive decision tree approach. J Diabetes Res. 2019;2019:4248218.

Neefjes EC, van der Vorst MJ, Verdegaal BA, Beekman AT, Berkhof J, Verheul HM. Identification of patients with cancer with a high risk to develop delirium. Cancer Med. 2017;6(8):1861–70.

Mubeen AM, Asaei A, Bachman AH, Sidtis JJ, Ardekani BA, Initiative AsDN. A six-month longitudinal evaluation significantly improves accuracy of predicting incipient Alzheimer’s disease in mild cognitive impairment. J Neuroradiol. 2017;44(6):381–7.

Hische M, Luis-Dominguez O, Pfeiffer AF, Schwarz PE, Selbig J, Spranger J. Decision trees as a simple-to-use and reliable tool to identify individuals with impaired glucose metabolism or type 2 diabetes mellitus. Eur J Endocrinol. 2010;163(4):565.

Bertsimas D, Dunn J, Pawlowski C, Silberholz J, Weinstein A, Zhuo YD, et al. Applied informatics decision support tool for mortality predictions in patients with cancer. JCO Clin Cancer Inform. 2018;2:1–11.

Bannister CA, Halcox JP, Currie CJ, Preece A, Spasic I. A genetic programming approach to development of clinical prediction models: a case study in symptomatic cardiovascular disease. PLoS ONE. 2018;13(9):e0202685.

Article   PubMed   PubMed Central   CAS   Google Scholar  

Alaa AM, Bolton T, Di Angelantonio E, Rudd JHF, van der Schaar M. Cardiovascular disease risk prediction using automated machine learning: a prospective study of 423,604 UK Biobank participants. PLoS ONE. 2019;14(5):e0213653.

Baxter SL, Marks C, Kuo TT, Ohno-Machado L, Weinreb RN. Machine learning-based predictive modeling of surgical intervention in glaucoma using systemic data from electronic health records. Am J Ophthalmol. 2019;208:30–40.

Dong Y, Xu L, Fan Y, Xiang P, Gao X, Chen Y, et al. A novel surgical predictive model for Chinese Crohn’s disease patients. Medicine (Baltimore). 2019;98(46):e17510.

Hill NR, Ayoubkhani D, McEwan P, Sugrue DM, Farooqui U, Lister S, et al. Predicting atrial fibrillation in primary care using machine learning. PLoS ONE. 2019;14(11):e0224582.

Kang AR, Lee J, Jung W, Lee M, Park SY, Woo J, et al. Development of a prediction model for hypotension after induction of anesthesia using machine learning. PLoS ONE. 2020;15(4):e0231172.

Karhade AV, Ogink PT, Thio Q, Cha TD, Gormley WB, Hershman SH, et al. Development of machine learning algorithms for prediction of prolonged opioid prescription after surgery for lumbar disc herniation. Spine J. 2019;19(11):1764–71.

Kebede M, Zegeye DT, Zeleke BM. Predicting CD4 count changes among patients on antiretroviral treatment: Application of data mining techniques. Comput Methods Programs Biomed. 2017;152:149–57.

Kim I, Choi HJ, Ryu JM, Lee SK, Yu JH, Kim SW, et al. A predictive model for high/low risk group according to oncotype DX recurrence score using machine learning. Eur J Surg Oncol. 2019;45(2):134–40.

Kwon JM, Jeon KH, Kim HM, Kim MJ, Lim S, Kim KH, et al. Deep-learning-based out-of-hospital cardiac arrest prognostic system to predict clinical outcomes. Resuscitation. 2019;139:84–91.

Kwon JM, Lee Y, Lee Y, Lee S, Park J. An algorithm based on deep learning for predicting in-hospital cardiac arrest. J Am Heart Assoc. 2018;7(13):26.

Scheer JK, Smith JS, Schwab F, Lafage V, Shaffrey CI, Bess S, et al. Development of a preoperative predictive model for major complications following adult spinal deformity surgery. J Neurosurg Spine. 2017;26(6):736–43.

Lopez-de-Andres A, Hernandez-Barrera V, Lopez R, Martin-Junco P, Jimenez-Trujillo I, Alvaro-Meca A, et al. Predictors of in-hospital mortality following major lower extremity amputations in type 2 diabetic patients using artificial neural networks. BMC Med Res Methodol. 2016;16(1):160.

Rau H-H, Hsu C-Y, Lin Y-A, Atique S, Fuad A, Wei L-M, et al. Development of a web-based liver cancer prediction model for type II diabetes patients by using an artificial neural network. Comput Methods Programs Biomed. 2016;125:58–65.

Ng T, Chew L, Yap CW. A clinical decision support tool to predict survival in cancer patients beyond 120 days after palliative chemotherapy. J Palliat Med. 2012;15(8):863–9.

Pérez-Gandía C, Facchinetti A, Sparacino G, Cobelli C, Gómez E, Rigla M, et al. Artificial neural network algorithm for online glucose prediction from continuous glucose monitoring. Diabetes Technol Therapeut. 2010;12(1):81–8.

Azimi P, Mohammadi HR, Benzel EC, Shahzadi S, Azhari S. Use of artificial neural networks to decision making in patients with lumbar spinal canal stenosis. J Neurosurg Sci. 2017;61(6):603–11.

Bowman A, Rudolfer S, Weller P, Bland JDP. A prognostic model for the patient-reported outcome of surgical treatment of carpal tunnel syndrome. Muscle Nerve. 2018;58(6):784–9.

Hearn J, Ross HJ, Mueller B, Fan CP, Crowdy E, Duhamel J, et al. Neural networks for prognostication of patients with heart failure. Circ. 2018;11(8):e005193.

Google Scholar  

Isma’eel HA, Cremer PC, Khalaf S, Almedawar MM, Elhajj IH, Sakr GE, et al. Artificial neural network modeling enhances risk stratification and can reduce downstream testing for patients with suspected acute coronary syndromes, negative cardiac biomarkers, and normal ECGs. Int J Cardiovasc Imaging. 2016;32(4):687–96.

Isma’eel HA, Sakr GE, Serhan M, Lamaa N, Hakim A, Cremer PC, et al. Artificial neural network-based model enhances risk stratification and reduces non-invasive cardiac stress imaging compared to Diamond-Forrester and Morise risk assessment models: a prospective study. J Nucl Cardiol. 2018;25(5):1601–9.

Jovanovic P, Salkic NN, Zerem E. Artificial neural network predicts the need for therapeutic ERCP in patients with suspected choledocholithiasis. Gastrointest Endosc. 2014;80(2):260–8.

Zhou HF, Huang M, Ji JS, Zhu HD, Lu J, Guo JH, et al. Risk prediction for early biliary infection after percutaneous transhepatic biliary stent placement in malignant biliary obstruction. J Vasc Interv Radiol. 2019;30(8):1233-41.e1.

Hertroijs DF, Elissen AM, Brouwers MC, Schaper NC, Köhler S, Popa MC, et al. A risk score including body mass index, glycated haemoglobin and triglycerides predicts future glycaemic control in people with type 2 diabetes. Diabetes Obes Metab. 2018;20(3):681–8.

Oviedo S, Contreras I, Quiros C, Gimenez M, Conget I, Vehi J. Risk-based postprandial hypoglycemia forecasting using supervised learning. Int J Med Inf. 2019;126:1–8.

Khanji C, Lalonde L, Bareil C, Lussier MT, Perreault S, Schnitzer ME. Lasso regression for the prediction of intermediate outcomes related to cardiovascular disease prevention using the TRANSIT quality indicators. Med Care. 2019;57(1):63–72.

Anderson JP, Parikh JR, Shenfeld DK, Ivanov V, Marks C, Church BW, et al. Reverse engineering and evaluation of prediction models for progression to type 2 diabetes: an application of machine learning using electronic health records. J Diabetes Sci Technol. 2016;10(1):6–18.

Patsopoulos NA. A pragmatic view on pragmatic trials. Dialogues Clin Neurosci. 2011;13(2):217–24.

Lu CY. Observational studies: a review of study designs, challenges and strategies to reduce confounding. Int J Clin Pract. 2009;63(5):691–7.

Morgenstern H. Ecologic studies in epidemiology: concepts, principles, and methods. Annu Rev Public Health. 1995;16(1):61–81.

Vandenbroucke JP. In defense of case reports and case series. Ann Intern Med. 2001;134(4):330–4.

Buckland ST, Burnham KP, Augustin NH. Model selection: an integral part of inference. Biometrics. 1997;53:603–18.

Zagar A, Kadziola Z, Lipkovich I, Madigan D, Faries D. Evaluating bias control strategies in observational studies using frequentist model averaging 2020 (submitted).

Kang J, Schwartz R, Flickinger J, Beriwal S. Machine learning approaches for predicting radiation therapy outcomes: a clinician’s perspective. Int J Radiat Oncol Biol Phys. 2015;93(5):1127–35.

Scott IM, Lin W, Liakata M, Wood J, Vermeer CP, Allaway D, et al. Merits of random forests emerge in evaluation of chemometric classifiers by external validation. Anal Chim Acta. 2013;801:22–33.

Kuhn M, Johnson K. Applied predictive modeling. Berlin: Springer; 2013.

Hess L, Winfree K, Muehlenbein C, Zhu Y, Oton A, Princic N. Debunking Myths While Understanding Limitations. Am J Public Health. 2020;110(5):E2-E.

Thesmar D, Sraer D, Pinheiro L, Dadson N, Veliche R, Greenberg P. Combining the power of artificial intelligence with the richness of healthcare claims data: Opportunities and challenges. PharmacoEconomics. 2019;37(6):745–52.

Download references

Acknowledgements

Not applicable.

No funding was received for the conduct of this study.

Author information

Authors and affiliations.

Eli Lilly and Company, Sydney, NSW, Australia

Alan Brnabic

Eli Lilly and Company, Indianapolis, IN, USA

Lisa M. Hess

You can also search for this author in PubMed   Google Scholar

Contributions

AB and LMH contributed to the design, implementation, analysis and interpretation of the data included in this study. AB and LMH wrote, revised and finalized the manuscript for submission. AB and LMH have both read and approved the final manuscript.

Corresponding author

Correspondence to Lisa M. Hess .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

Authors are employees of Eli Lilly and Company and receive salary support in that role.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1..

Table S1. Study quality of eligible publications, modified Luo scale [14].

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Brnabic, A., Hess, L.M. Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making. BMC Med Inform Decis Mak 21 , 54 (2021). https://doi.org/10.1186/s12911-021-01403-2

Download citation

Received : 07 July 2020

Accepted : 20 January 2021

Published : 15 February 2021

DOI : https://doi.org/10.1186/s12911-021-01403-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Machine learning
  • Decision making
  • Decision tree
  • Random forest
  • Automated neural network

BMC Medical Informatics and Decision Making

ISSN: 1472-6947

literature review example machine learning

Methodology

  • Open access
  • Published: 28 April 2022

An intelligent literature review: adopting inductive approach to define machine learning applications in the clinical domain

  • Renu Sabharwal   ORCID: orcid.org/0000-0001-9728-8001 1 &
  • Shah J. Miah 1  

Journal of Big Data volume  9 , Article number:  53 ( 2022 ) Cite this article

6595 Accesses

8 Citations

Metrics details

Big data analytics utilizes different techniques to transform large volumes of big datasets. The analytics techniques utilize various computational methods such as Machine Learning (ML) for converting raw data into valuable insights. The ML assists individuals in performing work activities intelligently, which empowers decision-makers. Since academics and industry practitioners have growing interests in ML, various existing review studies have explored different applications of ML for enhancing knowledge about specific problem domains. However, in most of the cases existing studies suffer from the limitations of employing a holistic, automated approach. While several researchers developed various techniques to automate the systematic literature review process, they also seemed to lack transparency and guidance for future researchers. This research aims to promote the utilization of intelligent literature reviews for researchers by introducing a step-by-step automated framework. We offer an intelligent literature review to obtain in-depth analytical insight of ML applications in the clinical domain to (a) develop the intelligent literature framework using traditional literature and Latent Dirichlet Allocation (LDA) topic modeling, (b) analyze research documents using traditional systematic literature review revealing ML applications, and (c) identify topics from documents using LDA topic modeling. We used a PRISMA framework for the review to harness samples sourced from four major databases (e.g., IEEE, PubMed, Scopus, and Google Scholar) published between 2016 and 2021 (September). The framework comprises two stages—(a) traditional systematic literature review consisting of three stages (planning, conducting, and reporting) and (b) LDA topic modeling that consists of three steps (pre-processing, topic modeling, and post-processing). The intelligent literature review framework transparently and reliably reviewed 305 sample documents.

Introduction

Organizations are continuously harnessing the power of various big data adopting different ML techniques. Captured insights from big data may create a greater impact to reshape their business operations and processes. As a vital technique, big data analytics methods are used to transform complicated and huge amounts of data, known as ‘Big Data, in order to uncover hidden patterns, new learning, untold facts or associations, anomalies, and other perceptions [ 41 ]. Big Data alludes to the enormous amount of data that a traditional database management system cannot handle. In most of the cases, traditional software functions would be inadequate to analyze or process them. Big data are characterized by the 5 V’s, which refers to volume, variety, velocity, veracity, and value [ 22 ]. ML is a vital approach to design useful big data analytics techniques, which is a rapidly growing sub-field in information sciences that deals with all these characteristics. ML employs numerous methods for machines to learn from past experiences (e.g., past datasets) reducing the extra burden of writing codes in traditional programming [ 7 , 26 ]. Clinical care enterprises face a huge challenge due to the increasing demand of big data processing to improve clinical care outcomes. For example, an electronic health record contains a huge amount of patient information, drug administration, imaging data using various modalities. The variety and quantity of the huge data provide in the clinical domain as an ideal topic to appraise the value of ML in research.

Existing ML approaches, such as Oala et al. [ 35 ] proposed an algorithmic framework that give a path towards the effective and reliable application of ML in the healthcare domain. In conjunction with their systematic review, our research offers a smart literature review that consolidates a traditional literature review followed the PRISMA framework guidelines and topic modeling using LDA, focusing on the clinical domain. Most of the existing literature focused on the healthcare domain [ 14 , 42 , 49 ] are more inclusive and of a broader scope with a requisite of medical activities, whereas our research is primarily focused is clinical, which assist in diagnosing and treating patients as well as includes clinical aspects of medicine.

Since clinical research has developed, the area has become increasingly attractive to clinical researchers, in particular for learning insights of ML applications in clinical practices . This is because of its practical pertinence to clinical patients, professionals, clinical application designers, and other specialists supported by the omnipresence of clinical disease management techniques. Although the advantage is presumed for the target audience, such as self-management abilities (self-efficacy and investment behavior) and physical or mental condition of life amid long-term ill patients, clinical care specialists (such as further developing independent direction and providing care support to patients), their clinical care have not been previously assessed and conceptualized as a well-defined and essential sub-field of health care research. It is important to portray similar studies utilizing different types of review approaches in the aspect of the utilization of ML/DL and its value. Table 1 represents some examples of existing studies with various points and review approaches in the domain.

Although the existing studies included in Table 1 give an understanding of designated aspects of ML/DL utilization in clinical care, they show a lack of focus on how key points addressed in existing ML/DL research are developing. Further to this, they indicate a clear need towards an understanding of multidisciplinary affiliations and profiles of ML/DL that could provide significant knowledge to new specialists or professionals in this space. For instance, Brnabic and Hess [ 8 ] recommended a direction for future research by stating that “ Future work should routinely employ ensemble methods incorporating various applications of machine learning algorithms” (p. 1).

ML tools have become the central focus of modern biomedical research, because of better admittance to large datasets, exponential processing power, and key algorithmic developments allowing ML models to handle increasingly challenging data [ 19 ]. Different ML approaches can analyze a huge amount of data, including difficult and abnormal patterns. Most studies have focused on ML and its impacts on clinical practices [ 2 , 9 , 10 , 24 , 26 , 34 , 43 ]. Fewer studies have examined the utilization of ML algorithms [ 11 , 20 , 45 , 48 ] for more holistic benefits for clinical researchers.

ML becomes an interdisciplinary science that integrates computer science, mathematics, and statistics. It is also a methodology that builds smart machines for artificial intelligence. Its applications comprise algorithms, an assortment of instructions to perform specific tasks, crafted to independently learn from data without human intercession. Over time, ML algorithms improve their prediction accuracy without a need for programming. Based on this, we offer an intelligent literature review using traditional literature review and Latent Dirichlet Allocation (LDA Footnote 1 ) topic modeling in order to meet knowledge demands in the clinical domain. Theoretical measures direct the current study results because previous literature provides a strong foundation for future IS researchers to investigate ML in the clinical sector. The main aim of this study is to develop an intelligent literature framework using traditional literature. For this purpose, we employed four digital databases -IEEE, Google Scholar, PubMed, and Scopus then performed LDA topic modeling, which may assist healthcare or clinical researchers in analyzing many documents intelligently with little effort and a small amount of time.

Traditional systematic literature is destined to be obsolete, time-consuming with restricted processing power, resulting in fewer sample documents investigated. Academic and practitioner-researchers are frequently required to discover, organize, and comprehend new and unexplored research areas. As a part of a traditional literature review that involves an enormous number of papers, the choice for a researcher is either to restrict the number of documents to review a priori or analyze the study using some other methods.

The proposed intelligent literature review approach consists of Part A and Part B, a combination of traditional systematic literature review and topic modeling that may assist future researchers in using appropriate technology, producing accurate results, and saving time. We present the framework below in Fig.  1 .

figure 1

Proposed intelligent literature review framework

The traditional literature review identified 534,327 articles embraces Scopus (24,498), IEEE (2558), PubMed (11,271), and Google Scholar (496,000) articles, which went through three stages–Planning the review, conducting the review, and reporting the review and analyzed 305 articles, where we performed topic modeling using LDA.

We follow traditional systematic literature review methodologies [ 25 , 39 , 40 ] including a PRISMA framework [ 37 ]. We review four digital databases and deliberately develop three stages entailing planning, conducting, and reporting the review (Fig.  2 ).

figure 2

Traditional literature review three stages

Planning the review

Research articles : the research articles are classified using some keywords mentioned below in Tables 2 , 3 .

Digital database : Four databases (IEEE, PubMed, Scopus, and Google Scholar) were used to collect details for reviewing research articles.

Review protocol development : We first used Scopus to search the information and found many studies regarding this review. We then searched PubMed, IEEE, and Google scholar for articles and extracted only relevant papers matching our keywords and review context based on their full-text availability.

Review protocol evaluation : To support the selection of research articles and inclusion and exclusion criteria, the quality of articles was explored and assessed to appraise their suitability and impartiality [ 44 ]. Only articles with keywords “machine learning” and “clinical” in document titles and abstracts were selected.

Conducting the review

The second step is conducting the review, which includes a description of Search Syntax and data synthesis.

Search syntax Table 4 details the syntax used to select research articles.

Data synthesis

We used a qualitative meta-synthesis technique to understand the methodology, algorithms, applications, qualities, results, and current research impediments. Qualitative meta-synthesis is a coherent approach for analyzing data across qualitative studies [ 4 ]. Our first search identified 534,327 papers, comprising Scopus (24,498), IEEE (2,558), PubMed (11,271), and Google Scholar (496,000) articles with the selected keywords. After subjecting this dataset to our inclusion and exclusion criteria, articles were reduced to Scopus (181), IEEE (62), PubMed (37), and Google Scholar (46) (Fig.  3 ).

figure 3

PRISMA framework of traditional literature review

Reporting the review

This section displays the result of the traditional literature review.

Demonstration of findings

A search including linear literature and citation chaining was acted in digital databases, and the resulted papers were thoroughly analyzed to choose only the most pertinent articles, at last, 305 articles were included for the Part B review. Information of such articles were classified, organized, and demonstrated to show the finding.

Report the findings

The word cloud is displayed on the selected 305 research articles which give an overview of the frequency of the word within those 305 research articles. The chosen articles are moved to the next step to perform the conversion of PDF files to text documents for performing LDA topic modeling (Fig. 4 ).

figure 4

Word cloud on 305 articles

Conversion of pdf files to a text document

The Python coding is used to convert pdf files shared on GitHub https://github.com/MachineLearning-UON/Topic-modeling-using-LDA.git . The one text document is prepared with 305 research papers collected from a traditional literature review.

Topic modelling for intelligent literature review

Our intelligent literature review is developed using a combination of traditional literature review and topic modeling [ 22 ]. We use topic modeling—probability generating, a text-mining technique widely used in computer science for text mining and data recovery. Topic modeling is used in numerous papers to analyze [ 1 , 5 , 17 , 36 ] and use various ML algorithms [ 38 ] such as Latent Semantic Indexing (LSI), Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), Non-Negative Matrix Factorization (NMF), Parallel Latent Dirichlet Allocation (PLDA), and Pachinko Allocation Model (PAM). We developed the LDA-based methodological framework so it would be most widely and easily used [ 13 , 17 , 21 ] as a very elementary [ 6 ] approach. LDA is an unsupervised and probabilistic ML algorithm that discovers topics by calculating patterns of word co-occurrence across many documents or corpus [ 16 ]. Each LDA topic is distributed across each document as a probability.

While there are numerous ways of conducting a systematic literature review, most strategies require a high expense of time and prior knowledge of the area in advance. This study examined the expense of various text categorization strategies, where the assumptions and cost of the strategy are analyzed [ 5 ]. Interestingly, except manually reading the articles and topic modeling, all the strategies require prior knowledge of the articles' categories and high pre-examination costs. However, topic modeling can be automated, alternate the utilization of researchers' time, demonstrating a perfect match for the utilization of topic modeling as a part of an Intelligent literature review. Topic modeling has been used in a few papers to categorize research papers presented in Table 5 .

The articles/papers in the above table analyzed are speeches, web documents, web posts, press releases, and newspapers. However, none of those have developed the framework to perform traditional literature reviews from digital databases then use topic modeling to save time. However, this research points out the utilization of LDA in academics and explores four parameters—text pre-processing, model parameters selection, reliability, and validity [ 5 ]. Topic modeling identifies patterns of the repetitive word across a corpus of documents. Patterns of word co-occurrence are conceived as hidden ‘topics’ available in the corpus. First, documents must be modified to be machine-readable, with only their most informative features used for topic modeling. We modify documents in a three-stage process entailing pre-processing, topic modeling, and post-processing, as defined in Fig.  1 earlier.

The utilization of topic modeling presents an opportunity for researchers to use advanced technology for the literature review process. Topic modeling has been utilized online and requires many statistical skills, which not all researchers have. Therefore, we have shared the codes in GitHub with the default parameter for future researchers.

Pre-processing

Székely and Brocke [ 46 ] explained that pre-processing is a seven-step process which explored below and mentioned in Fig.  1 as part B:

Load data—The text data file is imported using the python command.

Optical character recognition—using word cloud, characters are recognized.

Filtering non-English words—non-English words are removed.

Document tokenization—Split the text into sentences and the sentences into words. Lowercase the words and remove punctuation.

Text cleaning—the text has been cleaned using portstemmer.

Word lemmatization—words in the third person are changed to the first person, and past and future verb tenses are changed into the present.

Stop word removal—All stop words are removed.

Topic modelling using LDA

Several research articles have been selected to run LDA topic modeling, explained in Table 5 . LDA model results present the coherence score for all the selected topics and a list of the most frequently used words for each.

Post-processing

The goal of the post-processing stage is to identify and label topics and topics relevant for use in the literature review. The result of the LDA model is presented as a list of topics and probabilities of each document (paper). The list is utilized to assign a paper to a topic by arranging the list by the highest probability for each paper for each topic. All the topics contain documents that are like each other. To reduce the risk of error in topic identification, a combination of inspecting the most frequent words for each topic and a paper view is used. After the topic review, it will present in the literature review.

Following the intelligent literature review, results of the LDA model should be approved or validated by statistical, semantic, or predictive means. Statistical validation defines the mutual information tests of result fit to model assumptions; semantics validation requires hand-coding to decide if the importance of specific words varies significantly and as expected with tasks to different topics which is used in the current study to validate LDA model result; and predictive validation refers to checking if events that ought to have expanded the prevalence of particular topic if out interpretations are right, did so [ 6 , 21 ].

LDA defines that each word in each document comes from a topic, and the topic is selected from a set of keywords. So we have two matrices:

ϴtd = P(t|d) which is the probability distribution of topics in documents

Фwt = P(w|t), which is the probability distribution of words in topics

And, we can say that the probability of a word given document, i.e., P(w|d), is equal to:

where T is the total number of topics; likewise, let’s assume there are W keywords for all the documents.

If we assume conditional independence, we can say that

And hence P(w|d) is equal to

that is the dot product of ϴtd and Фwt for each topic t.

Our systematic literature review identified 305 research papers after performing a traditional literature review. After executing LDA topic modeling, only 115 articles show the relevancy with our topic "machine learning application in clinical domain'. The following stages present LDA topic modeling process.

The 305 research papers were stacked into a Python environment then converted into a single text file. The seven steps have been carried out, described earlier in Pre-processing .

  • Topic modeling

The two main parameters of the LDA topic model are the dictionary (id2word)-dictionary and the corpus—doc_term_matrix. The LDA model is created by running the command:

# Creating the object for LDA model using gensim library

LDA = gensim.models.ldamodel.LdaModel

# Build LDA model

lda_model = LDA(corpus=doc_term_matrix, id2word = dictionary, num_topics=20, random_state=100,

chunksize = 1000, passes=50,iterations=100)

In this model, ‘num_topics’ = 20, ‘chunksize’ is the number of documents used in each training chunk, and ‘passes’ is the total number of training passes.

Firstly, the LDA model is built with 20 topics; each topic is represented by a combination of 20 keywords, with each keyword contributing a certain weight to a topic. Topics are viewed and interpreted in the LDA model, such as Topic 0, represented as below:

(0, '0.005*"analysis" + 0.005*"study" + 0.005*"models" + 0.004*"prediction" + 0.003*"disease" + 0.003*"performance" + 0.003*"different" + 0.003*"results" + 0.003*"patient" + 0.002*"feature" + 0.002*"system" + 0.002*"accuracy" + 0.002*"diagnosis" + 0.002*"classification" + 0.002*"studies" + 0.002*"medicine" + 0.002*"value" + 0.002*"approach" + 0.002*"variables" + 0.002*"review"'),

Our approach to finding the ideal number of topics is to construct LDA models with different numbers of topics as K and select the model with the highest coherence value. Selecting the ‘K' value that denotes the end of the rapid growth of topic coherence ordinarily offers significant and interpretable topics. Picking a considerably higher value can provide more granular sub-topics if the ‘K’ selection is too large, which can cause the repetition of keywords in multiple topics.

Model perplexity and topic coherence values are − 8.855378536321144 and 0.3724024189689453, respectively. To measure the efficiency of the LDA model is lower the perplexity, the better the model is. Topics and associated keywords were then examined in an interactive chart using the pyLDAvis package, which presents the topics are 20 and most salient terms in those 20 topics, but these 20 topics overlap each other as shown in Fig.  5 , which means the keywords are repeated in these 20 topics and topics are overlapped, which means so decided to use num_topics = 9 and presented PyLDAvis Figure below. Each bubble on the left-hand side plot represents a topic. The bigger the bubble is, the more predominant that topic is. A decent topic will have a genuinely big, non-overlapping bubble dispersed throughout the graph instead of grouped in one quadrant. A topic model with many topics will typically have many overlaps, small-sized bubbles clustered in one locale of the graph, as shown in Fig.  6 .

figure 5

PyLDAvis graph with 20 topics in the clinical domain

figure 6

PyLDAvis graph with nine vital topics in the clinical domain

Each bubble addresses a generated topic. The larger the bubble, the higher percentage of the number of keywords in the corpus is about that topic which can be seen on the GitHub file. Blue bars address the general occurrence of each word in the corpus. If no topic is selected, the blue bars of the most frequently used words are displayed, as depicted in Fig.  6 .

The further the bubbles are away from each other, the more various they are. For example, we can tell that topic 1 is about patient information and studies utilized deep learning to analyze the disease, which can be seen in GitHub file codes ( https://github.com/MachineLearning-UON/Topic-modeling-using-LDA.git ) and presented in Fig.  7 .

figure 7

PyLDAvis graph with topic 1

Red bars give the assessed number of times a given topic produced a given term. As you can see from Fig.  7 , there are around 4000 of the word 'analysis', and this term is utilized 1000 times inside topic 1. The word with the longest red bar is the most used by the keywords having a place with that topic.

A good topic model will have big and non-overlapping bubbles dispersed throughout the chart. As we can see from Fig.  6 , the bubbles are clustered within one place. One of the practical applications of topic modeling is discovering the topic in a provided document. We find out the topic number with the highest percentage contribution in that document, as shown in Fig.  8 .

figure 8

Dominant topics with topic percentage contribution

The next stage is to process the discoveries and find a satisfactory depiction of the topics. A combination of evaluating the most continuous words utilized to distinguish the topic. For example, the most frequent words for the papers in topic 2 are "study" and "analysis", which indicate frequent words for ML usage in the clinical domain.

The topic name is displayed with the topic number from 0 to 8, which represents in the Table 6 , which includes the Topic number and Topic words.

The result represents the percentage of the topics in all documents, which presents that topic 0 and topic 6 have the highest percentage and used in 58 and 57 documents, respectively, with 115 papers. The result of this research was an overview of the exploration areas inside the paper corpus, addressed by 9 topics.

This paper presented a new methodology that is uncommon in scholarly publications. The methodology utilizes ML to investigate sample articles/papers to distinguish research directions. Even though the structure of the ML-based methodology has its restrictions, the outcomes and its ease of use leave a promising future for topic modeling-based systematic literature reviews.

The principal benefit of the methodological framework is that it gives information about an enormous number of papers, with little effort on the researcher's part, before time-exorbitant manual work is to be finished. By utilizing the framework, it is conceivable to rapidly explore a wide range of paper corpora and assess where the researcher's time and concentration should be spent. This is particularly significant for a junior researcher with minimal earlier information on a research field. If default boundaries and cleaning settings can be found for the steps in the framework, a completely programmed gathering of papers could be empowered, where limited works have been introduced to accomplish an overview of research directions.

From a literature review viewpoint, the advantage of utilizing the proposed framework is that the inclusion and exclusion selection of papers for a literature review will be delayed to a later stage where more information is given, resulting in a more educated dynamic interaction. The framework empowers reproducibility, as every step can be reproduced in the systematic review process that ultimately empowers with transparency. The whole process has been demonstrated as a case concept on GitHub by future researchers.

The study has introduced an intelligent literature review framework that uses ML to analyze existing research documents or articles. We demonstrate how topic modeling can assist literature review by reducing the manual screening of huge quantities of literature for more efficient use of researcher time. An LDA algorithm provides default parameters and data cleaning steps, reducing the effort required to review literature. An additional advantage of our framework is that the intelligent literature review offers accurate results with little time, and it comprises traditional ways to analyze literature and LDA topic modeling.

This framework is constructed in a step-by-step manner. Researchers can use it efficiently because it requires less technical knowledge than other ML algorithms. There is no restriction on the quantity of the research papers it can measure. This research extends knowledge to similar studies in this field [ 12 , 22 , 23 , 26 , 30 , 46 ] which present topic modeling. The study acknowledges the inspiring concept of smart literature defined by Asmussen and Møller [ 3 ]. The researchers previously provided a brief description of how LDA is utilized in topic modeling. Our research followed the basic idea but enhanced its significance to broaden its scale and focus on a specific domain such as the clinical domain to produce insights from existing research articles. For instance, Székely and Vom [ 46 ] utilized natural language processing to analyze 9514 sustainability reports published between 1999 and 2015. They identified 42 topics but did not develop any framework for future researchers. This was considered a significant gap in the research. Similarly, Kushwaha et al. [ 22 ] used a network analysis approach to analyze 10-year papers without providing any clear transparent outcome (e.g., how the research step-by-step produces an outcome). Likewise, Asmussen and Møller [ 3 ] developed a smart literature review framework that was limited to analyzing 650 sample articles through a single method. However, in our research, we developed an intelligent literature review that combines traditional and LDA topic modeling, so that future researchers can get assistance to gain effective knowledge regarding literature review when it becomes a state-of-the-art in research domains.

Our research developed a more effective intelligent framework, which combines traditional literature review and topic modeling using LDA, which provides more accurate and transparent results. The results are shared via public access on GitHub using this link https://github.com/MachineLearning-UON/Topic-modeling-using-LDA.git .

This paper focused on creating a methodological framework to empower researchers, diminishing the requirement for manually scanning documents and assigning the possibility to examine practically limitless. It would assist in capturing insights of an enormous number of papers quicker, more transparently, with more reliability. The proposed framework utilizes the LDA's topic model, which gathers related documents into topics.

A framework employed topic modeling for rapidly and reliably investigating a limitless number of papers, reducing their need to read individually, is developed. Topic modeling using the LDA algorithm can assist future researchers as they often need an outline of various research fields with minimal pre-existing knowledge. The proposed framework can empower researchers to review more papers in less time with more accuracy. Our intelligent literature review framework includes a holistic literature review process (conducting, planning, and reporting the review) and an LDA topic modeling (pre-processing, topic modeling, and post-processing stages), which conclude the results of 115 research articles are relevant to the search.

The automation of topic modeling with default parameters could also be explored to benefit non-technical researchers to explore topics or related keywords in any problem domain. For future directions, the principal points should be addressed. Future researchers in other research fields should apply the proposed framework to acquire information about the practical usage and gain ideas for additional advancement of the framework. Furthermore, research in how to consequently specify model parameters could extraordinarily enhance the ease of use for the utilization of topic modeling for non-specialized researchers, as the determination of model parameters enormously affects the outcome of the framework.

Future research may be utilized more ML analytics tools as complete solution artifacts to analyze different forms of big data. This could be adopting design science research methodologies for benefiting design researchers who are interested in building ML-based artifacts [ 15 , 28 , 29 , 31 , 32 , 33 ].

Availability of data and materials

Data will be supplied upon request.

LDA is a probabilistic method for topic modeling in text analysis, providing both a predictive and latent topic representation.

Abbreviations

The Institute of Electrical and Electronics Engineers

  • Machine learning
  • Latent Dirichlet Allocation

Organizational Capacity

Latent Semantic Indexing

Latent Semantic Analysis

Non-Negative Matrix Factorization

Parallel Latent Dirichlet Allocation

Pachinko Allocation Model

Abuhay TM, Kovalchuk SV, Bochenina K, Mbogo G-K, Visheratin AA, Kampis G, et al. Analysis of publication activity of computational science society in 2001–2017 using topic modelling and graph theory. J Comput Sci. 2018;26:193–204.

Article   Google Scholar  

Adlung L, Cohen Y, Mor U, Elinav E. Machine learning in clinical decision making. Med. 2021;2(6):642–65.

Asmussen CB, Møller C. Smart literature review: a practical topic modeling approach to exploratory literature review. J Big Data. 2019;6(1):1–18.

Beck CT. A meta-synthesis of qualitative research. MCN Am J Mater Child Nurs. 2002;27(4):214–21.

Behera RK, Bala PK, Dhir A. The emerging role of cognitive computing in healthcare: a systematic literature review. Int J Med Informatics. 2019;129:154–66.

Blei DM. Probabilistic topic models. Commun ACM. 2012;55(4):77–84.

Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003;3:993–1022.

MATH   Google Scholar  

Brnabic A, Hess LM. Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making. BMC Med Inform Decis Mak. 2021;21(1):1–19.

Cabitza F, Locoro A, Banfi G. Machine learning in orthopedics: a literature review. Front Bioeng Biotechnol. 2018;6:75.

Chang C-H, Lin C-H, Lane H-Y. Machine learning and novel biomarkers for the diagnosis of Alzheimer’s disease. Int J Mol Sci. 2021;22(5):2761.

Connor KL, O’Sullivan ED, Marson LP, Wigmore SJ, Harrison EM. The future role of machine learning in clinical transplantation. Transplantation. 2021;105(4):723–35.

Dias R, Torkamani A. Artificial intelligence in clinical and genomic diagnostics. Genome Med. 2019;11(1):1–12.

DiMaggio P, Nag M, Blei D. Exploiting affinities between topic modeling and the sociological perspective on culture: application to newspaper coverage of US government arts funding. Poetics. 2013;41(6):570–606.

Forest P-G, Martin D. Fit for Purpose: Findings and recommendations of the external review of the Pan-Canadian Health Organizations: Summary Report: Health Canada Ottawa, ON; 2018.

Genemo H, Miah SJ, McAndrew A. A design science research methodology for developing a computer-aided assessment approach using method marking concept. Educ Inf Technol. 2016;21(6):1769–84.

Greene D, Cross JP. Exploring the political agenda of the european parliament using a dynamic topic modeling approach. Polit Anal. 2017;25(1):77–94.

Grimmer J. A Bayesian hierarchical topic model for political texts: measuring expressed agendas in Senate press releases. Polit Anal. 2010;18(1):1–35.

Grimmer J, Stewart BM. Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit Anal. 2013;21(3):267–97.

Hassan N, Slight R, Weiand D, Vellinga A, Morgan G, Aboushareb F, et al. Preventing sepsis; how can artificial intelligence inform the clinical decision-making process? A systematic review. Int J Med Inform. 2021;150:104457.

Hirt R, Koehl NJ, Satzger G, editors. An end-to-end process model for supervised machine learning classification: from problem to deployment in information systems. Designing the Digital Transformation: DESRIST 2017 Research in Progress Proceedings of the 12th International Conference on Design Science Research in Information Systems and Technology Karlsruhe, Germany 30 May-1 Jun; 2017: Karlsruher Institut für Technologie (KIT).

Koltsova O, Koltcov S. Mapping the public agenda with topic modeling: the case of the Russian live journal. Policy Internet. 2013;5(2):207–27.

Kushwaha AK, Kar AK, Dwivedi YK. Applications of big data in emerging management disciplines: a literature review using text mining. Int J Inf Manag Data Insights. 2021;1(2):100017.

Google Scholar  

Li S, Wang H. Traditional literature review and research synthesis. The Palgrave handbook of applied linguistics research methodology. 2018:123–44.

Magrabi F, Ammenwerth E, McNair JB, De Keizer NF, Hyppönen H, Nykänen P, et al. Artificial intelligence in clinical decision support: challenges for evaluating AI and practical implications. Yearb Med Inform. 2019;28(01):128–34.

Maier D, Waldherr A, Miltner P, Wiedemann G, Niekler A, Keinert A, et al. Applying LDA topic modeling in communication research: toward a valid and reliable methodology. Commun Methods Meas. 2018;12(2–3):93–118.

Mårtensson G, Ferreira D, Granberg T, Cavallin L, Oppedal K, Padovani A, et al. The reliability of a deep learning model in clinical out-of-distribution MRI data: a multicohort study. Med Image Anal. 2020;66:101714.

Mendo IR, Marques G, de la Torre DI, López-Coronado M, Martín-Rodríguez F. Machine learning in medical emergencies: a systematic review and analysis. J Med Syst. 2021;45(10):1–16.

Miah SJ. An ontology based design environment for rural business decision support. Nathan: Griffith University Nathan; 2008.

Miah SJ, A new semantic knowledge sharing approach for e-government systems. 4th IEEE International Conference on Digital Ecosystems and Technologies; 2010: IEEE.

Miah SJ, Camilleri E, Vu HQ. Big Data in healthcare research: a survey study. J Comput Inf Syst. 2021. https://doi.org/10.1080/08874417.2020.1858727 .

Miah SJ, Gammack J, Kerr D, Ontology development for context-sensitive decision support. Third International Conference on Semantics, Knowledge and Grid (SKG 2007); 2007: IEEE.

Miah SJ, Gammack JG. Ensemble artifact design for context sensitive decision support. Australas J Inf Syst. 2014. https://doi.org/10.3127/ajis.v18i2.898 .

Miah SJ, Gammack JG, McKay J. A metadesign theory for tailorable decision support. J Assoc Inf Syst. 2019;20(5):4.

Mimno D, Blei D, editors. Bayesian checking for topic models. Proceedings of the 2011 conference on empirical methods in natural language processing; 2011.

Oala L, Murchison AG, Balachandran P, Choudhary S, Fehr J, Leite AW, et al. Machine learning for health: algorithm auditing & quality control. J Med Syst. 2021;45(12):1–8.

Ouhbi S, Idri A, Fernández-Alemán JL, Toval A. Requirements engineering education: a systematic mapping study. Requir Eng. 2015;20(2):119–38.

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2020;372:n71.

Quinn KM, Monroe BL, Colaresi M, Crespin MH, Radev DR. How to analyze political attention with minimal assumptions and costs. Am J Polit Sci. 2010;54(1):209–28.

Rowley J, Slack F. Conducting a literature review. Management research news. 2004.

Rozas LW, Klein WC. The value and purpose of the traditional qualitative literature review. J Evid Based Soc Work. 2010;7(5):387–99.

Sabharwal R, Miah SJ. A new theoretical understanding of big data analytics capabilities in organizations: a thematic analysis. J Big Data. 2021;8(1):1–17.

Salazar-Reyna R, Gonzalez-Aleu F, Granda-Gutierrez EM, Diaz-Ramirez J, Garza-Reyes JA, Kumar A. A systematic literature review of data science, data analytics and machine learning applied to healthcare engineering systems. Management Decision. 2020.

Shah P, Kendall F, Khozin S, Goosen R, Hu J, Laramie J, et al. Artificial intelligence and machine learning in clinical development: a translational perspective. NPJ Digit Med. 2019;2(1):1–5.

Sone D, Beheshti I. Clinical application of machine learning models for brain imaging in epilepsy: a review. Front Neurosci. 2021;15:761.

Spasic I, Nenadic G. Clinical text data in machine learning: systematic review. JMIR Med Inform. 2020;8(3):e17984.

Székely N, Vom Brocke J. What can we learn from corporate sustainability reporting? Deriving propositions for research and practice from over 9,500 corporate sustainability reports published between 1999 and 2015 using topic modelling technique. PLoS ONE. 2017;12(4):e0174807.

Verma D, Bach K, Mork PJ, editors. Application of machine learning methods on patient reported outcome measurements for predicting outcomes: a literature review. Informatics; 2021: Multidisciplinary Digital Publishing Institute.

Weng W-H. Machine learning for clinical predictive analytics. Leveraging data science for global health. Cham: Springer; 2020. p. 199–217.

Book   Google Scholar  

Yin Z, Sulieman LM, Malin BA. A systematic literature review of machine learning in online personal health data. J Am Med Inform Assoc. 2019;26(6):561–76.

Download references

Acknowledgements

Not applicable.

Author information

Authors and affiliations.

Newcastle Business School, The University of Newcastle, Newcastle, NSW, Australia

Renu Sabharwal & Shah J. Miah

You can also search for this author in PubMed   Google Scholar

Contributions

The first author conducted the research, while the second author has ensured quality standards and rewritten the entire findings linking to underlying theories. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Renu Sabharwal .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests, additional information, publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Sabharwal, R., Miah, S.J. An intelligent literature review: adopting inductive approach to define machine learning applications in the clinical domain. J Big Data 9 , 53 (2022). https://doi.org/10.1186/s40537-022-00605-3

Download citation

Received : 18 November 2021

Accepted : 06 April 2022

Published : 28 April 2022

DOI : https://doi.org/10.1186/s40537-022-00605-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Clinical research
  • Systematic literature review

literature review example machine learning

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 30 April 2024

Exploring post-COVID-19 health effects and features with advanced machine learning techniques

  • Muhammad Nazrul Islam 1 ,
  • Md Shofiqul Islam 1 ,
  • Nahid Hasan Shourav 1 ,
  • Iftiaqur Rahman 1 ,
  • Faiz Al Faisal 2 ,
  • Md Motaharul Islam 3 &
  • Iqbal H. Sarker 4  

Scientific Reports volume  14 , Article number:  9884 ( 2024 ) Cite this article

2 Altmetric

Metrics details

  • Computational biology and bioinformatics
  • Health care
  • Medical research

COVID-19 is an infectious respiratory disease that has had a significant impact, resulting in a range of outcomes including recovery, continued health issues, and the loss of life. Among those who have recovered, many experience negative health effects, particularly influenced by demographic factors such as gender and age, as well as physiological and neurological factors like sleep patterns, emotional states, anxiety, and memory. This research aims to explore various health factors affecting different demographic profiles and establish significant correlations among physiological and neurological factors in the post-COVID-19 state. To achieve these objectives, we have identified the post-COVID-19 health factors and based on these factors survey data were collected from COVID-recovered patients in Bangladesh. Employing diverse machine learning algorithms, we utilised the best prediction model for post-COVID-19 factors. Initial findings from statistical analysis were further validated using Chi-square to demonstrate significant relationships among these elements. Additionally, Pearson’s coefficient was utilized to indicate positive or negative associations among various physiological and neurological factors in the post-COVID-19 state. Finally, we determined the most effective machine learning model and identified key features using analytical methods such as the Gini Index, Feature Coefficients, Information Gain, and SHAP Value Assessment. And found that the Decision Tree model excelled in identifying crucial features while predicting the extent of post-COVID-19 impact.

Introduction

It is 2022-2023, and with the blessing of medical science, after the disastrous era of COVID-19, the world is finally seemingly healing from its wounds. But its deep-rooted adversities are still haunting the lives of the affected ones by the post-COVID trauma 1 , 2 , 3 . After a year of recovery, patients still find it challenging to return to everyday life. Many physical and Neurological factors indicate that vulnerabilities such as depression, anxiety, weakness, sleeplessness, etc., have increased alarmingly. Looking at the same person before and after their fight with COVID-19, it becomes clear of post-COVID trauma among them. COVID-19 has physical and neurological effects on our bodies 1 . And these types of factors are also interrelated with each other. For example, energy is significantly related to the sleeplessness of the patient.

Today’s physical and mental problems deeply connect with the patient’s previous COVID infection history 4 . These patients tend to be in mental traumas, neurological disorders, etc 5 . Research has also shown that COVID-19-recovered patients have common memory complaints and suffer from cognitive impairment, seizures, etc 5 , 6 . Thus it is important to explore whether any health problem in today’s era has any connection with the patient’s previous COVID history 7 . There has been much research on this phenomenon with modern approaches like statistical analysis and Machine Learning (ML) algorithms, 2 , 8 , 9 . Thus there comes the urgency for a comprehensive study with the help of statistics and ML models to evaluate the interrelation between before and after COVID health complications. Moreover, ML models may explore the interrelation between the COVID factors and how one factor can influence many others. Such findings can also be strongly supported by statistical analysis of the elements.

For example, machine learning analysis of Post-COVID-19 impact on medical staff and doctor productivity 10 as well as the adverse effects and nonmedical use during the Pre- and Post-COVID-19 outbreak 11 ; interpreting policy effects on air pollution during the COVID-19 lockdown in London with Explainable Machine Learning 12 ; analyzing the impact of COVID-19 in KSA based on Arabic Tweets using Deep Learning 13 ; understand the factors associated with mortality in COVID-19 hospitalized patients using ML 14 ; assessing risks in SME supply chains due to Covid-19 disruptions 15 ; Analyzing Spain’s social mood evolution during COVID-19 Vaccination based on Tweets using ML 16 ; assessing the influence of COVID-19 on human personality 17  and the effects on electricity consumption in distribution networks using ML 18 ; evaluating COVID-19 characteristics and risk factors using the Bayesian Machine Learning and Markov Chain Monte Carlo Techniques 19 ; analyzing factors influencing commercial crime calls using SHAP 20  and the effects for COVID-19 patients with severe hypoxemia using Causal Bayesian ML 21 . Assessing the COVID-19’s psychological consequences using Deep Learnings 22  as well as the Post-COVID-19 Recovery in urban area using Spatial and Deep Learning 23 .

Similarly, different models like Pearson’s coefficient and chi-square values determine how strongly they correlate. For example, the Mediating Influence of Resilience on Academic Stress, COVID-19 Anxiety, and Quality of Life in Nursing Students 24 .

Therefore, the primary objective of this research is to reveal various health issues related to post-COVID; secondly, to explore how much the revealed health factors have been impacted in post-COVID-19 individuals and how these factors are associated/correlated with each other; finally, to find the best-performed ML models for predicting the degree of impact of these health factors on post-COVID-19 individuals having different demographic profiles.

Our paper’s organization is as follows: In the opening section, we provide an introduction and delve into a literature review, with a particular emphasis on identifying the most significant features following the impact of COVID-19. The subsequent section offers a comprehensive view of our methodology and a presentation from an algorithmic standpoint. Moving to the third section, we unveil the results and engage in pertinent discussions. Finally, the fifth section serves as the conclusion of the paper and provides recommendations.

Literature review

COVID-19 has had a major impact on humanity, as seen by the millions of verified cases and fatalities documented globally. Health, the economy, and interpersonal relationships are just a few areas in which the pandemic has significantly influenced. Many studies have been done in reaction to the epidemic to learn more about the virus, how it spreads, and potential cures and vaccinations. Scientists and healthcare experts are working nonstop to lessen the pandemic’s consequences and create successful long-term management plans. Shanbehzadeh et al. 1 found some physical and mental issues in COVID-19 survivors with follow-up intervals of up to 3 months after COVID-19. The most frequent physical health issues were tiredness, pain, arthralgia, decreased physical capacity, reductions in physical role functioning, routine care, and daily activities. Anxiety, depression, and post-traumatic stress disorder were the three most prevalent mental health issues. Female patients and those admitted to critical care reported higher exhaustion, discomfort, anxiety, and sadness levels. Up to three months after COVID-19, overall, a lower quality of life was noticed. Matsumoto et al. 2 work, it was found that 37.0% of the 763 participants, the 135 COVID-19 survivors had COVID-19-related aftereffects. First, the findings of the Mann Whitney U test with Bonferroni correction revealed that the SARS-CoV-2-infected group with post-COVID conditions had substantially higher scores on all clinical symptom measures than the non-infected group and those without one (P .05). The Chi-squared test findings showed that there was a significant difference in the incidence rates of clinically relevant mental symptoms among each group (P.001). Ultimately, the multivariate logistic model’s findings showed that participants with post-COVID disorders had a 2.44-3.48 times greater likelihood of experiencing more severe clinical symptoms. Additionally, Ahmed et al. 3 showed that 16 individuals (8.8%) out of 182 had no sleep or mental health issues. 118 individuals (64.8%) reported having trouble sleeping, and 52 participants (28.6%) showed signs of probable PTSD. Somatization (41.8%) had the largest symptomatology percentage, followed by anxiety (28%), anger-hostility (15.9%), phobic anxiety (24.2%), obsessive-compulsive (19.8%), interpersonal sensitivity (0.5%), depression (11.5%), paranoid ideation (10.4%), and psychoticism (17.6%).

García-Sánchez et al. 25 discussed that attention abilities had a widespread influence, both as the only impacted domain (19% of single-domain impairment) and in combination with lowered performance in organizational processes, learning, and long-term memory. These prominent executive and attentional impairments were essentially independent of clinical elements like hospitalization, the severity of the illness, biomarkers, or emotional assessments. For the first time, Benedetti et al. 4 , explored the post-acute COVID-19 syndrome, inflammatory markers during acute COVID, brain regional GM volumes, DTI assessments of WM microstructure, and resting-state functional connectivity. The significant findings are that post-traumatic symptoms and decreasing GM volumes in the ACC and bilateral insular cortex correlate with WM microstructure and that depressed psychopathology correlates with decreasing GM volumes in the ACC. Moreover, resting-state FC was linked to inflammation and psychopathology, supporting the idea that the structural effect impacts brain function. Tarsitani et al. 6 saw concerningly high rates of PTSD and subthreshold PTSD in hospitalized COVID-19 patients. The proven risk factors for PTSD include female sex and pre-existing mental illnesses. After patients are discharged from the hospital, clinicians treating COVID-19 patients should think about checking for PTSD during follow-up examinations. Besides, Ahmed et al. 5 examined and found that 19.2% of COVID patients had memory difficulties in the study. He also discovered that steroids and antibiotics were linked to memory impairment among the treatment modalities, according to individual predictor analyses. According to multiple logistic regression, those who recovered from COVID-19 within six to twelve months were more likely to suffer memory problems. Although there was no correlation between age, sex, oxygen demand, or hospitalization and memory problems, rural inhabitants had more serious memory complaints than urban residents.

Moreover, Sher 26 found that psychiatric, neurological, and physical disease symptoms are likely to exacerbate suicidal ideation and behavior in this patient population, as are brain inflammation and post-COVID syndrome symptoms. Without post-COVID syndrome, COVID-19 survivors may potentially have a higher risk of suicide. More proof is identified by Pistarini et al. 27 described patients with cognitive abnormalities who were treated in COVID and post-COVID functional rehabilitation programs. According to the MoCA examination, specifically, 75% of COVID patients and 70% of post-COVID patients showed cognitive abnormalities. These findings demonstrate the severity and protracted nature of the neurological and mental effects that can result from COVID-19 infection.

To sum up, no review study explicitly focused on exploring all possible health issues or factors in post-COVID-19 patients. Moreover, limited research has been conducted to evaluate the impact of the factors after COVID-19 recovery. Whether or not any relationship exists among these health factors has fallen into the research gap. Besides, the application of ML models to detect post-COVID-19 issues needs to be explored in the research area.

Here, we have listed the main contribution of the research:

Explore health complications related to post-COVID-19 by identifying 17 significant Physiological and Neurological health factors.

Examine the independent influence of each factor and their interconnections rigorously.

Select the most important feature named Anxiety from the outcome of the best-performed four ML models (with feature ranking and comparative analysis); and the Decision Tree algorithm demonstrated the highest accuracy in predicting post-anxiety levels.

Methodology overview

The research methodology is divided into three phases as shown in Fig. 1 . Firstly, we explored the health factors by reviewing the related literature. Secondly, necessary data were collected from a study group, i.e., post-COVID individuals. Finally, data analysis was performed through statistical and ML-based approaches. Extracting top features of post effect of best-performed ML models.

figure 1

Methodological overview.

Exploration of the health factors

We have reviewed various research articles published in 2022 and 2023 to explore the health factors. Our aspect is the post-COVID scenario, so we stuck to research articles limited to this genre. At the same time, the search was performed in scholarly databases like IEEE Explore, Google Scholar, ACM, Digital Libraries, ResearchGate, etc. As an outcome, 17 post-COVID health complications or factors were revealed, such as stress disorders, cognitive impairment, impulsiveness, etc. The indicated factors were categorized into two major categories: Physiological Factors(Chest pain, sleeplessness, fainting) and Neurological Factors(Anxiety, depression, confidence).

Data acquisition

Preparing questionnaire: A questionnaire with a total of 13 questions was prepared by considering all (17) revealed factors, each having questions related to the condition before COVID-19 and another related to the health condition after COVID-19. The Rating Scale for the target class in numeric value as 5, 4, 3, 2, 1 for Strongly Agree, Agree, Neutral, Disagree, and Strongly Disagree respectively.

Data features: In our data set we have used 13 input features named as Features list: \(''gender''\) , ”age”, ”education”, \(''heart_{disease}''\) , ”diabetis”, \(''other_{disease}''\) , “smoking”, \(''blood_{pressure}''\) , ”weight”, \(''work_{type}''\) , “married”, \(''vaccination_{status}''\) , \(''vaccination_dose_status''\) . The total respondents were composed of 600 males and 400 females. All the respondents were vaccinated, and their average age was from 10 to 70 years old. The data has other input samples as : Education category: Higher, mid, and low study, Heart disease category: Yes or No, Diabetes category: Yes or No, Other disease category: Ye or No, Smoking category: Yes, Never, Partial, Blood pressure category: Low, Mid and High. Weight category: High, Mid and Low, Working type: Pvt Job, Self Employed, Govt job and Unemployed, Married category: Yes or No. Vaccination status: Yes or No. Number of vaccination doses taken: One, One with Two, and One, Two with Booster.

Study group : The survey questionnaire was distributed among people in Bangladesh of different age groups, genders, etc. 1000 people with different demographic profiles participated who all suffered from COVID-19.

Data collection approach: The questionnaire set was primarily distributed among the students and faculty members of the authors’ institute via email or Physically. The questionnaire was also distributed following the online distribution methods. Respondents were given two weeks to respond. Moreover, as an Offline approach, we hosted temporary places for volunteer participation and set provisions for gifts for kind participation. Finally, a total of 1000 responses were collected. The whole data collection process was carried out from July 2022–August 2023.

Data validation

Following data collection, we conducted data validation through the expertise of two distinguished medical professionals from renowned institutions in Bangladesh. These two doctors put forth their utmost diligence in labeling the data.

Institutional approval and ethical confirmations

We confirm that all methods were carried out in accordance with the relevant ethical guidelines and regulations by the Research and Development wings of the Military Institute of Science and Technology (MIST), Dhaka-1216.

We confirm that all experimental protocols were approved by the Research and Development wings of the Military Institute of Science and Technology (MIST), Dhaka-1216. This research and its data collection and analysis confirm that all the informed consent was obtained from all subjects and/or their legal guardian(s).

Data sample

Data samples are illustrated in the Table 1 , only 10 samples are given in the table for the target class Anxiety after COVID-19. Other target classes (Post covid effect) are not shown in the table.

Data analysis

  • Statistical analysis

In this step, we statistically analyzed every factor for the before-COVID-19 and after-COVID-19 state. For example, the symptom of Anxiety is investigated for both conditions (before their COVID-19 infection and after the infection).

ML based analysis

After the statistical analysis, the data were trained through various traditional ML models like Decision trees, Random Forest, and Ensemble ML Models such as Adaptive boosting, Gradient boosting, and Extreme gradient boosting.

Evaluation of ML models

Various parameters like Accuracy, Precision, Recall, and F1 score measured the performance of the ML models. The Confusion Matrix was implemented to judge the accuracy along with ROC analysis.

Study outcomes

To achieve our objectives, we conducted an in-depth analysis of the major health complications associated with COVID-19. A comprehensive overview of our findings is presented in Fig.  1 . Our research identified 17 significant health factors, categorized as Physiological and Neurological, which played a pivotal role in our study. Using these factors as a foundation, we conducted surveys with individuals who had recovered from COVID to assess their conditions both before and after their illness. We rigorously subjected this survey data to statistical analysis, unveiling how each of these 17 factors independently influences patients and exploring their interconnections. This marks the accomplishment of our second objective. Subsequently, we proceeded to identify the most effective predictive models for determining the extent of influence exerted by these health factors. Notably, the Decision Tree algorithm exhibited the highest accuracy in predicting anxiety levels, which serves as our ultimate objective. In our final stage, we identified the key features in the post-effect of the best-performing machine learning model. We employed a variety of methods, including feature importance analysis, Gini index, information gain, feature importance permutation, and SHAP value analysis, to uncover these essential insights of important features of post-COVID-19 effects. The primary outcomes of our study are:

Our research focused on analyzing major health complications related to COVID-19 by identifying 17 significant health factors categorized as Physiological and Neurological.

We conducted surveys with recovered COVID-19 patients to assess the impact of these factors on their health before and after their illness.

We rigorously analyzed the survey data to examining the independent influence of each factor and their interconnections.

We chose the most important feature named Anxiety from the outcome of survey study frequency. Among four ML models, the Decision Tree algorithm demonstrated the highest accuracy in predicting anxiety levels.

In our final stage, we identified key features in the post-effects of the best-performing machine learning model through various methods, providing valuable insights into post-COVID-19 effects.

Revealing the health factors due to COVID-19

In the last two and a half years, the COVID-19 pandemic has drastically affected millions worldwide. The impact hammers on physical and mental health problems in the post-COVID-19 state 1 . This phenomenon raises the necessity to investigate the relationship between post-COVID conditions and mental health 2 . Primarily, the investigation shows that coronavirus has a long-term effect of post-COVID-19 disease on sleep and mental illness, which also opens the door to detecting possible relationships between the severity of COVID-19 at the onset and sleep and mental illness 3 . Coronavirus affects the brain by bypassing the blood-brain barrier (BBB) in blood or via monocytes which could reach brain tissue via circumventricular organs 7 . Importantly, research shows a prominent frequency of impaired performance across cognitive domains in post-COVID patients with subjective complaints 25 . At the same time, the discovery of inflammatory biomarkers in COVID-19 survivors has come into broad light through MRI samples and other means 4 . One out of five patients hospitalized for COVID-19 was diagnosed with PTSD or subthreshold PTSD at a 3-month follow-up 6 . Potential contributing factors cause post-COVID-19 patients to suffer from different memory complaints 5 . Moreover, some psychiatric issues like ’depression’ prevail in COVID recovery patients, which causes a 25 times greater risk for suicide than the general population 26 . A summary of data from last year about the impacts on physical, cognitive, and neurological health disorders in COVID-19 survivors suggests three crucial aspects to manage: nutritional status, neurological disorders, and physical health 28 . So, the impaired cognitive deficits and emotional distress among COVID-19 patients should be addressed by functional rehabilitation 27 . Side by side, a brief study is to be analyzed on post-COVID-19 pandemic era mental health issues, vulnerable populations, and risk factors, as well as recommending a universal approach for mental health care and services 29 . Physiological and Neurological factors have been examined, with 39% classified as Physiological and 61% as Neurological. Neurological factors influence the mind and are connected to a person’s mental and emotional state. 30 . Here anxiety is a major Neurological factor among post-COVID patients with a frequency rating of 8 as shown in Table 2 . Anxiety is the most common mental illness in post-COVID 1 . Physiological factors deal with the functions of a living organism and its parts 30 . Fatigue is one of the most frequent alterations of post-COVID patients as shown in Table 2 . Over the past three years, extensive research has explored physiological and neurological health complications in the aftermath of COVID-19. We reviewed 23 research articles using keywords like mental health, cognitive impairment, and post-COVID trauma. From these studies, we identified 17 health factors associated with COVID infection, including fatigue, forgetfulness, and anxiety. These factors were categorized into two groups: Physiological and Neurological. Notably, 39% are Physiological factors, while 61% are Neurological factors, impacting the mind and emotional well-being 30 . Here anxiety is a major neurological factor among post-COVID patients with a frequency rating of 8 as shown in the Table 2 . Anxiety is the most common mental illness in post-COVID 1 . Physiological factors deal with the functions of a living organism and its parts 30 . Fatigue is one of the most frequent alterations of post-COVID patients Table 2 .

In this way, all revealed health factors are listed in Table 2 along with references and frequency of presence in those references.

Among the 17 factors we have divided them into two categories, as shown in Table 2 ;

Physiological factors : Physiological factors deal with the functions of a living organism and its parts 30 . For example, fatigue is one of the most frequent alterations of post-COVID patients in Table 2 . There are 7 physiological factors identified among all post-COVID-19 factors in this study, as shown in Table 2 .

Neurological factors : Neurological factors are the one that influences or affects the mind and are related to the mental and emotional state of a person 30 . For example, anxiety is the most common mental illness in post-COVID 1 . There are 10 neurological factors identified among all post-COVID-19 factors in this study, as shown in Table 2 .

We have given a statistical overview of our data in Fig.  2 to make our data more understandable. Data statistics, such as count, min, max, mean, standard deviation, variance, and median, are essential for understanding a dataset. Count shows dataset size, min/max indicates its range, mean reflects central tendencies, standard deviation measures data spread, and variance quantifies overall variability. The median is a robust central measure. These stats form the foundation for data summary, with quartiles, percentiles, skewness, and kurtosis for deeper dataset analysis.

figure 2

Statistical overview of data.

Feature correlation

Feature correlation in Figs.  3 and 4 gives a statistical measure that assesses the degree of association or relationship among features (variables) in our dataset. It quantifies how these features tend to vary together, providing insights into their dependencies. The advantages of this feature correlation (pearson) analysis in Fig. 4 (Full information is shown in Fig.  5 ) includes its utility in identifying redundant or highly informative features for best model performance, detection of multicollinearity in regression analysis, simplifying data exploration by revealing hidden patterns and relationships, aiding in model interpretability, and facilitating feature engineering by leveraging the knowledge of feature associations to create new informative variables. Pearson correlation, is a crucial data science tool. It quantifies the strength and direction of the linear relationship between two continuous variables, with values ranging from −1 to 1. This technique is widely employed in statistics and data analysis to uncover connections, patterns, and dependencies within complex datasets.

figure 3

Pearson correlation value for all to all input features.

figure 4

Overview of target class—anxiety.

figure 5

TNSE visualization of features for after anxiety.

Evaluating significant association

The chi-square test is one of the methods to find out the association i.e. relationship among the categorical variables. The relationship can be significant or insignificant. The standard P-value is considered as 0.05 and any p-value having less than 0.05 is considered to have a significant association i.e. relationship among variables as shown in Fig.  3 . In this research, the survey dataset has the responses i.e. level of impact on various physiological & neurological factors. These factors are considered categorical variables. The chi-square test is applyed on all factors and we got P -value for them which is shown in Fig.  3 . In the Table 3 , calculated p-values less than 0.05 are marked with Grey color. These values with corresponding Factors are analysed to possess significant relationships among them.

From the Fig.  3 , we can see all comparing factors have an association between them, Some basic features association as follows: a. Chest Pain & Unhappiness b. Unhappiness & Forgetfulness c. Depression & vigilance d. Chest pain & confidence e. Confidence & vigilance f. Energy & confidence g. Sleep & attentiveness h. Attentiveness & vigilance i. Sleep & determination j. Determination & vigilance and k. Fear of COVID & energetic

Exploring positive and negative correlation

Pearson correlation coefficient is a unit measuring the strength of the linear relationship between two variables. This is represented as the ’r-value.’ ’R-value’ results in the range from −1 to 1. +1 represents the positive correlations(direct relationship), 0 shows no relationship & −1 represents the negative correlations(inverse relationship). In the research, the physiological & neurological factors of the dataset are depicted as variables. The Pearson correlation coefficient is calculated for all factors, and we got the R-value for them shown in Fig.  3 . The R-values above 0.05 are considered for positive/direct relation between the factors. This means an increase in one factor may influence and increase the degree of another factor. R-values below 0(in the -ve range) are considered for Inverse relation between factors. This means a Decrease in one factor may influence and Decrease another factor. The Pearson correlation revealed a strong positive relationship between the two variables, with a correlation coefficient of 0.85, indicating a significant and direct association.

Feature ranking using regression model OLS

Feature importance analysis shown in Fig.  3 using the Ordinary Least Squares (OLS) regression model is a valuable technique in data analysis and predictive modeling. In this table, we renamed each feature name and labeled it from 1 to 13. In the context of feature importance, OLS can reveal the impact of each independent variable on the dependent variable. Larger coefficient values indicate stronger feature importance, while coefficients near zero suggest less relevance. This analysis aids in feature selection, helping us focus on the most influential variables for building predictive models or understanding the factors that drive specific outcomes in the data. Based the outcome shown in Table 3 , the most important feature is 13(with a score of 1.5447) and the less important feature is 1(with a score -1.0443).

figure a

Training algorithm for anxiety analysis.

Impact on post-COVID-19 health factors: before-to-after

Firstly, the compiled dataset is used for Statistical Analysis to explore whether any impact exists on the factors due to COVID-19 or not. The dataset possesses the info of both the Before and After conditions of the factors. The x-axis shows the categories/responses of people on how much each factor, like anger, depression, etc is affected. Y-Axis shows the percentage of how many persons are acknowledged in each category. In Fig.  4 b, we present a comparative view of anxiety before and after COVID-19. The blue color represents the degree of impact for the factors before being affected by COVID-19. The red color represents the status after suffering from the disease.

Before COVID-19 state, no people strongly agreed on having Anxiety over their COVID issue, but the percentage jumped to 16.67% who strongly agreed after suffering from it. The graph follows the same pattern in the subsequent remarks. Comparing the before & after situations, it can be concluded that after suffering from COVID-19, a large number of people got the new problem whereas the people having previous Anxiety issues remained the same/more. In Fig. 4 a, we present a complete view of anxiety amount before and after COVID-19.

It is such a factor that shows most of the patients are suffering from depression more after COVID. 23.33% and 36.67% patients either strongly agreed or agreed respectively on this matter. This figure has risen from 16.67% and 20.00% before COVID. While 36.67% disagreed on this matter before COVID the figure came down to only 10.00% after COVID. Depression, in human life, has increased after COVID-19

Unhappiness

On the factor of unhappiness, 33.33%, and 26.67% people agreed on their unhappy life before and after COVID respectively. However we see an almost inverse trend on the neutral point of view among the patients. Thus comparing the before & after situation, it can be visualized that after suffering from COVID-19, unhappiness has decreased among the patients.

The degree of confidence before and after the COVID-19 era shows a drastic change in people’s mentality. Before COVID-19 state, 56.67% of people agreed on their degree of confidence but COVID had hit hard on their lifestyle shifting down to 20% confidence degree after COVID. The same trend was seen in the disagreement chart. Comparing the before & after situation, it can be concluded that after suffering from COVID-19, the majority of the people’s confidence in themselves was shattered.

Forgetfulness

Regarding forgetfulness, double the number of patients either agreed or strongly agreed that they forgot things now more after suffering from it. Thus, COVID has fatally affected the patients’ memory, resulting in curbing their brains.

Before suffering from COVID, about 60% people agreed that they were more patient in life, but the percentage abruptly dropped to half who decided to be after suffering from COVID. But none Strongly Disagreed in this regard, neither before nor after. Thus comparing the before & after situation, it can be visualized that after suffering from COVID-19, vigilance has decreased by almost half or beyond among the patients.

Before the COVID-19 state, most people (56.67%) agreed about being more energetic, whereas the percentage increased in favor disagreement (36.67% disagree, 10% strongly disagree) in the post-COVID state. Comparing the before & after situations, it can be depicted that after suffering from COVID-19, people are becoming significantly less energetic.

Before COVID-19 state, no people strongly agreed about having chest pain, but the percentage jumped to 23.33% who strongly agreed after suffering from COVID. Comparing the before & after situations, it can be concluded that after suffering from COVID-19, a large number of people got the new problem, whereas the people having previous chest pain history remained the same/more.

Before COVID-19 state, about 36.67% of people agreed that they experienced more sleep, but the percentage decreased to 33.33% who agreed after suffering from COVID. Comparing the before & after situations, it can be concluded that after suffering from COVID-19, experiencing sound sleep conditions shows a sight-decreasing tendency.

Before COVID-19 state, about 43% of people were NEUTRAL about their anger problem, whereas 40% people agreed about the problem. Comparing the before & after situations, it can be concluded that after suffering from COVID-19, most people agreed that their anger has increased.

Before the COVID-19 state, most people (50%) disagreed about having dizziness problems, but the percentage is rising in favor of strongly agree (16.67%) and agree (36.67) in the post-COVID state. Comparing the before & after situations, it can be concluded that after suffering from COVID-19, dizziness is slowly increasing among people after COVID.

Impulsiveness

Before the COVID-19 state, a few people (3.33%) strongly agreed that they had been impulsive, but the percentage increased to 20% who strongly agreed after suffering from COVID. Comparing the before & after situations, it can be concluded that after suffering from COVID-19, people show a sight-increasing impulsiveness tendency.

Before suffering from COVID, about 60% of people agreed that they were more vigilant, but the percentage abruptly fell to 16.67% who agreed after suffering from COVID. At the same time, disagreement degrees increased in the post-COVID situation. Comparing the before & after situations, it can be visualized that after suffering from COVID-19, vigilance has decreased dramatically among the patients.

Determining correlation among health factors: factor-to-factor

The revealed health factors are analysed to check whether any significant or meaningful relationship exists between them.

Evaluating relationship among health factors

The preprocessed dataset visualizes some important information. Explored information shows obvious relationship among the Health Factors. Bar-chart shown in Fig.  4 b depicts the inherent relationship between two factors(like After Anxiety-to-before Anxiety). Various factors revealed a significant relationship. They are illustrated below :

Anxiety-to-energetic

About 53.33% of people agreed on Anxiety problems after suffering COVID-19 which is higher than the number of people (43.33%) who agreed before COVID-19. Again, 16.67% of people strongly agreed after COVID-19, whereas no person strongly agreed. Besides, 56.67% people agreed & 13.33% people strongly agreed that they were more energetic before COVID-19, whereas only 16.67% people agreed & 6.67% people strongly agreed on the issue after COVID-19. It can be seen that the amount of disagreement is higher, which is about 36.67% after the COVID-19 state, which means that patients got less energetic after COVID-19. Thus, Fig.  4 b visualizes that Anxiety has increased among the patients. At the same time, they become less energetic after suffering from COVID-19.

Depression-to-vigilance

About 36.67% of people agreed on having depression after suffering from COVID-19, which is higher than the number of people (20%) who agreed before COVID-19. Again, 23.33% people strongly agreed after COVID-19, whereas 16.67% strongly agreed on the issue before COVID. Besides, 60% people agreed that they were more vigilant before COVID-19, whereas only 16.67% people decided after COVID-19. It can be seen that the amount of disagreement is more (Neutral, Disagree, Strongly disagree) after the COVID-19 state means that patients are becoming less vigilant after COVID-19.

Thus, the graph outlines that depression has increased among the patients. At the same time, they have become less vigilant after suffering from COVID-19.

Confidence-to-energetic

More than half of the people (56.67%) agreed on having more confidence before COVID-19, which is higher than the number of people (20%) who agreed after suffering from COVID-19. At the same time, 56.67% people agreed that they felt more energetic before COVID-19 but only few people (16.67%) agreed after COVID-19.

Thus, the graph shows that the Confidence degree has decreased abruptly among the patients with the sudden decrease in energy degree after suffering from COVID-19.

Chest pain-to-unhappiness

The graph shows that about 23.33% people strongly agreed that they got more or newly generated chest pain at post-COVID state. Besides, about 20% of people said strongly about their unhappy state after suffering from COVID.

Thus, the graph represents that a considerable amount of people have grown chest pain which causes an unhappy state of people higher than previous.

Confidence-to-chest pain

More than half of the people (56.67%) agreed on having more confidence before COVID-19 which is higher than the number of people (20%) who agreed after suffering from COVID-19. Besides, 23.33% of people strongly agreed that they grew more chest pain after COVID-19 whereas no person strongly agreed before COVID-19.

Thus, the graph shows that the Confidence degree has decreased among the patients. At the same time, there is a high tendency to gain chest pain after suffering from COVID-19.

Sound sleep-to-attentiveness

This graph shows that there is a slight decrease in patients’ sleep conditions before & after suffering from COVID-19. Similarly, the percentage of attentiveness also a little low situation in pre & post-COVID-19 situations.

Thus, the graph shows that the degree of sleep & attentiveness slightly decreased in post COVID state.

Development of predictive models

In our analysis, we employ a data-driven approach to choose the most relevant features by evaluating their frequency in the existing literature. From this literature analysis, we identify the top two features. The primary focus of our analysis is on the feature labeled Anxiety specifically concerning its prevalence and impact in the context of post-COVID-19. We aim to harness machine learning algorithms to delve deeper into understanding and potentially predicting the various aspects of anxiety in individuals who have recovered from COVID-19.

Data overview

The data overview shown in Fig.  4 for the target class amount is to gain a clear understanding of the distribution of the target class within the dataset. The amount of each target class (after anxiety) is presented in Fig.  4 a. The differences of the target class amount before and after covid-19 is shown in Fig.  4 b This overview helps data analysts and machine learning practitioners assess whether the dataset is balanced or imbalanced, which is crucial for making informed decisions regarding model selection, evaluation, and potential data preprocessing techniques to address class imbalances.

3D visualization of data

The purpose of using t-SNE (t-Distributed Stochastic Neighbor Embedding) visualization shown in Fig.  5 is to present reduced dimensionality of complex datasets while preserving meaningful patterns and structures. It is particularly valuable for exploring and visualizing high-dimensional data in a lower-dimensional space, making it easier to identify clusters, similarities, and relationships between data points.

Data Preprocessing

Data preprocessing is followed by survey data collection. The raw data is full of missing values and outliers. Frequently used values replace the categorical missing values. The mean value replaces numerical missing values. It is an essential step in preparing data for machine learning. Actually, it involves tasks like handling missing values, outlier treatment, scaling, encoding categorical variables, and feature selection, all of which are necessary to ensure the data is clean, standardized, and suitable for training models. Proper data preprocessing enhances model accuracy and performance 31 32 .

Developing the ML cassifiers

The survey dataset possesses the different demographic profiles of the people of Bangladesh. The responses are basically about how they experience certain physiological & neurological factors before & after suffering from COVID-19. So, taking the demographic profile parameters & before the experience of a factor as the independent variable & after the experience of that particular factor as the target variable, Machine Learning algorithms can predict the level of after expertise. In this research, two types of ML models are used to predict the level of health factors: Traditional ML models and Ensemble ML models, because of the use of these models in previous research articles 33 . The generalization capacity of an ensemble, which comprises numerous learners, is significantly stronger than that of individual weak learners 34 . Some of the traditional ML models are Random Forest, Decision Tree, etc. Ensemble ML models have been used in the prediction like Adaboost, Gradient Boosting, etc. our methodological implementation is presented in Algorithm 1. The outlined process in the provided text represents a comprehensive workflow for the analysis of post-COVID anxiety effects using machine learning and data analysis techniques. It begins with the reprocessing of data, including handling null values, type conversion, and normalization to ensure the dataset’s quality and consistency. The subsequent steps involve feature analysis through methods like Ordinary Least Squares (OLS), Chi-Square, and Pearson Correlation to identify the most relevant variables. The dataset is then split into training and testing portions, with 70% allocated for training the machine learning models. Four different models, including Random Forest, Decision Tree, AdaBoost, and Gradient Boost, are utilized, each with a set of parameters for evaluation. Performance metrics, including accuracy, precision, recall, and F1 score, are calculated to assess the models’ effectiveness. Confusion metrics and ROC curves are generated for a deeper understanding of model performance. Feature importance is analyzed using multiple methods, and the most influential features for post-COVID anxiety effects are recommended based on the SHAP Value analysis of the model with the highest accuracy. This workflow represents a systematic and data-driven approach to understanding the impact of post-COVID anxiety on individuals. Regenerate

During training, the Decision Tree classifier got a higher ROC curve area of 0.95. On the other hand, Adaboost classifier gains a lower ROC curve area of 0.68. We also present individual class performance with ROC curve area analysis in Fig. 9 .

The description of the development of a total 4 ML classifiers is discussed in detail:

A well-known ensemble learning technique called AdaBoost combines weak learners to produce strong learners. Each weak learner in AdaBoost receives training on some of the training data and is given a weight depending on accuracy. A weighted mixture of the poor learners–with more weights given to the more accurate ones–makes up the final model.

Here, all training examples are given identical weights in the AdaBoost model, which is generated with default hyperparameters of a maximum depth of 1 for weak learners and 50 estimators. In succeeding weak learners, the weights of misclassified instances are incrementally increased.

GradientBoost

A strong predictive model is produced using the effective machine learning technique gradient boost by combining many weak models. It is a specific kind of ensemble learning method that functions by repeatedly adding new models to the ensemble while repairing the flaws in the prior models. GradientBoost is especially useful for regression and classification issues with huge datasets and high-dimensional feature spaces since it use gradient descent to optimize the loss function.

Here, the gradient boost model is built with default hyperparameters such as decision trees as the base estimator, a learning rate of 0.1, a maximum depth of 3 for the trees, and 100 estimators.

Decision tree

A well-liked supervised learning approach for both classification and regression analysis is decision trees. The method works by dividing the feature space into subsets according to the values of the input characteristics, resulting in a model that resembles a tree and is simple for people to understand. Each leaf node of the tree represents a projected output value, and each internal node reflects a choice based on a feature value.

Here, the decision tree model is built with default hyperparameters such as the Gini impurity criterion for measuring the quality of splits, no limits on the maximum depth or number of leaf nodes, and no constraints on the minimum number of samples required to split an internal node or form a leaf node.

Random forest

Random Forest is an ensemble learning algorithm that builds multiple decision trees and combines them to produce more accurate predictions. Each decision tree is trained on a randomly selected subset of the training data and a random subset of the input features, ensuring diversity among the trees. The final prediction is made by averaging the predictions of all the individual trees, resulting in a more robust and accurate model that is less prone to overfitting than a single decision tree.

Here, the random forest model is built with default hyperparameters such as the Gini impurity criterion for measuring the quality of splits, several decision trees equal to 100, and a maximum depth of each tree equal to None (unlimited).

Overall parameter tuning for each model is presented in the Table 4 .

Result analysis: ML models to predict post-COVID-19 health factors

The data is split and we used 70% data for training and took 521 samples for testing. All the models are trained on training data. In training the model, a default hyperparameter is used. Then the models were tested on test data subsequently. Lastly, a comparison is made among the ML models test prediction which depicts a picture of a more accurate ML model for specific factors after the experience.

In Table 5 , we depict the four best predictive models used for testing across various factors, along with the evaluation of their performance parameters. These parameters encompass Confusion Metrics such as Accuracy, Precision, Recall, Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), R-squared (r2), and F1 Score.

Table 6 provides a detailed overview of the training performance for each target class. In Table 7 , we present a corresponding breakdown of the testing performance for each target class.

Performance analysis

Training performance analysis.

In order to thoroughly evaluate the performance of our machine learning method, we allocated a significant portion, namely 70%, of the available data for testing. Among the algorithms we employed, the Decision Tree model stood out as the top performer, boasting an impressive accuracy metric result of 93.84%. In contrast, the Gradient Boost and Ada Boost algorithms exhibited slightly lower accuracy scores when compared to the Decision Tree. To provide a comprehensive understanding of the model’s performance, we additionally reported key metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and the R-squared (r2) score, shedding light on aspects beyond simple accuracy. Furthermore, to gain a deeper insight into the model’s classification capabilities, we presented the results of the Confusion Matrix, offering a more granular perspective on its training performance. For a visual summary of the overall performance, please refer to the Overall training performance is shown in Table 5 . And performance for each class is shown in Table 6 .

Testing performance analysis

In order to assess the machine learning method’s performance, we reserved 30% of the data for evaluation purposes. Based on the result as shown in Tables 5 and 6 , we can say that the Decision Tree model outperformed the others, achieving an impressive accuracy score of 92.70%. In contrast, both the Gradient Boost and Ada Boost algorithms exhibited slightly lower accuracy scores when compared. To provide a more comprehensive evaluation, we also reported metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and the R-squared (r2) score. Furthermore, we delved deeper into the testing performance by presenting the results of the Confusion Matrix, offering a more detailed insight into the model’s classification performance. Overall testing performance is shown in Table 5 . And performance for each class is shown in Table 7 .

Performance analysis using confusion matrix

A confusion matrix is a vital tool in machine learning, especially for classification tasks. It’s a matrix summarizing how well a classification algorithm performs, giving insights into its accuracy in predicting true data classes. Rows represent actual classes, while columns represent the model’s predictions. This tool is crucial for evaluating model performance by calculating key metrics like accuracy, precision, recall, and F1-score. These metrics offer a deeper understanding of the model’s strengths and weaknesses, facilitating model refinement and enhancement. Its significance is paramount in evaluating machine learning effectiveness, especially in scenarios involving imbalanced classes or specific error types. In Fig.  6 , The testing performance of four selected models is shown using the Confusion Matrix. Based on the Confusion Matrix result we can conclude that the Decision tree model performs better than all other models.

figure 6

Confusion matrix testing performance for anxiety after Covid-19.

Computational time analysis

The Table 8 displays the computational time required for training and testing various models. It demonstrates the time complexity associated with each model. According to the data, the decision tree model requires slightly less time compared to the other models. Decision trees are simpler than random forests. Decision trees consolidate decisions, while random forests combine multiple decision trees. Random forests are slower but more comprehensive, whereas decision trees are fast and efficient.

ROC curve analysis

ROC curves shown in Figs. 7 and 8  provide a powerful and intuitive means to assess binary or categorical classification model performance. This ROC curve offers a visually interpretable representation of a model’s ability to discriminate between positive and negative cases, facilitating easy model comparison and selection. ROC curves are robust to class imbalance and varying class prior probabilities, offering insights even in challenging dataset scenarios. The Area Under the ROC Curve (AUC) condenses overall model performance into a single scalar metric, simplifying model evaluation and ranking. Moreover, ROC curves are applicable to a wide range of classification algorithms, aiding transparency, interpretability, and informed decision-making, especially in fields like medicine and diagnostics where sensitivity and specificity trade-offs are critical.

figure 7

Training ROC curve for anxiety after Covid-19.

figure 8

Testing ROC curve for anxiety after Covid-19.

During the testing phase, the Decision Tree classifier achieved a higher ROC curve area of 0.95, while the Adaboost classifier attained a lower ROC curve area of 0.68 (Fig. 8 ). Additionally, we provide an analysis of individual class performance using ROC curve area metrics in Fig.  8 .

Important feature analysis of best models

Examining critical facets within machine learning offers several benefits, including enhanced model comprehension, increased efficiency through feature selection, improved performance, data-driven decision-making, optimized resource allocation, pattern recognition, and adherence to regulatory requirements. Furthermore, this analysis validates expertise in the relevant field, identifies potential biases, aids in model explanation, and fosters adaptability to changing circumstances. we analyze important features of best-performed model with Gini Index, Information Gain, and Classification Permutation. Figure 9 , 10 and 11 present the exploration of important factors anxiety for the post-COVID-19 effect.

Important feature exploration using Gini Index for the best model : The Gini Index, often used in decision tree algorithms, serves the purpose of quantifying the impurity or disorder within a set of data points within a specific class. It provides a measure of how frequently a randomly chosen element would be misclassified in terms of its class label if it were randomly assigned a label based on the distribution of class labels in the data subset. In the context of decision trees, the Gini Index is employed as a criterion for selecting the best feature to split data on, aiming to minimize this impurity. A lower Gini Index indicates a purer node with data points predominantly belonging to a single class, making it a valuable tool for guiding the creation of decision tree nodes that effectively partition data into more homogenous subsets, leading to better classification performance.

Based on the Gini analysis as shown in Figs.  10 a, 11 a, 12 a and 13 a, we clearly show that the decision tree, Gradient boost, Adaboost and Random forest algorithm give the most priority to features 0, 1 and 12.

Important feature exploration using information Gain for the best model : The purpose of Information Gain in the context of decision trees and feature selection is to quantify how much knowledge or reduction in uncertainty a particular feature provides when used to split a dataset. It measures the difference in entropy (or impurity) between the original dataset and the subsets created by splitting the data based on that feature. By selecting the feature with the highest Information Gain, decision tree algorithms aim to identify the feature that can separate the data into more homogenous or pure subsets, leading to more effective and accurate classification or regression models. Essentially, Information Gain helps decision trees make informed choices about which features to use as decision criteria, facilitating the creation of a tree structure that best represents the underlying data patterns.

Based on the Information gain analysis, Figs.  10 b, 11 b, 12 b and 13 b, we clearly show that the decision tree, Gradient boost, Adaboost and Random forest algorithm give the most priority to features 0, 1 and 12.

Important feature exploration using classification permutation for the best model : The purpose of Feature Importance by classification permutation is to assess the relative significance of individual features in a machine-learning classification model. It achieves this by systematically shuffling the values of a single feature while keeping all other features constant and then measuring the resulting drop in the model’s performance metric (typically accuracy or F1 score). Features that, when shuffled, cause a significant decrease in model performance are considered important, as they carry valuable information for making accurate predictions. This method helps practitioners identify which features contribute most to the model’s predictive power, aiding in feature selection, model interpretation, and improving overall model performance by focusing on the most informative attributes.

Based on the Permutation feature analysis, Figs.  10 c, 11 c, 12 c and 13 c, we clearly show that the Decision tree, Gradient boost, Adaboost and Random forest algorithm give the most priority to features 0 and 1.

Noted that, all covid patient Features are labeled as 0: Gender, 1: Age and 12: Vaccination Status.

figure 9

AdaBoost important feature analysis for anxiety after Covid-19.

figure 10

Gradient boost important feature analysis for anxiety after Covid-19.

figure 11

Decision tree important feature analysis for anxiety after Covid-19.

figure 12

Random forest important feature analysis for Anxiety After Covid-19.

figure 13

SHAP value analysis for decision tree algorithm.

Important feature analysis of trained models based on SHAP value

SHAP (SHapley Additive exPlanations) value analysis shown in Fig. 13 offers several notable advantages in the realm of model interpretability and feature analysis. One key advantage is its ability to provide a clear and intuitive understanding of how individual features influence the predictions of machine learning models. By assigning importance scores to each feature, SHAP values allow data scientists and stakeholders to pinpoint the most influential factors behind model outcomes, facilitating informed decision-making and actionable insights.

Furthermore, SHAP values ensure consistency in attribution, meaning that the sum of SHAP values for all features equals the difference between the model’s prediction and the expected (average) prediction. This consistency lends credibility to the interpretability of the analysis and ensures that the contributions of each feature align with the overall prediction.

Moreover, SHAP values offer interpretability across a wide range of machine learning models, including complex algorithms like gradient boosting and deep neural networks. This versatility makes SHAP a valuable tool in various domains, from healthcare to finance, where model transparency and trust are paramount.

Finally, SHAP values can be visualized in multiple ways, including summary plots, force plots, and dependence plots, making it accessible for both technical and non-technical stakeholders. These visualizations enhance the communication of model insights and contribute to more effective collaboration between data scientists and domain experts. In summary, SHAP value analysis significantly advances the field of model interpretability by offering transparency, consistency, and versatility in understanding the driving forces behind machine learning predictions.

Based on the SHAP value analysis of the Decision Tree algorithm, feature 12, feature 0 and feature 1 are the most important features. All covid patient Features are labelling as 0: Gender, 1: Age and 12: Vaccination Status.

Comparative analysis

In the comparative analysis, we compared our model with state-of-the-art. We compared our method with some relevant existing methods. Based on the Table 9 , our method obtains higher accuracy and handles more target classes.

Novely of our research

Our research brings the following contributions to the field:

Comprehensive health factor analysis : One of the main contributions of our research is the comprehensive analysis of 17 significant health factors associated with COVID-19. These factors encompass both Physiological and Neurological aspects, providing a holistic view of the health complications linked to the disease. This extensive factor analysis is crucial in understanding the multifaceted impact of COVID-19 on individuals’ health.

Longitudinal surveys for pre-and post-illness assessment : We have undertaken a distinctive approach by conducting surveys with individuals who have recovered from COVID-19. These surveys assess their health conditions both before and after the illness, creating a longitudinal perspective. This approach enables us to track the progression of health complications, which is a novel aspect of our research.

Rigorous statistical analysis : Our research stands out for its rigorous statistical analysis of the survey data. By subjecting the data to an in-depth statistical examination, we unveil how each of the 17 health factors independently influences patients. This analytical rigor provides a deeper understanding of the individual and collective impact of these factors.

Effective predictive models : Our study identifies the Decision Tree algorithm as the most effective predictive model for evaluating the influence of health factors, specifically in predicting anxiety levels. This contribution enhances the precision and reliability of post-COVID-19 health assessments.

Innovative feature analysis methods : In the final stage of our research, we employ a variety of innovative methods to identify key features in the post-effects of the best-performing machine learning model. These methods include feature importance analysis, Gini index, information gain, feature importance permutation, and SHAP value analysis. This feature analysis approach adds a novel dimension to the understanding of post-COVID-19 health outcomes.

Implication of the research

The implications of our research extend to following areas that offer valuable insights and potential benefits for healthcare, public health, and research endeavors:

Improved post-COVID-19 patient care : Our research provides a deeper understanding of the health complications associated with COVID-19, allowing healthcare providers to offer more tailored and effective care to individuals recovering from the disease. Identifying the key health factors and their impact can aid in personalized treatment plans.

Early intervention and monitoring : The longitudinal approach in our study allows for the early identification of health complications that may arise post-COVID-19. This early detection can lead to timely interventions and monitoring to prevent or mitigate the severity of these complications.

Resource allocation : Health systems can use our findings to allocate resources more effectively. By understanding the specific health factors that influence patients, healthcare facilities can allocate resources to address the most pressing needs, optimizing patient care.

Public health planning : Public health authorities can benefit from our research in planning and implementing post-COVID-19 health strategies. Understanding the factors contributing to health complications can inform public health policies and interventions to support affected individuals.

Research advancements : Our research contributes to the growing body of knowledge about the long-term effects of COVID-19. It provides a basis for further research and investigations into the intricacies of post-COVID-19 health, fostering a better understanding of this emerging field.

Machine learning applications : The effectiveness of the Decision Tree algorithm in predicting anxiety levels and the innovative feature analysis methods can inspire future machine learning applications in healthcare and predictive modeling.

Our research focused on analyzing major health complications related to COVID-19, identifying 17 significant health factors categorized as Physiological and Neurological. We conducted surveys with recovered COVID-19 patients to assess the impact of these factors on their health before and after their illness. We rigorously analyzed the survey data, examining the independent influence of each factor and their interconnections. We chose the most important feature named Anxiety from the outcome of the survey study frequency. Among four ML models, the Decision Tree algorithm demonstrated the highest accuracy in predicting anxiety levels, which was our primary objective. Finally, we identified key features in the post-effects of the best-performing machine learning model through various methods, providing valuable insights into post COVID-19 effects.

Post-COVID traumas have both mental and physical effects, significantly impacting patients’ lives. Depression doubled from 20% to 37%, while vigilance dropped from 60% to 16.67%, impulsiveness decreased from 33.33% to 20%, and determination fell from 60% to 20%. Confidence levels plummeted from 56.67% to 20%, and energy levels declined from 56.67% to 16.67%. Relationships exist among factors like chest pain and unhappiness, sleep and attentiveness, with forgetfulness having connections with almost all other factors. Additionally, there are direct or inverse relationships among various factors, with depression and forgetfulness showing a direct relationship (p-value = 0.678515), and anxiety and energy displaying an inverse relationship (p-value = -0.18056).

This signifies that a COVID survivor suffering more anxiety will most probably feel less energetic. Lastly, we discovered the best predictive ML models to predict the degree of impact on post-COVID-19 health factors. It is observed that our developed Decision Tree model showed the highest accuracy(0.9384) to predict the degree of impact in case of Anxiety in a post-COVID individual. Similarly, developed Decision Tree models were also identified as the most accurate model in predicting the degree of impact in case of Anxiety. In summary, different predictive machine learning models showed a definite accuracy in predicting the degree of impact of various factors in post-COVID-19 individuals.

Data availibility

The datasets generated and analyzed during the current study are publicly available in our GitHub repository link at https://github.com/shafiq-islam-cse/Data---Exploring-Post-COVID-19-Health-Effects-and-Features-with-Advanced-Machine-Learning-Techniques .

Shanbehzadeh, S., Tavahomi, M., Zanjari, N., Ebrahimi-Takamjani, I. & Amiri-Arimi, S. Physical and mental health complications post-Covid-19: Scoping review. J. Psychosom. Res. 147 , 110525 (2021).

Article   PubMed   PubMed Central   Google Scholar  

Matsumoto, K., Hamatani, S., Shimizu, E., Käll, A. & Andersson, G. Impact of post-Covid conditions on mental health: A cross-sectional study in Japan and Sweden. BMC Psychiatry 22 , 237 (2022).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Ahmed, G. K. et al. Long term impact of Covid-19 infection on sleep and mental health: A cross-sectional study. Psychiatry Res. 305 , 114243 (2021).

Benedetti, F. et al. Brain correlates of depression, post-traumatic distress, and inflammatory biomarkers in Covid-19 survivors: A multimodal magnetic resonance imaging study. Brain Behav. Immunity-Health 18 , 100387 (2021).

Article   CAS   Google Scholar  

Ahmed, M. et al. Post-Covid-19 memory complaints: Prevalence and associated factors. Neurologia (2022).

Tarsitani, L. et al. Post-traumatic stress disorder among Covid-19 survivors at 3-month follow-up after hospital discharge. J. Gen. Intern. Med. 36 , 1702–1707 (2021).

Hu, F. et al. Has covid-19 changed china’s digital trade?—implications for health economics. Front. public health 10 , 831549 (2022). 

Article   Google Scholar  

Satu, M. S. et al. Covid-hero: Machine learning based Covid-19 awareness enhancement mobile game for children. In International Conference on Applied Intelligence and Informatics , 321–335 (Springer, 2021).

Li, J. et al. How nursing students’ risk perception affected their professional commitment during the covid-19 pandemic: the mediating effects of negative emotions and moderating effects of psychological capital. Humanit. Soc. Sci. Commun. 10 , 1–9 (2023).

Yousif, M. G., Hashim, K. & Rawaf, S. Post Covid-19 effect on medical staff and doctors’ productivity analysed by machine learning. Baghdad Sci. J. 20 , 1507–1507 (2023).

Shin, H. et al. The adverse effects and nonmedical use of methylphenidate before and after the outbreak of Covid-19: Machine learning analysis. J. Med. Internet Res. 25 , e45146 (2023).

Ma, L., Graham, D. J. & Stettler, M. E. Using explainable machine learning to interpret the effects of policies on air pollution: Covid-19 lockdown in London. Environmental Science & Technology (2023).

Alqarni, A. & Rahman, A. Arabic tweets-based sentiment analysis to investigate the impact of Covid-19 in KSA: A deep learning approach. Big Data and Cognitive Computing 7 , 16 (2023).

Baker, T. B. et al. A machine learning analysis of correlates of mortality among patients hospitalized with Covid-19. Sci. Rep. 13 , 4080 (2023).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Sun, K.-X., Ooi, K.-B., Tan, G. W.-H. & Lee, V.-H. Enhancing supply chain resilience in smes: A deep learning-based approach to managing Covid-19 disruption risks. J. Enterprise Inf. Manage. (2023).

Turón, A., Altuzarra, A., Moreno-Jiménez, J. M. & Navarro, J. Evolution of social mood in Spain throughout the Covid-19 vaccination process: A machine learning approach to tweets analysis. Public Health 215 , 83–90 (2023).

Article   PubMed   Google Scholar  

Acharya, A., Aryan, A., Saha, S. & Ghosh, A. Impact of Covid-19 on the human personality: An analysis based on document modeling using machine learning tools. Comput. J. 66 , 963–969 (2023).

Amole, A., Oladipo, S., Ighravwe, D., Makinde, K. & Ajibola, J. Comparative analysis of deep learning techniques based Covid-19 impact assessment on electricity consumption in distribution network. Nigerian J. Technol. Dev. 20 , 23–46 (2023).

Khidir, H. A., Etikan, İ, Kadir, D. H., Mahmood, N. H. & Sabetvand, R. Bayesian machine learning analysis with Markov chain Monte Carlo techniques for assessing characteristics and risk factors of covid-19 in erbil city-iraq 2020–2021. Alex. Eng. J. 78 , 162–174 (2023).

Kim, H. W., McCarty, D. & Jeong, M. Examining commercial crime call determinants in alley commercial districts before and after Covid-19: A machine learning-based shap approach. Appl. Sci. 13 , 11714 (2023).

Blette, B. S. et al. Causal Bayesian machine learning to assess treatment effect heterogeneity by dexamethasone dose for patients with covid-19 and severe hypoxemia. Sci. Rep. 13 , 6570 (2023).

Almeqren, M. A., Almuqren, L., Alhayan, F., Cristea, A. I. & Pennington, D. Using deep learning to analyze the psychological effects of Covid-19. Frontiers in Psychology 14 (2023).

Ma, S., Li, S. & Zhang, J. Spatial and deep learning analyses of urban recovery from the impacts of Covid-19. Sci. Rep. 13 , 2447 (2023).

Hu, F., Ma, Q., Hu, H., Zhou, K. H. & Wei, S. A study of the spatial network structure of ethnic regions in northwest china based on multiple factor flows in the context of covid-19: Evidence from ningxia. Heliyon 10 (2024).

García-Sánchez, C. et al. Neuropsychological deficits in patients with cognitive complaints after Covid-19. Brain Behav. 12 , e2508 (2022).

Sher, L. Post-Covid syndrome and suicide risk. QJM: Int. J. Med. 114 , 95–98 (2021).

Pistarini, C. et al. Cognitive and emotional disturbances due to Covid-19: An exploratory study in the rehabilitation setting. Front. Neurol. 500 (2021).

Crispo, A. et al. Strategies to evaluate outcomes in long-Covid-19 and post-Covid survivors. Infect. Agents Cancer 16 , 1–20 (2021).

Vadivel, R. et al. Mental health in the post-Covid-19 era: Challenges and the way forward. Gen. Psychiatry 34 (2021).

Orrù, G. et al. Long-covid syndrome? a study on the persistence of neurological, psychological and physiological symptoms. In Healthcare , 9 , 575 (MDPI, 2021).

Rahman, A. Statistics-based data preprocessing methods and machine learning algorithms for big data analysis. Int. J. Artif. Intell. 17 , 44–65 (2019).

Google Scholar  

Aggarwal, V., Gupta, V., Singh, P., Sharma, K. & Sharma, N. Detection of spatial outlier by using improved z-score test. In 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI) , 788–790 (IEEE, 2019).

Imtiaz Khan, N., Mahmud, T. & Nazrul Islam, M. Covid-19 and black fungus: Analysis of the public perceptions through machine learning. Eng. Rep. 4 , e12475 (2022).

Article   CAS   PubMed   Google Scholar  

Zhang, C. & Ma, Y. Ensemble Machine Learning: Methods and Applications (Springer, 2012).

Gupta, A., Jain, V. & Singh, A. Stacking ensemble-based intelligent machine learning model for predicting post-Covid-19 complications. N. Gener. Comput. 40 , 987–1007 (2022).

Ahamad, M. M. et al. Adverse effects of Covid-19 vaccination: Machine learning and statistical approach to identify and classify incidences of morbidity and postvaccination reactogenicity. In Healthcare 11 , 31 (MDPI, 2022).

Shakhovska, N., Yakovyna, V. & Chopyak, V. A new hybrid ensemble machine-learning model for severity risk assessment and post-Covid prediction system. Math. Biosci. Eng. 19 , 6102–6123 (2022).

Article   MathSciNet   PubMed   Google Scholar  

Abbaspour, S. et al. Identifying modifiable predictors of Covid-19 vaccine side effects: A machine learning approach. Vaccines 10 , 1747 (2022).

Download references

Author information

Authors and affiliations.

Department of Computer Science and Engineering, Military Institute of Science and Technology, Mirpur Cantonment, Dhaka, 1216, Bangladesh

Muhammad Nazrul Islam, Md Shofiqul Islam, Nahid Hasan Shourav & Iftiaqur Rahman

Department of Computer Science and Engineering, Green University of Bangladesh, Dhaka, Bangladesh

Faiz Al Faisal

Department of Computer Science and Engineering, United International University, Dhaka, 1212, Bangladesh

Md Motaharul Islam

School of Science, Edith Cowan University, Perth, WA, 6027, Australia

Iqbal H. Sarker

You can also search for this author in PubMed   Google Scholar

Contributions

The idea of this article was developed by M.N.I. and N.H.S.; Literature review was conducted by I.R.; Data acquisition and pre-processing was carried out by M.N.I., N.H.S. and F.A.F.; M.S.I and I.H.S. analysed the data and results. M.S.I, I.R, M.M.I. and F.A.F. prepared the first draft of the article, while M.N.I, M.M.I. and F.A.F finalize the manuscript to prepare it for publication. All authors read, edited, and approved the final manuscript.

Corresponding author

Correspondence to Muhammad Nazrul Islam .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Islam, M.N., Islam, M.S., Shourav, N.H. et al. Exploring post-COVID-19 health effects and features with advanced machine learning techniques. Sci Rep 14 , 9884 (2024). https://doi.org/10.1038/s41598-024-60504-w

Download citation

Received : 03 December 2023

Accepted : 23 April 2024

Published : 30 April 2024

DOI : https://doi.org/10.1038/s41598-024-60504-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Machine learning
  • Pearson’s coefficient

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

literature review example machine learning

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Enhanced PRIM recognition using PRI sound and deep learning techniques

Roles Investigation, Visualization, Writing – review & editing

* E-mail: [email protected] (SMHA); [email protected] (AM)

Affiliation Department of Electrical Engineering, Shiraz Branch, Islamic Azad University, Shiraz, Iran

ORCID logo

Roles Formal analysis

Roles Conceptualization

Affiliation Department of Electrical Engineering, Imam Khomeini Marine Science University, Nowshahr, Iran

  • Seyed Majid Hasani Azhdari, 
  • Azar Mahmoodzadeh, 
  • Mohammad Khishe, 
  • Hamed Agahi

PLOS

  • Published: May 1, 2024
  • https://doi.org/10.1371/journal.pone.0298373
  • Reader Comments

Fig 1

Pulse repetition interval modulation (PRIM) is integral to radar identification in modern electronic support measure (ESM) and electronic intelligence (ELINT) systems. Various distortions, including missing pulses, spurious pulses, unintended jitters, and noise from radar antenna scans, often hinder the accurate recognition of PRIM. This research introduces a novel three-stage approach for PRIM recognition, emphasizing the innovative use of PRI sound. A transfer learning-aided deep convolutional neural network (DCNN) is initially used for feature extraction. This is followed by an extreme learning machine (ELM) for real-time PRIM classification. Finally, a gray wolf optimizer (GWO) refines the network’s robustness. To evaluate the proposed method, we develop a real experimental dataset consisting of sound of six common PRI patterns. We utilized eight pre-trained DCNN architectures for evaluation, with VGG16 and ResNet50V2 notably achieving recognition accuracies of 97.53% and 96.92%. Integrating ELM and GWO further optimized the accuracy rates to 98.80% and 97.58. This research advances radar identification by offering an enhanced method for PRIM recognition, emphasizing the potential of PRI sound to address real-world distortions in ESM and ELINT systems.

Citation: Hasani Azhdari SM, Mahmoodzadeh A, Khishe M, Agahi H (2024) Enhanced PRIM recognition using PRI sound and deep learning techniques. PLoS ONE 19(5): e0298373. https://doi.org/10.1371/journal.pone.0298373

Editor: Jude Hemanth, Karunya Institute of Technology and Sciences, INDIA

Received: October 5, 2023; Accepted: January 24, 2024; Published: May 1, 2024

Copyright: © 2024 Hasani Azhdari et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Data contain potentially military information, data are owned by Imam Khomeini University of Marine Sciences – Nowshahr. Data will be available upon reasonable request to interested qualified researchers by [email protected] . The email is related to Dr. Fallah Mohammadzadeh, the head of Artificial Intelligence Research Center of Imam Khomeini University of Marine Sciences. The data will be available by requesting the data through his email. The phone number for requesting data is also available as follows: 09188429780.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

The topic of automation holds significant importance in contemporary ELINT and ESM systems [ 1 ]. The increasing intricacy of electronic warfare (EW) situations is the underlying cause. In order to achieve this objective, it is necessary to de-interleave the interleaved pulses emitted by various radars. Subsequently, each radar signal must be accurately identified and subjected to analysis in an automated manner without any manual intervention.

PRIMs are a significant component of radar signal analysis since they offer crucial insights into the origin of radiation and possible hazards [ 2 – 4 ]. PRIMs play a pivotal role in signal processing and serve as a fundamental point of reference for discerning the origin of radiation inside such systems [ 2 ]. Furthermore, radar warning receivers (RWR) and jammers employ pulse repetition interval (PRI) characteristics [ 5 , 6 ].

PRIM is a significant challenge in radar signal processing inside ELINT and ESM systems. In the field of radar technology, it is commonly observed that there are six primary types of PRIM techniques employed. These techniques include simple, stagger, jitter, dwell and switch (D&S), periodic, and sliding modulations [ 6 ]. Fig 1 depicts different forms of PRIMs, visually representing the variety in modulation types and their respective patterns [ 2 ].

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0298373.g001

Table 1 lists several common variations of PRI, as referenced in the identified source, providing detailed insights into each type’s unique characteristics and specifications. The primary objective of the ESM and ELINT analyst is to classify radar emitters by analyzing changes in PRI, which is contingent upon specific PRI variations associated with each category [ 6 ].

thumbnail

https://doi.org/10.1371/journal.pone.0298373.t001

Researchers have achieved notable advancements in PRIM recognition in the past several years by developing algorithms and methodologies [ 7 , 8 ]. Recognition of diverse PRIM signals presents considerable difficulties owing to their intricate waveforms and the diversity observed in operational circumstances. The limitations of conventional signal processing techniques sometimes need to be revised to ensure the accuracy of detecting and identifying PRIMs. The primary constraints mainly arise from the challenges of developing algorithms capable of effectively managing the diverse range of PRIM variables, including pulse repetition frequency (PRF), pulse width (PW), and modulation methods. In addition, the intricate nature and various characteristics of PRIMs provide considerable obstacles in advancing machine learning (ML) methods designed for PRIM identification. A significant challenge during ML methods requires much training data. This task proves arduous in numerous real-world scenarios. Another obstacle is developing algorithms that exhibit resilience due to alterations in PRIM variables, such as PRF fluctuations or signal to noise ratio (SNR) variations. Resolving these issues is essential in advancing precise and resilient PRIM recognition techniques for contemporary radar systems employing a diverse array of PRIMs [ 9 ].

Hence, the objective of this investigation is to make a scholarly contribution to the domain of radar signal processing, an innovative methodology for identifying and classifying various types of PRIMs inside radar signals that are contaminated by noise. This study employs a deep learning-based approach to effectively detect PRIMs by utilizing the capabilities of DCNNs [ 10 ].

This study suggests using an ELM instead of the fully connected layer to provide a real-time processor [ 11 – 13 ]. Utilizing a DCNN automated feature, in conjunction with ELMs, can effectively tackle the problems associated with manual feature extraction and the elongation of training time in the proposed two-phase technique.

The Random Vector Functional Link (RVFL) [ 14 ] is utilized to establish the foundation of the ELM, resulting in a highly efficient and adaptable system [ 12 , 13 ]. Research shows that engineering applications use ELM regularly [ 15 – 17 ]. However, it should be noted that there are indeed obstacles associated with ELM [ 18 , 19 ], such as the need for many hidden nodes to provide better generalization and the requirement to choose appropriate activation functions.

ELMs, in contrast, strive to minimize training errors and ascertain the minimum norm of output weights. To mitigate the impact of poorly conditioned matrices on the accuracy of results, the input weights and biases employed in ELM are selected randomly. Consequently, the resulting matrix may not accurately represent the total column rank [ 17 , 20 , 21 ]. Therefore, this work utilizes a GWO algorithm to improve ELM’s conditioning and ensure optimal solutions are attained. This research suggests replacing the final conventional fully connected layer in DCNN with ELM for real-time training and testing purposes. A GWO algorithm is presented as a solution to the issues of ill-conditioning and inconsistency faced by the classical ELM. The proposed technique aims to provide real-time structure and high-accuracy detection.

GWO is a leading optimization application due to its excellent performance and flexibility. GWO was founded by examining the predatory actions of grey wolves [ 21 , 22 ]. The concept is straightforward and can be easily implemented with little lines of code, making it accessible to many users. GWO has superior robustness in parameter regulation compared to other evolutionary algorithms, resulting in enhanced computing efficiency [ 21 , 23 ]. Therefore, this work aims to utilize GWO as an alternative optimization technique for the ELM and integrate it into the PRIM system.

The methodology employed in the present investigation is grounded on prior scholarly investigations. It seeks to address many noteworthy issues, such as the presence of missing and spurious pulses and the vast array of characteristics exhibited by various PRIMs. The study’s experimental results demonstrate the effectiveness and robustness of our methodology, as evidenced by the achievement of significant levels of accuracy when applied to various radar signals.

The primary outcomes of this study are as follows

  • The present study introduces a novel approach consisting of a three-phase methodology. This methodology utilizes a transfer learning-based DCNN as a feature extractor, an ELM for real-time recognition of the six often occurring forms of PRIM, and a GWO algorithm to improve the network’s resilience and stability.
  • This study introduces using PRI sound to identify its modulation for the first time. The utilized data sets consist of authentic data obtained through designing, constructing, and deploying the necessary system in a region characterized by a significant concentration of radar signals.
  • The efficacy of eight distinct benchmark transfers learning-based DCNNs is initially evaluated on the given dataset.
  • In addition, an examination and assessment of the efficacy of integrating eight variations of TDCNN with ELM are conducted on the given dataset.
  • In addition, the two transfer learning-based DCNN-ELM networks that yielded the most optimal outcomes on the dataset are also chosen and integrated with the GWO algorithm after scrutinizing and assessing the same dataset.
  • The results showed that VGG16 and ResNet50V2 models obtained the best recognition accuracy with values of 95.38% and training time of 38.92 seconds and 96.92% and training time of 442.75 seconds, respectively. These values increased to 98.46%, a training time of 60.97 seconds, and a 99.06% training time of 276.4 seconds with the evolution of these networks with ELM and GWO, respectively.

The structure of the paper is as follows: In Section 2, you will find a comprehensive overview of the literature currently available in the field. Section 3 critically examines the background knowledge relevant to the study. Section 4 provides an introduction to the hybrid model that is being proposed. Section 5 presents the simulation, outcomes, and discussion. Ultimately, the findings are briefly outlined in Section 6.

2. Literature review

The PRIM approach is a commonly utilized technique in radar relationships involving data modulation onto a radar signal’s PRI [ 24 ]. PRIM has garnered significant interest in contemporary times because it can deliver elevated data rates and ensure secure connection [ 25 ]. Nevertheless, PRIM signals are susceptible to several external factors, such as interference, noise, and jamming, all of which can potentially impact the overall effectiveness and efficiency of PRIM-based systems [ 26 ].

The available methodologies can be broadly classified into four distinct categories: statistical-based approaches, decision tree-based approaches, histogram-based approaches, and learning-based approaches [ 9 ].

As referenced in [ 27 – 30 ], most collaborative techniques rely on histogram operations. In the context of these methodologies, establishing an appropriate threshold typically emerges as the foremost pivotal concern. Additionally, the quantity of pulses must be sufficiently large in order to generate a well-defined histogram. Additionally, it is essential to consider practical considerations. In a previous study [ 30 ], the authors simulated 5,000 pulses emitted by three low PRF radars. It is worth noting that such scenarios are infrequently encountered in contemporary EW settings, primarily due to the radar antenna scan and the deficient side lobe levels.

Furthermore, it is imperative to consider the detrimental consequences that arise from the absence and erroneous occurrence of pulses during the recognition process. These phenomena significantly impact the majority of histogram-based algorithms. Due to their inherent simplicity, these techniques are limited in their applicability to a select range of PRIM schemes and exhibit notable performance degradation in the presence of noise.

Additional approaches can be observed in the relevant scholarly works. In [ 31 ], the authors treat every time of arrival (TOA) as an individual observation and each emitter as a distinct target. Utilizing a Kalman filter enables tracking individual emitter pulses and facilitates the prediction of forthcoming pulses. Despite the method’s capability to handle moderate levels of spurious and missing pulses, the simulated situation is limited in complexity as it encompasses only three forms of modulations: primary, jitter, and stagger.

Typically, scholars primarily studying PRIM recognition choose to employ feature-based methodologies. Like other classification issues, the initial phase in this process involves feature extraction. In this procedure, a limited to extensive number of characteristics are derived from an unadulterated signal, and those possessing the most excellent discriminatory capability are employed. Specific features include the capacity to effectively distinguish a single category from the rest, whereas others exhibit the ability to discern distinct groups of data accurately. In the methods described in [ 32 , 33 ], a collection of features is extracted through autocorrelation. However, Reference [ 34 ] employs five distinct parts, three of which exclusively differentiate a particular form of PRIM, while the remaining two features discriminate between the other types.

The analysis of PRI is conducted in [ 35 ] through the utilization of the decimated walsh-hadamard transform (WHT). The approach employed in this study is a threshold-based strategy, wherein only three specific modulation types are considered: simple, jitter, and stagger. These modulation types are further categorized into up to four levels. Further examination of the identification of various kinds of PRIs and the detrimental consequences of the absence and erroneous presence of pulses may be found in [ 36 ]. This study employs a hierarchical approach, incorporating features based on wavelet analysis and intuitive features. The works cited [ 37 – 40 ] also show the utilization of intuitive characteristics.

Another significant aspect of this framework is the classification methodology. Decision Tree has been employed as the classifier in various investigations, including those referenced [ 35 , 40 ] One of the primary limitations of decision-tree-based methodologies is the requirement for manually determined thresholds. This process is not only time-consuming but also highly susceptible to variations in noise levels and changes in PRI parameters [ 9 ].

The authors of [ 41 ] propose the utilization of a neural network classifier featuring a solitary hidden layer. This classifier is developed using a dataset including second differences in TOA arrival times. The authors of [ 37 ] present a feed-forward neural network architecture with an input layer with three distinct characteristics and a solitary hidden layer containing eight neurons. This methodology needs to be revised in categorizing and identifying only four different patterns of PRI change.

Acknowledging that the aforementioned learning-based strategies necessitate a comprehensive feature design and extraction procedure before utilizing the neural network is essential. This limitation hinders the approaches’ ability to adjust to variations in the pattern of PRI changes rapidly. Nevertheless, one notable benefit of intelligent techniques is their ability to identify several fundamental ways of PRI changes using different methodologies. Nevertheless, the drawbacks are comparable as they entail a substantial workload in data preprocessing and an inability to effectively adjust to environments characterized by a significant prevalence of missing and false pulses.

Deep learning (DL) has recently gained prominence as a formidable tool in various classification endeavors [ 42 – 44 ]. Notably, researchers have explored the application of DL in radar signal recognition. This is due to the inherent capability of DL to autonomously extract signal characteristics, leading to significant achievements in this domain. Various disciplines have been extensively explored, including image processing, speech recognition, and object detection [ 45 ].

The authors introduced a consolidated approach, deep learning-based multitasking learning (DMTL), to perform the five PRIM recognition of radar pulses [ 3 ]. The simulation findings indicate that the precision and accuracy of modulation recognition is 73.2%. This assessment considers an equal rate of 30% for spurious and missing pulses. The dataset comprises 10,000 samples representing five different PRI modulation types.

The authors in [ 46 ] proposed an attention-based recognition framework, known as the recurrent neural network (ARNN), for classifying pulse streams into six types of PRIMs. This framework is designed to handle high proportions of missing and spurious pulses. The simulation results demonstrate that this model achieves a PRM recognition accuracy of 92.18% while using attention and 89.56% without attention. This study involves a dataset of 240,000 data samples, with a false pulse rate of 70% and a missing pulse rate of 50%.

Scholars have recently investigated applying DCNNs in PRIM recognition. CNNs have demonstrated considerable efficacy across various applications owing to their inherent capacity to autonomously acquire and discern characteristics from unprocessed data [ 47 – 49 ].

The authors of [ 50 ] suggested a method based on DCNNs for recognizing seven different patterns of PRIM. The simulation results indicate that the total recognition accuracy is 96.1%, with a maximum of 50% lost pulses and 20% spurious pulses. The dataset has 25,000 samples encompassing all PRIM types.

Reference [ 9 ] introduces a unique technique based on DL. This technique utilizes a DCNN to classify seven distinct patterns of PRI changes. The simulation findings indicate that the overall recognition accuracy is approximately 96%, whereas the rates of missing and spurious pulses randomly range from 25% to 30%. The dataset consists of 3,000 samples for each PRI modulation type.

The paper introduces a novel approach that utilizes the inherent characteristic of the temporal convolutional network (TCN) [ 2 ]. The simulation findings demonstrate that this method can accurately categorize seven distinct variations of PRI modulation, even in the presence of a higher proportion of missing and false pulses (up to 30%). The suggested model can effectively differentiate between seven forms of PRI modulation with an accuracy of over 98%. The results are derived from a sample size of 40,000 tests, chosen randomly from a pool of seven distinct modulations, each having an equal likelihood of being selected.

In a subsequent study, the authors [ 51 ] presented a DCNN system for PRIM classification. The simulation results demonstrate that this method can accurately classify eight distinct kinds of PRIM, achieving an overall recognition accuracy of 98.5%. This performance is achieved even when there is a 15% ratio of missing pulses and a 15% ratio of spurious pulses. The dataset consists of 16,000 samples for each PRI modulation type.

The findings of various methods utilizing DL for the recognition and classification of PRIMs are summarized in Table 2 .

thumbnail

https://doi.org/10.1371/journal.pone.0298373.t002

One area for improvement is that existing techniques mainly rely on features, which may not fully capture PRIM patterns’ complex and diverse nature. Moreover, these methodologies may demonstrate a restricted level of resilience when confronted with noise and interference, potentially undermining the system’s overall effectiveness in practical situations. Hence, this research introduces a novel approach that employs a DCNN as a feature extractor, ELM for real-time identification of PRIM designs, and WGO to enhance the network’s robustness. The methodology presented in this study has been devised to tackle the inherent limitations of existing techniques. It aims to improve the accuracy and consistency of PRIM detection, particularly in the presence of noise and interference. In addition, the expansion of datasets to encompass a broader range of PRIM signals with varying SNRs could enhance the progress and evaluation of PRIM recognition techniques.

The DCNN algorithms do not incorporate preprocessing steps like signal preparation or feature extraction. The efficacy of DCNNs in PRIM recognition remains evident, even when confronted with significant instances of missing and spurious pulses. Nevertheless, it is crucial to acknowledge a significant detrimental impact overlooked in the analysis, namely the presence of substantial outliers resulting from radar antenna scanning [ 2 ].

All the methodologies above have employed simulated data throughout their training and evaluation processes. The training and evaluation processes for accurate data pose significant challenges and consume much time at each level. These challenges arise from the presence of missing pulses and unexpected spurious pulses. Consequently, users experience delays of several hours before receiving feedback on their selected model for the intended diagnosis. Moreover, it is essential to note that all of these methodologies necessitate the utilization of an extensive dataset across all training and evaluation phases. The proposed method incorporates using PRI sound from actual radar systems, a novel approach that has yet to be previously employed. This PRI sound is utilized throughout all training and evaluation phases, marking a significant advancement in the field.

3. Background knowledge

This part presents a comprehensive overview of the underlying fundamental principles and essential concepts about the PRI Sounds, CNNs, GWO, and ELM algorithms.

3.1 Pulse repetition interval sound

The historical technique of PRI analysis involves using a loudspeaker or headphones to perceive the pulse train’s sound audibly. This remains relevant and valuable in contemporary times. The significance of pulse stretch circuitry is underscored by the low-duty cycle shown by radar signals. Furthermore, constant amplitude pulses may be employed due to the potential confusion arising from wildly fluctuating amplitudes, as stated in reference [ 6 ].

One straightforward approach is concurrently monitoring an audio oscillator alongside the radar pulse sequence. The analyst aligns the tonal characteristics of the generator with those of the pulse train by detecting beats, similar to the process of tuning a musical instrument. Novice analysts may mistakenly configure the audio oscillator to a harmonic or subharmonic of the PRI. However, this error is infrequently seen once sufficient experience is gained. The analyzer gradually increases the sound volume until the rhythm note becomes audible. The beat note frequency is equivalent to the disparity between the audio oscillator frequency and the PRF, which can be calculated as the reciprocal of the PRI. The audio oscillator is adjusted by the analyst until the frequency of the beat note reaches zero, resulting in the disappearance of the beat. Under optimal conditions, the margin of error is around ±20 Hz, as this value represents the minimum threshold of human auditory perception. Scanning can provide additional mistakes that make perceiving the beat note more challenging [ 6 ].

Contemporary ELINT devices are engineered to generate auditory signals, even when the PRF exceeds the threshold of human auditory perception. The process involves the nonlinear mapping of the authentic PRF to generate a synthetic PRF sound. For instance, frequency ranges up to 1 kHz are faithfully replicated without alteration. According to [ 6 ], it is possible to map PRFs ranging from 1 to 200 kHz onto a narrower range of 1 to 20 kHz.

3.2 Convolutional neural networks

CNNs are deep neural networks that excel in image identification and classification. CNNs are specifically engineered to acquire spatial hierarchies of information autonomously and adaptively by utilizing the backpropagation algorithm. CNNs commonly comprise the subsequent layers [ 42 ]:

  • The input layer is designed to receive the image input.
  • The convolutional layer performs a convolution operation on the input and then passes the resulting output to the subsequent layer. This technique facilitates the network’s ability to concentrate on specific local locations and acquire diverse properties.
  • The activation layer is typically implemented after each convolutional layer. The model incorporates a non-linear activation function, such as a rectified linear unit (ReLU), which enables it to acquire knowledge from the error and adapt accordingly.
  • The pooling layer is positioned after the activation layer and conducts a down-sampling operation across the spatial dimensions. This process reduces computing complexity by decreasing input dimensionality. Max and Average Pooling are popular pooling layers in neural networks.
  • The wholly connected layer consists of neurons that establish connections with all activations in the preceding layer, similar to conventional neural networks. At a higher level of abstraction, they can be conceptualized as classifiers.
  • The output layer generates the ultimate output of the neural network.

Training a CNN entails utilizing labeled training data, wherein the weights and biases within the network are iteratively adjusted [ 52 ]. This process involves applying backpropagation to propagate errors from the output to the input layer [ 52 ]. Optimization algorithms like Gradient Descent are employed to optimize the network’s performance. CNNs are instrumental in various applications, including but not limited to Image and Video Recognition, Image Analysis, Autonomous Vehicles, Healthcare (e.g., Medical Image Analysis), and Natural Language Processing (when combined with other types of architectures)

Some of the renowned CNN architectures include LeNet-5 [ 53 ], AlexNet [ 54 ], ZFNet [ 55 ], GoogLeNet [ 56 ], VGGNet [ 57 ], ResNet {Szegedy, 2017 #57. each with its unique characteristics and enhancements over its predecessors.

2.2.1 MobileNetV2.

MobileNetV2 represents a notable advancement compared to its predecessor, MobileNetV1, tailored explicitly for utilization in mobile and edge computing devices. Inverted residuals and linear bottlenecks are employed to enhance the propagation of information and gradients within the network. The software possesses a low weight and high efficiency, rendering it appropriate for real-time applications on devices with limited resources {Gulzar, 2023 #58}.

2.2.2 Xception.

Xception can be regarded as an expansion of the Inception architecture. Using depthwise separable convolutions instead of conventional Inception modules enables the network to learn cross-channel and spatial correlations independently. This results in enhanced efficiency and performance [ 58 ].

2.2.3 EfficientNetB0.

EfficientNetB0 is the base model of the EfficientNet family, focusing on balancing accuracy and computational efficiency. A compound scaling method is used to optimize performance across different scales by uniformly scaling the network’s depth, width, and resolution [ 59 ].

2.2.4 EfficientNetV2B2.

EfficientNetV2 represents an enhanced iteration of the existing EfficientNet models. This study aims to strengthen the optimization process’s precision and effectiveness. The approach employs a combination of compound scaling, model fusion, and progressive learning techniques to enhance performance while minimizing computational resources. The designation "B2" denotes a distinct variant or arrangement of the EfficientNetV2 concept [ 60 ].

2.2.5 VGG16.

The VGG16 model is a type of neural network architecture known as a DCNN. Researchers from the University of Oxford, K. Simonyan, and A. Zisserman created it. The design of the VGG16 model is notable for its focus on simplicity. It consists of multiple layers of 3x3 convolutional layers stacked on top of each other, with the depth of the layers increasing as you move further into the network. Once these convolutional layers have been processed, fully linked layers are utilized [ 61 , 62 ].

2.2.6 ResNet50V2.

ResNet50V2 is an enhanced iteration of the initial ResNet50 architecture. Skip connections, shortcut connections, and bypass specific layers within the neural network architecture. This technique becomes beneficial in addressing the vanishing gradient problem, hence facilitating the successful training of deep networks. Including the "V2" designation signifies specific alterations and enhancements made to the initial ResNet framework [ 63 , 64 ].

2.2.7 MobileNetV3Small.

MobileNetV3Small is another member of the MobileNet family, designed explicitly for resource-constrained environments. It incorporates advancements in architecture search and hardware-aware training, emphasizing efficiency and performance on mobile devices [ 65 ].

2.2.8 DenseNet121.

DenseNet121 belongs to the family of Densely Connected Convolutional Networks. The connectivity pattern of this network architecture establishes a feed-forward linkage between each layer, facilitating optimal transmission of information across all layers. The dense connection network of this approach allows for fewer parameters while achieving a high level of precision [ 66 , 67 ].

3.3 Extreme learning machine

The ELM is a popular Single-hidden Layer Neural Networks (SLNN) learning algorithm. Its various versions are commonly employed in sequential, batch, and incremental learning due to their rapid and efficient learning speed, suitable generalization capability, fast convergence rate, and straightforward implementation [ 11 ]. In contrast to conventional learning algorithms, the fundamental objective of the ELM is to enhance generalization performance by minimizing the norm of the output weights and reducing the training error. According to Bartlett’s theory on feed-forward neural networks [ 21 ], networks with smaller weights examples will likely exhibit improved generalization performance.

The ELM initially assigns random weights and biases to the input layer and then subsequently computes the output layer weights based on these randomly generated values. The algorithm under consideration exhibits a higher rate of learning and superior performance in comparison to conventional neural network algorithms [ 17 , 21 ]. In Fig 2 , you can see a typical Single-layer Neural Network (SLNN). In this diagram, "n" refers to the number of neurons in the input layer, "L" represents the number of neurons in the hidden layer, and "m" stands for the number of neurons in the output layer.

thumbnail

https://doi.org/10.1371/journal.pone.0298373.g002

literature review example machine learning

The modified Moore-Penrose inverse of the matrix H is denoted as H + .

The performance of ELM is influenced by the quantity of hidden layer neurons and the duration of training epochs. In order to find the most effective number of hidden neurons, an experiment was conducted by varying the number of hidden neurons while keeping the number of training epochs constant at 30. An experiment with the Root Mean Square Error (RMSE) found the best number of hidden neurons. The number of hidden neurons varied while keeping the number of training epochs at 30. to evaluate the performance of the ELM. The final structure of the suggested model consisted of 1048 input neurons, 128 hidden neurons, and six output neurons, which were determined depending on the number of classes.

Nevertheless, the instability of the canonical ELM in real-world engineering issues might be attributed to the random values assigned to input weights and biases. Additionally, it has been suggested that the ELM may necessitate a larger quantity of hidden neurons due to the stochastic determination of the input weights and hidden biases [ 70 , 71 ]. Hence, optimization methods can be utilized to adjust input weights and biases to stabilize the results. In the subsequent section, it is suggested that GWO be used to change the input weights and biases of ELM.

3.4 Gray Wolf Optimization

GWO draws inspiration from the hierarchical structure and hunting patterns observed in gray wolf communities. The system utilizes mathematical modeling to simulate the various processes of optimizing gray wolf populations, including tracking, surrounding, hunting, and attacking. The hunting procedure of the gray wolf has three distinct stages: social hierarchy stratification, encircling the prey, and attacking the prey [ 21 , 72 ]. Fig 3 shows a diagram of the GWO algorithm.

thumbnail

https://doi.org/10.1371/journal.pone.0298373.g003

3.4.1 Social hierarchy.

Gray wolves are highly gregarious animals occupying the apex of the food chain, adhering to a rigid social dominance structure. The optimal solution is denoted as α, whereas the subsequent keys of lesser quality are denoted as β for the second-best, δ for the third-best, and ω for the remaining solutions [ 21 , 72 ].

3.4.2 Encircling the prey.

literature review example machine learning

Let X represent the location vector of the gray wolf, Xp represent the position vectors of prey, t represent the current iteration, A and C represent coefficient vectors, r 1 and r 2 represent random vectors in the interval [0, 1] n raised to the power of n, a represents the distance control parameter, which linearly decreases from 2 to 0 across the duration of iterations, and Max_iter represents the maximum number of iterations [ 21 , 72 ].

3.4.3 Attacking the prey.

literature review example machine learning

In the given equation, X α , X β , and X δ represent the position vectors of α, β, and δ wolves, respectively. The computations for A 1 , A 2 , and A 3 are analogous to those for A , while the counts for C 1 , C 2 , and C 3 are analogous to those for C . The equations D α = C 1 X α − X , D β = C 2 X β − X , and D δ = C 3 X δ − X are used to denote the distance between the current candidate wolves and the top three wolves. As depicted in Fig 4 , the candidate solution ultimately resides within the random circle delineated by α, β , and δ . Subsequently, under the supervision of the three most proficient wolves, the remaining contenders randomly adjust their locations in proximity to the prey. The individuals commence their hunt for information regarding the position of their prey in a disorganized manner, focusing their efforts on launching an attack on the prey.

thumbnail

https://doi.org/10.1371/journal.pone.0298373.g004

4. The suggested procedure

The suggested procedure, called DCNN_ELM_GWO, incorporates a hybrid model consisting of a DCNN combined with ELM and GWO techniques.

The methodology proposed, called DCNN_ELM_GWO, and integrates a hybrid DCNN model with ELM and GWO methods. This research presents a novel three-step process for identifying six prevalent PRIM forms. DL approaches are limited in addressing these aspects due to the substantial time required for training and fine-tuning the model parameters. This methodology involves a three-step process. Firstly, a DCNN is trained as a feature extractor. Secondly, an ELM is employed for real-time pattern identification. The primary approach ELM uses involves randomly adjusting input weights and biases. However, this practice needs to improve the network’s stability and dependability, as the network’s performance relies heavily on the initial adjustment of weights and biases. Consequently, this study proposes utilizing the GWO algorithm to enhance outcomes and bolster network reliability, all while preserving real-time capabilities.

Transfer learning has been employed to train the targeted neural networks. This approach involves using pre-trained weights from DCNNs trained on the ImageNet dataset, which includes a wide variety of classes. Only the fully connected layers at the network’s end are trained, while the remaining layers retain pre-trained weights. The fully connected layers that have been substituted in all networks exhibit uniformity, as they consist of a fully connected layer comprising 1024 neurons with a Relu activation function, a fully connected layer comprising 128 neurons with a Relu activation function, and a fully connected layer comprising six neurons, which aligns with the output classes, utilizing a softmax activation function.

4.1. Investigation of the empirical dataset

This study has generated a unique dataset of PRI radar signals to evaluate the suggested methodology, marking the first instance of such an endeavor. The study was carried out at Imam Khomeini Marine University, located in Nowshahr, throughout the period spanning from September to December 2020. To fulfill the intended objective, the system necessary for this task was meticulously devised and deployed within an area characterized by a substantial concentration of radar signals, where it remained operational for eight months. To achieve the desired functions and fulfill the specified criteria, electronic support systems are typically structured into several key components: radio antennas and receivers, hardware, control, and power supply units, processing units including processors, software, and processor units, as well as user consoles.

The passive approach receives, detects, processes, and analyzes radar signals within the 2–18 GHz frequency range. Based on the specified objectives and needs, the system comprises two primary components: the external component, which encompasses antennas and radio receivers, and the inside, part, which includes processor sets and hardware units. The establishment of connectivity between these two components is facilitated through cable interfaces. The antenna arrangement pattern design involves considering each antenna’s radiation pattern and coverage and determining the appropriate number of antennas needed to form an array that covers 360 degrees. The system processes the output signals from the receivers in real time, depending on the particular type of receiver.

Fig 4 illustrates the overall operational process of the system that has been designed. This figure provides a visual representation of the comprehensive operational process of the developed system, depicting the sequence of operations and interactions within the system.

Fig 5 illustrates the overarching block structure that represents the several processing processes conducted by the system. During the hardware and software processing stages, the system monitor presents the parameters of the extracted targets inside the domain of processors.

thumbnail

https://doi.org/10.1371/journal.pone.0298373.g005

In the software processing component of the system, the first phase entails doing activities such as classifying, filtering, and splitting the information. Consequently, an investigation into the segregation of pulses is conducted. After the successful completion of this stage, the subsequent supplementary parameters are obtained. Identifying the target involves measuring many properties.

ELINT or ESM systems receive radar signals and subsequently analyze the characteristics of each detected pulse.

This work introduces the utilization of PRI sound for identifying its modulation type. To achieve this objective, the PRI sequence obtained from a width hold signal (WHS) module is subjected to compression, ensuring that the amplitude of the sequence remains consistent. Subsequently, the compressed line is inputted into the sound card, and the resulting sound produced by the series is recorded through the speaker output.

The initial audio data exhibits a significant amount of noise. The technique employed in the study referenced as [ 11 ] has been utilized to mitigate unwanted disturbances in the initial audio dataset.

When presented with a waveform including both a signal and background noise (Sn), as well as a sample audio clip derived from the same or a comparable waveform but consisting solely of background noise (N), The algorithm is outlined as follows [ 73 ]:

  • Calculate the short-time Fourier transform for a given N (spec n ) value.
  • To calculate the statistical measures for each frequency component across time, we must determine the mean and standard deviation of spec n .
  • To calculate the short-time Fourier transform of Sn (spec n ), perform the necessary computations.
  • The mean and standard deviation of the spec n should be utilized to determine the threshold noise level for each frequency component.
  • To create a mask over specifications, it is necessary to consider the strength of the specifications and the predetermined thresholds from the spec n dataset.
  • The mask should be applied evenly throughout both frequency and time domains.
  • The mask should be applied to the specifications to eliminate any noise present.
  • The inverse short-time Fourier transform is computed across the given specifications to obtain a de-noised time-domain signal.

Fig 6 illustrates the initial sound data and the sound data after the application of a de-noising technique for noise removal.

thumbnail

Comparison of Sound Data, (a) the original sound data with inherent noise, and (b) the sound data post the noise removal process.

https://doi.org/10.1371/journal.pone.0298373.g006

The initial acoustic data exhibited temporal variations. A thorough analysis was conducted on the noise-free data after eliminating the extraneous elements from the original dataset. This analysis involved organizing the data into distinct segments of varying lengths based on the repeat duration of patterns seen within each audio data class. The subject matter is partitioned into four parts and allocated to their respective categories. The several selected components exhibit no overlap with one another. One hundred eight audio data samples, representing six distinct classes, were extracted for the collection. Fig 7 presents the block diagram illustrating the preparation of the current dataset. This figure provides a visual representation of the process undertaken for the preparation of the current dataset, illustrating the sequential steps and components involved in organizing and refining the data.

thumbnail

https://doi.org/10.1371/journal.pone.0298373.g007

The spectrogram images of the preexisting audio data have been extracted to facilitate the anticipated networks’ training process. Subsequently, these photos are employed to train the neural networks. Fig 8 presents spectrogram images representing the variability and characteristics of sample data within each designated class, allowing visual interpretation of the dataset’s frequency and time domain features.

thumbnail

https://doi.org/10.1371/journal.pone.0298373.g008

The dataset consists of a total of 108 data points, encompassing six distinct types of modulation. The data points were split into three subsets: 70% for training, 15% for validation, and 15% for testing. Data augmentation techniques have been employed to augment the existing dataset. Indeed, this strategy solely resulted in a quadrupling of the training data. Data augmentation has used two methods for introducing noise and jittering in the temporal domain. Table 3 provides comprehensive information about the dataset used in this study, including specifications and characteristics relevant to the research.

thumbnail

https://doi.org/10.1371/journal.pone.0298373.t003

4.2. Evaluation measurements

literature review example machine learning

TN refers to the count of instances that are true negatives. Whereas TP represents the number of actual positive cases. FP denotes the number of false-positive cases, while FN denotes the number of false-negative cases.

The efficacy of the suggested methodology is assessed through three distinct investigations, which are outlined as follows:

  • Firstly, the performance of eight deep convolutional neural networks (DCNN) types is assessed and compared using the dataset.
  • Additionally, an investigation is conducted on the performance of integrating DCNNs with ELM using the dataset.
  • Additionally, the DCNN-ELMs that have achieved the most favorable outcomes are subjected to optimization using the Grey Wolf Optimization (GWO) algorithm, and their performance is evaluated on the dataset.

The network with the highest levels of accuracy and speed has been unveiled.

The suggested DCNNs were trained using Google Colab’s shared hardware and the T4 graphics card. The DCNN_ELM and DCNN_ELM_GWO networks were trained on shared Google Kolb hardware and CPU due to the absence of shared RAM. The models required are created using Python programming language with Tensorflow and Keras libraries. The total number of Epochs for all networks is set at 30, while the batch size for all networks is standardized to 16. An initial training rate of 0.001 is initially chosen in training neural networks. Subsequently, if the evaluation data’s accuracy does not decline throughout five epochs, the training rate is halved. The minimum value for the training rate is set at 0.00001. The quantification of weights for transfer learning in pre-trained networks is performed using the methodology given by the TensorFlow library.

Fig 9 Training Diagram for DCNN Approaches. This diagram illustrates the various steps involved in the Deep Convolutional Neural Networks (DCNNs) training process, providing insights into the implemented methodologies.

thumbnail

https://doi.org/10.1371/journal.pone.0298373.g009

Fig 10 confusion matrix findings for each neural network. This figure presents the confusion matrix results for each implemented neural network, illustrating the classification performance and accuracy of the models.

thumbnail

https://doi.org/10.1371/journal.pone.0298373.g010

Fig 11 illustrates the precision-recall and receiver operating characteristic (ROC) curves for approaches employing DCNNs.

thumbnail

https://doi.org/10.1371/journal.pone.0298373.g011

Table 4 compares the classification results of various DCNN approaches, emphasizing performance and accuracy differences.

thumbnail

https://doi.org/10.1371/journal.pone.0298373.t004

Table 5 outlines the complexity analysis of the DCNN methods, detailing the computational cost and resources required by each approach.

thumbnail

https://doi.org/10.1371/journal.pone.0298373.t005

Fig 12 provides a comparative visualization of the computational outcomes, showcasing the efficiency and effectiveness of the proposed DCNNs in the study.

thumbnail

https://doi.org/10.1371/journal.pone.0298373.g012

Fig 13 Illustrates the comparative analysis of the suggested method’s average measurement criteria, focusing specifically on the average rank, to provide insights into its performance and reliability.

thumbnail

https://doi.org/10.1371/journal.pone.0298373.g013

Fig 14 Visualizes the time required to train the proposed DCNNs, providing insights into their computational efficiency.

thumbnail

https://doi.org/10.1371/journal.pone.0298373.g014

From the insights garnered from Tables 4 and 5 and Figs 10 – 14 , it is discernible that ResNet50V2 stands out as the optimum performing model, registering the highest scores in all specified metrics, accentuating its aptitude in classification tasks. Conversely, EfficientNetB0 significantly trails in every assessed metric, suggesting it may be relatively inefficient in managing classification tasks when juxtaposed with its peers. Both VGG16 and Xception exhibit exceptional and well-balanced performance, portraying them as reliable across many classification scenarios. Interestingly, VGG16 and DenseNet121 feature lower FLOPS, fewer network parameters, and reduced training times, suggesting they are more economically feasible regarding computational demands and enable faster inference. However, despite being high achievers, ResNet50V2 and Xception incur higher computational overheads due to increased FLOPS and network parameters, potentially necessitating substantial resources and elongating inference times. It is noteworthy that EfficientNetB0, despite its suboptimal performance, presents competitive complexity metrics comparable to MobileNetV2, underscoring the importance of a balanced approach between efficiency and performance. The data brings to light a prominent trade-off between performance and complexity. Models like ResNet50V2, albeit high-performing, are associated with higher computational demands, possibly constraining their applicability in environments with limited resources. Conversely, models such as VGG16 strike a balance, delivering notable performance and lower computational requisites, rendering them adaptable to broader applications.

The performance outcomes of the proposed DCNNs were obtained using Google Collab shared hardware a T4 graphics card, and is presented in Tables 4 and 5 , as well as Figs 10 – 14 . The performance results of the networks shown in Tables 6 – 8 and Figs 15 – 18 were acquired utilizing shared hardware and CPU resources provided by Google Collab. Due to this rationale, the values obtained for the table’s shared parameters exhibit variation.

thumbnail

https://doi.org/10.1371/journal.pone.0298373.g015

thumbnail

https://doi.org/10.1371/journal.pone.0298373.g016

thumbnail

https://doi.org/10.1371/journal.pone.0298373.g017

thumbnail

https://doi.org/10.1371/journal.pone.0298373.g018

thumbnail

https://doi.org/10.1371/journal.pone.0298373.t006

thumbnail

https://doi.org/10.1371/journal.pone.0298373.t007

thumbnail

https://doi.org/10.1371/journal.pone.0298373.t008

Table 6 presents a comparative analysis of classification results obtained from the two proposed approaches, VGG16-ELM and ResNet50V2-ELM. Results are the mean, standard deviation (STD) and root mean square error (RMSE) percentage, computed based on 25 independent runs. The experiments utilized an ELM hidden layer configured with 4000 and 1000 nodes.

The mean, RMSE, and STD were calculated using the data from 25 runs of each experiment. These statistical measures were used to evaluate the performance of the proposed DVNN-ELM technique in diagnosing PRIM recognition. The three assessments mentioned are widely recognized as the most prevalent statistical evaluation measures [ 78 – 80 ]. The mean quantifies the proximity of the classifier’s overall performance across multiple runs to the best answer. In contrast, the RMSE quantifies the concentration of the results from various runs around the perfect solution. The STD quantifies the extent to which the outcomes of multiple trials deviate from the average [ 78 – 80 ].

In the present study, a high mean value close to 100.00% indicates that the classifier performed well in various runs. Similarly, a low value for RMSE and STD suggests that the classifier consistently created results similar to or almost equal to 100.00%. The statistical findings for all experiments of the proposed DCNN-ELM techniques are presented in Table 6 . Eqs 21 – 23 are utilized for the computation of the μ, RMSE, and STD [ 78 , 80 ].

literature review example machine learning

The symbol μ indicates the population means, X i represents each value in the population, N represents the total number of values, and O represents the observed or optimal value, which is 100.00%.

Based on the findings in Table 6 , the mean values of all measurements are a close to 100.00%. This indicates that the DCNN_ELM algorithms consistently produced high levels of accuracy, precision, recall, F1-score, sensitivity, MCC, and specificity in the majority of the 25 runs. The low values of RMSE and STD, demonstrate the usefulness of the DCNN-ELM method in obtaining a high classification performance during the 25 runs.

Table 6 shows the VGG16_ELM-4000 model demonstrates exceptional performance, achieving the best mean sensitivity (97.76%), specificity (98.82%), precision (96.93%), F1 score (97.97%), accuracy (97.84%), and MCC (97.62%). These findings indicate that the model demonstrates a robust equilibrium across all assessed metrics, rendering it highly dependable for classification tasks. VGG16_ELM-1000 exhibits inferior performance in comparison to the 4000 feature set, as evidenced by lower mean values of sensitivity (91.51%), specificity (96.96%), precision (92.84%), F1 score (91.62%), accuracy (91.69%), and MCC (91.32%). The decrease in the size of the feature set may impact the performance, but it still has a commendable classification capability. ResNet50V2_ELM_4000 performs somewhat worse than VGG16_ELM-4000 but surpasses the ResNet variation with reduced features. The model attains a mean sensitivity of 97.13%, specificity of 98.65%, precision of 97.68%, F1 score of 97.30%, accuracy of 96.92%, and MCC of 96.69%. The ResNet50V2_ELM_1000 model exhibits the lowest mean values compared to the other three models while still demonstrating exceptional performance in terms of sensitivity (94.88%), specificity (97.87%), precision (95.32%), F1 score (94.73%), accuracy (94.15%), and MCC (93.98%).

The standard deviation numbers reflect the extent of variation in model performance:

The VGG16_ELM-4000 model exhibits minimal variability, as seen by its low standard deviation (STD) values. This suggests that the model’s performance remains stable across multiple runs. VGG16_ELM-1000 and ResNet50V2_ELM_1000 demonstrate larger traditional deviation values than their 4000-feature equivalents, suggesting a lower level of performance consistency. This could be attributed to the decreased intricacy of the feature space or the model’s susceptibility to the subtle variations in the dataset. Overall, when the number of features reduces from 4000 to 1000, both the VGG16 and ResNet50V2 models exhibit an increase in performance variability, as indicated by more significant standard deviation (STD) values.

The VGG16_ELM-4000 consistently achieves low RMSE values, which confirms its vital dependability and precision in classification tasks In contrast, VGG16_ELM-1000 exhibits the most excellent RMSE values, indicating more significant inaccuracies in the performance measurements. Both ResNet50V2_ELM_4000 and ResNet50V2_ELM_1000 exhibit moderate RMSE values. However, the latter has larger values, suggesting a more significant average error.

To summarize, the models with a more extensive feature set of 4000 show better average performance and have reduced variability and error rates, making them more resilient and dependable. The models with a reduced feature set (1000) provide satisfactory average performance but with more significant variability and error, suggesting they may be more susceptible to the dataset or require more precise adjustments to get ideal performance. Overall, the VGG16_ELM-4000 model demonstrates exceptional stability and accuracy across all criteria.

Nevertheless, when the critical data is divided into three sets, the number of samples available for training the model is considerably diminished, and the results can occasionally be influenced by a random selection of the (train, validation) sets. Cross-validation (CV) is a method that addresses this issue by merely using the test set for the final evaluation without needing the validation set. Practitioners widely employ the K-fold cross-validation (KCV) technique to pick models and estimate errors of classifiers. KCV involves dividing a dataset into k subsets. Some of these subsets are used for model training, while the remaining subsets are used for performance evaluation [ 81 , 82 ].

Given the utilization of an unbalanced and limited data set in this study, it is recommended to employ the 5-fold cross-validation technique to evaluate the final proposed solutions. The results are presented in the Table 7 and Fig 15 .

Fig 16 illustrates the comparative performance of several approaches, namely VGG16, VGG16_ELM, VGG16_ELM_GWO, ResNet50V2, ResNet50V2_ELM, and ResNet50V2_ELM_GWO, evaluating their effectiveness in achieving the desired outcomes.

Fig 17 presents a comparative analysis of the average rank, a specific measurement criterion, across six distinct models: VGG16, VGG16_ELM, VGG16_ELM_GWO, ResNet50V2, ResNet50V2_ELM, and ResNet50V2_ELM_GWO, providing insights into their respective performances.

Table 7 and Figs 15 – 17 has examined and contrasted the efficacy of traditional deep networks VGG16 and ResNet50V2, along with their combined variants utilizing ELM and GWO methods. This paper employed a 5-fold CV methodology for evaluation, resulting in the subsequent outcomes:

  • The training time for the VGG16 and ResNet50V2 models is considerably longer, averaging 39.0279 and 153.2663 seconds, respectively. Conversely, the incorporation of ELM layers significantly decreases the duration of training. VGG16_ELM_1000 and ResNet50v2_ELM_1000, using a smaller feature set of 1000, exhibit mean training times of 0.3538 and 1.3730 seconds, respectively. These times are ten times faster than their non-ELM counterparts. Increasing the feature set to 4000 in VGG16_ELM_4000 and resnet50v2_ELM_4000 leads to a slight increase in time, resulting in average durations of 1.3790 and 5.5860 seconds, respectively. The inclusion of GWO optimization significantly impacts the duration of the training, namely with VGG16_ELM_1000_GWO and Resnet50v2_ELM_1000_GWO. The training length increased significantly to 18.1578 and 69.3511 seconds, respectively, indicating that the optimization step contributes to the computational cost.
  • The standard VGG16 model achieves the highest average accuracy of 97.5384%, closely followed by VGG16_ELM_4000 and VGG16_ELM_1000_GWO, which have average accuracies of 97.8461% and 98.8059%, respectively. The findings indicate that utilizing ELM layers and GWO modification can improve the performance of VGG16. ResNet50V2 and its variations demonstrate marginally lower average accuracies, with Resnet50v2_ELM_1000_GWO obtaining an average of 97.5845%.
  • Precision is a measure that indicates the proportion of accurate optimistic predictions out of all the optimistic forecasts made. The VGG16_ELM_1000_GWO model demonstrated exceptional performance with a mean precision of 98.9393%, which signifies its dependability in accurately identifying negative cases as unfavorable. The precision of Resnet50v2_ELM_1000_GWO closely aligns with a mean of 98.1518%, indicating that GWO optimization improves precision for both architectures.
  • The VGG16_ELM_1000_GWO model exhibits exceptional sensitivity, with an average value of 98.6885%. The Resnet50v2_ELM_1000_GWO model shows a notable sensitivity, with an average of 99.1263%, indicating that the GWO optimization enhances the models’ capacity to identify positive cases in many scenarios.
  • The two GWO-optimized models, VGG16_ELM_1000_GWO and Resnet50v2_ELM_1000_GWO, exhibit strong performance in terms of accuracy and sensitivity, as seen by their high mean F1 scores of 98.3792% and 99.0249%, respectively.
  • The models VGG16_ELM_1000_GWO and Resnet50v2_ELM_1000_GWO demonstrate impressive MCC of 98.0964% and 99.0233%, respectively, indicating their excellent predictive accuracy and ability to handle class imbalances effectively.
  • The Resnet50v2_ELM_1000_GWO model demonstrates the highest mean specificity of 99.0540%, suggesting its extraordinary capability to identify and reject non-PRI cases accurately. This is particularly important in applications where false alarms might incur significant costs.

Overall, utilizing VGG16 and ResNet50V2 models with ELM layers and GWO optimization has exhibited noteworthy enhancements in both efficiency and efficacy. They significantly decrease the duration of training sessions while improving all performance measures, such as accuracy, precision, sensitivity, F1 scores, MCC, and specificity. The GWO-optimized variations achieve a commendable equilibrium between the time taken for training and the performance in classification. This makes them well-suited for PRI classification tasks in real-world scenarios when accuracy and efficiency are paramount. The uniformity in performance across all folds suggests that the models are robust and capable of effectively adapting to unfamiliar input, which is crucial for their use in practical environments. The combination of ELM and GWO optimization demonstrates the promise of these hybrid methodologies in successfully and efficiently addressing complicated classification tasks, as evidenced by the reduced training times and high performance achieved.

Table 8 delineates a comprehensive complexity analysis of several models, specifically VGG16, VGG16_ELM, VGG16_ELM_GWO, ResNet50V2, ResNet50V2_ELM, and ResNet50V2_ELM_GWO. The analysis provides insights into the computational cost, resources required, and the overall complexity of each model.

Fig 18 illustrates the comparative amount of training time required by different DL models including VGG16, VGG16_ELM, VGG16_ELM_GWO, ResNet50V2, ResNet50V2_ELM, and ResNet50V2_ELM_GWO, offering insights into their computational efficiency and time consumption.

Table 8 and Fig 18 demonstrate that the ResNet50V2 models exhibit a considerably higher level of complexity. With FLOPS of over 200 million and parameters of roughly 100 million, these models require the highest amount of computational resources. In contrast, VGG16`models have a shallow level of complexity, with FLOPS of approximately 50 million and around 25 million parameters.

The VGG16_ELM_1000 model distinguishes itself by having an impressively brief training duration of around 0.354 seconds, rendering it the most efficient. Conversely, `ResNet50V2`necessitates the most extended training duration, amounting to 153.266 seconds.

The ResNet50V2_ELM_GWO model has superior performance, achieving a remarkable accuracy of 98.66%. This represents a compromise between the level of computing difficulty and precision since higher performance necessitates more excellent computational resources. Conversely, the model `VGG16_ELM_1000`has the lowest accuracy of 92.73%, indicating that it is a more cost-effective but less effective model.

The analysis of Tables 6 – 8 and Figs 15 – 18 indicates a significant decrease in training durations with the incorporation of ELM layers. For example, VGG16_ELM_1000 and ResNet50v2_ELM_1000, which have a reduced feature set of 1000, show training times that is ten times faster than their regular versions. Nevertheless, the incorporation of GWO results in a substantial increase in training duration, which suggests additional computational expenses.

The typical VGG16 model demonstrates exceptional performance with an impressive average accuracy of 97.5384%.The performance of this model is improved by incorporating ELM layers and utilizing GWO optimization. Notably, VGG16_ELM_4000 and VGG16_ELM_1000_GWO have achieved accuracies as high as 98.8059%. The ResNet50V2 models, although exhibiting a somewhat lower average accuracy, also demonstrate enhancements with these alterations.

The VGG16_ELM_1000_GWO model exhibits extraordinarily high precision, a crucial indicator of accurate optimistic predictions. This high precision underscores the model’s trustworthiness. The GWO-optimized models provide exceptional sensitivity and specificity, essential for precisely recognizing positive cases and correctly rejecting non-PRI instances, respectively.

The F1 scores and Matthew’s Correlation Coefficient (MCC) highlight these algorithms’ exceptional prediction accuracy and capability in addressing class imbalances. The Resnet50v2_ELM_1000_GWO model demonstrates the best average specificity, which is particularly important for applications requiring minimal false alarms.

When examining the intricacy of these models, ResNet50V2 distinguishes itself due to its substantial demand on computational resources, as indicated by its FLOPS (floating point operations per second) and parameters. In contrast, VGG16 models exhibit lower complexity while simultaneously demonstrating lower efficiency in terms of accuracy.

Incorporating ELM and GWO optimization into VGG16 and ResNet50V2 models represents notable progress in PRI modulation detection. These models decrease the time required for training and improve performance across different measurements, achieving a harmonious combination of training length and classification effectiveness. The equilibrium is crucial for practical scenarios when precision and efficacy are paramount. The consistent performance observed in all folds indicates the resilience and flexibility of these hybrid techniques, underscoring their potential to tackle intricate classification issues.

5. Discussion

Given the provided data, it can be inferred that a clear trade-off exists between model performance and computational complexity across the different DCNN models explored. ResNet50V2 stands out with the highest scores in most metrics, emphasizing its proficiency in classification tasks but at a substantial computational cost. It shows increased FLOPS and network parameters, indicating potentially higher resource demands and longer inference times. EfficientNetB0, while having competitive complexity metrics, lags significantly in performance, suggesting a need for a balanced approach between efficiency and efficacy. Models like VGG16 and DenseNet121 exemplify lower computational demands and faster training, making them feasible and adaptable to various applications, notably when resources are constrained. Moreover, ELM-enhanced models such as VGG16 ELM-4000 show significant improvements in efficiency and performance, albeit with increased training times in some instances like ResNet50V2_ELM_4000. GWO-enhanced models, despite their superior performance in classification metrics, are considerably resource-intensive, emphasizing the necessity for optimizations and strategic model selections based on task-specific requirements and constraints. In essence, choosing the suitable model necessitates meticulously considering the balance between performance, computational efficiency, and resource availability tailored to the distinct needs of each classification task.

6. Conclusion

This work is the inaugural utilization of PRI sound for PRIM recognition. This study presents an innovative three-phase methodology for identifying the six prevalent kinds of PRIM. The initial step of this methodology was training a DCNN based on transfer learning, which served as a feature extractor. Subsequently, an ELM was substituted for the final fully connected layers to enhance the proposed model’s time complexity. Later, the introduction of GWO aimed to mitigate the space complexity associated with the proposed paradigm. This research also presents a novel experimental dataset of PRI patterns specifically tailored for recognition measurement, marking its inaugural introduction. This study incorporates eight pre-trained convolutional neural network models, including the VGG and the ResNet. The models have undergone satisfactory testing and evaluation using the PRI sound image dataset. The implemented classifiers’ outcomes showed that VGG16 and ResNet50V2 models obtained the best recognition accuracy with values of 97.53% and training time of 39.02 seconds and 96.92% and training time of 153.26 seconds, respectively. These values increased to 98.80%, a training time of 18.15 seconds, and a 97.58 training time of 69.35 seconds with the evolution of these networks with ELM and GWO, respectively. When evaluating all six measurement criteria, ResNet50V2_ELM_GWO is given the highest rating, while VGG 16_ELM_GWO is given the second-highest score.

For future research endeavors, several suggestions can be put forward. Optimizing prevailing models like ResNet50V2 and VGG16 warrants further exploration as it promises benefits in reducing training time and computational complexity. A deeper and more comprehensive investigation into current methodologies available for trimming model complexity will likely bear fruitful outcomes. Furthermore, scrutinizing other models, including EfficientNet and diverse versions of MobileNet, is conducive to discovering more efficacious and economical models. Experimentation with novel and contemporary techniques in deep learning, such as Transfer Learning and Meta-Learning, is poised to elevate the performance of models. Lastly, the adoption of advanced and potent hardware has the potential to abbreviate the duration of training times significantly.

  • View Article
  • Google Scholar
  • 7. Zhang, D., et al. Distributed Radar PRI Sequence Classification using K-medoids Algorithm and Feedforward Neural Networks. in 2021 IEEE 5th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). 2021. IEEE.
  • 12. Albadr, M.A.A., et al. Extreme learning machine for automatic language identification utilizing emotion speech data. in 2021 international conference on electrical, communication, and computer engineering (ICECCE). 2021. IEEE.
  • 20. Zhao, G., et al. On improving the conditioning of extreme learning machine: a linear case. in 2009 7th International Conference on Information, Communications and Signal Processing (ICICS). 2009. IEEE.
  • PubMed/NCBI
  • 31. Liu, J., H. Meng, and X. Wang. A new pulse deinterleaving algorithm based on multiple hypothesis tracking. in 2009 International Radar Conference" Surveillance for a Safer World"(RADAR 2009). 2009. IEEE.
  • 34. Kauppi, J.-P. and K. Martikainen. An efficient set of features for pulse repetition interval modulation recognition. in 2007 IET International Conference on Radar Systems. 2007. IET.
  • 35. Ghani, K.A., et al. Pulse repetition interval analysis using decimated Walsh-Hadamard transform. in 2017 IEEE Radar Conference (RadarConf). 2017. IEEE.
  • 36. Ahmed, U.I., et al. Robust pulse repetition interval (PRI) classification scheme under complex multi emitter scenario. in 2018 22nd International Microwave and Radar Conference (MIKON). 2018. IEEE.
  • 37. Liu, Y. and Q. Zhang. An improved algorithm for PRI modulation recognition. in 2017 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC). 2017. IEEE.
  • 38. Keshavarzi, M., A.M. Pezeshk, and F. Farzaneh. A new method for detection of complex pulse repetition interval modulations. in 2012 IEEE 11th International Conference on Signal Processing. 2012. IEEE.
  • 39. Song, K.-H., et al. Pulse repetition interval modulation recognition using symbolization. in 2010 International Conference on Digital Image Computing: Techniques and Applications. 2010. IEEE.
  • 40. Hu, G. and Y. Liu. An efficient method of pulse repetition interval modulation recognition. in 2010 International Conference on Communications and Mobile Computing. 2010. IEEE.
  • 42. Heaton, J., Ian Goodfellow, Yoshua Bengio, and Aaron Courville: Deep learning: The MIT Press, 2016, 800 pp, ISBN: 0262035618. Genetic programming and evolvable machines, 2018. 19(1–2): p. 305–307.
  • 51. Hekrdla, M. and A. Heřmánek. Deep Convolutional Neural Network Classifier of Pulse Repetition Interval Modulations. in 2019 International Radar Conference (RADAR). 2019. IEEE.
  • 52. O’Shea, K. and R. Nash, An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458, 2015.
  • 55. Zeiler, M.D. and R. Fergus. Visualizing and understanding convolutional networks. in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part I 13. 2014. Springer.
  • 56. Szegedy, C., et al. Going deeper with convolutions. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
  • 57. Simonyan, K. and A. Zisserman, Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  • 62. Mpova, L., T.C. Shongwe, and A. Hasan. The Classification and Detection of Cyanosis Images on Lightly and Darkly Pigmented Individual Human Skins Applying Simple CNN and Fine-Tuned VGG16 Models in TensorFlow’s Keras API. in 2023 IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA). 2023. IEEE.
  • 63. Raje, N.R. and A. Jadhav. Automated Diagnosis of Pneumonia through Capsule Network in conjunction with ResNet50v2 model. in 2022 International Conference on Emerging Smart Computing and Informatics (ESCI). 2022. IEEE.
  • 65. Suthar, O., V. Katkar, and K. Vaghela. Person Recognition using Gait Energy Image, MobileNetV3Small and Machine Learning. in 2023 IEEE 3rd International Conference on Technology, Engineering, Management for Societal impact using Marketing, Entrepreneurship and Talent (TEMSMET). 2023. IEEE.

Machine Learning for Predicting Corporate Violations: How Do CEO Characteristics Matter?

  • Original Paper
  • Published: 30 April 2024

Cite this article

literature review example machine learning

  • Ruijie Sun 1 ,
  • Feng Liu   ORCID: orcid.org/0000-0001-9367-049X 1 ,
  • Yinan Li 1 ,
  • Rongping Wang 1 &
  • Jing Luo 2  

Based on upper echelon theory, we employ machine learning to explore how CEO characteristics influence corporate violations using a large-scale dataset of listed firms in China for the period 2010–2020. Comparing ten machine learning methods, we find that eXtreme Gradient Boosting (XGBoost) outperforms the other models in predicting corporate violations. An interpretable model combining XGBoost and SHapley Additive exPlanations (SHAP) indicates that CEO characteristics play a central role in predicting corporate violations. Tenure has the strongest predictive power and is negatively associated with corporate violations, followed by marketing experience, education, duality (i.e., simultaneously holding the position of chairperson), and research and development experience. In contrast, shareholdings, age, and pay are positively related to corporate violations. We also analyze violation severity and violation type, confirming the role of tenure in predicting more severe and intentional violations. Overall, our findings contribute to preventing corporate violations, improving corporate governance, and maintaining order in the financial market.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

literature review example machine learning

Data availability

The data that supports the findings of this study are available from the corresponding author, upon reasonable request.

More detailed information can be found at http://english.sse.com.cn/news/newsrelease/c/4986736.shtml

We use the 2010–2020 period for our investigation of corporate violations because 2009 was an important year for the A-share market due to the introduction of a number of policies (e.g., strengthening market regulations, promoting new stock issuance, and reducing stamp duty), and companies may have taken some time to adapt to these policies.

CEOs with business degrees tend to exhibit more business ethics and are less likely to commit violations (Troy et al., 2011 ). Thus, we separate the possession of an MBA from education in general to investigate the impact of CEO MBA on corporate violations.

Using the logarithm to compress the original data into a smaller range has been widely employed to analyze CEO characteristics because the data are easier to process and the impact of noise and outliers on the results is reduced (e.g., Chidambaran and Prabhala, 2003 ; Gao and Li, 2015 ).

CEO pay in our study does not include equity incentives because the equity incentive plan for Chinese listed companies starts late, while the proportion of equity incentives implemented by listed companies and the proportion of equity incentive shares granted by equity incentives are low.

According to the database guide, CEO shareholdings do not include unexercised stock options.

Amiram, D., Bozanic, Z., Cox, J. D., Dupont, Q., Karpoff, J. M., & Sloan, R. (2018). Financial reporting fraud and other forms of misconduct: A multidisciplinary review of the literature. Review of Accounting Studies, 23 (2), 732–783.

Article   Google Scholar  

Ardichvili, A., Jondle, D., Kowske, B., Cornachione, E., Li, J., & Thakadipuram, T. (2012). Ethical cultures in large business organizations in Brazil, Russia, India, and China. Journal of Business Ethics, 105 , 415–428.

Babalola, M. T., Bal, M., Cho, C. H., Garcia–Lorenzo, L., Guedhami, O., Liang, H., ... & van Gils, S. (2022). Bringing excitement to empirical business ethics research: Thoughts on the future of business ethics. Journal of Business Ethics, 180(3), 903–916.

Bao, Y., Ke, B., Li, B., Yu, Y. J., & Zhang, J. (2020). Detecting accounting fraud in publicly traded US firms using a machine learning approach. Journal of Accounting Research, 58 (1), 199–235.

Barker, V. L., & Mueller, G. C. (2002). CEO characteristics and firm R&D spending. Management Science, 48 (6), 782–801.

Baucus, M. S. (1994). Pressure, opportunity and predisposition: A multivariate model of corporate illegality. Journal of Management, 20 (4), 699–721.

Benmelech, E., Kandel, E., & Veronesi, P. (2010). Stock–based compensation and CEO (dis) incentives. The Quarterly Journal of Economics, 125 (4), 1769–1820.

Bertomeu, J., Cheynel, E., Liao, Y., & Milone, M. (2021b). Using machine learning to measure conservatism. Available at SSRN 3924961. http://hdl.handle.net/10125/76928

Bertomeu, J. (2020). Machine learning improves accounting: Discussion, implementation and research opportunities. Review of Accounting Studies, 25 (3), 1135–1155.

Bertomeu, J., Cheynel, E., Floyd, E., & Pan, W. (2021a). Using machine learning to detect misstatements. Review of Accounting Studies, 26 (2), 468–519.

Bertrand, M., & Schoar, A. (2003). Managing with style: The effect of managers on firm policies. The Quarterly Journal of Economics, 118 (4), 1169–1208.

Bhaskar, L. S., Krishnan, G. V., & Yu, W. (2017). Debt covenant violations, firm financial distress, and auditor actions. Contemporary Accounting Research, 34 (1), 186–215.

Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30 (7), 1145–1159.

Brown, N. C., Crowley, R. M., & Elliott, W. B. (2020). What are you saying? Using topic to detect financial misreporting. Journal of Accounting Research, 58 (1), 237–291.

Bundy, J., Iqbal, F., & Pfarrer, M. D. (2021). Reputations in flux: How a firm defends its multiple reputations in response to different violations. Strategic Management Journal, 42 (6), 1109–1138.

Caskey, J., & Ozel, N. B. (2017). Earnings expectations and employee safety. Journal of Accounting and Economics, 63 (1), 121–141.

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over–sampling technique. Journal of Artificial Intelligence Research, 16 , 321–357.

Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794.

Cheynel, E., & Zhou, F. S. (2023). Auditor tenure and misreporting: Evidence from a dynamic oligopoly game. Management Science, Ahead of Print. https://doi.org/10.1287/mnsc.2023.4944

Cheynel, E., Cianciaruso, D., & Zhou, F. (2023). Fraud Power Laws. Available at SSRN 4292259. https://ssrn.com/abstract=4292259

Chidambaran, N. K., & Prabhala, N. R. (2003). Executive stock option repricing, internal governance mechanisms, and management turnover. Journal of Financial Economics, 69 (1), 153–189.

Choi, D., Shin, H., & Kim, K. (2023). CEO’s childhood experience of natural disaster and CSR activities. Journal of Business Ethics, Ahead of Print. https://doi.org/10.1007/s10551-022-05319-3

Conyon, M. J., & He, L. (2016). Executive compensation and corporate fraud in China. Journal of Business Ethics, 134 , 669–691.

Davidson, R. H. (2022). Who did it matters: Executive equity compensation and financial reporting fraud. Journal of Accounting and Economics, 73 (2–3), 101453.

Davidson, R., Dey, A., & Smith, A. (2015). Executives’ “off–the–job” behavior, corporate culture, and financial reporting risk. Journal of Financial Economics, 117 (1), 5–28.

Dikolli, S. S., Mayew, W. J., & Nanda, D. (2014). CEO tenure and the performance–turnover relation. Review of Accounting Studies, 19 , 281–327.

Ding, K., Lev, B., Peng, X., Sun, T., & Vasarhelyi, M. A. (2020). Machine learning improves accounting estimates: Evidence from insurance payments. Review of Accounting Studies, 25 , 1098–1134.

Dzyabura, D., El Kihal, S., Hauser, J. R., & Ibragimov, M. (2023). Leveraging the power of images in managing product return rates. Marketing Science, 42 (6), 1125–1142.

Fan, J. P., Wong, T. J., & Zhang, T. (2007). Politically connected CEOs, corporate governance, and post–IPO performance of China’s newly partially privatized firms. Journal of Financial Economics, 84 (2), 330–357.

Farag, H., & Mallin, C. (2018). The influence of CEO demographic characteristics on corporate risk-taking: Evidence from Chinese IPOs. The European Journal of Finance, 24 (16), 1528–1551.

Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27 (8), 861–874.

Gangloff, K. A., Connelly, B. L., & Shook, C. L. (2016). Of scapegoats and signals: Investor reactions to CEO succession in the aftermath of wrongdoing. Journal of Management, 42 (6), 1614–1634.

Gao, H., & Li, K. (2015). A comparison of CEO pay-performance sensitivity in privately–held and public firms. Journal of Corporate Finance, 35 , 370–388.

Gong, G., Huang, X., Wu, S., Tian, H., & Li, W. (2021). Punishment by securities regulators, corporate social responsibility and the cost of debt. Journal of Business Ethics, 171 , 337–356.

Gong, G., Xu, S., & Gong, X. (2018). On the value of corporate social responsibility disclosure: An empirical investigation of corporate bond issues in China. Journal of Business Ethics, 150 , 227–258.

Greve, H. R., Palmer, D., & Pozner, J. E. (2010). Organizations gone wild: The causes, processes, and consequences of organizational misconduct. The Academy of Management Annals, 4 (1), 53–107.

Hambrick, D. C., & Mason, P. A. (1984). Upper echelons: The organization as a reflection of its top managers. Academy of Management Review, 9 (2), 193–206.

Harrison, A., Summers, J., & Mennecke, B. (2018). The effects of the dark triad on unethical behavior. Journal of Business Ethics, 153 , 53–77.

He, F., Du, H., & Yu, B. (2022). Corporate ESG performance and manager misconduct: Evidence from China. International Review of Financial Analysis, 82 , 102201.

Hennes, K. M., Leone, A. J., & Miller, B. P. (2008). The importance of distinguishing errors from irregularities in restatement research: The case of restatements and CEO/CFO turnover. The Accounting Review, 83 (6), 1487–1519.

Heyden, M. L., Gu, J., Wechtler, H. M., & Ekanayake, U. I. (2023). The face of wrongdoing? An expectancy violations perspective on CEO facial characteristics and media coverage of misconducting firms. The Leadership Quarterly, 34 (3), 101671.

Ho, C., & Redfern, K. A. (2010). Consideration of the role of guanxi in the ethical judgments of Chinese managers. Journal of Business Ethics, 96 , 207–221.

Hwang, D. B., & Blair Staley, A. (2005). An analysis of recent accounting and auditing failures in the United States on US accounting and auditing in China. Managerial Auditing Journal, 20 (3), 227–234.

Hwang, D. B., Golemon, P. L., Chen, Y., Wang, T. S., & Hung, W. S. (2009). Guanxi and business ethics in Confucian society today: An empirical case study in Taiwan. Journal of Business Ethics, 89 , 235–250.

Jia, C., Ding, S., Li, Y., & Wu, Z. (2009). Fraud, enforcement action, and the role of corporate governance: Evidence from China. Journal of Business Ethics, 90 , 561–576.

Jia, Y., & LENT, L. V., & Zeng, Y. (2014). Masculinity, testosterone, and financial misreporting. Journal of Accounting Research, 52 (5), 1195–1246.

Ke, Z., Liu, D., & Brass, D. J. (2020). Do online friends bring out the best in us? The effect of friend contributions on online review provision. Information Systems Research, 31 (4), 1322–1336.

Khanna, T., & Yafeh, Y. (2007). Business groups in emerging markets: Paragons or parasites? Journal of Economic Literature, 45 (2), 331–372.

Koch-Bayram, I. F., & Wernicke, G. (2018). Drilled to obey? Ex-military CEOs and financial misconduct. Strategic Management Journal, 39 (11), 2943–2964.

Krupa, J., & Minutti-Meza, M. (2022). Regression and machine learning methods to predict discrete outcomes in accounting research. Journal of Financial Reporting, 7 (2), 131–178.

La Porta, R., Lopez-de-Silanes, F., Shleifer, A., & Vishny, R. (2002). Investor protection and corporate valuation. The Journal of Finance, 57 (3), 1147–1170.

Leone, A. J., & Liu, M. (2010). Accounting irregularities and executive turnover in founder-managed firms. The Accounting Review, 85 (1), 287–314.

Li, J., Yu, L., Mei, X., & Feng, X. (2022). Do social media constrain or promote company violations? Accounting and Finance, 62 (1), 31–70.

Li, X., & Li, Y. (2020). Female independent directors and financial irregularities in Chinese listed firms: From the perspective of audit committee chairpersons. Finance Research Letters, 32 , 101320.

Lisic, L. L., Silveri, S. D., Song, Y., & Wang, K. (2015). Accounting fraud, auditing, and the role of government sanctions in China. Journal of Business Research, 68 (6), 1186–1195.

Liu, C. (2018). Are women greener? Corporate gender diversity and environmental violations. Journal of Corporate Finance, 52 , 118–142.

Liu, F., Wang, R., & Fang, M. (2024). Mapping green innovation with machine learning: Evidence from China. Technological Forecasting and Social Change, 200 , 123107.

Loe, T. W., Ferrell, L., & Mansfield, P. (2000). A review of empirical studies assessing ethical decision making in business. Journal of Business Ethics, 25 , 185–204.

López Vargas, K., Runge, J., & Zhang, R. (2022). Algorithmic assortative matching on a digital social medium. Information Systems Research, 33 (4), 1138–1156.

Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In I. Guyon et al. (Eds.), Advances in neural information processing systems (Vol. 30, pp. 4765–4774). Curran Associates, Inc. http://papers.nips.cc/paper/7062-a-unifed-approach-to-interpreting-model-predictions.pdf

Martin, G., Campbell, J. T., & Gomez-Mejia, L. (2016). Family control, socioemotional wealth and earnings management in publicly traded firms. Journal of Business Ethics, 133 , 453–469.

McGuire, S. T., Omer, T. C., & Sharp, N. Y. (2012). The impact of religion on financial reporting irregularities. The Accounting Review, 87 (2), 645–673.

Murdoch, W. J., Singh, C., Kumbier, K., Abbasi-Asl, R., & Yu, B. (2019). Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116 (44), 22071–22080.

Musteen, M., Barker, V. L., III., & Baeten, V. L. (2006). CEO attributes associated with attitude toward change: The direct and moderating effects of CEO tenure. Journal of Business Research, 59 (5), 604–612.

Nietsch, M. (2018). Corporate illegal conduct and directors’ liability: An approach to personal accountability for violations of corporate legal compliance. Journal of Corporate Law Studies, 18 (1), 151–184.

Oh, W. Y., Chang, Y. K., & Cheng, Z. (2016). When CEO career horizon problems matter for corporate social responsibility: The moderating roles of industry–level discretion and blockholder ownership. Journal of Business Ethics, 133 , 279–291.

Perols, J. L., Bowen, R. M., Zimmermann, C., & Samba, B. (2017). Finding needles in a haystack: Using data analytics to improve fraud prediction. The Accounting Review, 92 (2), 221–245.

Persons, O. S. (2006). The effects of fraud and lawsuit revelation on US executive turnover and compensation. Journal of Business Ethics, 64 , 405–419.

Proudfoot, D., Berry, Z., Chang, E. H., & Kay, M. B. (2023). The diversity heuristic: How team demographic composition influences judgments of team creativity. Management Science, Ahead of Print. https://doi.org/10.1287/mnsc.2023.4862

Provis, C. (2020). Business ethics, Confucianism and the different faces of ritual. Journal of Business Ethics, 165 , 191–204.

Rodríguez-Pereira, J., Balcik, B., Rancourt, M. È., & Laporte, G. (2021). A cost-sharing mechanism for multi-country partnerships in disaster preparedness. Production and Operations Management, 30 (12), 4541–4565.

Schrand, C. M., & Zechman, S. L. (2012). Executive overconfidence and the slippery slope to financial misreporting. Journal of Accounting and Economics, 53 (1–2), 311–329.

Scott, A., & Nyaga, G. N. (2019). The effect of firm size, asset ownership, and market prices on regulatory violations. Journal of Operations Management, 65 (7), 685–709.

Shrestha, Y. R., He, V. F., Puranam, P., & von Krogh, G. (2021). Algorithm supported induction for building theory: How can we use prediction models to theorize? Organization Science, 32 (3), 856–880.

Štrumbelj, E., & Kononenko, I. (2014). Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems, 41 , 647–665.

Tan, J. (2009). Institutional structure and firm social performance in transitional economies: Evidence of multinational corporations in China. Journal of Business Ethics, 86 , 171–189.

Tang, Y., Li, J., & Liu, Y. (2016). Does founder CEO status affect firm risk taking? Journal of Leadership & Organizational Studies, 23 (3), 322–334.

Troy, C., Smith, K. G., & Domino, M. A. (2011). CEO demographics and accounting fraud: Who is more likely to rationalize illegal acts? Strategic Organization, 9 (4), 259–282.

Van Scotter, J. R., & Roglio, K. D. D. (2020). CEO bright and dark personality: Effects on ethical misconduct. Journal of Business Ethics, 164 , 451–475.

Wang, B. Y., Duan, M., & Liu, G. (2021a). Does the power gap between a chairman and CEO matter? Evidence from corporate debt financing in China. Pacific-Basin Finance Journal, 65 , 101495.

Wang, L., Su, Z. Q., Fung, H. G., Jin, H. M., & Xiao, Z. (2021b). Do CEOs with academic experience add value to firms? Evidence on bank loans from Chinese firms. Pacific-Basin Finance Journal, 67 , 101534.

Warren, D. E., Dunfee, T. W., & Li, N. (2004). Social exchange in China: The double–edged sword of guanxi. Journal of Business Ethics, 55 , 353–370.

Wathne, K. H., & Heide, J. B. (2000). Opportunism in interfirm relationships: Forms, outcomes, and solutions. Journal of Marketing, 64 (4), 36–51.

Wei, J., Ouyang, Z., & Chen, H. A. (2018). CEO characteristics and corporate philanthropic giving in an emerging market: The case of China. Journal of Business Research, 87 , 1–11.

Wei, L. Q., & Ling, Y. (2015). CEO characteristics and corporate entrepreneurship in transition economies: Evidence from China. Journal of Business Research, 68 (6), 1157–1165.

Wu, D. (2023). Text–based measure of supply chain risk exposure. Management Science, Ahead of Print. https://doi.org/10.1287/mnsc.2023.4927 .

Wu, J., Zhang, Z., & Zhou, S. X. (2022). Credit rating prediction through supply chains: A machine learning approach. Production and Operations Management, 31 (4), 1613–1629.

Wu, W., Johan, S. A., & Rui, O. M. (2016). Institutional investors, political connections, and the incidence of regulatory enforcement against corporate fraud. Journal of Business Ethics, 134 , 709–726.

Xu, X., Xiong, F., & An, Z. (2023). Using machine learning to predict corporate fraud: Evidence based on the gone framework. Journal of Business Ethics, 186 (1), 137–158.

You, J., & Du, G. (2012). Are political connections a blessing or a curse? Evidence from CEO turnover in China. Corporate Governance: An International Review, 20 (2), 179–194.

Zahra, S. A., Priem, R. L., & Rasheed, A. A. (2005). The antecedents and consequences of top management fraud. Journal of Management, 31 (6), 803–828.

Zhang, J., Zhu, M., & Liu, F. (2023). Find who is doing social good: Using machine learning to predict corporate social responsibility performance. Operations Management Research, Ahead of Print. https://doi.org/10.1007/s12063-023-00427-3

Zhang, J. (2018). Public governance and corporate fraud: Evidence from the recent anti-corruption campaign in China. Journal of Business Ethics, 148 (2), 375–396.

Zhang, M., & Luo, L. (2023). Can consumer–posted photos serve as a leading indicator of restaurant survival? Evidence from Yelp. Management Science, 69 (1), 25–50.

Zhang, X., Du, Q., & Zhang, Z. (2022). A theory-driven machine learning system for financial disinformation detection. Production and Operations Management, 31 (8), 3160–3179.

Zhang, Y., & Zhang, Z. (2006). Guanxi and organizational dynamics in China: A link between individual and organizational levels. Journal of Business Ethics, 67 , 375–392.

Download references

Acknowledgements

We gratefully acknowledge insightful suggestions from the editors and the anonymous reviewers, which substantively improved this article. We also thank Mingjie Fang, Caixia Liu, Simon Shufeng Xiao, Gil Coombe, and Zongli Dai for their comments on earlier versions of this paper. We thank the members of Star-lights Research Team for research assistance.

This work was supported by the Humanities and Social Sciences Foundation of the Ministry of Education of China [Grant No. 21YJC630076].

Author information

Authors and affiliations.

Business School, Shandong University, Weihai, China

Ruijie Sun, Feng Liu, Yinan Li & Rongping Wang

School of Mathematics and Statistics, Shandong University, Weihai, China

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Feng Liu .

Ethics declarations

Conflict of interest.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 743 KB)

Rights and permissions.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Sun, R., Liu, F., Li, Y. et al. Machine Learning for Predicting Corporate Violations: How Do CEO Characteristics Matter?. J Bus Ethics (2024). https://doi.org/10.1007/s10551-024-05685-0

Download citation

Received : 20 May 2023

Accepted : 03 April 2024

Published : 30 April 2024

DOI : https://doi.org/10.1007/s10551-024-05685-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Corporate violations
  • CEO characteristics
  • Violation severity
  • Intentional violations
  • Machine learning

JEL Classification

  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. (PDF) A Literature Review on Supervised Machine Learning Algorithms and

    literature review example machine learning

  2. 39 Best Literature Review Examples (Guide & Samples)

    literature review example machine learning

  3. (PDF) Machine Learning Algorithms in Healthcare: A Literature Survey

    literature review example machine learning

  4. (PDF) Predicting student’s performance using machine learning methods

    literature review example machine learning

  5. 50 Smart Literature Review Templates (APA) ᐅ TemplateLab

    literature review example machine learning

  6. (PDF) A SYSTEMATIC LITERATURE REVIEW ON SUPERVISED MACHINE LEARNING

    literature review example machine learning

VIDEO

  1. Find s algorithm

  2. Why you should read Research Papers in ML & DL? #machinelearning #deeplearning

  3. Systematic Literature Review Technique

  4. Machine Learning Algoritham

  5. What is a review of literature in research?

  6. Literature review structure and AI tools

COMMENTS

  1. Systematic literature review of machine learning methods used in the

    Machine learning is a broad term encompassing a number of methods that allow the investigator to learn from the data. These methods may permit large real-world databases to be more rapidly translated to applications to inform patient-provider decision making. This systematic literature review was conducted to identify published observational research of employed machine learning to inform ...

  2. A systematic literature review on machine learning applications for

    The use of Machine Learning (ML) techniques to mine online reviews has been found broadly in literature [4], [5]. CSA, traditionally a DM and text classification task [6] , is described as the computational understanding of consumer's sentiments, opinions, and attitude towards services or products [7] , [8] .

  3. Machine Learning for industrial applications: A comprehensive

    Machine Learning (ML) is a branch of artificial intelligence that studies algorithms able to learn autonomously, directly from the input data. ... So, we believe that a systematic literature review focused on the historical developments of ML for industrial applications, may be extremely useful to highlight present and future trends and, above ...

  4. A Systematic Literature Review of Machine Learning Applications in

    A SLR is a way allowing us to evaluate and to interpret researches related to a specific research question, or to a research subject. It focuses on giving an objective evaluation of a research subject trough a credible methodology [].Once the research questions are specified, a protocol is established, this covered definitions of "Inclusion and exclusion criteria", "Search strategy ...

  5. Systematic literature review of machine learning methods used in the

    Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making. Alan Brnabic 1 and Lisa M. Hess 2 ... the cross-validation option was used from Quinlan's C5.0 decision tree learning algorithm as their study sample was too small to split into a testing and validation sample ...

  6. Machine-Learning-Based Disease Diagnosis: A Comprehensive Review

    Motivation. The purpose of this review is to provide insights to recent and future researchers and practitioners regarding machine-learning-based disease diagnosis (MLBDD) that will aid and enable them to choose the most appropriate and superior machine learning/deep learning methods, thereby increasing the likelihood of rapid and reliable disease detection and classification in diagnosis.

  7. Full article: A systematic literature review on machine learning

    2. Literature review: the application of machine learning for energy efficiency improvements. Artificial intelligence (AI), in general, impacts the energy sector since it can assist in the development of clean, cheap, and reliable energy (Makala and Bakovic Citation 2020).Furthermore, AI eliminates energy waste and reduces energy costs by improving the planning, operations, and control of ...

  8. Financial applications of machine learning: A literature review

    Review of financial applications of machine learning. This section presents a comprehensive review of existing literature across the six financial areas: stock markets, portfolio management, cryptocurrency, foreign exchange markets, financial crisis, and bankruptcy and insolvency. The performed review of the 126 selected articles includes an ...

  9. Systematic reviews of machine learning in healthcare: a literature review

    There are numerous examples such as the application of AI to the diagnosis of cardiac diseases [Citation 2], neoplastic diseases [Citation 3], ... Daniel, Cenggoro TW, Pardamean B. A systematic literature review of machine learning application in COVID-19 medical image classification. Procedia Comput Sci. 2023;216:749-756. doi: ...

  10. A Systematic Literature Review on the Use of Deep Learning in Software

    Software engineering (SE) research investigates questions pertaining to the design, development, maintenance, testing, and evolution of software systems. As software continues to pervade a wide range of industries, both open- and closed-source code repositories have grown to become. ∗Authors have contributed equally.

  11. An open source machine learning framework for efficient and ...

    It is a challenging task for any research field to screen the literature and determine what needs to be included in a systematic review in a transparent way. A new open source machine learning ...

  12. Machine Learning: Algorithms, Real-World Applications and ...

    Supervised: Supervised learning is typically the task of machine learning to learn a function that maps an input to an output based on sample input-output pairs [].It uses labeled training data and a collection of training examples to infer a function. Supervised learning is carried out when certain goals are identified to be accomplished from a certain set of inputs [], i.e., a task-driven ...

  13. Fake news detection: a systematic literature review of machine learning

    Thus, 61 publications comprise the sample considered for . analysis by this research. ... a systematic literature review of machine learning algorithms and datasets Villela et al. 20 23.

  14. An intelligent literature review: adopting inductive approach to define

    Big data analytics utilizes different techniques to transform large volumes of big datasets. The analytics techniques utilize various computational methods such as Machine Learning (ML) for converting raw data into valuable insights. The ML assists individuals in performing work activities intelligently, which empowers decision-makers. Since academics and industry practitioners have growing ...

  15. A Systematic Literature Review on Using Machine Learning Algorithms for

    <i>Context</i>. The improvements made in the last couple of decades in the requirements engineering (RE) processes and methods have witnessed a rapid rise in effectively using diverse machine learning (ML) techniques to resolve several multifaceted RE issues. One such challenging issue is the effective identification and classification of the software requirements on Stack Overflow (SO) for ...

  16. (PDF) A literature review on artificial intelligence

    Machine learning literature The literature of machine learning is wide (Grumberg et al., 2003, Brod ley and F riedl, 1999, Meek, 2001 and W alker, 2000).The following is a brief description of the ...

  17. PDF The Application of Machine Learning in Literature Reviews: a Framework

    THE APPLICATION OF MACHINE LEARNING IN LITERATURE REVIEWS: A FRAMEWORK 73. The artifacts created so far can be used for Data Visualization. For example, the word list can be used to create word clouds. The artifacts centroid table and distance table can be used to create network graphs.

  18. PDF Systematic Literature Review on Machine Learning and Student

    2. Overview of Machine Learning Principles This section summarizes the most widely used ML models in student performance prediction studies. In addition, the common evaluation metrics and validation strategies are reviewed. 2.1. Overview of Machine Learning Models There are different types of machine learning. Student performance evaluation re-

  19. A systematic literature review of machine learning methods applied to

    The rest of this article is structured as follows: Section 2 describes the planning and the execution of SLR. Section 3 presents an overview of the main steps on the development of a ML model. Section 4 presents a summary of the studied literature, highlighting the answers to some research questions and the main characteristics of PdM techniques based on ML.

  20. Exploring post-COVID-19 health effects and features with advanced

    For example, machine learning analysis of Post-COVID-19 impact on medical staff and doctor productivity 10 as well as the adverse effects and nonmedical ... Literature review was conducted by I.R ...

  21. (PDF) Machine Learning:A Review

    portant: (1) Machine learning is important in adjusting its struc-. ture to produce desired outputs due to the heavy amount. of data input into the system [57]. (2) Machine learning is also ...

  22. Machine Learning Applications in Baseball: A Systematic Literature Review

    In particular, analysts can now apply machine learning algorithms to large baseball data sets to derive meaningful insights into player and team performance. In the interest of stimulating new research and serving as a go-to resource for academic and industrial analysts, we perform a systematic literature review of machine learning applications ...

  23. Literature Review on the Applications of Machine Learning and

    The keywords for this topic were smart healthcare, machine learning, biomedical monitoring, data analytics, data mining, deep learning, fusion, human activity recognition, model, medical diagnostic imaging, neural network, and smart home. Machine learning, a field of artificial intelligence, has been widely used in the medical industry.

  24. Enhanced PRIM recognition using PRI sound and deep learning techniques

    Literature review. The PRIM approach is ... The results are derived from a sample size of 40,000 tests, chosen randomly from a pool of seven distinct modulations, each having an equal likelihood of being selected. ... Wang J., et al., A review on extreme learning machine. Multimedia Tools and Applications, 2022. 81(29): p. 41611-41660.

  25. A systematic review of the literature on machine learning application

    A systematic review of the literature on machine learning application of determining the attributes influencing academic performance. Author links open overlay panel Iddrisu ... the scope of the population sample needs a benchmarked dataset and embedding the appropriate intervention outlines that will map the learner's performance early in ...

  26. Machine Learning for Predicting Corporate Violations: How Do CEO

    For example, machine learning has been used to build fraud prediction models (e.g., Bao et al., ... First, we review the literature on the relationship between CEO characteristics and corporate violations in Sect. 2. We then detail the sample selection process and variable interpretation in Sect. 3, while Sect. 4 introduces data and machine ...

  27. Effects of Public Service Motivation on R&D Project-Based Team Learning

    While public service motivation (PSM) and teamwork are widely recognized as crucial drivers for effective public service delivery, researchers primarily analyze these factors independently and at a personal level. The existing literature rarely explores the interplay between PSM, the project team learning process (PTLP), and psychological safety (PS) within research and development (R&D ...

  28. Enhancing Marketing Strategies Through Personalized Marketing: a

    Machine Learning (ML) is a subset of AI that allows computers to analyse and interpret data without being explicitly programmed. ... of data for this study was literature review and thematic ...