MCPHS Library Logo

Literature Reviews & Search Strategies

  • Defining the Literature Review
  • Types of Literature Reviews

Use Multiple Databases

  • Search Strategies
  • Organizing Your Literature
  • Books: Research Design & Scholarly Writing
  • Recommended Tutorials

While not every literature search you undertake will be for a systematic review, the Cochrane Handbook's statement that "a search of MEDLINE alone is not considered adequate" holds true for almost all literature reviews. You need to go beyond one database to get a more comprehensive picture of your topic and to minimize selection bias. 

There are A LOT of databases that you could potential search for academic/scholarly articles to use in your literature review. We recommend focusing on resources that specializes in academic sources (ie databases), rather than a general search tool like Google because a lot of scholarly literature is still not discoverable on the open web and when it is you'll often hit a paywall and have to head to a subscription database available through the library to read the full article any way.

All our databases are listed on the A-Z Databases List , these are a few, often recommended, examples:

Medicine, life sciences, behavioral health,  nursing, dentistry, allied health, public health, health policy development, pre-clinical health, biomedicine, and related education.  Abstracts (with links to full-text articles) and embedded full-text articles.
Chemical science, biological sciences, medical & health sciences, psychology, law, economics, society, education, management, and policy. Abstracts (with links to full-text articles).

Nursing, allied health, consumer health, health science librarianship, alternative/complementary medicine and nutrition.

Abstracts (with links to full-text articles) and embedded full-text articles.
Psychology, social, behavioral and health sciences, psychiatry, management, education, and social work. Abstracts (with links to full-text articles) and embedded full-text articles.
Education Full-text articles and links to them.
Drug and biomedical, some coverage of dentistry, nursing and psychology. Abstracts with links to full-text articles.
Evidence-based health and medical topics.  Full-text articles, reviews, protocols, trials and links to them. 
A bit of everything. But they don’t disclose what’s included/excluded. Connect to get full-text articles through us (free to you).
  • << Previous: Types of Literature Reviews
  • Next: Search Strategies >>
  • Last Updated: Aug 28, 2024 12:14 PM
  • URL: https://mcphs.libguides.com/litreviews
  • En español – ExME
  • Em português – EME

Literature searches: what databases are available?

Posted on 6th April 2021 by Izabel de Oliveira

""

Many types of research require a search of the medical literature as part of the process of understanding the current evidence or knowledge base. This can be done using one or more biomedical bibliographic databases. [1]

Bibliographic databases make the information contained in the papers more visible to the scientific community and facilitate locating the desired literature.

This blog describes some of the main bibliographic databases which index medical journals.

PubMed was launched in 1996 and, since June 1997, provides free and unlimited access for all users through the internet. PubMed database contains more than 30 million references of biomedical literature from approximately 7,000 journals. The largest percentage of records in PubMed comes from MEDLINE (95%), which contains 25 million records from over 5,600 journals. Other records derive from other sources such as In-process citations, ‘Ahead of Print’ citations, NCBI Bookshelf, etc.

The second largest component of PubMed is PubMed Central (PMC) . Launched in 2000, PMC is a permanent collection of full-text life sciences and biomedical journal articles. PMC also includes articles deposited by journal publishers and author manuscripts, published articles that are submitted in compliance with the public access policies of the National Institutes of Health (NIH) and other research funding agencies. PMC contains approximately 4.5 million articles.

Some National Library of Medicine (NLM) resources associated with PubMed are the NLM Catalog and MedlinePlus. The NLM Catalog contains bibliographic records for over 1.4 million journals, books, audiovisuals, electronic resources, and other materials. It also includes detailed indexing information for journals in PubMed and other NCBI databases, although not all materials in the NLM Catalog are part of NLM’s collection. MedlinePlus is a consumer health website providing information on various health topics, drugs, dietary supplements, and health tools.

MeSH (Medical Subject Headings) is the NLM controlled vocabulary used for indexing articles in PubMed. It is used by indexers who analyze and maintain the PubMed database to reflect the subject content of journal articles as they are published. Indexers typically select 10–12 MeSH terms to describe every paper.

Embase is considered the second most popular database after MEDLINE. More than 32 million records from over 8,200 journals from more than 95 countries, and ‘grey literature’ from over 2.4 million conference abstracts, are estimated to be in the Embase content.

Embase contains subtopics in health care such as complementary and alternative medicine, prognostic studies, telemedicine, psychiatry, and health technology. Besides that, it is also widely used for research on drug-related topics as it offers better coverage than MEDLINE on pharmaceutics-related literature.

In 2010, Embase began to include all MEDLINE citations. MEDLINE records are delivered to Elsevier daily and are incorporated into Embase after de-duplication with records already indexed by Elsevier to produce ‘MEDLINE-unique’ records. These MEDLINE-unique records are not re-indexed by Elsevier. However, their indexing is mapped to Emtree terms used in Embase to ensure that Emtree terminology can be used to search all Embase records, including those originally derived from MEDLINE.

Since this coverage expansion—at least in theory and without taking into consideration the different indexing practices of the two databases—a search in Embase alone should cover every record in both Embase and MEDLINE, making Embase a possible “one-stop” search engine for medical research [1].

Emtree is a hierarchically structured, controlled vocabulary for biomedicine and the related life sciences. It includes a whole range of terms for drugs, diseases, medical devices, and essential life science concepts. Emtree is used to index all of the Embase content. This process includes full-text indexing of journal articles, which is done by experts.

The most important index of the technical-scientific literature in Latin America and the Caribbean, LILACS , was created in 1985 to record scientific and technical production in health. It has been maintained and updated by a network of more than 600 institutions of education, government, and health research and coordinated by Latin America and Caribbean Center on Health Sciences Information (BIREME), Pan American Health Organization (PAHO), and World Health Organization (WHO).

LILACS contains scientific and technical literature from over 908 journals from 26 countries in Latin America and the Caribbean, with free access. About 900,000 records from articles with peer review, theses and dissertations, government documents, conference proceedings, and books; more than 480,000 of them are available with the full-text link in open access.

The LILACS Methodology is a set of standards, manuals, guides, and applications in continuous development, intended for the collection, selection, description, indexing of documents, and generation of databases. This centralised methodology enables the cooperation between Latin American and Caribbean countries to create local and national databases, all feeding into the LILACS database.  Currently, the databases LILACS, BBO, BDENF, MEDCARIB, and national databases of the countries of Latin America are part of the LILACS System.

Health Sciences Descriptors (DeCS) is the multilingual and structured vocabulary created by BIREME to serve as a unique language in indexing articles from scientific journals, books, congress proceedings, technical reports, and other types of materials, and also for searching and retrieving subjects from scientific literature from information sources available on the Virtual Health Library (VHL) such as LILACS, MEDLINE, and others. It was developed from the MeSH with the purpose of permitting the use of common terminology for searching in multiple languages, and providing a consistent and unique environment for the retrieval of information. DeCS vocabulary is dynamic and totals 34,118 descriptors and qualifiers, of which 29,716 come from MeSH, and 4,402 are exclusive.

Cochrane CENTRAL

The Cochrane Central Register of Controlled Trials (CENTRAL) is a database of reports of randomized and quasi-randomized controlled trials. Most records are obtained from the bibliographic databases PubMed and Embase, with additional records from the published and unpublished sources of CINAHL, ClinicalTrials.gov, and the WHO’s International Clinical Trials Registry Platform.

Although CENTRAL first began publication in 1996, records are included irrespective of the date of publication, and the language of publication is also not a restriction to being included in the database.  You won’t find the full text to the article on CENTRAL but there is often a summary of the article, in addition to the standard details of author, source, and year.

Within CENTRAL, there are ‘Specialized Registers’ which are collected and maintained by Cochrane Review Groups (plus a few Cochrane Fields), which include reports of controlled trials relevant to their area of interest. Some Cochrane Centres search the general healthcare literature of their countries or regions in order to contribute records to CENTRAL.

ScienceDirect

ScienceDirect i s Elsevier’s most important peer-reviewed academic literature platform. It was launched in 1997 and contains 16 million records from over 2,500 journals, including over 250 Open Access publications, such as Cell Reports and The Lancet Global Health, as well as 39,000 eBooks.

ScienceDirect topics include:

  • health sciences;
  • life sciences;
  • physical sciences;
  • engineering;
  • social sciences; and
  • humanities.

Web of Science

Web of Science (previously Web of Knowledge) is an online scientific citation indexing service created in 1997 by the Institute for Scientific Information (ISI), and currently maintained by Clarivate Analytics.

Web of Science covers several fields of the sciences, social sciences, and arts and humanities. Its main resource is the Web of Science Core Collection which includes over 1 billion cited references dating back to 1900, indexed from 21,100 peer-reviewed journals, including Open Access journals, books and proceedings.

Web of Science also offers regional databases which cover:

  • Latin America (SciELO Citation Index);
  • China (Chinese Science Citation Database);
  • Korea (Korea Citation Index);
  • Russia (Russian Science Citation Index).

Boolean operators

To make the search more precise, we can use boolean operators in databases between our keywords.

We use boolean operators to focus on a topic, particularly when this topic contains multiple search terms, and to connect various pieces of information in order to find exactly what we are looking for.

Boolean operators connect the search words to either narrow or broaden the set of results. The three basic boolean operators are: AND, OR, and NOT.

  • AND narrows a search by telling the database that all keywords used must be found in the article in order for it to appear in our results.
  • OR broadens a search by telling the database that any of the words it connects are acceptable (this is useful when we are searching for synonymous words).
  • NOT narrows the search by telling the database to eliminate all terms that follow it from our search results (this is helpful when we are interested in a specific aspect of a topic or when we want to exclude a type of article.

References (pdf)

You may also be interested in the following blogs for further reading:

Conducting a systematic literature search

Reviewing the evidence: what method should I use?

Cochrane Crowd for students: what’s in it for you?

' src=

Izabel de Oliveira

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

Subscribe to our newsletter

You will receive our monthly newsletter and free access to Trip Premium.

Related Articles

""

Epistemonikos: the world’s largest repository of healthcare systematic reviews

Learn more about the Epistemonikos Foundation and its repository of healthcare systematic reviews. The first in a series of three blogs.

""

How do you use the Epistemonikos database?

Learn how to use the Epistemonikos database, the world’s largest multilingual repository of healthcare evidence. The second in a series of three blogs.

""

Epistemonikos: All you need is L·OVE

Discover more about the ‘Living OVerview of Evidence’ platform from Epistemonikos, which maps the best evidence relevant for making health decisions. The final blog in a series of three focusing on the Epistemonikos Foundation.

Northeastern University Library

  • Northeastern University Library
  • Research Subject Guides
  • Guides for Library Services
  • Systematic Reviews and Evidence Syntheses
  • Evidence Synthesis Service
  • Types of Systematic Reviews in the Health Sciences
  • Beginning Your Project
  • Standards & Guidance
  • Critical Appraisal
  • Evidence-Based Assignments
  • Tips for a Successful Review Team
  • Training and Tutorials

Systematic Reviews and Evidence Syntheses : Databases

You will want to search at least three databases for your systematic review. Three databases alone does not complete the search standards for systematic review requirements. You will also have to complete a search of the grey literature and complete additional hand searches. Which databases you should search is highly dependent on your systematic review topic, so it is recommended you  meet with a librarian . 

Commonly Used Health Sciences Databases

Commonly used social sciences databases, commonly used education databases.

  • Resources for Finding Systematic Reviews

You will want to search at least three databases for your systematic review. Three databases alone does not complete the search standards for systematic review requirements as you will also have additional searches of the grey literature and hand searches to complete.  Which databases you search is highly dependent on your systematic review topic, so it is recommended you  meet with a librarian . 

Cochrane, which is considered the gold standard for clinical systematic reviews, recommends searching the following three databases, at a minimum: PubMed, Embase, and Cochrane Central Register of Controlled Trials (CENTRAL).

Northeastern login or email required

  • ERIC (Education Resources Institute) This link opens in a new window Citations to education information, including scholarly articles, professional literature, education dissertations, and books, plus grey literature such as curriculum guides, conference proceedings, government publications, and white papers. Covers 1966 to the present. more... less... Sponsored by the U.S. Department of Education.

Looking to Find Systematic Reviews?

There are a number of places to look for systematic reviews, including within the commonly used databases listed on this page. Some other resources to consider are:

  • Systematic Review Repository - International Initiative for Impact Evaluation The systematic review repository from International Initiative for Impact Evaluation is an essential resource for policymakers and researchers who are looking for synthesized evidence on the effects of social and economic interventions in low- and middle- income countries.
  • Epistemonikos Epistemonikos is a collaborative, multilingual database of health evidence. It is the largest source of systematic reviews relevant for health-decision making, and a large source of other types of scientific evidence. Please note: Epistemonikos is a systematic reviews focused database. It pulls in systematic reviews from a number of different international sources and pulls in the studies those reviews. While you will find randomized controlled trials and other primary studies in this database, they are only added in because of their association with a systematic review. Therefore, searching here for randomized controlled trials or other primary studies would NOT be considered a comprehensive search.
  • << Previous: Types of Systematic Reviews in the Health Sciences
  • Next: Resources for Completing Evidence Syntheses >>
  • Ask a Librarian
  • Last Updated: Aug 23, 2024 2:00 PM
  • URL: https://subjectguides.lib.neu.edu/systematicreview

Written content on a narrow subject and published in a periodical or website. In some contexts, academics may use article as a shortened form of journal article.

  • Green Paper
  • Grey Literature

Bibliography

A detailed list of resources cited in an article, book, or other publication. Also called a List of References.

Call Number

A label of letters and/or numbers that tell you where the resource can be found in the library. Call numbers are displayed on print books and physical resources and correspond with a topic or subject area.

Peer Review

Well-regarded review process used by some academic journals. Relevant experts review articles for quality and originality before publication. Articles reviewed using this process are called peer reviewed articles. Less often, these articles are called refereed articles.

A search setting that removes search results based on source attributes. Limiters vary by database but often include publication date, material type, and language. Also called: filter or facet.

Dissertation

A paper written to fulfill requirements for a degree containing original research on a narrow topic. Also called a thesis.

A searchable collection of similar items. Library databases include resources for research. Examples include: a newspaper database, such as Access World News, or a humanities scholarly journal database, such as JSTOR.

Scholarly Source

A book or article written by academic researchers and published by an academic press or journal. Scholarly sources contain original research and commentary.

  • Scholarly articles are published in journals focused on a field of study. also called academic articles.
  • Scholarly books are in-depth investigations of a topic. They are often written by a single author or group. Alternatively in anthologies, chapters are contributed by different authors.

database in literature review

Help us improve our Library guides with this 5 minute survey . We appreciate your feedback!

  • UOW Library
  • Key guides for students

Literature Review

Where to search when doing a literature review.

  • Find examples of literature reviews
  • How to write a literature review
  • How to search effectively
  • Grey literature

Aim to be as comprehensive as possible when conducting a literature review. Knowing exactly where to search for information is important.

Work through the steps to find out the best databases to search for information on your research topic.

1. Start with research databases

Scopus and Web of Science are good databases to start with for any research topic and literature review.

  • Scopus Scopus is a large multidisciplinary database covering published material in the humanities and sciences. It also provides citation analysis of authors and subject areas. Searching Scopus tutorial - Includes access to Scival via expanded top menu (Elsevier personal registration required).
  • Web of Science - Core Collection The leading citation index' of scholarly literature, chemical reactions and author information. Includes citation databases: Sciences Expanded (1965+), Social Sciences (1965+), Arts & Humanities (1975+). Conference Proceedings (1990+), Emerging Sources Citation (2005+) , Current Chemical Reactions (1985+) and Index Chemicus (1993+) Access InCites benchmarking & analytics tools via the menu bar at the top of the screen.

2. Focus your search with specific databases

Select two or three discipline/specialist databases to conduct your search for comprehensive results.

Our subject guides will help you find databases relevant to major subject areas in each discipline and specific materials relevant to your research.

  • Discipline subject guides
  • News sources

3. Find books, theses and more

If you're looking for a specific medium (book, thesis, journal, etc.) for your research, try the following:

Australian content

  • Finding Theses Help finding theses at UOW, Australia and around the world and how to access them
  • Previous: How to search effectively
  • Next: Grey literature
  • Last Updated: May 28, 2024 9:42 AM
  • URL: https://uow.libguides.com/literaturereview

Insert research help text here

LIBRARY RESOURCES

Library homepage

Library SEARCH

A-Z Databases

STUDY SUPPORT

Academic Skills Centre

Referencing and citing

Digital Skills Hub

MORE UOW SERVICES

UOW homepage

Student support and wellbeing

IT Services

database in literature review

On the lands that we study, we walk, and we live, we acknowledge and respect the traditional custodians and cultural knowledge holders of these lands.

database in literature review

Copyright & disclaimer | Privacy & cookie usage

Systematic Reviews and Meta Analysis

  • Getting Started
  • Guides and Standards
  • Review Protocols
  • Databases and Sources
  • Randomized Controlled Trials
  • Controlled Clinical Trials
  • Observational Designs
  • Tests of Diagnostic Accuracy
  • Software and Tools
  • Where do I get all those articles?
  • Collaborations
  • EPI 233/528
  • Countway Mediated Search
  • Risk of Bias (RoB)

Data Sources

Databases you will probably search.

No one database can cover the literature for any topic. For medical topics, a combination of PubMed (or other search of PubMed data) plus Embase, Web of Science, and Google Scholar has been shown to provide adequate recall ( Syst Rev. 2017;6(1):245 ). For topics that reach beyond the biomedicine, other databases need to be considered.

  • PubMed PubMed is both the search platform provided by the National Center for Biotechnology information and the database. PubMed includes MEDLINE (records indexed with MeSH terms) but also material in process, older records from before the inception of MEDLINE, and material from journals not included in MEDLINE. The PubMed database is available on independent platforms including Ovid SP, Web of Science, and several others.
  • Embase Note: Embase requires users to either create an individual account (free) or log in with an institutional email address to enable the export of records. Before you start a session, 'log in" at the upper right. You can either create an account or use your Harvard email (recommended).   Embase includes materials second tier European and Asian journals not included in MEDLINE as well as conference abstracts. The Emtree controlled vocabulary is well developed. Embase records include more Emtree terms than MEDLINE records do MeSH term. Hence, results sets can often be significantly large in Embase, especially for drug-related searches.
  • Cochrane Central Register of Controlled Trials Cochrane Central contains trials from both MEDLINE and Embase plus many trials from other, non-indexed sources; limited to randomized and non-randomized controlled trials.  MeSH for MEDLINE records, but no other controlled vocabulary. To limit to results in Central, click the "Trials" limit to the left of your results.
  • Web of Science Core Collection (includes the Science Citation Index) Broad coverage of all sciences.  Will cover some journals at the edge of the biomedical sciences missed by PubMed and Embase. Some meeting information. No controlled vocabulary. Alternatively, the Elsevier database Scopus can be used. Harvard does not license access to Scopus.
  • GoogleScholar Consider as a supplement to the literature databases. It can improve sensitivity because it searches the full-text of articles. Screening the first 200-400 records in a search is recommended.
  • ClinicalTrials.gov Registers trials that are recruiting, completed, or terminated. Some records includes results.  Searching here helps identify unpublished trials. See below for other registries.

These database can be an effective complement to your search.  They can be essential in their specialized topic areas.

  • BIOSIS Previews Although it is primarly useful for biologists, it contains a lot of meetings and some medical journals.  Controlled vocabulary is not suitable for medical searching.
  • CINAHL Nursing and other health related information; excellent source for issues in patient care.  Well developed controlled vocabulary.
  • PsycINFO Cognitive and behavioral therapies are well covered.  Controlled vocabulary.
  • Google Scholar Add as an additional source. Here are some search tips.
  • WHO Global Index Medicus Search all WHO regional indexes, including the South-East Asia and Western Pacific Pacific regional databases.
  • Sociological Abstracts The primary index for sociological literature.  May be useful for community-related studies or interpersonal issues. Controlled vocabulary.
  • 3ie Impact Evaluation Repository Investigating an ecomomic or social intervention? The 3ie Impact Evaluation Repository is a currated database for evidence of what works in international development in low- and middle-income countries.
  • EconLit Economics. Almost any social intervetion and many medical ones get studied by economists.
  • RePEc IDEAS A repository of economics literature. It includes bibliographic metadata from many archives.

Resources for Meetings and Other Grey Literature

Truely unbiased searches look for unpublished literature in a number of places, included meeting abstracts, white papers, clinical trial registries, and searching by hand.

  • GreyNet GreyNet is an organization dedicated to promoting and facilitating the use of grey literature. Includes of listing of grey literature resources, GreySource .  OpenGrey, a former multidisciplinary database of technical reports, meetings, dissertations, and official publications is now archived in GreyNet. 
  • Grey Literature Report A bi-monthly publication of the New York Academy of Medicine, the GLR includes listings of recently published reports in health science and public health. The archives are tagged with MeSH terms and are searchable.
  • BIOSIS Previews Meetings! BIOSIS Previews includes proceedings of many meetings that may not be electronically available elsewhere.
  • ProQuest Dissertations & Theses Global A central authoritative source for locating doctoral dissertations and master's theses. Provides full text for most indexed dissertations from 1990-present. Includes theses and dissertations from the Harvard T.H. Chan School of Public Health, Harvard Medical School, and Harvard School of Dental Medicine.
  • greylitsearcher A web-based tool for performing systematic and transparent searches of organizational websites

Identifying sources for grey literature and being sure you've done enough is a challenge. The Canadian Agency for Drugs and Technologies in Health (CADTH) feels your pain and has produced a checklist that might help guide your grey research. The Grey Matters checklist provides an organized source of health technology assessment sites, regulatory agencies, trial registries, and other databases in a form that can help ensure the completeness of you search.

Clinical Trial Registries

  • ClinicalTrials.gov
  • European Union Clinical Trials Registry
  • ISRCTN registry
  • International Clinical Trials Registry Platform  (ICTRP)

When you search Cochrane Library/Trials , you will see results from both ClinicalTrials.gov and ICTRP. 

More information about trial registries and solving the problems associated with searching them is available through this site: Medical and health-related trials registers and research registers which is maintained by Julie Glanville and Carol Lefebvre and hosted by the York Health Economics Health Consortium.

  • << Previous: Review Protocols
  • Next: Methodology Filters >>
  • Last Updated: Sep 4, 2024 4:04 PM
  • URL: https://guides.library.harvard.edu/meta-analysis
         


10 Shattuck St, Boston MA 02115 | (617) 432-2136

| |
Copyright © 2020 President and Fellows of Harvard College. All rights reserved.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Lippincott Open Access

Logo of lwwopen

Integrity of Databases for Literature Searches in Nursing

The quality of literature used as the foundation to any research or scholarly project is critical. The purpose of this study was to analyze the extent to which predatory nursing journals were included in credible databases, MEDLINE, Cumulative Index to Nursing and Allied Health Literature (CINAHL), and Scopus, commonly used by nurse scholars when searching for information. Findings indicated that no predatory nursing journals were currently indexed in MEDLINE or CINAHL, and only one journal was in Scopus. Citations to articles published in predatory nursing journals are not likely found in a search using these curated databases but rather through Google or Google Scholar search engines.

Research, evidence-based practice, quality improvement studies, and other scholarly projects typically begin with a literature review. In research, the review of the literature describes existing knowledge about the topic, reveals gaps and further research questions to be answered, and provides a rationale for engaging in a new study. In evidence-based practice, the literature review provides evidence to answer clinical questions and make informed decisions. Quality improvement studies also begin with a search of the literature to gather available knowledge about a problem and explore interventions used in other settings. The appearance of journals that are published by predatory publishers has introduced the danger that reviews of the literature include inadequate, poorly designed, and low-quality information being used as “evidence”—raising the possibility of risky and harmful practice. Researchers and authors should be confident in the literature they cite; readers should have assurance that the literature review is based on sound, authoritative sources. When predatory journals are cited, that trust is eroded. No matter what type of study or project is being done, the quality of literature is critical for the development of nursing knowledge and for providing up-to-date information, concepts, theories, and approaches to care. 1

An effective literature review requires searching various reliable and credible databases such as MEDLINE (through PubMed or Ovid) and the Cumulative Index to Nursing and Allied Health Literature (CINAHL), among others that are relevant to the topic. The ease of searching using a web browser (now commonly referred to as “googling”) has increased the risk of finding sources published in predatory and low-quality journals that have not met the standards of research and scholarship that can be trusted as credible and reliable evidence.

The purpose of this article is to present an analysis of the extent to which predatory nursing journals are included in MEDLINE, CINAHL, and Scopus databases, used by nurse researchers and other nurses when searching for information, and in the Directory of Open Access Journals. This directory indexes “high-quality, open access, peer-reviewed journals” and should not include any predatory journals. 2

Statement of Significance

What is known or assumed to be true about this topic?

The quality of nursing literature used is vital for the development of research studies, application of evidence in clinical settings, and other scholarly projects. Nurse scholars need to be confident as they search the literature that they are accessing sound information sources and not articles from predatory nursing journals, which do not adhere to quality and ethical publishing standards. Citations of articles in predatory nursing journals may be found when searching Google and Google Scholar, making these citations easy to access but potentially resulting in the integration of poor quality research into the nursing literature. On the other hand, searches through credible databases—MEDLINE, CINAHL, and Scopus—are less likely to yield citations from predatory publications.

What this article adds:

This study helps validate the trustworthiness of these databases for conducting searches in nursing.

PREDATORY JOURNALS

Many studies have documented the problem of predatory journals. These journals do not adhere to quality and ethical publishing standards, often use deceptive language in emails to encourage authors to submit their manuscripts to them, are open access but may not be transparent with the article processing charge, may have quick but questionable peer review, and may publish inaccurate information on their Web sites such as impact factor and indexing. 3 – 6 Predatory publishing is an issue in many fields including nursing. In a recent study, 127 predatory journals were identified in nursing. 7

Citations acknowledge the ideas of others and give credit to the authors of the original work. When articles are cited in a subsequent publication, those citations disseminate the information beyond the original source, and the article in which it is cited might in turn be referenced again, transferring knowledge from one source to yet another. When articles in predatory journals are cited, the same process occurs. Those citations transfer knowledge from the predatory publication beyond that source. Studies have found that authors are citing articles published in predatory journals in nursing as well as other fields. 7 – 10 Nurse scholars need to be confident as they search the literature that they are accessing sound information sources and not articles from predatory journals.

NATIONAL LIBRARY OF MEDICINE INFORMATION RESOURCES

The National Library of Medicine (NLM) supports researchers and clinicians through its multiple health information resources including PubMed, MEDLINE, and PubMed Central (PMC). PubMed serves as the search engine to access the MEDLINE database, PMC, and books, chapters, and other documents that are indexed by the NLM. PubMed is free and publicly available: by using PubMed, researchers can search more than 30 million citations to the biomedical literature. 11 The majority of records in PubMed are from MEDLINE, which has citations from more than 5200 scholarly journals. For inclusion in MEDLINE, journals are assessed for their quality by the Literature Selection Technical Review Committee. 12 Five areas are included in this assessment: scope of the journal (ie, in a biomedical subject); quality of the content (validity, importance of the content, originality, and contribution of the journal to the coverage of the field); editorial standards and practices; production quality (eg, layout and graphics); and audience (content addresses health care professionals).

PMC includes journal citations and full-text articles that are selected by the NLM for digital archiving. To be included in PMC, journals are evaluated for their scope and scientific, editorial, and technical quality. 13 Journals considered for inclusion are evaluated by independent individuals both inside and outside PMC. 14 PMC serves as the repository for articles to meet the compliance requirements of the National Institutes of Health (NIH) and other funding agencies for public access to funded research. About 12% of the articles in PMC are deposited by individual authors to be in compliance with funders and 64% by publishers, scholarly societies, and other groups. 15 Beginning in June 2020, as a pilot program, preprints reporting research funded by the NIH also can be deposited in PMC. 16

CINAHL AND SCOPUS

The journal assessment and indexing processes for CINAHL and Scopus are similar to those used by the NLM. However, as private corporations, EBSCO (CINAHL) and Elsevier (Scopus) are not required to make journal selection processes publicly available or explicit. CINAHL has an advisory board for journal selection. A CINAHL representative provided the following criteria for indexing of journals in CINAHL: high impact factor; usage in reputable subject indexes (eg, the NLM catalog); peer-reviewed journals covered by other databases (eg, Web of Science and Scopus); top-ranked journals by industry studies; and article quality (avoiding low-quality journals) (personal communication, October 19, 2020).

Elsevier's Scopus provides a webpage referring to the journal selection and assessment processes. Journals being considered for indexing in Scopus are evaluated by the Content Selection and Advisory Board and must meet the following criteria: peer-reviewed with a publicly available description of the peer review process; published on a regular basis; has a registered International Standard Serial Number (ISSN); includes references in Roman (Latin) script; has English language titles and abstracts; and has publicly available publication ethics and publication malpractice statements. 17

LITERATURE REVIEW

Studies have shown that in health care fields, researchers, clinicians, faculty, and students regularly search MEDLINE for their research and other scholarly and clinical information. 18 – 21 De Groote et al 18 found that 81% of health science faculty used MEDLINE to locate articles for their research. MEDLINE was used by the majority of faculty in each individual health care field including nursing (75%) and medicine (87.5%) for searching the literature and finding articles. In another study of 15 different resources, medical faculty and residents reported that PubMed was used most frequently for searching the databases of the NLM, primarily MEDLINE. 20 Few studies have focused on the search practices of nurses. In a review of the literature, Alving et al 22 found that hospital nurses primarily searched Google for information on evidence-based nursing. They used Google more than bibliographic databases.

The quality of content that is retrieved when using PubMed as a search engine is important considering its widespread use for accessing scholarly and clinical information in nursing and other fields. Manca et al 23 reported that articles published in predatory journals were being retrieved when conducting searches using PubMed and were a concern for researchers. Based on their studies of predatory journals in neurology 24 and rehabilitation, 25 they concluded that predatory journals “leaked into PubMed” through PMC because of less stringent criteria for inclusion of journals. 23 Citations to articles from predatory journals then could be found using the PubMed search engine. However, in a letter to the editor, Topper et al 26 from the NLM clarified that individual articles published in predatory journals might be deposited in PMC to meet the requirements of research funding and be searchable in PubMed. Topper and colleagues make a clear distinction between journals indexed in MEDLINE or PMC and citations of individual articles that were deposited in PMC to meet funder requirements.

The aim of this study was to determine whether predatory nursing journals were included in databases used by nurse researchers and other nurses when searching for information. These databases included MEDLINE (searched via PubMed), CINAHL (EBSCO), and Scopus (Elsevier) and in the Directory of Open Access Journals.

In an earlier study, 127 predatory nursing journals were identified and assessed for characteristics of predatory publications. That dataset was used for the current study. For each predatory nursing journal, information was retrieved from the NLM Catalog, Ulrichsweb, and journal and publisher Web sites. Ulrichsweb 27 provides bibliographic and publisher information on academic and scholarly journals, open access journals, peer-reviewed titles, magazines, newspapers, and other publications. Journal titles of the predatory journals were often similar to nonpredatory journals and could be easily mistaken. To ensure accuracy, the information for each journal was checked for consistency between these sources using the ISSN, exact journal title, and publisher name. The purpose of an ISSN is to identify a publication and distinguish it from other publications with similar names. An ISSN is mandatory for all publications in many countries and having one assigned is considered a journal best practice. 28 For each predatory journal, the following data were collected if available: complete journal title; abbreviated journal title; acronym; ISSN (electronic and/or print); DOI prefix; publisher name and Web site URL; NLM index status; number of predatory journal articles cited in MEDLINE and PMC (when searching using PubMed), in CINAHL, and in Scopus; if the journal was indexed in the Directory of Open Access Journals; status in Ulrichsweb; and Google Scholar profile URL.

Counts of articles cited were checked individually by journal title, publisher, and/or ISSN. Once ISSNs (both electronic and print where available) were assembled, a search algorithm was created, which included all retrieved journal ISSNs. MEDLINE was searched via PubMed using a combination of NLM journal title abbreviations and ISSNs. CINAHL, Scopus, and the Directory of Open Access Journals were searched using a combination of ISSN, journal title abbreviation, full title, and publisher. Results were visually inspected for accuracy and alignment with dataset fields.

Data analysis

Data were collected between January and April 2020. Data were entered into an Excel spreadsheet and organized by predatory journal name; abbreviated journal title; acronym; ISSN (electronic, print); DOI prefix; Web site URL; entry in NLM Catalog (yes/no); index status; number of articles cited in PubMed, CINAHL, and Scopus; Directory of Open Access Journals (included/not included); Ulrichsweb status (active/ceased); publisher; and Google Scholar profile URL. Frequencies and medians are reported.

Of the 127 predatory nursing journals in the dataset, only 102 had ISSNs to use for the search. Eighteen of the journals had records in the NLM Catalog, but only 2 of those had ever been indexed in MEDLINE, and neither are currently indexed. These 2 journals had been published earlier by a reputable publisher but then were sold to one of the large predatory publishers. The NLM Catalog record for these journals indicates that citations of articles from them appeared in MEDLINE through 2014 for one of the journals and 2018 for other, but following their transition to the new publisher are no longer included. Consistent with the MEDLINE results, these same 2 journals had been indexed in Scopus as well. Citations of articles from one of these journals were added to Scopus up to 2014, with no articles cited thereafter. Articles from the second journal continue to be added through 2020. One additional journal from the predatory journal dataset is currently in Scopus, however, only through 2014. None of the predatory nursing journals were indexed in CINAHL based on full journal title, title abbreviation, ISSN, or publisher. Two journals in the dataset were found in the Directory of Open Access Journals.

When searching PubMed, we found citations of articles from 16 predatory nursing journals. The number of citations ranged from 1 to 372 citations (from one of the journals indexed earlier in MEDLINE but sold to a predatory publisher). The second highest number of citations (n = 168) was of articles from a predatory nursing journal that had been depositing articles in PMC (and thus were retrievable when searching PubMed) but is no longer adding new material to PMC. The other citations were of articles deposited in PMC to meet requirements of NIH and other research funding. The predatory journals in which these articles were published, however, are not indexed in MEDLINE or PMC.

There were no articles from predatory nursing journals cited in CINAHL. Scopus has citations from the 2 predatory nursing journals that are no longer indexed there: 616 that were published in one of the journals and 120 from the other. Articles from a third predatory nursing journal in the study dataset, which is currently indexed in Scopus, totaled 173 (see Table).

Predatory Nursing Journals Number of Citations
PubMed ScopusCINAHL
A3726160
B1681730
C1200
D700
E51200
F300
G300

Abbreviation: CINAHL, Cumulative Index to Nursing and Allied Health Literature.

a Predatory nursing journals with 3 or more citations to articles.

b Search using PubMed.

This analysis documented that none of the predatory nursing journals in the study dataset were currently indexed in MEDLINE or CINAHL, and only one journal is still in Scopus. Most of the citations of articles from predatory journals found in a search of these databases are from earlier years before the journals were sold to one of the large predatory publishers. Other citations are to articles deposited in PMC in compliance with research funder requirements.

By using PubMed as a search engine and entry point to the databases of the NLM, researchers can search millions of records included in MEDLINE, or in process for inclusion, and articles from PMC deposited by publishers or authors for compliance with funders. Six million records, and about 5500 journals, can be searched in CINAHL Complete, 29 and Scopus, the largest of the proprietary databases, provides access to 24000 journals and 60 million records. 30 Results from this study show that very few articles published in predatory nursing journals find their way into a search done using PubMed and Scopus and none into CINAHL.

In a prior study, 814 citations of articles in predatory nursing journals were found in articles published in nonpredatory nursing journals. 7 Based on this current study, the conclusion can be made that these citations are not coming from searches in MEDLINE/PubMed, CINAHL, or Scopus and are likely from searches done using Google or Google Scholar as the search engine. The databases examined in this study are curated by organizations with a vested interest in maintaining and improving the quality of the research literature in those databases.

Searching multiple databases using different search engines can be frustrating and time consuming. There is overlap among MEDLINE, CINAHL, and Scopus. However, these are curated databases and, as this study found, are unlikely to return many, if any, predatory citations as part of the search results. Still, it falls on the searcher to eliminate duplicates and redundant citations. Further, certain types of literature, such as theses, dissertations, and fugitive (or “gray” literature), 31 are unlikely to be found in any of these databases, even though those citations may be important or relevant sources. Given this, it is easy to understand the intuitive appeal of Google Scholar, which provides “one stop shopping”: “From one place, you can search across many disciplines and sources: articles, theses, books, abstracts and court opinions, from academic publishers, professional societies, online repositories, universities and other web sites. Google Scholar helps you find relevant work across the world of scholarly research.” 32 Google and Google Scholar were founded with a mission to become the most comprehensive search engines in the world. While this allows someone to scour the World Wide Web and Internet for some of the most obscure facts available, at the same time, little is done to verify or validate the results that are returned. Thus, it falls on the searcher to be diligent and evaluate the results of a Google or Google Scholar search, which will include citations of articles in predatory journals. This is easily confirmed by the fact that many predatory journal Web sites promote the Google Scholar logo as a sign of indexing or a badge of legitimacy.

Another vexing issue that was revealed in this study is that of reputable journals that have been bought by predatory publishers. This study found 2 journals in this category. Brown 33 reported on 16 medical specialty journals that were purchased from 2 Canadian commercial publishers by a predatory publisher. In all these cases, it is the same predatory publisher, although some of the purchases were made under a different business imprint, adding further confusion to an already muddied situation. Jeffrey Beall, who coined the term “predatory publisher” and maintained the blog “Scholarly Open Access” for almost a decade, was quoted by Brown 33 : “[The company] is not only buying journals, it is buying metrics and indexing, such as the journals' impact factors and listing in Scopus and PubMed, in order to look legitimate.” One positive finding from this study was that the 2 purchased journals that were identified were quickly de-accessioned by the NLM and are no longer indexed in MEDLINE, although citations from their pre-predatory era remain intact.

Recommendations

All of this presents a confusing picture, but it is possible to make some specific recommendations to aid researchers, clinicians, faculty, and students in their literature searches. First, become familiar with the journals and publications in your field. This is a basic foundation of scholarship. As you read articles, remember where they were published, learn journal titles, and focus on sources as well as the content. As you come across predatory journals in nursing and health care, make note of them and learn their titles too. Remember that many predatory journals adopt names that are intended to be confusing and may differ from a legitimate journal by only one letter, such as “Africa” and “African.”

Second, consider carefully how to approach your search from the outset. If you choose to start with MEDLINE (searched via PubMed), CINAHL, or Scopus, then you can have some assurance that the results will not return citations from predatory journals—although you should still verify every citation that you receive. On the other hand, Google and Google Scholar can be a “quick and easy” way to get started but will require that you carefully review and evaluate the results. If you need to venture to other more specialized databases, such as PsycInfo or ERIC (Education Resources Information Center), it is important to carefully inspect the results that you receive. To reduce the risk of including a predatory journal article in research, nursing scholars should use reputable bibliographic databases, which have clear criteria for journal indexing, for their searches.

Third, when you come across a journal title that is not familiar, take time to research it, visit the journal Web site and evaluate the information at the Web site, and determine whether it is a credible source to include in your results. If something seems irregular, then it is worth your time to do more investigating—either on your own or by enlisting the help of a knowledgeable colleague or librarian. Journals change publishers all the time, and while most of these business transfers are benign and probably will not impact you as an end consumer of the literature, that is not always the case. Likewise, the major publishers in the world today are large, multinational conglomerates that regularly spin off or purchase other companies. While this probably will not impact you on a day-to-day basis, it is important to investigate any irregularities when conducting a search of the literature.

Last, because these issues are complex and multifaceted, it is always wise to consult with a librarian who can assist you in every step of the search process. Their knowledge and expertise in information literacy, data sources, and searching techniques can help to ensure that you find the information you need from sources that are reliable and credible.

Researchers, clinicians, faculty, and students need to be careful not to include citations from predatory sources in their literature searches and articles. Predatory journals publish low-quality studies and citing this work erodes the scholarly literature in nursing. The findings of this study offer some reassurance to those who search the professional nursing literature: if you begin a search in a database such as MEDLINE, CINAHL, or Scopus, then the results will probably not include citations to predatory publications. Google and Google Scholar searches, however, may very well include predatory citations, and in that case, it is the searcher's responsibility to carefully evaluate the output and discard findings from nonlegitimate sources. Enlisting the help of a librarian is always beneficial and highly recommended.

Peggy L. Chinn, PhD, RN, FAAN, Editor, Advances in Nursing Science , is a member of our research team and contributed to the study and preparation of the manuscript.

The authors have disclosed that they have no significant relationships with, or financial interest in, any commercial companies pertaining to this article.

library logo banner

How to do a Literature Search: Choosing a database

  • Introduction

Choosing a database

  • Choosing keywords
  • Using keywords
  • Author searching
  • Managing your search/results

Library databases

The Library subscribes to hundreds of databases which index many different journals and sometimes other types of literature such as books, conference papers and patents.  When you search a library database you are not normally searching the full text of a publication.  They mostly contain publication titles, authors, reference information and an abstract summary .  Most of this content is peer-reviewed, in other words checked by another expert in that subject area, so the information you find  should  be of high quality. 

There is a complete list of Library databases, which you can filter by subject:

open in new tab

The best database to search depends on the question you are asking.  There are three things to consider:

1.  Does the database cover my topic ?

Some databases index the literature from all subject areas, others specialise in one particular subject area.  You will normally choose a database from your own subject area but this depends on what you are working on.  For example, a chemist may want to use an education database or an education student may want to use a chemistry database if they are working on a chemistry education topic.  If you are working in a multidisciplinary area such as environmental science you may want to choose a database that covers all subjects, such as Web of Science or Scopus.

2.  What type of literature do I need to find?

  • Books are most useful for definitions and background information.
  • Journal articles  and conference papers are most useful for up to date, in depth research.
  • Statistics may be needed.
  • Depending on your topic, you could also consider searching for patents , standards , reports , theses .

Different databases cover different types of literature.

3. Which journals are covered by the database ?

Some databases provide a list of the journal titles that they index.  Most databases allow you to do a search by journal name.  It might be worth checking that the database covers your favourite journal titles.  You should be able to find out how many journal titles the database indexes: the greater the number of titles, the less likely you are to miss something useful.

Finding journal articles

database in literature review

There are different types of journal articles.  Some articles contain original data from research projects: these are referred to as  primary literature .  In a  review , the author has selected the most important primary articles and given an overview of the key developments on a topic.  These are referred to as  secondary literature . Review articles can be very useful when you are investigating a topic that is new to you. Most library databases allow you to filter for different types of journal article, including review articles.

Library databases vs Google Scholar

database in literature review

When searching for a topic, Google Scholar results may  appear  to be more relevant than those from a library database.  This is because it ranks results using an algorithm which includes the frequency with which your keywords appear.  Library databases most often rank results in order of  date , with the most recent first.

The problem with Google Scholar is that you don't know what it is indexing, so you don't know exactly what you are searching. With Library databases you can find out exactly which journals or other content they index. 

Google Scholar normally returns a large number of results and you cannot guarantee that the key papers will be at the top of the list.  Library databases give you much greater flexibility for searching within the results to identify key information.

Finding books

Although textbooks are unlikely to be of use for an in-depth research project, there are many 'research monographs' - more detailed books -  accessible in print and online through Library subscriptions.  They are an excellent source of information if you are getting started on a new project and need to find background information.  They can provide a useful overview of the research which has been carried out up until the time that the book was written.  Most Library databases do not index books, and even Google is not a very good source for this. Use the Library catalogue to search for books:

Finding statistics

 Although books and journal articles often contain statistics it is best to go directly to the source.  Who would have measured it?  Do you want UK, European or International statistics?  For UK statistics you might try the Office for National Statistics.  For International statistics you could try the OECD Library.  We have a web page listing sources of statistics:

Example Library databases for finding articles

Scopus

  • << Previous: Introduction
  • Next: Choosing keywords >>
  • Last Updated: Nov 22, 2022 9:56 AM
  • URL: https://library.bath.ac.uk/literaturesearch

Banner

Doing the literature review: Selecting databases

  • The literature review: why?
  • Types of literature review

Selecting databases

  • Scoping search
  • Using a database thesaurus
  • Advanced search in a database
  • Citation information
  • Using a reference manager
  • Reporting your search strategy
  • Writing & structuring
  • When to stop

You have to decide which databases you will use in your literature search. To limit location bias, you have to use more than one database. Make an informed choice! In this module we list some database features you can take into account.

There are different types of literature databases:

  • Discipline specific databases , such as PsycINFO, Philosopher's Index, Sociologial Abstracts and Business Source Premier, offer extra search tools, for example a thesaurus . The journals (some databases even select at the article level !) indexed in these databases are selected carefully, based on selection criteria. These databases may be available via different vendors/platforms.
  • Multidisciplinary databases , such as Scopus and Web of Science, index academic journals from all disciplines, ranging from astronomy to zoology. These will help you find relevant articles in journals outside your own discipline, but non-relevant results are hard to avoid. Both Scopus and Web of Science are citation databases, which means that they track citations: you can see if an article is cited in other papers in the database and by which authors. JSTOR also offers access to journals from different disciplines, but be aware that this database has an archive function - for most journals you can’t access or search the most recently published volumes.
  • Publishers databases , such as ScienceDirect (Elsevier), SpringerLink and Wiley Online Library, are limited to publications from a particular publisher. Therefore they are not suitable for a literature search for a review. But of course, you will use these databases to retrieve the full text of articles.

The Checklist Selecting Databases provides an overview of features you have to take into account when choosing the databases to use for your literature review.

What about Google Scholar?

Google Scholar indexes websites with scholarly articles – including websites of academic publishers, university repositories and personal websites of researchers . A major difference between Google Scholar and A&I databases is that Google Scholar doesn’t provide information about the indexed websites or journals. It’s hard to check if a particular journal is indexed cover-to-cover in Google Scholar. Google Scholar gives no definition of ‘scholarly’. Amongst the scholarly results you might get results from predatory publishers and papers written by students.

When you use Google Scholar for a search for your literature review, be aware that it can be hard to perform a structured, repeatable search:

  • the advanced search options and filter options are limited
  • you can't combine two search queries afterwards
  • you can enter up to 256 characters in the search box of Google Scholar - if you want to include synonyms of search terms you often need more characters
  • Google Scholar limits the results of any search query to 1000 papers, irrespectible of the number on top of the search results
  • the information shown in the search results of Google Scholar can be very limited
  • the metadata of articles (for example the publication year) can be incorrect, due to parsing errors
  • specific journals or publishers might not be indexed by Google Scholar, due to technical reasons.

Google Scholar is a great tool for locating articles you know the titles of. In the Scholar settings you can add a Library link to the Erasmus University Library (see the module Get the most out of Google Scholar ) then follow the FULL-TEXT @ EUR links to the published version of an article in the EUR Library collection. 

TIP : Publish or Perish (also called PoP) is software used for citation analysis, based on Google Scholar citation data. The general citation search in Publish or Perish allows you to perform an Advanced Scholar Search query and analyse its results. The advantage is the presentation of the results: you can sort by author, year, times cited, publication and publisher. The abstract is not shown. It's possible to export the output, for example to Excel.

database in literature review

  • Overview of all EUR databases
  • Guides per discipline Overview of recommended sources per discipline.
  • Electronic resources: terms and conditions of use Terms and conditions you must comply with when using any of the electronic resources offered by the Library.

database in literature review

Email the Information skills team

  • << Previous: Types of literature review
  • Next: Scoping search >>
  • Last Updated: Aug 30, 2024 6:38 PM
  • URL: https://libguides.eur.nl/informationskillslitreview

Reference management. Clean and simple.

The top list of academic research databases

best research databases

2. Web of Science

5. ieee xplore, 6. sciencedirect, 7. directory of open access journals (doaj), get the most out of your academic research database, frequently asked questions about academic research databases, related articles.

Whether you are writing a thesis , dissertation, or research paper it is a key task to survey prior literature and research findings. More likely than not, you will be looking for trusted resources, most likely peer-reviewed research articles.

Academic research databases make it easy to locate the literature you are looking for. We have compiled the top list of trusted academic resources to help you get started with your research:

Scopus is one of the two big commercial, bibliographic databases that cover scholarly literature from almost any discipline. Besides searching for research articles, Scopus also provides academic journal rankings, author profiles, and an h-index calculator .

  • Coverage: 90.6 million core records
  • References: N/A
  • Discipline: Multidisciplinary
  • Access options: Limited free preview, full access by institutional subscription only
  • Provider: Elsevier

Search interface of Scopus

Web of Science also known as Web of Knowledge is the second big bibliographic database. Usually, academic institutions provide either access to Web of Science or Scopus on their campus network for free.

  • Coverage: approx. 100 million items
  • References: 1.4 billion
  • Access options: institutional subscription only
  • Provider: Clarivate (formerly Thomson Reuters)

Web of Science landing page

PubMed is the number one resource for anyone looking for literature in medicine or biological sciences. PubMed stores abstracts and bibliographic details of more than 30 million papers and provides full text links to the publisher sites or links to the free PDF on PubMed Central (PMC) .

  • Coverage: approx. 35 million items
  • Discipline: Medicine and Biological Sciences
  • Access options: free
  • Provider: NIH

Search interface of PubMed

For education sciences, ERIC is the number one destination. ERIC stands for Education Resources Information Center, and is a database that specifically hosts education-related literature.

  • Coverage: approx. 1.6 million items
  • Discipline: Education
  • Provider: U.S. Department of Education

Search interface of ERIC academic database

IEEE Xplore is the leading academic database in the field of engineering and computer science. It's not only journal articles, but also conference papers, standards and books that can be search for.

  • Coverage: approx. 6 million items
  • Discipline: Engineering
  • Provider: IEEE (Institute of Electrical and Electronics Engineers)

Search interface of IEEE Xplore

ScienceDirect is the gateway to the millions of academic articles published by Elsevier, 1.4 million of which are open access. Journals and books can be searched via a single interface.

  • Coverage: approx. 19.5 million items

Search interface of ScienceDirect

The DOAJ is an open-access academic database that can be accessed and searched for free.

  • Coverage: over 8 million records
  • Provider: DOAJ

Search interface of DOAJ database

JSTOR is another great resource to find research papers. Any article published before 1924 in the United States is available for free and JSTOR also offers scholarships for independent researchers.

  • Coverage: more than 12 million items
  • Provider: ITHAKA

Search interface of JSTOR

Start using a reference manager like Paperpile to save, organize, and cite your references. Paperpile integrates with PubMed and many popular databases, so you can save references and PDFs directly to your library using the Paperpile buttons:

database in literature review

Scopus is one of the two big commercial, bibliographic databases that cover scholarly literature from almost any discipline. Beside searching for research articles, Scopus also provides academic journal rankings, author profiles, and an h-index calculator .

PubMed is the number one resource for anyone looking for literature in medicine or biological sciences. PubMed stores abstracts and bibliographic details of more than 30 million papers and provides full text links to the publisher sites or links to the free PDF on PubMed Central (PMC)

database in literature review

  • All Solutions

scopus-hero

Expertly curated abstract & citation database

About Scopus

Scopus is the largest abstract and citation database of peer-reviewed literature: scientific journals, books and conference proceedings. Delivering a comprehensive overview of the world's research output in the fields of science, technology, medicine, social sciences, and arts and humanities, Scopus features smart tools to track, analyse and visualise research.

As research becomes increasingly global, interdisciplinary and collaborative, you can make sure that critical research from around the world is not missed when you choose Scopus.

“Speed is very important … I can easily identify what I need to know, read it, digest it and move on to the next one.” James, Research Pathologist, Medical Device R&D, Scopus user
“Scopus is very customer-friendly… You get more information from all different fields. It saves a lot of time.” Chris, Head of R&D, Diagnostic Testing, Scopus user
“Scopus informs every phase of the editorial process. I would not want to do this job without it, and I intend to continue using it throughout my career.” William, Professor of Economics, University of Tennessee

More information (in English)

  • Why I would not want to be without Scopus; an editor’s story
  • NASI Scopus Young Scientists Awards 2019

Elsevier.com visitor survey

We are always looking for ways to improve customer experience on Elsevier.com. We would like to ask you for a moment of your time to fill in a short questionnaire, at the end of your visit . If you decide to participate, a new browser tab will open so you can complete the survey after you have completed your visit to this website. Thanks in advance for your time.

TUS Logo

Literature Review Guide: Search strategies and Databases

  • What is a Literature Review?
  • How to start?
  • Picking your research question and searching
  • Search strategies and Databases
  • How to organise the review
  • Examples of Literature Reviews
  • Library summary

Searching online best strategies, Boolean and Concept building

  • Search Concepts Synonyms
  • Research steps
  • Searching with Keywords

Finding Information

  • Advanced Searching
  • More Searching tutorials
  • Search aids; statements and citations
  • Database Information
  • Keyword and Database search tips
  • Harvard and Academic Integrity
  • Open access, Repositories and OER
  • Reference Management Software
  • Data sources
  • The Boolean Machien

AND, OR and NOT

You can use the search operators  AND, OR  and  NOT  to  combine search terms . These are the most commonly known and used boolean operators.

The operators  AND  and  NOT limit the number of results  from a search. The operator  OR  does the opposite; it  increases the number of results .

  • Endangered  AND  birds : searches for sources that have both  these two words.
  • Endangered  OR  birds : searches for sources with the word 'endangered' OR the word 'birds'. This search will produce more results. (Tip: the operator “OR” can also be used to include different spellings and translations or synonyms in the search).
  • Endangered  NOT  birds : searches for the word ‘endangered’ and excludes the any sources that also has the word ‘birds’.

To see how this works, take a look at  The Boolean machine . Move your cursor over the operators AND, OR and NOT to see how they determine your search.

You can also combine more than two search terms. Use brackets to indicate the priority. For example (Money OR inflation) AND banking.

Short videos on searching information:

Boolean search basics:

Why cant I just 'Google it', from RMIT libraries

One perfect source?

database in literature review

Database search concepts and Boolean:

Example: I want to search for an essay titled: " Discuss the effect of antioxidants on athletic performance "

I can do an 'Advanced Search' this means there will be a number of concept search boxes available with AND between them

Remember the more concepts we combine the fewer and more specific the search results

Within a concept box we can add different variants of similar words (Synonyms) with OR between them to increase the search results

database in literature review

Decide on your research question: The research question should be a topic that is searchable and not too broad or too specific. Do sample exploratory searches on library databases before finalizing your topic Decide how broad or narrow your scope will be:  Having picked your topic decide on the coverage and specificity of your research topic, will you focus on all topic areas or just one, can you narrow the topic, what years will you cover. Decide where you will search The library has a number of specialist databases, see the descriptions under each database Conduct your search:  Do look at the tabs there also on 'Advanced Searching' and on 'Keyword and database search tips'  Remember to source all the relevant key words for your topic  Remember to keep track of your references; Mendeley or Zotero are useful tools for this. Review the literature You may need to critically evaluate the sources and references you use; analyse the research methodologies used and any bias or exclusions and compare with other relevant studies 

Performing a Keyword search in library databases:

Sample research topic: The effect of supplements on athletic performance.

  • Step 1:   Identify the concepts in your research topic :  In this case 'Supplement' is one concept and 'Athletic performance' is another separate concept 
  • Step 2: I dentify Keywords for these concepts : In this case use the concepts themselves as keywords and also other synonyms (words that mean the same), so for example, Supplement OR anti-oxidant OR vitamin AND "Athletic performance" OR "Sport performance"  
  • Step 3:  Search college databases using these keywords using the connector 'AND' between them : 
  • Step 4:  Look at relevant articles and look at the keyword and subjects listed in these, to find more relevant Keywords you can use : In this case examples of further relevant keywords might be "Nutrition Support" or "Dietary supplement" or "Dietary supplement" or Ergogenic aids AND "Sports competition" or "Exercise test" or "sports ability"
  • Step 5:  Use 'Limiters' in the database,  such as 'Date' limiters or language limiter or Subject limiter to improve and refine your search
  • Step 6:  If necessary use other search aides;  such as truncation * or phrase searching (see the 'Search strategies and databases' tab - 'Search aids...' sub-tab for more information). 

Overview Note:  If you are using a single search box then each separate search concept needs to be in brackets e.g. ("Sport* performance" OR "Exercise test") AND ("nutrition support" OR supplement OR Vitamin)  Its OK to use Google scholar but you will get a lot more full text relevant articles using the Discover search tool on the  library main page  or use specific search databases on  such as SPORTDiscus from the  library database web-page .

Assignment search example : Discuss the effects of exercise on psychological stress so the main search concepts are 'exercise and stress'

Sources include Books / Academic Articles/ Statistics and websites

Books can be searched on the TUS Midlands Library OPAC catalogue : The search can be limited to  Ebooks or to physical books only using the tick boxes below the advanced search. 

The OPAC catalogue record ( example ) will show the call number of the shelving location of physical books; also books on similar topics can be browsed under the Subject entry on the catalogue record.

e-books are also available on the OPAC search: Just click on the link 'Online access click to view' in your OPAC catalogue search results, this will redirect to the specific e-book record on the E book Central database

E books Central is also available directly on the TUS Midlands library website under the Tab - Collections - E books - Ebook Central

Academic Articles

One can also check for academic articles on your research topic (as an example if doing an essay on the effect of physical activity on psychological stress, you might use keywords such as, 'exercise and stress' in a database search)

We can use the Discover search on the main library web-page to search for articles.

Note:You will need to login to use this resource.

Just add your search concepts to the Discover search box (Exercise AND stress) and links to academic journal articles will be available in the search results.

The journal articles retrieved may be available directly though our TUS Midlands  library databases or by links to free open access articles.

Refining a search result in Discover:

If a lot of results are provided, the concepts can be narrowed: There is a 'Date' and subject limiter on the left,  

One can also click on the 'Advanced search' option under the search box once a search is performed in Discovery, also one can 'Select a field' on the drop-down tab on the right of advanced search (example 'Subject terms search'or or 'Abstract ').

One can also change the search keywords, for example using search keywords such as Jogging or running instead of exercise.

Articles can be opened by following the links in Discover to find the full text if available.

Other sources:

Examples of other sources that can also be searched are: Statistics, Information Guides and websites.

See the Library website and tabs - Collections - Theses, there are some useful links here include Lenus , Ethos (online theses) Rian and Research@Thea

If doing a Google search be careful to evaluate your source (see the Critical Evaluation tab on most of the Science Libguides )

Another good source for government sources and reports is Google advanced search as the Domain limiter there will allow any search be limited by national government (gov.ie, gov.uk etc) or education (edu) websites.

In any stage of you search if you need help or support you can contact the library staff at or Email:  [email protected]

Advanced Searching on EBSCO from ISU libraries

Some more tips on advanced searching: TUS: Midlands

Advanced Searching skills and Search Concepts: using search concepts in a library database

Example topic : “ The effect of supplements on sports performance”

Go to Library - Collections - Databases - open EBSCO Academic Search Complete

At this stage you can add some extra EBSCO databases to allow a cross search by clicking on ' Choose Databases ' at the top of the web-page:

For this topic we add: Cinahl, SportsDiscus; Medline (full descriptions of database content are on the library databases page ).

Note : It is useful to do a personal sign in to any database.  A personal sign-on allows us to save relevant searches and also to have any new results of relevant searches automatically emailed by setting up a search alert.

To save a search or to set up an alert after the personal sign-in on EBSCO, just click on 'Search history' and then click on the relevant search to select it.

Example search ; Supplements and Sport performance

Advanced search allows a separate search box for each concept, so synonyms (words that have similar meanings) can be added with OR between them in each search concept box. Examples of synonyms are Supplement or Antioxidant or "nutrition support"

In EBSCO one can also increase the results obtained by clicking on the 'apply related words' box.

Inverted commas indicates a search phase, (e.g. "tennis elbow"); and this means that only that exact phase will be retrieved in a search.

Other options to aid in a search:

Proximity Search:

Means that the words need to be within a certain number of words to each other to be retrieved. 

 e.g. Nutrition N4 support; means these words have to be within 4 words of each other in the documents searched

Truncation Search:

This will search for all possible ending of a word for example Supplement* (so, supplement, supplements, supplementation or any other word with the stem 'supplement'), n* will search for every word that starts with the letter n.

 e'g Supplement* with a star * sign will search for supplement or supplements

So our full example search is as follows:

(Supplement or antioxidants or Nutrition N4 Support)

(sports or athletics or "physical activity" or exercise)

(benefit or performance or impact)

N.B. Brackets indicate a separate search box in advanced search.

Result: This gives (at time of search): Search Results: 1 - 10 of 2,654.

Strategies to focus these results

In EBSCO under 'Page options' we can change the results per page to 50

These results are ranked by relevance so more relevant articles will be on top in the results page.

Search Fields: In case of a lot of results, it is also often useful to rerun the search, - change the 'Select a field' opposite the search boxes to Abstract or Subject or even to a 'Title' search).  It is also possible to limit your search to a particular journal.

Subject search:  A search can be limited by clicking on the specific subject suggestions (on the left of results in EBSCO)

Note: This strategy will increase search specificity but you may also miss out on relevant results.

Search Filters are on the left of the results screen:

Filter by Date; - f or example limit results to 'last 5 years'

Filter by Subject : It is possible also to ‘search within a search with a listed option of subject terms’ (on the left of the results page). This can make the results very focused as is is a specific subject search within a search already performed.

NB : One can also filter before a search by using the particular database 'Limiters' under the search boxes (Document type, date, sex etc)

Other strategies include: Examination of keywords and subjects occurring in relevant articles and using these in subsequent searches. (An example for our search, might be a keyphrase, such as "dietary supplement").

Looking at the reference list of very relevant articles, and searching for some of those specific articles. If you find a relevant article in a reference list you can simply enter the article title into Discovery or do a 'Publication search' (for our journal title holdings of that journal) also on the main library web-page.

Other Resources:

Google advanced search with domain limiter for government reports; / education reports etc: Google Scholar with cited by links: Library guides : Open Education Resources ( OER ): Library These tab with additional resource links: Library contacts for specific queries.

Using search aids:

Most databases allow you to use * as a truncation symbol; for example Nurs* will search for all words starting with Nurs, typically Nurse, Nurses, Nursing

Most databases allow the use of ? as a wildcard, for example Wom?n stands for Women or Woman

Most databases allow phrase searching with the use of inverted commas, e.g. "Nutrition support"

Many databases allow proximity searching but the method differs in different databases , e.g, In EBSCO  Nutrition N3 Support means that the words need not be in a phrase but need to be within 3 words of each other to be retrieved

Review Articles: When reviewing the literature it can be useful to look at review articles; some databases such as PubMed and PsycINFO allow one to use a search limiter for review articles; alternatively the search term 'Review' can be added as an extra search concept

Citation tracking: Also called Citation analysis , or Cited reference searching , is a way of measuring the relative importance or impact or an author, article, or publication, by counting the number of times that author, article, or publication has been cited by other works. Databases such as PubMed; Scopus and Google Scholar have a 'Cited by' link 

Writing a Search Statement : You can use 'Concepts' and Boolean logic to create a Search Statement for your write up, for example a search strategy for the essay topic, ' Supplements are a valuable aid to athletic performance, Discuss? ',  might be as below

(Antioxidants or supplements or "nutrition support") AND (Sports performance or athletic performance) 

Truncation symbols and proximity operators in use for different databases

database in literature review

Click: Collections tab and then on Databases

Main Science databases include the EBSCO databases (CINAHL, Academic Search, SocINDEX, SportDiscus, Medline) and Proquest databases (Health Research premier, PsycArticles, PsycInfo), also, Scopus, Wiley Online, Sage, Taylor & Francis, Oxford Journals Collection, JSTOR, Cambridge Journals Online and Science Direct Database; and free databases such as  PubMed  and Google Scholar.

OTHER FREE DATABASES

‘ Highwire ’ and ‘ BioMed Central’ for examples of good open access databases. 

For more free resources, click on the 'Collections' tab and then on the ‘ Theses ’ link for free resources such as RIAN, Ethos (theses) and LENUS

N.B. 'Discover' on the main library web-page is not a database but a search engine which searches across different databases however the searching principles are the same:

All TUS Athlone Science & Medicine databases:

Academic Search Complete (Ebsco) Anatomy TV BMJ Best Practice Cambridge Journals Online ClinicalKey Student Nursing (eBooks) Clinical Skills (Interactive nursing skills based) Clinical Sports Medicine Collection CINAHL with Full Text (Ebsco) The Cochrane Library (Freely available) Dentistry and Oral Sciences Source Health Research Premium Irish Medicines Formulary JOVE Video Journal, Science Education & Textbook JSTOR Kanopy (Film documentaries) Medicines Complete Medline Oxford Journals Collection PILOTS: Published International Literature On Traumatic Stress PsycArticles PsycInfo PubMed (Freely available) Research Professional Sage Premier 2021 Collection Sage Research Methods ScienceDirect Scopus Social Sciences Premium Collection SocINDEX with Full Text SPORTDiscus Wileys Journal Database

KEYWORD SEARCH: strategies and Hints

See: Information Literacy Tutorial at TUS Midwest

• Decide on keywords, which cover all or part of the meaning of your research.

• Initially select 2 or 3 words which together sum up the meaning of the essay (or part) and use these in separate search boxes

• When selecting keywords start simply; use two or three simple keywords or phrases; then you can adjust the search depending on the number of results. 

• In many databases (and Google)  you can put key phrases in inverted commas such as “student nurse” or “communication barriers”; this will ensure that the phrase, rather than the separate words are searched. 

• If you get a lot of results make the search more specific by adding another keyword or by changing the search parameters to TITLE or ABSTRACT.

• After the preliminary search type you should expand your keywords. Eventually, you may have 10 or more different keywords that you can try in a combination of 2 to 3 at a time in the different databases. Remember different keywords will give optimum results in different databases.

• Look at the keywords or subject terms in relevant articles in your results and consider adding these to your search keyword list.

• In some databases certain SUBJECT searches will be suggested depending on your initial search which can be very helpful. Remember a SUBJECT is what the main topic of an article is, so all the results will be relevant. In EBSCO simply click on the subject thesaurus or Subject Major heading on the left of the main screen

• You may then decide to rerun the search with different keywords or Subject areas

• Also search Discover and  Google scholar for the article as these link to other free resources

• For medical / Biological articles also check PubMed and click on the free full-text limiter – this limiter is on the left of the results screen.

• To look at the full text for any article click on PDF full-text link – you can email articles to your student email from most databases or print or save using the print and save shortcuts. 

• You can save results and also set up email subject alerts by setting up a personal login on the library databases 

• If you find a very relevant article in any database do check the reference list at the end of the article; you can search for a journal title on the main library page to see if the library holds the journal referenced or if the library has a link to it. 

• NOTE: If the journal-title is in abbreviated format simply type the abbreviation into a google search with the word ‘journal’ beside it and the full title will appear

• Pubmed has a ‘Related citation’ link on its searches to help you find similar articles

• Google scholar has a ‘Cited by’ and Cited in’ function which can also be used to find similar material.

  • What is Plagiarism
  • Cite Them Right Harvard Guide
  • How to use Turnitin Go to Moodle, then click on 'Academic Writing Skills' and then 'Stop Plagiarism' and then the link to the video .'Are you sure you have referenced correctly? Watch this video before you upload your draft assignment to Turnitin'
  • Draft Turnitin Go to Moodle, then click on 'Academic Writing Skills' and then 'Stop Plagiarism' to find the Draft Turnitin link.

database in literature review

Zotero is a reference manager will allow you to organise your references and to download book and article references into your Zotero library

Zotero: Getting started

How to download Zotero

  • AIT Civil & Construction Engineering: Referencing

Managing and citing sources using Zotero

How To Insert Zotero Citations Into Microsoft Word   How To Insert Zotero Citations Into Microsoft Word

Duplication in Zotero

Zotero (WVU Libraries)

Using Zotero

Zotero Style Repository

  • Zotero: more information including advanced tips  (Thanks to the Health Sciences Library, University of North Carolina who have given permission for this link to be display)

Mendeley Graphic

Mendeley is a reference manager software founded in 2007 by PhD students and acquired by Elsevier in 2013

Please note : The newer Mendeley download is called Mendeley Reference Manager; if you download this on a college laptop then download the word plugin through Microsoft Store from  this link  .  To complete the install for the word plugin you will need to login to Office 365, just follow the prompts, and use your college signon.

How to use Mendeley Reference Manager (2022)

Download Mendeley Reference Manager and Mendeley Cite   Thomas Cooper Library: Tips & Tutorials: This web tutorial will teach viewers how to create a Mendeley account, download the Mendeley Reference Manager for Windows, and download Mendeley Cite to Microsoft Word.

Mendeley Reference Manager Libguide: American University of Beirut   This guide helps you step by step use Mendeley to manage your sources/references easily.

EndNote is another reference management software package, used to manage bibliographies and references when writing essays, reports and articles.

database in literature review

Endnote Web Citations and Bibliography   Mary Immaculate College Libraries

Endnote Wed tutorials on You tube

Log into EndNote Web

Removing duplicates from Endnote online

Endnote online (you tube video library)

Statistics relating to Ireland: 

  • Central Statistics Office  (CSO) website.
  • Open Data  data.gov.ie
  • Irish Education Statistics  on Dept. of Education & Skills website.
  • Central Bank of Ireland
  • National Economic & Social Council  (NESC)
  • Environmental Protection Agency (EPA)  
  • Economic & Social Research Institution

International statistics  

  • Eurostat  (European Commission)
  • OECD iLibrary  (OECD)
  • UN data  (UN)
  • World Development Indicators  (World Bank).
  • << Previous: Picking your research question and searching
  • Next: Tutorials >>
  • Last Updated: Sep 3, 2024 10:26 AM
  • URL: https://ait.libguides.com/literaturereview
  • Open access
  • Published: 06 December 2017

Optimal database combinations for literature searches in systematic reviews: a prospective exploratory study

  • Wichor M. Bramer 1 ,
  • Melissa L. Rethlefsen 2 ,
  • Jos Kleijnen 3 , 4 &
  • Oscar H. Franco 5  

Systematic Reviews volume  6 , Article number:  245 ( 2017 ) Cite this article

162k Accesses

893 Citations

90 Altmetric

Metrics details

Within systematic reviews, when searching for relevant references, it is advisable to use multiple databases. However, searching databases is laborious and time-consuming, as syntax of search strategies are database specific. We aimed to determine the optimal combination of databases needed to conduct efficient searches in systematic reviews and whether the current practice in published reviews is appropriate. While previous studies determined the coverage of databases, we analyzed the actual retrieval from the original searches for systematic reviews.

Since May 2013, the first author prospectively recorded results from systematic review searches that he performed at his institution. PubMed was used to identify systematic reviews published using our search strategy results. For each published systematic review, we extracted the references of the included studies. Using the prospectively recorded results and the studies included in the publications, we calculated recall, precision, and number needed to read for single databases and databases in combination. We assessed the frequency at which databases and combinations would achieve varying levels of recall (i.e., 95%). For a sample of 200 recently published systematic reviews, we calculated how many had used enough databases to ensure 95% recall.

A total of 58 published systematic reviews were included, totaling 1746 relevant references identified by our database searches, while 84 included references had been retrieved by other search methods. Sixteen percent of the included references (291 articles) were only found in a single database; Embase produced the most unique references ( n  = 132). The combination of Embase, MEDLINE, Web of Science Core Collection, and Google Scholar performed best, achieving an overall recall of 98.3 and 100% recall in 72% of systematic reviews. We estimate that 60% of published systematic reviews do not retrieve 95% of all available relevant references as many fail to search important databases. Other specialized databases, such as CINAHL or PsycINFO, add unique references to some reviews where the topic of the review is related to the focus of the database.

Conclusions

Optimal searches in systematic reviews should search at least Embase, MEDLINE, Web of Science, and Google Scholar as a minimum requirement to guarantee adequate and efficient coverage.

Peer Review reports

Investigators and information specialists searching for relevant references for a systematic review (SR) are generally advised to search multiple databases and to use additional methods to be able to adequately identify all literature related to the topic of interest [ 1 , 2 , 3 , 4 , 5 , 6 ]. The Cochrane Handbook, for example, recommends the use of at least MEDLINE and Cochrane Central and, when available, Embase for identifying reports of randomized controlled trials [ 7 ]. There are disadvantages to using multiple databases. It is laborious for searchers to translate a search strategy into multiple interfaces and search syntaxes, as field codes and proximity operators differ between interfaces. Differences in thesaurus terms between databases add another significant burden for translation. Furthermore, it is time-consuming for reviewers who have to screen more, and likely irrelevant, titles and abstracts. Lastly, access to databases is often limited and only available on subscription basis.

Previous studies have investigated the added value of different databases on different topics [ 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 ]. Some concluded that searching only one database can be sufficient as searching other databases has no effect on the outcome [ 16 , 17 ]. Nevertheless others have concluded that a single database is not sufficient to retrieve all references for systematic reviews [ 18 , 19 ]. Most articles on this topic draw their conclusions based on the coverage of databases [ 14 ]. A recent paper tried to find an acceptable number needed to read for adding an additional database; sadly, however, no true conclusion could be drawn [ 20 ]. However, whether an article is present in a database may not translate to being found by a search in that database. Because of this major limitation, the question of which databases are necessary to retrieve all relevant references for a systematic review remains unanswered. Therefore, we research the probability that single or various combinations of databases retrieve the most relevant references in a systematic review by studying actual retrieval in various databases.

The aim of our research is to determine the combination of databases needed for systematic review searches to provide efficient results (i.e., to minimize the burden for the investigators without reducing the validity of the research by missing relevant references). A secondary aim is to investigate the current practice of databases searched for published reviews. Are included references being missed because the review authors failed to search a certain database?

Development of search strategies

At Erasmus MC, search strategies for systematic reviews are often designed via a librarian-mediated search service. The information specialists of Erasmus MC developed an efficient method that helps them perform searches in many databases in a much shorter time than other methods. This method of literature searching and a pragmatic evaluation thereof are published in separate journal articles [ 21 , 22 ]. In short, the method consists of an efficient way to combine thesaurus terms and title/abstract terms into a single line search strategy. This search is then optimized. Articles that are indexed with a set of identified thesaurus terms, but do not contain the current search terms in title or abstract, are screened to discover potential new terms. New candidate terms are added to the basic search and evaluated. Once optimal recall is achieved, macros are used to translate the search syntaxes between databases, though manual adaptation of the thesaurus terms is still necessary.

Review projects at Erasmus MC cover a wide range of medical topics, from therapeutic effectiveness and diagnostic accuracy to ethics and public health. In general, searches are developed in MEDLINE in Ovid (Ovid MEDLINE® In-Process & Other Non-Indexed Citations, Ovid MEDLINE® Daily and Ovid MEDLINE®, from 1946); Embase.com (searching both Embase and MEDLINE records, with full coverage including Embase Classic); the Cochrane Central Register of Controlled Trials (CENTRAL) via the Wiley Interface; Web of Science Core Collection (hereafter called Web of Science); PubMed restricting to records in the subset “as supplied by publisher” to find references that not yet indexed in MEDLINE (using the syntax publisher [sb]); and Google Scholar. In general, we use the first 200 references as sorted in the relevance ranking of Google Scholar. When the number of references from other databases was low, we expected the total number of potential relevant references to be low. In this case, the number of hits from Google Scholar was limited to 100. When the overall number of hits was low, we additionally searched Scopus, and when appropriate for the topic, we included CINAHL (EBSCOhost), PsycINFO (Ovid), and SportDiscus (EBSCOhost) in our search.

Beginning in May 2013, the number of records retrieved from each search for each database was recorded at the moment of searching. The complete results from all databases used for each of the systematic reviews were imported into a unique EndNote library upon search completion and saved without deduplication for this research. The researchers that requested the search received a deduplicated EndNote file from which they selected the references relevant for inclusion in their systematic review. All searches in this study were developed and executed by W.M.B.

Determining relevant references of published reviews

We searched PubMed in July 2016 for all reviews published since 2014 where first authors were affiliated to Erasmus MC, Rotterdam, the Netherlands, and matched those with search registrations performed by the medical library of Erasmus MC. This search was used in earlier research [ 21 ]. Published reviews were included if the search strategies and results had been documented at the time of the last update and if, at minimum, the databases Embase, MEDLINE, Cochrane CENTRAL, Web of Science, and Google Scholar had been used in the review. From the published journal article, we extracted the list of final included references. We documented the department of the first author. To categorize the types of patient/population and intervention, we identified broad MeSH terms relating to the most important disease and intervention discussed in the article. We copied from the MeSH tree the top MeSH term directly below the disease category or, in to case of the intervention, directly below the therapeutics MeSH term. We selected the domain from a pre-defined set of broad domains, including therapy, etiology, epidemiology, diagnosis, management, and prognosis. Lastly, we checked whether the reviews described limiting their included references to a particular study design.

To identify whether our searches had found the included references, and if so, from which database(s) that citation was retrieved, each included reference was located in the original corresponding EndNote library using the first author name combined with the publication year as a search term for each specific relevant publication. If this resulted in extraneous results, the search was subsequently limited using a distinct part of the title or a second author name. Based on the record numbers of the search results in EndNote, we determined from which database these references came. If an included reference was not found in the EndNote file, we presumed the authors used an alternative method of identifying the reference (e.g., examining cited references, contacting prominent authors, or searching gray literature), and we did not include it in our analysis.

Data analysis

We determined the databases that contributed most to the reviews by the number of unique references retrieved by each database used in the reviews. Unique references were included articles that had been found by only one database search. Those databases that contributed the most unique included references were then considered candidate databases to determine the most optimal combination of databases in the further analyses.

In Excel, we calculated the performance of each individual database and various combinations. Performance was measured using recall, precision, and number needed to read. See Table  1 for definitions of these measures. These values were calculated both for all reviews combined and per individual review.

Performance of a search can be expressed in different ways. Depending on the goal of the search, different measures may be optimized. In the case of a clinical question, precision is most important, as a practicing clinician does not have a lot of time to read through many articles in a clinical setting. When searching for a systematic review, recall is the most important aspect, as the researcher does not want to miss any relevant references. As our research is performed on systematic reviews, the main performance measure is recall.

We identified all included references that were uniquely identified by a single database. For the databases that retrieved the most unique included references, we calculated the number of references retrieved (after deduplication) and the number of included references that had been retrieved by all possible combinations of these databases, in total and per review. For all individual reviews, we determined the median recall, the minimum recall, and the percentage of reviews for which each single database or combination retrieved 100% recall.

For each review that we investigated, we determined what the recall was for all possible different database combinations of the most important databases. Based on these, we determined the percentage of reviews where that database combination had achieved 100% recall, more than 95%, more than 90%, and more than 80%. Based on the number of results per database both before and after deduplication as recorded at the time of searching, we calculated the ratio between the total number of results and the number of results for each database and combination.

Improvement of precision was calculated as the ratio between the original precision from the searches in all databases and the precision for each database and combination.

To compare our practice of database usage in systematic reviews against current practice as evidenced in the literature, we analyzed a set of 200 recent systematic reviews from PubMed. On 5 January 2017, we searched PubMed for articles with the phrase “systematic review” in the title. Starting with the most recent articles, we determined the databases searched either from the abstract or from the full text until we had data for 200 reviews. For the individual databases and combinations that were used in those reviews, we multiplied the frequency of occurrence in that set of 200 with the probability that the database or combination would lead to an acceptable recall (which we defined at 95%) that we had measured in our own data.

Our earlier research had resulted in 206 systematic reviews published between 2014 and July 2016, in which the first author was affiliated with Erasmus MC [ 21 ]. In 73 of these, the searches and results had been documented by the first author of this article at the time of the last search. Of those, 15 could not be included in this research, since they had not searched all databases we investigated here. Therefore, for this research, a total of 58 systematic reviews were analyzed. The references to these reviews can be found in Additional file 1 . An overview of the broad topical categories covered in these reviews is given in Table  2 . Many of the reviews were initiated by members of the departments of surgery and epidemiology. The reviews covered a wide variety of disease, none of which was present in more than 12% of the reviews. The interventions were mostly from the chemicals and drugs category, or surgical procedures. Over a third of the reviews were therapeutic, while slightly under a quarter answered an etiological question. Most reviews did not limit to certain study designs, 9% limited to RCTs only, and another 9% limited to other study types.

Together, these reviews included a total of 1830 references. Of these, 84 references (4.6%) had not been retrieved by our database searches and were not included in our analysis, leaving in total 1746 references. In our analyses, we combined the results from MEDLINE in Ovid and PubMed (the subset as supplied by publisher) into one database labeled MEDLINE.

Unique references per database

A total of 292 (17%) references were found by only one database. Table  3 displays the number of unique results retrieved for each single database. Embase retrieved the most unique included references, followed by MEDLINE, Web of Science, and Google Scholar. Cochrane CENTRAL is absent from the table, as for the five reviews limited to randomized trials, it did not add any unique included references. Subject-specific databases such as CINAHL, PsycINFO, and SportDiscus only retrieved additional included references when the topic of the review was directly related to their special content, respectively nursing, psychiatry, and sports medicine.

Overall performance

The four databases that had retrieved the most unique references (Embase, MEDLINE, Web of Science, and Google Scholar) were investigated individually and in all possible combinations (see Table  4 ). Of the individual databases, Embase had the highest overall recall (85.9%). Of the combinations of two databases, Embase and MEDLINE had the best results (92.8%). Embase and MEDLINE combined with either Google Scholar or Web of Science scored similarly well on overall recall (95.9%). However, the combination with Google Scholar had a higher precision and higher median recall, a higher minimum recall, and a higher proportion of reviews that retrieved all included references. Using both Web of Science and Google Scholar in addition to MEDLINE and Embase increased the overall recall to 98.3%. The higher recall from adding extra databases came at a cost in number needed to read (NNR). Searching only Embase produced an NNR of 57 on average, whereas, for the optimal combination of four databases, the NNR was 73.

Probability of appropriate recall

We calculated the recall for individual databases and databases in all possible combination for all reviews included in the research. Figure  1 shows the percentages of reviews where a certain database combination led to a certain recall. For example, in 48% of all systematic reviews, the combination of Embase and MEDLINE (with or without Cochrane CENTRAL; Cochrane CENTRAL did not add unique relevant references) reaches a recall of at least 95%. In 72% of studied systematic reviews, the combination of Embase, MEDLINE, Web of Science, and Google Scholar retrieved all included references. In the top bar, we present the results of the complete database searches relative to the total number of included references. This shows that many database searches missed relevant references.

Percentage of systematic reviews for which a certain database combination reached a certain recall. The X -axis represents the percentage of reviews for which a specific combination of databases, as shown on the y -axis, reached a certain recall (represented with bar colors). Abbreviations: EM Embase, ML MEDLINE, WoS Web of Science, GS Google Scholar. Asterisk indicates that the recall of all databases has been calculated over all included references. The recall of the database combinations was calculated over all included references retrieved by any database

Differences between domains of reviews

We analyzed whether the added value of Web of Science and Google Scholar was dependent of the domain of the review. For 55 reviews, we determined the domain. See Fig.  2 for the comparison of the recall of Embase, MEDLINE, and Cochrane CENTRAL per review for all identified domains. For all but one domain, the traditional combination of Embase, MEDLINE, and Cochrane CENTRAL did not retrieve enough included references. For four out of five systematic reviews that limited to randomized controlled trials (RCTs) only, the traditional combination retrieved 100% of all included references. However, for one review of this domain, the recall was 82%. Of the 11 references included in this review, one was found only in Google Scholar and one only in Web of Science.

Percentage of systematic reviews of a certain domain for which the combination Embase, MEDLINE and Cochrane CENTRAL reached a certain recall

Reduction in number of results

We calculated the ratio between the number of results found when searching all databases, including databases not included in our analyses, such as Scopus, PsycINFO, and CINAHL, and the number of results found searching a selection of databases. See Fig.  3 for the legend of the plots in Figs.  4 and 5 . Figure  4 shows the distribution of this value for individual reviews. The database combinations with the highest recall did not reduce the total number of results by large margins. Moreover, in combinations where the number of results was greatly reduced, the recall of included references was lower.

Legend of Figs. 3 and 4

The ratio between number of results per database combination and the total number of results for all databases

The ratio between precision per database combination and the total precision for all databases

Improvement of precision

To determine how searching multiple databases affected precision, we calculated for each combination the ratio between the original precision, observed when all databases were searched, and the precision calculated for different database combinations. Figure  5 shows the improvement of precision for 15 databases and database combinations. Because precision is defined as the number of relevant references divided by the number of total results, we see a strong correlation with the total number of results.

Status of current practice of database selection

From a set of 200 recent SRs identified via PubMed, we analyzed the databases that had been searched. Almost all reviews (97%) reported a search in MEDLINE. Other databases that we identified as essential for good recall were searched much less frequently; Embase was searched in 61% and Web of Science in 35%, and Google Scholar was only used in 10% of all reviews. For all individual databases or combinations of the four important databases from our research (MEDLINE, Embase, Web of Science, and Google Scholar), we multiplied the frequency of occurrence of that combination in the random set, with the probability we found in our research that this combination would lead to an acceptable recall of 95%. The calculation is shown in Table  5 . For example, around a third of the reviews (37%) relied on the combination of MEDLINE and Embase. Based on our findings, this combination achieves acceptable recall about half the time (47%). This implies that 17% of the reviews in the PubMed sample would have achieved an acceptable recall of 95%. The sum of all these values is the total probability of acceptable recall in the random sample. Based on these calculations, we estimate that the probability that this random set of reviews retrieved more than 95% of all possible included references was 40%. Using similar calculations, also shown in Table  5 , we estimated the probability that 100% of relevant references were retrieved is 23%.

Our study shows that, to reach maximum recall, searches in systematic reviews ought to include a combination of databases. To ensure adequate performance in searches (i.e., recall, precision, and number needed to read), we find that literature searches for a systematic review should, at minimum, be performed in the combination of the following four databases: Embase, MEDLINE (including Epub ahead of print), Web of Science Core Collection, and Google Scholar. Using that combination, 93% of the systematic reviews in our study obtained levels of recall that could be considered acceptable (> 95%). Unique results from specialized databases that closely match systematic review topics, such as PsycINFO for reviews in the fields of behavioral sciences and mental health or CINAHL for reviews on the topics of nursing or allied health, indicate that specialized databases should be used additionally when appropriate.

We find that Embase is critical for acceptable recall in a review and should always be searched for medically oriented systematic reviews. However, Embase is only accessible via a paid subscription, which generally makes it challenging for review teams not affiliated with academic medical centers to access. The highest scoring database combination without Embase is a combination of MEDLINE, Web of Science, and Google Scholar, but that reaches satisfactory recall for only 39% of all investigated systematic reviews, while still requiring a paid subscription to Web of Science. Of the five reviews that included only RCTs, four reached 100% recall if MEDLINE, Web of Science, and Google Scholar combined were complemented with Cochrane CENTRAL.

The Cochrane Handbook recommends searching MEDLINE, Cochrane CENTRAL, and Embase for systematic reviews of RCTs. For reviews in our study that included RCTs only, indeed, this recommendation was sufficient for four (80%) of the reviews. The one review where it was insufficient was about alternative medicine, specifically meditation and relaxation therapy, where one of the missed studies was published in the Indian Journal of Positive Psychology . The other study from the Journal of Advanced Nursing is indexed in MEDLINE and Embase but was only retrieved because of the addition of KeyWords Plus in Web of Science. We estimate more than 50% of reviews that include more study types than RCTs would miss more than 5% of included references if only traditional combination of MEDLINE, Embase, and Cochrane CENTAL is searched.

We are aware that the Cochrane Handbook [ 7 ] recommends more than only these databases, but further recommendations focus on regional and specialized databases. Though we occasionally used the regional databases LILACS and SciELO in our reviews, they did not provide unique references in our study. Subject-specific databases like PsycINFO only added unique references to a small percentage of systematic reviews when they had been used for the search. The third key database we identified in this research, Web of Science, is only mentioned as a citation index in the Cochrane Handbook, not as a bibliographic database. To our surprise, Cochrane CENTRAL did not identify any unique included studies that had not been retrieved by the other databases, not even for the five reviews focusing entirely on RCTs. If Erasmus MC authors had conducted more reviews that included only RCTs, Cochrane CENTRAL might have added more unique references.

MEDLINE did find unique references that had not been found in Embase, although our searches in Embase included all MEDLINE records. It is likely caused by difference in thesaurus terms that were added, but further analysis would be required to determine reasons for not finding the MEDLINE records in Embase. Although Embase covers MEDLINE, it apparently does not index every article from MEDLINE. Thirty-seven references were found in MEDLINE (Ovid) but were not available in Embase.com . These are mostly unique PubMed references, which are not assigned MeSH terms, and are often freely available via PubMed Central.

Google Scholar adds relevant articles not found in the other databases, possibly because it indexes the full text of all articles. It therefore finds articles in which the topic of research is not mentioned in title, abstract, or thesaurus terms, but where the concepts are only discussed in the full text. Searching Google Scholar is challenging as it lacks basic functionality of traditional bibliographic databases, such as truncation (word stemming), proximity operators, the use of parentheses, and a search history. Additionally, search strategies are limited to a maximum of 256 characters, which means that creating a thorough search strategy can be laborious.

Whether Embase and Web of Science can be replaced by Scopus remains uncertain. We have not yet gathered enough data to be able to make a full comparison between Embase and Scopus. In 23 reviews included in this research, Scopus was searched. In 12 reviews (52%), Scopus retrieved 100% of all included references retrieved by Embase or Web of Science. In the other 48%, the recall by Scopus was suboptimal, in one occasion as low as 38%.

Of all reviews in which we searched CINAHL and PsycINFO, respectively, for 6 and 9% of the reviews, unique references were found. For CINAHL and PsycINFO, in one case each, unique relevant references were found. In both these reviews, the topic was highly related to the topic of the database. Although we did not use these special topic databases in all of our reviews, given the low number of reviews where these databases added relevant references, and observing the special topics of those reviews, we suggest that these subject databases will only add value if the topic is related to the topic of the database.

Many articles written on this topic have calculated overall recall of several reviews, instead of the effects on all individual reviews. Researchers planning a systematic review generally perform one review, and they need to estimate the probability that they may miss relevant articles in their search. When looking at the overall recall, the combination of Embase and MEDLINE and either Google Scholar or Web of Science could be regarded sufficient with 96% recall. This number however is not an answer to the question of a researcher performing a systematic review, regarding which databases should be searched. A researcher wants to be able to estimate the chances that his or her current project will miss a relevant reference. However, when looking at individual reviews, the probability of missing more than 5% of included references found through database searching is 33% when Google Scholar is used together with Embase and MEDLINE and 30% for the Web of Science, Embase, and MEDLINE combination. What is considered acceptable recall for systematic review searches is open for debate and can differ between individuals and groups. Some reviewers might accept a potential loss of 5% of relevant references; others would want to pursue 100% recall, no matter what cost. Using the results in this research, review teams can decide, based on their idea of acceptable recall and the desired probability which databases to include in their searches.

Strengths and limitations

We did not investigate whether the loss of certain references had resulted in changes to the conclusion of the reviews. Of course, the loss of a minor non-randomized included study that follows the systematic review’s conclusions would not be as problematic as losing a major included randomized controlled trial with contradictory results. However, the wide range of scope, topic, and criteria between systematic reviews and their related review types make it very hard to answer this question.

We found that two databases previously not recommended as essential for systematic review searching, Web of Science and Google Scholar, were key to improving recall in the reviews we investigated. Because this is a novel finding, we cannot conclude whether it is due to our dataset or to a generalizable principle. It is likely that topical differences in systematic reviews may impact whether databases such as Web of Science and Google Scholar add value to the review. One explanation for our finding may be that if the research question is very specific, the topic of research might not always be mentioned in the title and/or abstract. In that case, Google Scholar might add value by searching the full text of articles. If the research question is more interdisciplinary, a broader science database such as Web of Science is likely to add value. The topics of the reviews studied here may simply have fallen into those categories, though the diversity of the included reviews may point to a more universal applicability.

Although we searched PubMed as supplied by publisher separately from MEDLINE in Ovid, we combined the included references of these databases into one measurement in our analysis. Until 2016, the most complete MEDLINE selection in Ovid still lacked the electronic publications that were already available in PubMed. These could be retrieved by searching PubMed with the subset as supplied by publisher. Since the introduction of the more complete MEDLINE collection Epub Ahead of Print , In-Process & Other Non-Indexed Citations , and Ovid MEDLINE® , the need to separately search PubMed as supplied by publisher has disappeared. According to our data, PubMed’s “as supplied by publisher” subset retrieved 12 unique included references, and it was the most important addition in terms of relevant references to the four major databases. It is therefore important to search MEDLINE including the “Epub Ahead of Print, In-Process, and Other Non-Indexed Citations” references.

These results may not be generalizable to other studies for other reasons. The skills and experience of the searcher are one of the most important aspects in the effectiveness of systematic review search strategies [ 23 , 24 , 25 ]. The searcher in the case of all 58 systematic reviews is an experienced biomedical information specialist. Though we suspect that searchers who are not information specialists or librarians would have a higher possibility of less well-constructed searches and searches with lower recall, even highly trained searchers differ in their approaches to searching. For this study, we searched to achieve as high a recall as possible, though our search strategies, like any other search strategy, still missed some relevant references because relevant terms had not been used in the search. We are not implying that a combined search of the four recommended databases will never result in relevant references being missed, rather that failure to search any one of these four databases will likely lead to relevant references being missed. Our experience in this study shows that additional efforts, such as hand searching, reference checking, and contacting key players, should be made to retrieve extra possible includes.

Based on our calculations made by looking at random systematic reviews in PubMed, we estimate that 60% of these reviews are likely to have missed more than 5% of relevant references only because of the combinations of databases that were used. That is with the generous assumption that the searches in those databases had been designed sensitively enough. Even when taking into account that many searchers consider the use of Scopus as a replacement of Embase, plus taking into account the large overlap of Scopus and Web of Science, this estimate remains similar. Also, while the Scopus and Web of Science assumptions we made might be true for coverage, they are likely very different when looking at recall, as Scopus does not allow the use of the full features of a thesaurus. We see that reviewers rarely use Web of Science and especially Google Scholar in their searches, though they retrieve a great deal of unique references in our reviews. Systematic review searchers should consider using these databases if they are available to them, and if their institution lacks availability, they should ask other institutes to cooperate on their systematic review searches.

The major strength of our paper is that it is the first large-scale study we know of to assess database performance for systematic reviews using prospectively collected data. Prior research on database importance for systematic reviews has looked primarily at whether included references could have theoretically been found in a certain database, but most have been unable to ascertain whether the researchers actually found the articles in those databases [ 10 , 12 , 16 , 17 , 26 ]. Whether a reference is available in a database is important, but whether the article can be found in a precise search with reasonable recall is not only impacted by the database’s coverage. Our experience has shown us that it is also impacted by the ability of the searcher, the accuracy of indexing of the database, and the complexity of terminology in a particular field. Because these studies based on retrospective analysis of database coverage do not account for the searchers’ abilities, the actual findings from the searches performed, and the indexing for particular articles, their conclusions lack immediate translatability into practice. This research goes beyond retrospectively assessed coverage to investigate real search performance in databases. Many of the articles reporting on previous research concluded that one database was able to retrieve most included references. Halladay et al. [ 10 ] and van Enst et al. [ 16 ] concluded that databases other than MEDLINE/PubMed did not change the outcomes of the review, while Rice et al. [ 17 ] found the added value of other databases only for newer, non-indexed references. In addition, Michaleff et al. [ 26 ] found that Cochrane CENTRAL included 95% of all RCTs included in the reviews investigated. Our conclusion that Web of Science and Google Scholar are needed for completeness has not been shared by previous research. Most of the previous studies did not include these two databases in their research.

We recommend that, regardless of their topic, searches for biomedical systematic reviews should combine Embase, MEDLINE (including electronic publications ahead of print), Web of Science (Core Collection), and Google Scholar (the 200 first relevant references) at minimum. Special topics databases such as CINAHL and PsycINFO should be added if the topic of the review directly touches the primary focus of a specialized subject database, like CINAHL for focus on nursing and allied health or PsycINFO for behavioral sciences and mental health. For reviews where RCTs are the desired study design, Cochrane CENTRAL may be similarly useful. Ignoring one or more of the databases that we identified as the four key databases will result in more precise searches with a lower number of results, but the researchers should decide whether that is worth the >increased probability of losing relevant references. This study also highlights once more that searching databases alone is, nevertheless, not enough to retrieve all relevant references.

Future research should continue to investigate recall of actual searches beyond coverage of databases and should consider focusing on the most optimal database combinations, not on single databases.

Levay P, Raynor M, Tuvey D. The contributions of MEDLINE, other bibliographic databases and various search techniques to NICE public health guidance. Evid Based Libr Inf Pract. 2015;10:50–68.

Article   Google Scholar  

Stevinson C, Lawlor DA. Searching multiple databases for systematic reviews: added value or diminishing returns? Complement Ther Med. 2004;12:228–32.

Article   CAS   PubMed   Google Scholar  

Lawrence DW. What is lost when searching only one literature database for articles relevant to injury prevention and safety promotion? Inj Prev. 2008;14:401–4.

Lemeshow AR, Blum RE, Berlin JA, Stoto MA, Colditz GA. Searching one or two databases was insufficient for meta-analysis of observational studies. J Clin Epidemiol. 2005;58:867–73.

Article   PubMed   Google Scholar  

Zheng MH, Zhang X, Ye Q, Chen YP. Searching additional databases except PubMed are necessary for a systematic review. Stroke. 2008;39:e139. author reply e140

Beyer FR, Wright K. Can we prioritise which databases to search? A case study using a systematic review of frozen shoulder management. Health Inf Libr J. 2013;30:49–58.

Higgins JPT, Green S. Cochrane handbook for systematic reviews of interventions: The Cochrane Collaboration, London, United Kingdom. 2011.

Wright K, Golder S, Lewis-Light K. What value is the CINAHL database when searching for systematic reviews of qualitative studies? Syst Rev. 2015;4:104.

Article   PubMed   PubMed Central   Google Scholar  

Wilkins T, Gillies RA, Davies K. EMBASE versus MEDLINE for family medicine searches: can MEDLINE searches find the forest or a tree? Can Fam Physician. 2005;51:848–9.

PubMed   Google Scholar  

Halladay CW, Trikalinos TA, Schmid IT, Schmid CH, Dahabreh IJ. Using data sources beyond PubMed has a modest impact on the results of systematic reviews of therapeutic interventions. J Clin Epidemiol. 2015;68:1076–84.

Ahmadi M, Ershad-Sarabi R, Jamshidiorak R, Bahaodini K. Comparison of bibliographic databases in retrieving information on telemedicine. J Kerman Univ Med Sci. 2014;21:343–54.

Google Scholar  

Lorenzetti DL, Topfer L-A, Dennett L, Clement F. Value of databases other than MEDLINE for rapid health technology assessments. Int J Technol Assess Health Care. 2014;30:173–8.

Beckles Z, Glover S, Ashe J, Stockton S, Boynton J, Lai R, Alderson P. Searching CINAHL did not add value to clinical questions posed in NICE guidelines. J Clin Epidemiol. 2013;66:1051–7.

Hartling L, Featherstone R, Nuspl M, Shave K, Dryden DM, Vandermeer B. The contribution of databases to the results of systematic reviews: a cross-sectional study. BMC Med Res Methodol. 2016;16:1–13.

Aagaard T, Lund H, Juhl C. Optimizing literature search in systematic reviews—are MEDLINE, EMBASE and CENTRAL enough for identifying effect studies within the area of musculoskeletal disorders? BMC Med Res Methodol. 2016;16:161.

van Enst WA, Scholten RJ, Whiting P, Zwinderman AH, Hooft L. Meta-epidemiologic analysis indicates that MEDLINE searches are sufficient for diagnostic test accuracy systematic reviews. J Clin Epidemiol. 2014;67:1192–9.

Rice DB, Kloda LA, Levis B, Qi B, Kingsland E, Thombs BD. Are MEDLINE searches sufficient for systematic reviews and meta-analyses of the diagnostic accuracy of depression screening tools? A review of meta-analyses. J Psychosom Res. 2016;87:7–13.

Bramer WM, Giustini D, Kramer BM, Anderson PF. The comparative recall of Google Scholar versus PubMed in identical searches for biomedical systematic reviews: a review of searches used in systematic reviews. Syst Rev. 2013;2:115.

Bramer WM, Giustini D, Kramer BMR. Comparing the coverage, recall, and precision of searches for 120 systematic reviews in Embase, MEDLINE, and Google Scholar: a prospective study. Syst Rev. 2016;5:39.

Ross-White A, Godfrey C. Is there an optimum number needed to retrieve to justify inclusion of a database in a systematic review search? Health Inf Libr J. 2017;33:217–24.

Bramer WM, Rethlefsen ML, Mast F, Kleijnen J. A pragmatic evaluation of a new method for librarian-mediated literature searches for systematic reviews. Res Synth Methods. 2017. doi: 10.1002/jrsm.1279 .

Bramer WM, de Jonge GB, Rethlefsen ML, Mast F, Kleijnen J. A systematic approach to searching: how to perform high quality literature searches more efficiently. J Med Libr Assoc. 2018.

Rethlefsen ML, Farrell AM, Osterhaus Trzasko LC, Brigham TJ. Librarian co-authors correlated with higher quality reported search strategies in general internal medicine systematic reviews. J Clin Epidemiol. 2015;68:617–26.

McGowan J, Sampson M. Systematic reviews need systematic searchers. J Med Libr Assoc. 2005;93:74–80.

PubMed   PubMed Central   Google Scholar  

McKibbon KA, Haynes RB, Dilks CJW, Ramsden MF, Ryan NC, Baker L, Flemming T, Fitzgerald D. How good are clinical MEDLINE searches? A comparative study of clinical end-user and librarian searches. Comput Biomed Res. 1990;23:583–93.

Michaleff ZA, Costa LO, Moseley AM, Maher CG, Elkins MR, Herbert RD, Sherrington C. CENTRAL, PEDro, PubMed, and EMBASE are the most comprehensive databases indexing randomized controlled trials of physical therapy interventions. Phys Ther. 2011;91:190–7.

Download references

Acknowledgements

Not applicable

Melissa Rethlefsen receives funding in part from the National Center for Advancing Translational Sciences of the National Institutes of Health under Award Number UL1TR001067. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Availability of data and materials

The datasets generated and/or analyzed during the current study are available from the corresponding author on a reasonable request.

Author information

Authors and affiliations.

Medical Library, Erasmus MC, Erasmus University Medical Centre Rotterdam, 3000 CS, Rotterdam, the Netherlands

Wichor M. Bramer

Spencer S. Eccles Health Sciences Library, University of Utah, Salt Lake City, Utah, USA

Melissa L. Rethlefsen

Kleijnen Systematic Reviews Ltd., York, UK

Jos Kleijnen

School for Public Health and Primary Care (CAPHRI), Maastricht University, Maastricht, the Netherlands

Department of Epidemiology, Erasmus MC, Erasmus University Medical Centre Rotterdam, Rotterdam, the Netherlands

Oscar H. Franco

You can also search for this author in PubMed   Google Scholar

Contributions

WB, JK, and OF designed the study. WB designed the searches used in this study and gathered the data. WB and ML analyzed the data. WB drafted the first manuscript, which was revised critically by the other authors. All authors have approved the final manuscript.

Corresponding author

Correspondence to Wichor M. Bramer .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

WB has received travel allowance from Embase for giving a presentation at a conference. The other authors declare no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:.

Reviews included in the research . References to the systematic reviews published by Erasmus MC authors that were included in the research. (DOCX 19 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Bramer, W.M., Rethlefsen, M.L., Kleijnen, J. et al. Optimal database combinations for literature searches in systematic reviews: a prospective exploratory study. Syst Rev 6 , 245 (2017). https://doi.org/10.1186/s13643-017-0644-y

Download citation

Received : 21 August 2017

Accepted : 24 November 2017

Published : 06 December 2017

DOI : https://doi.org/10.1186/s13643-017-0644-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Databases, bibliographic
  • Review literature as topic
  • Sensitivity and specificity
  • Information storage and retrieval

Systematic Reviews

ISSN: 2046-4053

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

database in literature review

Library Homepage

Literature Reviews

  • What is a Literature Review?
  • Steps for Creating a Literature Review
  • Providing Evidence / Critical Analysis
  • Challenges when writing a Literature Review
  • Systematic Literature Reviews

Developing a Literature Review

1. Purpose and Scope

To help you develop a literature review, gather information on existing research, sub-topics, relevant research, and overlaps. Note initial thoughts on the topic - a mind map or list might be helpful - and avoid unfocused reading, collecting irrelevant content.  A literature review serves to place your research within the context of existing knowledge. It demonstrates your understanding of the field and identifies gaps that your research aims to fill. This helps in justifying the relevance and necessity of your study.

To avoid over-reading, set a target word count for each section and limit reading time. Plan backwards from the deadline and move on to other parts of the investigation. Read major texts and explore up-to-date research. Check reference lists and citation indexes for common standard texts. Be guided by research questions and refocus on your topic when needed. Stop reading if you find similar viewpoints or if you're going off topic.

You can use a "Synthesis Matrix" to keep track of your reading notes. This concept map helps you to provide a summary of the literature and its connections is produced as a result of this study. Utilizing referencing software like RefWorks to obtain citations, you can construct the framework for composing your literature evaluation.

2. Source Selection

Focus on searching for academically authoritative texts such as academic books, journals, research reports, and government publications. These sources are critical for ensuring the credibility and reliability of your review. 

  • Academic Books: Provide comprehensive coverage of a topic.
  • Journal Articles: Offer the most up-to-date research and are essential for a literature review.
  • Research Reports: Detailed accounts of specific research projects.
  • Government Publications: Official documents that provide reliable data and insights.

3. Thematic Analysis

Instead of merely summarizing sources, identify and discuss key themes that emerge from the literature. This involves interpreting and evaluating how different authors have tackled similar issues and how their findings relate to your research.

4. Critical Evaluation

Adopt a critical attitude towards the sources you review. Scrutinize, question, and dissect the material to ensure that your review is not just descriptive but analytical. This helps in highlighting the significance of various sources and their relevance to your research.

Each work's critical assessment should take into account:

Provenance:  What qualifications does the author have? Are the author's claims backed up by proof, such as first-hand accounts from history, case studies, stories, statistics, and current scientific discoveries? Methodology:  Were the strategies employed to locate, collect, and evaluate the data suitable for tackling the study question? Was the sample size suitable? Were the findings properly reported and interpreted? Objectivity : Is the author's viewpoint impartial or biased? Does the author's thesis get supported by evidence that refutes it, or does it ignore certain important facts? Persuasiveness:  Which of the author's arguments is the strongest or weakest in terms of persuasiveness? Value:  Are the author's claims and deductions believable? Does the study ultimately advance our understanding of the issue in any meaningful way?

5. Categorization

Organize your literature review by grouping sources into categories based on themes, relevance to research questions, theoretical paradigms, or chronology. This helps in presenting your findings in a structured manner.

6. Source Validity

Ensure that the sources you include are valid and reliable. Classic texts may retain their authority over time, but for fields that evolve rapidly, prioritize the most recent research. Always check the credibility of the authors and the impact of their work in the field.

7. Synthesis and Findings

Synthesize the information from various sources to draw conclusions about the current state of knowledge. Identify trends, controversies, and gaps in the literature. Relate your findings to your research questions and suggest future directions for research.

Practical Tips

  • Use a variety of sources, including online databases, university libraries, and reference lists from relevant articles. This ensures a comprehensive coverage of the literature.
  • Avoid listing sources without analysis. Use tables, bulk citations, and footnotes to manage references efficiently and make your review more readable.
  • Writing a literature review is an ongoing process. Start writing early and revise as you read more. This iterative process helps in refining your arguments and identifying additional sources as needed.  

Brown University Library (2024) Organizing and Creating Information. Available at: https://libguides.brown.edu/organize/litreview (Accessed: 30 July 2024).

Pacheco-Vega, R. (2016) Synthesizing different bodies of work in your literature review: The Conceptual Synthesis Excel Dump (CSED) technique . Available at: http://www.raulpacheco.org/2016/06/synthesizing-different-bodies-of-work-in-your-literature-review-the-conceptual-synthesis-excel-dump-technique/ (Accessed: 30 July 2024).

Study Advice at the University of Reading (2024) Literature reviews . Available at: https://libguides.reading.ac.uk/literaturereview/developing (Accessed: 31 July 2024).

Further Reading

Frameworks for creating answerable (re)search questions  How to Guide

Literature Searching How to Guide

  • << Previous: Steps for Creating a Literature Review
  • Next: Providing Evidence / Critical Analysis >>
  • Last Updated: Sep 4, 2024 11:43 AM
  • URL: https://library.lsbu.ac.uk/literaturereviews

Unfortunately we don't fully support your browser. If you have the option to, please upgrade to a newer version or use Mozilla Firefox , Microsoft Edge , Google Chrome , or Safari 14 or newer. If you are unable to, and need support, please send us your feedback .

We'd appreciate your feedback. Tell us what you think! opens in new tab/window

Scopus: Comprehensive, multidisciplinary, trusted abstract and citation database

Quickly find relevant and authoritative research, identify experts and gain access to reliable data, metrics and analytical tools. Be confident in advancing research, educational goals, and research direction and priorities — all from one database.

Scopus benefits

Enhance research and scholarship with comprehensive data and analytics

Increase research efficiency.

Having access to comprehensive content and high-quality data is effective only if you can easily find the information you need. The state-of-the-art search tools and filters in Scopus enable you to quickly:

Discover relevant sources

Identify trends in research or emerging topics

Uncover potential research collaborators

Use our Quick Reference Guide to learn about our search features and filters.

Download the Scopus Quick Reference Guide opens in new tab/window

database in literature review

Identify emerging trends

Scopus has comprehensive scholarly literature, data and analytical tools to keep you up-to-date and ahead of the competition.

94M+ records

29,200+ active serial titles

330,000+ books

Watch this video opens in new tab/window to get a quick overview of how Scopus helps organizations of any size progress basic and applied research, support educational goals, and inform research strategies.

Download our fact sheet opens in new tab/window with Scopus content figures and the latest product updates.

Learn more about Scopus content

database in literature review

Accelerate your research

In the ever-changing landscape of academic research, staying at the forefront requires modern tools. Scopus AI is an AI-powered tool that helps you navigate the vast amount of information available in Scopus, allowing you to gain a deeper understanding of your research topic, generate new insights, and enhance your overall research experience.

Scopus AI accelerates the journey from inquiry to discovery, enabling you to push the boundaries of knowledge and drive innovation in your field.

Learn more about Scopus AI

database in literature review

Inform strategic research decisions

Scopus empowers organizations with unparalleled access to critical global research, which can be integrated with existing platforms to increase analysis and insights.

Its advanced suite of analytical tools helps users visualize, compare and export data to evaluate research output and trends, assisting in measuring research performance at the individual or institutional level, and helping inform strategic research decisions.

Learn more about Scopus data

Research leaders analysing information on a screen

Enhance research visibility

Scopus Author Profiles offer new insights into the reach and influence of research, helping to build a reliable body of work to support career goals. Once a profile is validated, Scopus takes over, automatically populating it and continuously building on an author's credentials.

Scopus is the only database to blend automated and manually curated data to generate current author profiles. This process allows us to deliver over 17m profiles that support accurate author searches in the same way you can search for articles: efficiently and easily.

Learn more about Author Profiles

database in literature review

Show journal, article & author influence

Scopus outperforms other abstract and citation databases by providing a broader range of research metrics covering nearly twice the number of peer-reviewed publications.

Using Scopus metrics, you can demonstrate the influence of your institution's scholarly output. Discover the details behind our metrics, giving you confidence in knowing how the numbers are derived.

Learn more about Scopus metrics

Computer engineers meeting

What's new?

It's here! CiteScore 2023 is now available, providing transparent insights into journal citation impact.

Learn more about CiteScore 2023

The academic community plays a crucial role in developing and testing Scopus AI, and we've tapped into user feedback to introduce new features .

Read more about the Scopus AI May 2024 release opens in new tab/window

We're thrilled to announce some significant changes coming to Scopus, specifically to the Citation Overview feature! These changes are designed to enhance your experience and the quality of data analysis.

Read more about changes to Citation Overview opens in new tab/window

Woman studying in library  with laptop

Why choose Scopus?

Industry-leading collection of scholarly abstracts and citations, comprehensive coverage, greater insights, independent review and selection, intuitive search, better tools, better results, more metrics, serve your organization's research and education needs, scopus for enterprise.

Scopus is available as a subscription only for organizations. Individuals, please contact your library or information resource center.

View Scopus for free

See what Scopus can do for you by visiting Scopus Preview for free.

"When it comes to measuring success, you can’t compare other products to Scopus — no other output metrics offer the same kind of depth and coverage ... faculty, department chairs, college deans, they are always amazed when they discover what’s possible." Read the full customer story opens in new tab/window

Headshot of Hector R. Perez-Gilbe

Hector R. Perez-Gilbe

Research Librarian for the Health Sciences, University of California, Irvine (USA)

Frequently asked questions

How do i get a complete list of titles indexed in scopus.

Use our free Scopus Preview opens in new tab/window to get a complete list of titles indexed in Scopus and access to Scopus metrics.

How do I request changes to an author profile?

To request changes to an author profile, follow the link to learn about our Author Profile Wizard opens in new tab/window .

How do I request a title correction on Scopus?

Learn more about making title corrections opens in new tab/window and changes opens in new tab/window .

How do I submit a journal, book or conference for indexing?

Visit the Scopus Content Policy & Selection page for more information about submitting a journal, book or conference for indexing.

Where can I find information about Scopus APIs?

To learn about Scopus APIs, please visit our Developer Portal opens in new tab/window .

Learn how Scopus can help your organization achieve its goals.

Librarian helping student

Related links

  • Reserve a study room
  • Library Account
  • Undergraduate Students
  • Graduate Students
  • Faculty & Staff

How to Conduct a Literature Review (Health Sciences and Beyond)

  • What is a Literature Review?
  • Developing a Research Question
  • Selection Criteria

Popular Databases

Finding additional databases.

  • Database Search
  • Documenting Your Search
  • Organize Key Findings
  • Reference Management

Below is a list of the most commonly used databases. Select one or more that align with the scope of your research discipline.

  • ERIC (Education Resources Information Center)

We also recommend that you consult your discipline's  research guide  for additional database suggestions. Below are a few research guides you may find useful, or you can browse our full list of research guides .

  • Dentistry by Erica Brody Last Updated Aug 22, 2024 112 views this year
  • Nursing by Roy Brown Last Updated Aug 20, 2024 3806 views this year
  • Medicine by John Cyrus Last Updated Sep 3, 2024 986 views this year
  • Pharmacy by Erica Brody Last Updated Jul 31, 2024 1424 views this year
  • << Previous: Selection Criteria
  • Next: Database Search >>
  • Last Updated: Mar 15, 2024 12:22 PM
  • URL: https://guides.library.vcu.edu/health-sciences-lit-review

Brown University Homepage

Organizing and Creating Information

  • Citation and Attribution

What Is a Literature Review?

Review the literature, write the literature review, further reading, learning objectives, attribution.

This guide is designed to:

  • Identify the sections and purpose of a literature review in academic writing
  • Review practical strategies and organizational methods for preparing a literature review

A literature review is a summary and synthesis of scholarly research on a specific topic. It should answer questions such as:

  • What research has been done on the topic?
  • Who are the key researchers and experts in the field?
  • What are the common theories and methodologies?
  • Are there challenges, controversies, and contradictions?
  • Are there gaps in the research that your approach addresses?

The process of reviewing existing research allows you to fine-tune your research question and contextualize your own work. Preparing a literature review is a cyclical process. You may find that the research question you begin with evolves as you learn more about the topic.

Once you have defined your research question , focus on learning what other scholars have written on the topic.

In order to  do a thorough search of the literature  on the topic, define the basic criteria:

  • Databases and journals: Look at the  subject guide  related to your topic for recommended databases. Review the  tutorial on finding articles  for tips. 
  • Books: Search BruKnow, the Library's catalog. Steps to searching ebooks are covered in the  Finding Ebooks tutorial .
  • What time period should it cover? Is currency important?
  • Do I know of primary and secondary sources that I can use as a way to find other information?
  • What should I be aware of when looking at popular, trade, and scholarly resources ? 

One strategy is to review bibliographies for sources that relate to your interest. For more on this technique, look at the tutorial on finding articles when you have a citation .

Tip: Use a Synthesis Matrix

As you read sources, themes will emerge that will help you to organize the review. You can use a simple Synthesis Matrix to track your notes as you read. From this work, a concept map emerges that provides an overview of the literature and ways in which it connects. Working with Zotero to capture the citations, you build the structure for writing your literature review.

Citation Concept/Theme Main Idea Notes 1 Notes 2 Gaps in the Research Quotation Page
               
               

How do I know when I am done?

A key indicator for knowing when you are done is running into the same articles and materials. With no new information being uncovered, you are likely exhausting your current search and should modify search terms or search different catalogs or databases. It is also possible that you have reached a point when you can start writing the literature review.

Tip: Manage Your Citations

These citation management tools also create citations, footnotes, and bibliographies with just a few clicks:

Zotero Tutorial

Endnote Tutorial

Your literature review should be focused on the topic defined in your research question. It should be written in a logical, structured way and maintain an objective perspective and use a formal voice.

Review the Summary Table you created for themes and connecting ideas. Use the following guidelines to prepare an outline of the main points you want to make. 

  • Synthesize previous research on the topic.
  • Aim to include both summary and synthesis.
  • Include literature that supports your research question as well as that which offers a different perspective.
  • Avoid relying on one author or publication too heavily.
  • Select an organizational structure, such as chronological, methodological, and thematic.

The three elements of a literature review are introduction, body, and conclusion.

Introduction

  • Define the topic of the literature review, including any terminology.
  • Introduce the central theme and organization of the literature review.
  • Summarize the state of research on the topic.
  • Frame the literature review with your research question.
  • Focus on ways to have the body of literature tell its own story. Do not add your own interpretations at this point.
  • Look for patterns and find ways to tie the pieces together.
  • Summarize instead of quote.
  • Weave the points together rather than list summaries of each source.
  • Include the most important sources, not everything you have read.
  • Summarize the review of the literature.
  • Identify areas of further research on the topic.
  • Connect the review with your research.
  • DeCarlo, M. (2018). 4.1 What is a literature review? In Scientific Inquiry in Social Work. Open Social Work Education. https://scientificinquiryinsocialwork.pressbooks.com/chapter/4-1-what-is-a-literature-review/
  • Literature Reviews (n.d.) https://writingcenter.unc.edu/tips-and-tools/literature-reviews/ Accessed Nov. 10, 2021

This guide was designed to: 

  • Identify the sections and purpose of a literature review in academic writing 
  • Review practical strategies and organizational methods for preparing a literature review​

Content on this page adapted from: 

Frederiksen, L. and Phelps, S. (2017).   Literature Reviews for Education and Nursing Graduate Students.  Licensed CC BY 4.0

  • << Previous: EndNote
  • Last Updated: Jul 17, 2024 3:55 PM
  • URL: https://libguides.brown.edu/organize

moBUL - Mobile Brown University Library

Brown University Library  |  Providence, RI 02912  |  (401) 863-2165  |  Contact  |  Comments  |  Library Feedback  |  Site Map

Library Intranet

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • How to Write a Literature Review | Guide, Examples, & Templates

How to Write a Literature Review | Guide, Examples, & Templates

Published on January 2, 2023 by Shona McCombes . Revised on September 11, 2023.

What is a literature review? A literature review is a survey of scholarly sources on a specific topic. It provides an overview of current knowledge, allowing you to identify relevant theories, methods, and gaps in the existing research that you can later apply to your paper, thesis, or dissertation topic .

There are five key steps to writing a literature review:

  • Search for relevant literature
  • Evaluate sources
  • Identify themes, debates, and gaps
  • Outline the structure
  • Write your literature review

A good literature review doesn’t just summarize sources—it analyzes, synthesizes , and critically evaluates to give a clear picture of the state of knowledge on the subject.

Instantly correct all language mistakes in your text

Upload your document to correct all your mistakes in minutes

upload-your-document-ai-proofreader

Table of contents

What is the purpose of a literature review, examples of literature reviews, step 1 – search for relevant literature, step 2 – evaluate and select sources, step 3 – identify themes, debates, and gaps, step 4 – outline your literature review’s structure, step 5 – write your literature review, free lecture slides, other interesting articles, frequently asked questions, introduction.

  • Quick Run-through
  • Step 1 & 2

When you write a thesis , dissertation , or research paper , you will likely have to conduct a literature review to situate your research within existing knowledge. The literature review gives you a chance to:

  • Demonstrate your familiarity with the topic and its scholarly context
  • Develop a theoretical framework and methodology for your research
  • Position your work in relation to other researchers and theorists
  • Show how your research addresses a gap or contributes to a debate
  • Evaluate the current state of research and demonstrate your knowledge of the scholarly debates around your topic.

Writing literature reviews is a particularly important skill if you want to apply for graduate school or pursue a career in research. We’ve written a step-by-step guide that you can follow below.

Literature review guide

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

database in literature review

Writing literature reviews can be quite challenging! A good starting point could be to look at some examples, depending on what kind of literature review you’d like to write.

  • Example literature review #1: “Why Do People Migrate? A Review of the Theoretical Literature” ( Theoretical literature review about the development of economic migration theory from the 1950s to today.)
  • Example literature review #2: “Literature review as a research methodology: An overview and guidelines” ( Methodological literature review about interdisciplinary knowledge acquisition and production.)
  • Example literature review #3: “The Use of Technology in English Language Learning: A Literature Review” ( Thematic literature review about the effects of technology on language acquisition.)
  • Example literature review #4: “Learners’ Listening Comprehension Difficulties in English Language Learning: A Literature Review” ( Chronological literature review about how the concept of listening skills has changed over time.)

You can also check out our templates with literature review examples and sample outlines at the links below.

Download Word doc Download Google doc

Before you begin searching for literature, you need a clearly defined topic .

If you are writing the literature review section of a dissertation or research paper, you will search for literature related to your research problem and questions .

Make a list of keywords

Start by creating a list of keywords related to your research question. Include each of the key concepts or variables you’re interested in, and list any synonyms and related terms. You can add to this list as you discover new keywords in the process of your literature search.

  • Social media, Facebook, Instagram, Twitter, Snapchat, TikTok
  • Body image, self-perception, self-esteem, mental health
  • Generation Z, teenagers, adolescents, youth

Search for relevant sources

Use your keywords to begin searching for sources. Some useful databases to search for journals and articles include:

  • Your university’s library catalogue
  • Google Scholar
  • Project Muse (humanities and social sciences)
  • Medline (life sciences and biomedicine)
  • EconLit (economics)
  • Inspec (physics, engineering and computer science)

You can also use boolean operators to help narrow down your search.

Make sure to read the abstract to find out whether an article is relevant to your question. When you find a useful book or article, you can check the bibliography to find other relevant sources.

You likely won’t be able to read absolutely everything that has been written on your topic, so it will be necessary to evaluate which sources are most relevant to your research question.

For each publication, ask yourself:

  • What question or problem is the author addressing?
  • What are the key concepts and how are they defined?
  • What are the key theories, models, and methods?
  • Does the research use established frameworks or take an innovative approach?
  • What are the results and conclusions of the study?
  • How does the publication relate to other literature in the field? Does it confirm, add to, or challenge established knowledge?
  • What are the strengths and weaknesses of the research?

Make sure the sources you use are credible , and make sure you read any landmark studies and major theories in your field of research.

You can use our template to summarize and evaluate sources you’re thinking about using. Click on either button below to download.

Take notes and cite your sources

As you read, you should also begin the writing process. Take notes that you can later incorporate into the text of your literature review.

It is important to keep track of your sources with citations to avoid plagiarism . It can be helpful to make an annotated bibliography , where you compile full citation information and write a paragraph of summary and analysis for each source. This helps you remember what you read and saves time later in the process.

Don't submit your assignments before you do this

The academic proofreading tool has been trained on 1000s of academic texts. Making it the most accurate and reliable proofreading tool for students. Free citation check included.

database in literature review

Try for free

To begin organizing your literature review’s argument and structure, be sure you understand the connections and relationships between the sources you’ve read. Based on your reading and notes, you can look for:

  • Trends and patterns (in theory, method or results): do certain approaches become more or less popular over time?
  • Themes: what questions or concepts recur across the literature?
  • Debates, conflicts and contradictions: where do sources disagree?
  • Pivotal publications: are there any influential theories or studies that changed the direction of the field?
  • Gaps: what is missing from the literature? Are there weaknesses that need to be addressed?

This step will help you work out the structure of your literature review and (if applicable) show how your own research will contribute to existing knowledge.

  • Most research has focused on young women.
  • There is an increasing interest in the visual aspects of social media.
  • But there is still a lack of robust research on highly visual platforms like Instagram and Snapchat—this is a gap that you could address in your own research.

There are various approaches to organizing the body of a literature review. Depending on the length of your literature review, you can combine several of these strategies (for example, your overall structure might be thematic, but each theme is discussed chronologically).

Chronological

The simplest approach is to trace the development of the topic over time. However, if you choose this strategy, be careful to avoid simply listing and summarizing sources in order.

Try to analyze patterns, turning points and key debates that have shaped the direction of the field. Give your interpretation of how and why certain developments occurred.

If you have found some recurring central themes, you can organize your literature review into subsections that address different aspects of the topic.

For example, if you are reviewing literature about inequalities in migrant health outcomes, key themes might include healthcare policy, language barriers, cultural attitudes, legal status, and economic access.

Methodological

If you draw your sources from different disciplines or fields that use a variety of research methods , you might want to compare the results and conclusions that emerge from different approaches. For example:

  • Look at what results have emerged in qualitative versus quantitative research
  • Discuss how the topic has been approached by empirical versus theoretical scholarship
  • Divide the literature into sociological, historical, and cultural sources

Theoretical

A literature review is often the foundation for a theoretical framework . You can use it to discuss various theories, models, and definitions of key concepts.

You might argue for the relevance of a specific theoretical approach, or combine various theoretical concepts to create a framework for your research.

Like any other academic text , your literature review should have an introduction , a main body, and a conclusion . What you include in each depends on the objective of your literature review.

The introduction should clearly establish the focus and purpose of the literature review.

Depending on the length of your literature review, you might want to divide the body into subsections. You can use a subheading for each theme, time period, or methodological approach.

As you write, you can follow these tips:

  • Summarize and synthesize: give an overview of the main points of each source and combine them into a coherent whole
  • Analyze and interpret: don’t just paraphrase other researchers — add your own interpretations where possible, discussing the significance of findings in relation to the literature as a whole
  • Critically evaluate: mention the strengths and weaknesses of your sources
  • Write in well-structured paragraphs: use transition words and topic sentences to draw connections, comparisons and contrasts

In the conclusion, you should summarize the key findings you have taken from the literature and emphasize their significance.

When you’ve finished writing and revising your literature review, don’t forget to proofread thoroughly before submitting. Not a language expert? Check out Scribbr’s professional proofreading services !

This article has been adapted into lecture slides that you can use to teach your students about writing a literature review.

Scribbr slides are free to use, customize, and distribute for educational purposes.

Open Google Slides Download PowerPoint

If you want to know more about the research process , methodology , research bias , or statistics , make sure to check out some of our other articles with explanations and examples.

  • Sampling methods
  • Simple random sampling
  • Stratified sampling
  • Cluster sampling
  • Likert scales
  • Reproducibility

 Statistics

  • Null hypothesis
  • Statistical power
  • Probability distribution
  • Effect size
  • Poisson distribution

Research bias

  • Optimism bias
  • Cognitive bias
  • Implicit bias
  • Hawthorne effect
  • Anchoring bias
  • Explicit bias

A literature review is a survey of scholarly sources (such as books, journal articles, and theses) related to a specific topic or research question .

It is often written as part of a thesis, dissertation , or research paper , in order to situate your work in relation to existing knowledge.

There are several reasons to conduct a literature review at the beginning of a research project:

  • To familiarize yourself with the current state of knowledge on your topic
  • To ensure that you’re not just repeating what others have already done
  • To identify gaps in knowledge and unresolved problems that your research can address
  • To develop your theoretical framework and methodology
  • To provide an overview of the key findings and debates on the topic

Writing the literature review shows your reader how your work relates to existing research and what new insights it will contribute.

The literature review usually comes near the beginning of your thesis or dissertation . After the introduction , it grounds your research in a scholarly field and leads directly to your theoretical framework or methodology .

A literature review is a survey of credible sources on a topic, often used in dissertations , theses, and research papers . Literature reviews give an overview of knowledge on a subject, helping you identify relevant theories and methods, as well as gaps in existing research. Literature reviews are set up similarly to other  academic texts , with an introduction , a main body, and a conclusion .

An  annotated bibliography is a list of  source references that has a short description (called an annotation ) for each of the sources. It is often assigned as part of the research process for a  paper .  

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

McCombes, S. (2023, September 11). How to Write a Literature Review | Guide, Examples, & Templates. Scribbr. Retrieved September 3, 2024, from https://www.scribbr.com/dissertation/literature-review/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, what is a theoretical framework | guide to organizing, what is a research methodology | steps & tips, how to write a research proposal | examples & templates, "i thought ai proofreading was useless but..".

I've been using Scribbr for years now and I know it's a service that won't disappoint. It does a good job spotting mistakes”

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

sustainability-logo

Article Menu

database in literature review

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Literature review on collaborative project delivery for sustainable construction: bibliometric analysis.

database in literature review

1. Introduction

2. literature review, 2.1. collaborative project delivery, 2.2. design build (db), 2.3. construction manager at risk (cmar), 2.4. integrated project delivery method (ipd), 2.5. sustainability, 2.6. sustainable construction, 2.7. benefits of eci comparing case studies, 2.8. collaborative delivery models, 3. methodology, 3.1. research methods, 3.2. database research, 4.1. ipd, design-build, and cmar overview, 4.1.1. yearly publication distribution of db cmar and ipd, 4.1.2. major country analysis, 4.1.3. most relevant and influential journals, 4.1.4. corresponding author countries, 4.2. keyword analysis, 4.2.1. high-frequency keyword analysis, 4.2.2. co-occurrence network analysis, 4.2.3. analysis of keywords’ frequency over time, 5. discussion, 5.1. findings of advantages and disadvantages of ipd, db, and cmar for sustainable construction, 5.1.1. advantages of ipd, 5.1.2. advantages of design-build, 5.1.3. advantages of construction manager at risk, 5.1.4. disadvantages of ipd, 5.1.5. disadvantages of design-build, 5.1.6. disadvantages of construction manager at risk, 5.2. most suitable cpd technique for sustainable construction based on literature review, 5.2.1. limitations, 5.2.2. recommendations for future research, 6. future trend, 6.1. enhancing innovation through collaborative project delivery, 6.2. open communication and block chain technology, 6.3. multi-party agreement, 6.4. utilizing artificial intelligence in decision support systems, 7. conclusions, author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest.

  • Giachino, J.; Cecil, M.; Husselbee, B.; Matthews, C. Alternative Project Delivery: Construction Management at Risk, Design-Build and Public-Private Partnerships. In Proceedings of the Utility Management Conference 2016, San Diego, CA, USA, 24–26 February 2016. [ Google Scholar ]
  • Shrestha, P.P.; Maharjan, R.; Batista, J.R. Performance of Design-Build and Construction Manager-at-Risk Methods in Water and Wastewater Projects. Pract. Period. Struct. Des. Constr. 2019 , 24 , 04018029. [ Google Scholar ] [ CrossRef ]
  • Shrestha, P.P.; Batista, J. Lessons Learned in Design-Build and Construction-Manager-at-Risk Water and Wastewater Project. J. Leg. Aff. Dispute Resolut. Eng. Constr. 2020 , 12 , 04520002. [ Google Scholar ] [ CrossRef ]
  • Xia, B.; Chan, A.P.C. Identification of Selection Criteria for Operational Variations of The Design-Build System: A Delphi Study in China. J. Civ. Eng. Manag. 2012 , 18 , 173–183. [ Google Scholar ] [ CrossRef ]
  • Shane, J.S.; Bogus, S.M.; Molenaar, K.R. Municipal Water/Wastewater Project Delivery Performance Comparison. J. Manag. Eng. 2013 , 29 , 251–258. [ Google Scholar ] [ CrossRef ]
  • Sullivan, J.; El Asmar, M.; Chalhoub, J.; Obeid, H. Two Decades of Performance Comparisons for Design-Build, Construction Manager at Risk, and Design-Bid-Build: Quantitative Analysis of the State of Knowledge on Project Cost, Schedule, and Quality. J. Constr. Eng. Manag. 2017 , 143 , 04017009. [ Google Scholar ] [ CrossRef ]
  • Raouf, A.M.; Al-Ghamdi, S. Effectiveness of Project Delivery Systems in Executing Green Buildings. J. Constr. Eng. Manag. 2019 , 145 , 03119005. [ Google Scholar ] [ CrossRef ]
  • Francom, T.; El Asmar, M.; Ariaratnam, S.T. Performance Analysis of Construction Manager at Risk on Pipeline Engineering and Construction Projects. J. Manag. Eng. 2016 , 32 , 04016016. [ Google Scholar ] [ CrossRef ]
  • Gransberg, D.D.; Shane, J.S.; Transportation Research Board. Construction Manager-at-Risk Project Delivery for Highway Programs ; The National Academies Press: Washington, DC, USA, 2010. [ Google Scholar ]
  • Rahman, M.M.; Kumaraswamy, M.M. Potential for Implementing Relational Contracting and Joint Risk Management. J. Manag. Eng. 2004 , 20 , 178–189. [ Google Scholar ] [ CrossRef ]
  • Feghaly, J.; El Asmar, M.; Ariaratnam, S.; Bearup, W. Selecting project delivery methods for water treatment plants. Eng. Constr. Archit. Manag. 2019 , 27 , 936–951. [ Google Scholar ] [ CrossRef ]
  • Park, H.-S.; Lee, D.; Kim, S.; Kim, J.-L. Comparing Project Performance of Design-Build and Design-Bid-Build Methods for Large-sized Public Apartment Housing Projects in Korea. J. Asian Archit. Build. Eng. 2015 , 14 , 323–330. [ Google Scholar ] [ CrossRef ]
  • Shrestha, P.P.; Batista, J.; Maharajan, R. Risks involved in using alternative project delivery (APD) methods in water and wastewater projects. Procedia Eng. 2016 , 145 , 219–223. [ Google Scholar ] [ CrossRef ]
  • Hettiaarachchige, N.; Rathnasinghe, A.; Ranadewa, K.; Thurairajah, N. Thurairajah, Lean Integrated Project Delivery for Construction Procurement: The Case of Sri Lanka. Buildings 2022 , 12 , 524. [ Google Scholar ] [ CrossRef ]
  • Kent, D.C.; Becerik-Gerber, B. Understanding Construction Industry Experience and Attitudes toward Integrated Project Delivery. J. Constr. Eng. Manag. 2010 , 136 , 815–825. [ Google Scholar ] [ CrossRef ]
  • Franz, B.; Leicht, R.; Molenaar, K.; Messner, J. Impact of Team Integration and Group Cohesion on Project Delivery Performance. J. Constr. Eng. Manag. 2017 , 143 , 04016088. [ Google Scholar ] [ CrossRef ]
  • Engebø, A.; Klakegg, O.J.; Lohne, J.; Lædre, O. A collaborative project delivery method for design of a high-performance building. Int. J. Manag. Proj. Bus. 2020 , 13 , 1141–1165. [ Google Scholar ] [ CrossRef ]
  • Ahmed, S.; El-Sayegh, S. Critical Review of the Evolution of Project Delivery Methods in the Construction Industry. Buildings 2020 , 11 , 11. [ Google Scholar ] [ CrossRef ]
  • Bond-Barnard, T.J.; Fletcher, L.; Steyn, H. Linking trust and collaboration in project teams to project management success. Int. J. Manag. Proj. Bus. 2018 , 11 , 432–457. [ Google Scholar ] [ CrossRef ]
  • Rodrigues, M.R.; Lindhard, S.M. Lindhard, Benefits and challenges to applying IPD: Experiences from a Norwegian mega-project. Constr. Innov. 2021 , 23 , 287–305. [ Google Scholar ] [ CrossRef ]
  • Kaminsky, J. The fourth pillar of infrastructure sustainability: Tailoring civil infrastructure to social context. Constr. Manag. Econ. 2015 , 33 , 299–309. [ Google Scholar ] [ CrossRef ]
  • Al Khalil, M.I. Selecting the appropriate project delivery method using AHP. Int. J. Proj. Manag. 2002 , 20 , 469–474. [ Google Scholar ] [ CrossRef ]
  • Ibbs, C.W.; Kwak, Y.H.; Ng, T.; Odabasi, A.M. Project Delivery Systems and Project Change: Quantitative Analysis. J. Constr. Eng. Manag. 2003 , 129 , 382–387. [ Google Scholar ] [ CrossRef ]
  • Jansen, J.; Beck, A. Overcoming the Challenges of Large Diameter Water Project in North Texas via CMAR Delivery Method. In Proceedings of the Pipelines 2020, San Antonio, TX, USA, 9–12 August 2020; Conference Held Virtually. pp. 264–271. [ Google Scholar ] [ CrossRef ]
  • Bingham, E.; Gibson, G.E.; Asmar, M.E. Measuring User Perceptions of Popular Transportation Project Delivery Methods Using Least Significant Difference Intervals and Multiple Range Tests. J. Constr. Eng. Manag. 2018 , 144 , 04018033. [ Google Scholar ] [ CrossRef ]
  • Cho, Y.J. A review of construction delivery systems: Focus on the construction management at risk system in the Korean public construction market. KSCE J. Civ. Eng. 2016 , 20 , 530–537. [ Google Scholar ] [ CrossRef ]
  • Rosayuru, H.D.R.R.; Waidyasekara, K.G.A.S.; Wijewickrama, M.K.C.S. Sustainable BIM based integrated project delivery system for construction industry in Sri Lanka. Int. J. Constr. Manag. 2022 , 22 , 769–783. [ Google Scholar ] [ CrossRef ]
  • Pishdad-Bozorgi, P.; Beliveau, Y.J. Symbiotic Relationships between Integrated Project Delivery (IPD) and Trust. Int. J. Constr. Educ. Res. 2016 , 12 , 179–192. [ Google Scholar ] [ CrossRef ]
  • Sherif, M.; Abotaleb, I.; Alqahtani, F.K. Alqahtani, Application of Integrated Project Delivery (IPD) in the Middle East: Implementation and Challenges. Buildings 2022 , 12 , 467. [ Google Scholar ] [ CrossRef ]
  • Manata, B.; Garcia, A.J.; Mollaoglu, S.; Miller, V.D. The effect of commitment differentiation on integrated project delivery team dynamics: The critical roles of goal alignment, communication behaviors, and decision quality. Int. J. Proj. Manag. 2021 , 39 , 259–269. [ Google Scholar ] [ CrossRef ]
  • Kraatz, J.A.; Sanchez, A.X.; Hampson, K.D. Hampson, Digital Modeling, Integrated Project Delivery and Industry Transformation: An Australian Case Study. Buildings 2014 , 4 , 453–466. [ Google Scholar ] [ CrossRef ]
  • Zhang, L.; He, J.; Zhou, S. Sharing Tacit Knowledge for Integrated Project Team Flexibility: Case Study of Integrated Project Delivery. J. Constr. Eng. Manag. 2013 , 139 , 795–804. [ Google Scholar ] [ CrossRef ]
  • El Asmar, M.; Hanna, A.S.; Loh, W.-Y. Quantifying Performance for the Integrated Project Delivery System as Compared to Established Delivery Systems. J. Constr. Eng. Manag. 2013 , 139 , 04013012. [ Google Scholar ] [ CrossRef ]
  • Ghassemi, R.; Becerik-Gerber, B. Transitioning to integrated project delivery: Potential barriers and lessons learned. Lean Constr. J. 2011 , 32–52. Available online: https://leanconstruction.org/resources/lean-construction-journal/lcj-back-issues/2011-issue/ (accessed on 11 August 2024).
  • Mei, T.; Guo, Z.; Li, P.; Fang, K.; Zhong, S. Influence of Integrated Project Delivery Principles on Project Performance in China: An SEM-Based Approach. Sustainability 2022 , 14 , 4381. [ Google Scholar ] [ CrossRef ]
  • Ilozor, B.D.; Kelly, D.J. Building information modeling and integrated project delivery in the commercial construction industry: A conceptual study. J. Eng. Proj. Prod. Manag. 2012 , 2 , 23–36. [ Google Scholar ] [ CrossRef ]
  • Zabihi, H.; Habib, F.; Mirsaeedie, L. Sustainability in Building and Construction: Revising Definitions and Concepts. Int. J. Emerg. Sci. 2012 , 2 , 570–578. [ Google Scholar ]
  • Young, J.W.S. A Framework for the Ultimate Environmental Index—Putting Atmospheric Change Into Context With Sustainability. Environ. Monit. Assess. 1997 , 46 , 135–149. [ Google Scholar ] [ CrossRef ]
  • Ding, G.K.C. Sustainable construction—The role of environmental assessment tools. J. Environ. Manag. 2008 , 86 , 451–464. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Conte, E. The Era of Sustainability: Promises, Pitfalls and Prospects for Sustainable Buildings and the Built Environment. Sustainability 2018 , 10 , 2092. [ Google Scholar ] [ CrossRef ]
  • Standardized Method of Life Cycle Costing for Construction Procurement. A Supplement to BS ISO 15686-5. Buildings and Constructed Assets. Service Life Planning. Life Cycle Costing ; BSI British Standards: London, UK, 2008. [ CrossRef ]
  • Sustainability|Free Full-Text|A Hybrid Multi-Criteria Decision Support System for Selecting the Most Sustainable Structural Material for a Multistory Building Construction. Available online: https://www.mdpi.com/2071-1050/15/4/3128 (accessed on 2 April 2024).
  • Korkmaz, S.; Riley, D.; Horman, M. Piloting Evaluation Metrics for Sustainable High-Performance Building Project Delivery. J. Constr. Eng. Manag. 2010 , 136 , 877–885. [ Google Scholar ] [ CrossRef ]
  • Ng, M.S.; Graser, K.; Hall, D.M. Digital fabrication, BIM and early contractor involvement in design in construction projects: A comparative case study. Archit. Eng. Des. Manag. 2021 , 19 , 39–55. [ Google Scholar ] [ CrossRef ]
  • Moradi, S.; Kähkönen, K.; Sormunen, P. Analytical and Conceptual Perspectives toward Behavioral Elements of Collaborative Delivery Models in Construction Projects. Buildings 2022 , 12 , 316. [ Google Scholar ] [ CrossRef ]
  • Zupic, I.; Čater, T. Bibliometric Methods in Management and Organization. 2015. Available online: https://journals.sagepub.com/doi/abs/10.1177/1094428114562629 (accessed on 3 April 2024).
  • Rozas, L.W.; Klein, W.C. The Value and Purpose of the Traditional Qualitative Literature Review. J. Evid.-Based Soc. Work 2010 , 7 , 387–399. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Cobo, M.J.; López-Herrera, A.G.; Herrera-Viedma, E.; Herrera, F. Science mapping software tools: Review, analysis, and cooperative study among tools. J. Am. Soc. Inf. Sci. Technol. 2011 , 62 , 1382–1402. [ Google Scholar ] [ CrossRef ]
  • Cancino, C.A.; Merigó, J.M.; Coronado, F.C. A bibliometric analysis of leading universities in innovation research. J. Innov. Knowl. 2017 , 2 , 106–124. [ Google Scholar ] [ CrossRef ]
  • Pedro, L.F.M.G.; Barbosa, C.M.M.d.O.; Santos, C.M.d.N. A critical review of mobile learning integration in formal educational contexts. Int. J. Educ. Technol. High. Educ. 2018 , 15 , 10. [ Google Scholar ] [ CrossRef ]
  • Wen, S.; Tang, H.; Ying, F.; Wu, G. Exploring the Global Research Trends of Supply Chain Management of Construction Projects Based on a Bibliometric Analysis: Current Status and Future Prospects. Buildings 2023 , 13 , 373. [ Google Scholar ] [ CrossRef ]
  • Hosseini, M.R.; Martek, I.; Zavadskas, E.K.; Aibinu, A.A.; Arashpour, M.; Chileshe, N. Critical evaluation of off-site construction research: A Scientometric analysis. Autom. Constr. 2018 , 87 , 235–247. [ Google Scholar ] [ CrossRef ]
  • Toyin, J.O.; Mewomo, M.C. Mewomo, Overview of BIM contributions in the construction phase: Review and bibliometric analysis. J. Inf. Technol. Constr. 2023 , 28 , 500–514. [ Google Scholar ] [ CrossRef ]
  • Kahvandi, Z.; Saghatforoush, E.; Alinezhad, M.; Noghli, F. Integrated Project Delivery (IPD) Research Trends. J. Eng. 2017 , 7 , 99–114. [ Google Scholar ] [ CrossRef ]
  • Hale, D.R.; Shrestha, P.P.; Gibson, G.E.; Migliaccio, G.C. Empirical Comparison of Design/Build and Design/Bid/Build Project Delivery Methods. J. Constr. Eng. Manag. 2009 , 135 , 579–587. [ Google Scholar ] [ CrossRef ]
  • Mollaoglu-Korkmaz, S.; Swarup, L.; Riley, D. Delivering Sustainable, High-Performance Buildings: Influence of Project Delivery Methods on Integration and Project Outcomes. J. Manag. Eng. 2013 , 29 , 71–78. [ Google Scholar ] [ CrossRef ]
  • Ugwu, O.O.; Haupt, T.C. Key performance indicators and assessment methods for infrastructure sustainability—a South African construction industry perspective. Build. Environ. 2007 , 42 , 665–680. [ Google Scholar ] [ CrossRef ]
  • Kines, P.; Andersen, L.P.S.; Spangenberg, S.; Mikkelsen, K.L.; Dyreborg, J.; Zohar, D. Improving construction site safety through leader-based verbal safety communication. J. Safety Res. 2010 , 41 , 399–406. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Ballard, G. The Lean Project Delivery System: An Update. 2008. [ Google Scholar ]
  • Bynum, P.; Issa, R.R.A.; Olbina, S. Building information modeling in support of sustainable design and construction. J. Constr. Eng. Manag. 2013 , 139 , 24–34. [ Google Scholar ] [ CrossRef ]
  • Choudhry, R.M.; Fang, D.; Lingard, H. Measuring Safety Climate of a Construction Company. J. Constr. Eng. Manag. 2009 , 135 , 890–899. [ Google Scholar ] [ CrossRef ]
  • Wardani, M.A.E.; Messner, J.I.; Horman, M.J. Comparing procurement methods for Design-Build projects. J. Constr. Eng. Manag. 2006 , 132 , 230–238. [ Google Scholar ] [ CrossRef ]
  • Liu, J.; Zhao, X.; Yan, P. Risk Paths in International Construction Projects: Case Study from Chinese Contractors. J. Constr. Eng. Manag. 2016 , 142 . [ Google Scholar ] [ CrossRef ]
  • El-Sayegh, S. Evaluating the effectiveness of project delivery methods. J. Constr. Manag. Econ. 2008 , 23 , 457–465. [ Google Scholar ]
  • Fang, C.; Marle, F.; Zio, E.; Bocquet, J.-C. Network theory-based analysis of risk interactions in large engineering projects. Reliability Eng. Syst. Safety 2012 , 106 , 1–10. [ Google Scholar ] [ CrossRef ]
  • Franz, B.; Leicht, R.M. Initiating IPD Concepts on Campus Facilities with a ‘Collaboration Addendum’. In Proceedings of the Construction Research Congress 2012, West Lafayette, IN, USA, 21–23 May 2012; pp. 61–70. [ Google Scholar ] [ CrossRef ]
  • Kim, H.; Kim, K.; Kim, H. Vision-Based Object-Centric Safety Assessment Using Fuzzy Inference: Monitoring Struck-By Accidents with Moving Objects. J. Comput. Civil Eng. 2016 , 30 . [ Google Scholar ] [ CrossRef ]
  • Zhou, Y.; Ding, L.Y.; Chen, L.J. Application of 4D visualization technology for safety management in metro construction. Automation Constr. 2013 , 34 , 25–36. [ Google Scholar ] [ CrossRef ]
  • Wanberg, J.; Harper, C.; Hallowell, M.R.; Rajendran, S. Relationship between Construction Safety and Quality Performance. J. Constr. Eng. Manag. 2013 , 139 . [ Google Scholar ] [ CrossRef ]
  • Shrestha, P.P.; O’Connor, J.T.; Gibson, G.E. Performance comparison of large Design-Build and Design-Bid-Build highway projects. J. Constr. Eng. Manag. 2012 , 138 , 1–13. [ Google Scholar ] [ CrossRef ]
  • Torabi, S.A.; Hassini, E. Multi-site production planning integrating procurement and distribution plans in multi-echelon supply chains: An interactive fuzzy goal programming approach. Int. J. Prod. Res. 2009 , 47 , 5475–5499. [ Google Scholar ] [ CrossRef ]
  • Baradan, S.; Usmen, M. Comparative Injury and Fatality Risk Analysis of Building Trades. J. Constr. Eng. Manag.-ASCE 2006 , 132 . [ Google Scholar ] [ CrossRef ]
  • Levitt, R.E. CEM Research for the Next 50 Years: Maximizing Economic, Environmental, and Societal Value of the Built Environment1. J. Constr. Eng. Manag. 2007 , 133 , 619–628. [ Google Scholar ] [ CrossRef ]
  • Araya, F. Modeling the spread of COVID-19 on construction workers: An agent-based approach. Saf. Sci. 2021 , 133 , 105022. [ Google Scholar ] [ CrossRef ]
  • Zheng, X.; Le, Y.; Chan, A.P.; Hu, Y.; Li, Y. Review of the application of social network analysis (SNA) in construction project management research. Int. J. Proj. Manag. 2016 , 34 , 1214–1225. [ Google Scholar ] [ CrossRef ]
  • Elghaish, F.; Abrishami, S. A centralised cost management system: Exploiting EVM and ABC within IPD. Eng. Constr. Archit. Manag. 2021 , 28 , 549–569. [ Google Scholar ] [ CrossRef ]
  • Smith, R.E.; Mossman, A.; Emmitt, S. Lean and integrated project delivery. Lean Constr. J. 2011 , 1–16. [ Google Scholar ]
  • Bröchner, J.; Badenfelt, U. Changes and change management in construction and IT projects. Autom. Constr. 2011 , 20 , 767–775. [ Google Scholar ] [ CrossRef ]
  • Monteiro, A.; Mêda, P.; Martins, J.P. Framework for the coordinated application of two different integrated project delivery platforms. Autom. Constr. 2014 , 38 , 87–99. [ Google Scholar ] [ CrossRef ]
  • Azhar, N.; Kang, Y.; Ahmad, I.U. Factors influencing integrated project delivery in publicly owned construction projects: An information modelling perspective. Procedia Eng. 2014 , 77 , 213–221. [ Google Scholar ] [ CrossRef ]
  • Mihic, M.; Sertic, J.; Zavrski, I. Integrated Project Delivery as Integration between Solution Development and Solution Implementation. Procedia Soc. Behav. Sci. 2014 , 119 , 557–565. [ Google Scholar ] [ CrossRef ]
  • Nawi, M.N.M.; Haron, A.T.; Hamid, Z.A.; Kamar, K.A.M.; Baharuddin, Y. Improving integrated practice through building information modeling-integrated project delivery (BIM-IPD) for Malaysian industrialised building system (IBS) Construction Projects. Malays. Constr. Res. J. 2014 , 15 , 29–38. Available online: https://dsgate.uum.edu.my/jspui/handle/123456789/1651 (accessed on 24 April 2024).
  • Ma, Z.; Zhang, D.; Li, J. A dedicated collaboration platform for Integrated Project Delivery. Autom. Constr. 2018 , 86 , 199–209. [ Google Scholar ] [ CrossRef ]
  • Yadav, S.; Kanade, G. Application of Revit as Building Information Modeling (BIM) for Integrated Project Delivery (IPD) to Building Construction Project—A Review. Int. Res. J. Eng. Technol. 2018 , 5 , 11–14. [ Google Scholar ]
  • Salim, M.S.; Mahjoob, A.M.R. Integrated project delivery (IPD) method with BIM to improve the project performance: A case study in the Republic of Iraq. Asian J. Civ. Eng. 2020 , 21 , 947–957. [ Google Scholar ] [ CrossRef ]
  • Ling, Y.Y.; Lau, B.S.Y. A case study on the management of the development of a large-scale power plant project in East Asia based on design-build arrangement. Int. J. Proj. Manag. 2002 , 20 , 413–423. [ Google Scholar ] [ CrossRef ]
  • Dalui, P.; Elghaish, F.; Brooks, T.; McIlwaine, S. Integrated Project Delivery with BIM: A Methodical Approach Within the UK Consulting Sector. J. Inf. Technol. Constr. 2021 , 26 , 922–935. [ Google Scholar ] [ CrossRef ]
  • Pishdad-Bozorgi, P. Case Studies on the Role of Integrated Project Delivery (IPD) Approach on the Establishment and Promotion of Trust. Int. J. Constr. Educ. Res. 2017 , 13 , 102–124. [ Google Scholar ] [ CrossRef ]
  • Singleton, M.S.; Hamzeh, F.R. Implementing integrated project delivery on department of the navy construction projects: Lean Construction Journal. Lean Constr. J. 2011 , 17–31. [ Google Scholar ]
  • Tran, D.Q.; Nguyen, L.D.; Faught, A. Examination of communication processes in design-build project delivery in building construction. Eng. Constr. Archit. Manag. 2017 , 24 , 1319–1336. [ Google Scholar ] [ CrossRef ]
  • Park, J.; Kwak, Y.H. Design-Bid-Build (DBB) vs. Design-Build (DB) in the U.S. public transportation projects: The choice and consequences. Int. J. Proj. Manag. 2017 , 35 , 280–295. [ Google Scholar ] [ CrossRef ]
  • Wiss, R.A.; Roberts, R.T.; Phraner, S.D. Beyond Design-Build-Operate-Maintain: New Partnership Approach Toward Fixed Guideway Transit Projects. Transp. Res. Rec. J. Transp. Res. Board 2000 , 1704 , 13–18. [ Google Scholar ] [ CrossRef ]
  • Xia, B.; Chan, A.P. Key competences of design-build clients in China. J. Facil. Manag. 2010 , 8 , 114–129. [ Google Scholar ] [ CrossRef ]
  • DeBernard, D.M. Beyond Collaboration—The Benefits of Integrated Project Delivery ; AIA Soloso Website: Washington, DC, USA, 2008. [ Google Scholar ]
  • Chen, Q.; Jin, Z.; Xia, B.; Wu, P.; Skitmore, M. Time and Cost Performance of Design–Build Projects. J. Constr. Eng. Manag. 2016 , 142 , 04015074. [ Google Scholar ] [ CrossRef ]
  • Xia, B.; Chan, P. Review of the design-build market in the People’s Republic of China. J. Constr. Procure. 2008 , 14 , 108–117. [ Google Scholar ]
  • Mcwhirt, D.; Ahn, J.; Shane, J.S.; Strong, K.C. Military construction projects: Comparison of project delivery methods. J. Facil. Manag. 2011 , 9 , 157–169. [ Google Scholar ] [ CrossRef ]
  • Minchin, R.E.; Li, X.; Issa, R.R.; Vargas, G.G. Comparison of Cost and Time Performance of Design-Build and Design-Bid-Build Delivery Systems in Florida. J. Constr. Eng. Manag. 2013 , 139 , 04013007. [ Google Scholar ] [ CrossRef ]
  • Adamtey, S.; Onsarigo, L. Effective tools for projects delivered by progressive design-build method. In Proceedings of the CSCE Annual Conference 2019, Laval, QC, Canada, 12–15 June 2019; pp. 1–10. [ Google Scholar ]
  • Adamtey, S.A. A Case Study Performance Analysis of Design-Build and Integrated Project Delivery Methods. Int. J. Constr. Educ. Res. 2021 , 17 , 68–84. [ Google Scholar ] [ CrossRef ]
  • Gad, G.M.; Adamtey, S.A.; Gransberg, D.D. Gransberg, Trends in Quality Management Approaches to Design–Build Transportation Projects. Transp. Res. Rec. J. Transp. Res. Board. 2015 , 2504 , 87–92. [ Google Scholar ] [ CrossRef ]
  • Sari, E.M.; Irawan, A.P.; Wibowo, M.A.; Siregar, J.P.; Praja, A.K.A. Project delivery systems: The partnering concept in integrated and non-integrated construction projects. Sustainability 2022 , 15 , 86. [ Google Scholar ] [ CrossRef ]
  • Chakra, H.A.; Ashi, A. Comparative analysis of design/build and design/bid/build project delivery systems in Lebanon. J. Ind. Eng. Int. 2019 , 15 , 147–152. [ Google Scholar ] [ CrossRef ]
  • Perkins, R.A. Sources of Changes in Design–Build Contracts for a Governmental Owner. J. Constr. Eng. Manag. 2009 , 135 , 588–593. [ Google Scholar ] [ CrossRef ]
  • Palaneeswaran, E.; Kumaraswamy, M.M. Contractor Selection for Design/Build Projects. J. Constr. Eng. Manag. 2000 , 126 , 331–339. [ Google Scholar ] [ CrossRef ]
  • Chan, A.P.C. Evaluation of enhanced design and build system a case study of a hospital project. Constr. Manag. Econ. 2000 , 18 , 863–871. [ Google Scholar ] [ CrossRef ]
  • Shrestha, P.P.; Davis, B.; Gad, G.M. Investigation of Legal Issues in Construction-Manager-at-Risk Projects: Case Study of Airport Projects. J. Leg. Aff. Dispute Resolut. Eng. Constr. 2020 , 12 , 04520022. [ Google Scholar ] [ CrossRef ]
  • Marston, S. CMAR Project Delivery Method Generates Team Orientated Project Management with Win/Win Mentality. In Proceedings of the Pipelines 2020, San Antonio, TX, USA, 9–12 August 2020; pp. 167–170. [ Google Scholar ] [ CrossRef ]
  • Francom, T.; El Asmar, M.; Ariaratnam, S.T. Ariaratnam, Longitudinal Study of Construction Manager at Risk for Pipeline Rehabilitation. J. Pipeline Syst. Eng. Pract. 2017 , 8 , 04017001. [ Google Scholar ] [ CrossRef ]
  • Peña-Mora, F.; Tamaki, T. Effect of Delivery Systems on Collaborative Negotiations for Large-Scale Infrastructure Projects. J. Manag. Eng. 2001 , 17 , 105–121. [ Google Scholar ] [ CrossRef ]
  • Mahdi, I.M.; Alreshaid, K. Decision support system for selecting the proper project delivery method using analytical hierarchy process (AHP). Int. J. Proj. Manag. 2005 , 23 , 564–572. [ Google Scholar ] [ CrossRef ]
  • Randall, T.; Pool, S.; Limke, J.; Bradney, A. CMaR Delivery of Critical Water and Wastewater Pipelines. In Proceedings of the Pipelines 2020, San Antonio, TX, USA, 9–12 August 2020; Conference Held Virtually. pp. 280–289. [ Google Scholar ] [ CrossRef ]
  • Perrenoud, A.; Reyes, M.; Ghosh, S.; Coetzee, M. Collaborative Risk Management of the Approval Process of Building Envelope Materials. In Proceedings of the AEI 2017, Oklahoma City, OK, USA, 11–13 April 2017; pp. 806–816. [ Google Scholar ] [ CrossRef ]
  • Parrott, B.C.; Bomba, M.B. Integrated Project Delivery and Building Information Modeling: A New Breed of Contract. 2010. Available online: https://content.aia.org/sites/default/files/2017-03/Integrated%20project%20delivery%20and%20BIM-%20A%20new%20breed%20of%20contract.pdf (accessed on 18 November 2023).
  • Cheng, R. IPD Case Studies. Report. March 2012. Available online: http://conservancy.umn.edu/handle/11299/201408 (accessed on 1 May 2024).
  • Lee, H.W.; Anderson, S.M.; Kim, Y.-W.; Ballard, G. Ballard, Advancing Impact of Education, Training, and Professional Experience on Integrated Project Delivery. Pract. Period. Struct. Des. Constr. 2014 , 19 , 8–14. [ Google Scholar ] [ CrossRef ]
  • Hoseingholi, M.; Jalal, M.P. Jalal, Identification and Analysis of Owner-Induced Problems in Design–Build Project Lifecycle. J. Leg. Aff. Dispute Resolut. Eng. Constr. 2017 , 9 , 04516013. [ Google Scholar ] [ CrossRef ]
  • Öztaş, A.; Ökmen, Ö. Risk analysis in fixed-price design–build construction projects. Build. Environ. 2004 , 39 , 229–237. [ Google Scholar ] [ CrossRef ]
  • Lee, D.-E.; Arditi, D. Total Quality Performance of Design/Build Firms Using Quality Function Deployment. J. Constr. Eng. Manag. 2006 , 132 , 49–57. [ Google Scholar ] [ CrossRef ]
  • Garner, B.; Richardson, K.; Castro-Lacouture, D. Design-Build Project Delivery in Military Construction: Approach to Best Value Procurement. J. Adv. Perform. Inf. Value 2008 , 1 , 35–50. [ Google Scholar ] [ CrossRef ]
  • Graham, P. Evaluation of Design-Build Practice in Colorado Project IR IM(CX) 025-3(113) ; Colorado Department of Transportation: Denver, CO, USA, 2001. [ Google Scholar ]
  • Parami Dewi, A.; Too, E.; Trigunarsyah, B. Implementing design build project delivery system in Indonesian road infrastructure projects. In Innovation and Sustainable Construction in Developing Countries (CIB W107 Conference 2011) ; Uwakweh, B.O., Ed.; Construction Publishing House/International Council for Research and Innovation in Building and C: Hanoi, Vietnam, 2011; pp. 108–117. [ Google Scholar ]
  • Arditi, D.; Lee, D.-E. Assessing the corporate service quality performance of design-build contractors using quality function deployment. Constr. Manag. Econ. 2003 , 21 , 175–185. [ Google Scholar ] [ CrossRef ]
  • Rao, T. . Is Design-Build Right for Your Next WWW Project? presented at the WEFTEC 2009, Water Environment Federation. January 2009, pp. 6444–6458. Available online: https://www.accesswater.org/publications/proceedings/-297075/is-design-build-right-for-your-next-www-project- (accessed on 3 April 2024).
  • Touran, A.; Molenaar, K.R.; Gransberg, D.D.; Ghavamifar, K. Decision Support System for Selection of Project Delivery Method in Transit. Transp. Res. Rec. 2009 , 2111 , 148–157. [ Google Scholar ] [ CrossRef ]
  • Culp, G. Alternative Project Delivery Methods for Water and Wastewater Projects: Do They Save Time and Money? Leadersh. Manag. Eng. 2011 , 11 , 231–240. [ Google Scholar ] [ CrossRef ]
  • Ling, F.Y.Y.; Poh, B.H.M. Problems encountered by owners of design–build projects in Singapore. Int. J. Proj. Manag. 2008 , 26 , 164–173. [ Google Scholar ] [ CrossRef ]
  • Pishdad-Bozorgi, P.; de la Garza, J.M. Comparative Analysis of Design-Bid-Build and Design-Build from the Standpoint of Claims. In Proceedings of the Construction Research Congress 2012, West Lafayette, IN, USA, 21–23 May 2012. [ Google Scholar ] [ CrossRef ]
  • Walewski, J.; Gibson, G.E., Jr.; Jasper, J. Project Delivery Methods and Contracting Approaches Available for Implementation by the Texas Department of Transportation. University of Texas at Austin. Center for Transportation Research. 2001. Available online: https://rosap.ntl.bts.gov/view/dot/14863 (accessed on 3 April 2024).
  • Alleman, D.; Antoine, A.; Gransberg, D.D.; Molenaar, K.R. Comparison of Qualifications-Based Selection and Best-Value Procurement for Construction Manager–General Contractor Highway Construction. 2017. Available online: https://journals.sagepub.com/doi/abs/10.3141/2630-08 (accessed on 2 April 2024).
  • Gransberg, N.J.; Gransberg, D.D. Public Project Construction Manager-at-Risk Contracts: Lessons Learned from a Comparison of Commercial and Infrastructure Projects. J. Leg. Aff. Dispute Resolut. Eng. Constr. 2020 , 12 , 04519039. [ Google Scholar ] [ CrossRef ]
  • Anderson, S.D.; Damnjanovic, I. Selection and Evaluation of Alternative Contracting Methods to Accelerate Project Completion ; The National Academies Press: Washington, DC, USA, 2008; Available online: http://elibrary.pcu.edu.ph:9000/digi/NA02/2008/23075.pdf (accessed on 26 April 2024).
  • Shrestha, P.P.; Batista, J.; Maharjan, R. Impediments in Using Design-Build or Construction Management-at-Risk Delivery Methods for Water and Wastewater Projects. In Proceedings of the Construction Research Congress 2016, San Juan, PR, USA, 31 May–2 June 2016; pp. 380–387. [ Google Scholar ] [ CrossRef ]
  • Chateau, L. Environmental acceptability of beneficial use of waste as construction material—State of knowledge, current practices and future developments in Europe and in France. J. Hazard. Mater. 2007 , 139 , 556–562. [ Google Scholar ] [ CrossRef ]
  • Lam, T.I.; Chan, H.W.E.; Chau, C.K.; Poon, C.S. An Overview of the Development of Green Specifications in the Construction Industry. In Proceedings of the International Conference on Urban Sustainability [ICONUS], 1 January 2008; pp. 295–301. Available online: https://research.polyu.edu.hk/en/publications/an-overview-of-the-development-of-green-specifications-in-the-con (accessed on 2 May 2024).
  • Tabish, S.Z.S.; Jha, K.N. Success Traits for a Construction Project. J. Constr. Eng. Manag. 2012 , 138 , 1131–1138. [ Google Scholar ] [ CrossRef ]
  • Niroumand, H.; Zain, M.; Jamil, M. A guideline for assessing of critical parameters on Earth architecture and Earth buildings as a sustainable architecture in various countries. Renew. Sustain. Energy Rev. 2013 , 28 , 130–165. [ Google Scholar ] [ CrossRef ]
  • Rogulj, K.; Jajac, N. Achieving a Construction Barrier–Free Environment: Decision Support to Policy Selection. J. Manag. Eng. 2018 , 34 , 04018020. [ Google Scholar ] [ CrossRef ]
  • Sackey, S.; Kim, B.-S. Environmental and Economic Performance of Asphalt Shingle and Clay Tile Roofing Sheets Using Life Cycle Assessment Approach and TOPSIS. J. Constr. Eng. Manag. 2018 , 144 , 04018104. [ Google Scholar ] [ CrossRef ]
  • Carretero-Ayuso, M.J.; García-Sanz-Calcedo, J.; Rodríguez-Jiménez, C.E. Rodríguez-Jiménez, Characterization and Appraisal of Technical Specifications in Brick Façade Projects in Spain. J. Perform. Constr. Facil. 2018 , 32 , 04018012. [ Google Scholar ] [ CrossRef ]
  • Golabchi, A.; Guo, X.; Liu, M.; Han, S.; Lee, S.; AbouRizk, S. An integrated ergonomics framework for evaluation and design of construction operations. Autom. Constr. 2018 , 95 , 72–85. [ Google Scholar ] [ CrossRef ]
  • Jha, K.; Iyer, K. Commitment, coordination, competence and the iron triangle. Int. J. Proj. Manag. 2007 , 25 , 527–540. [ Google Scholar ] [ CrossRef ]
  • Tabassi, A.A.; Ramli, M.; Roufechaei, K.M.; Tabasi, A.A. Team development and performance in construction design teams: An assessment of a hierarchical model with mediating effect of compensation. Constr. Manag. Econ. 2014 , 32 , 932–949. [ Google Scholar ] [ CrossRef ]
  • Chen, Y.; Okudan, G.E.; Riley, D.R. Sustainable performance criteria for construction method selection in concrete buildings. Autom. Constr. 2010 , 19 , 235–244. [ Google Scholar ] [ CrossRef ]
  • Doloi, H.; Sawhney, A.; Iyer, K.; Rentala, S. Analysing factors affecting delays in Indian construction projects. Int. J. Proj. Manag. 2012 , 30 , 479–489. [ Google Scholar ] [ CrossRef ]
  • Kog, Y.C.; Loh, P.K. Critical Success Factors for Different Components of Construction Projects. J. Constr. Eng. Manag. 2012 , 138 , 520–528. [ Google Scholar ] [ CrossRef ]
  • Gunduz, M.; Almuajebh, M. Critical success factors for sustainable construction project management. Sustainability 2020 , 12 , 1990. [ Google Scholar ] [ CrossRef ]
  • Cao, D.; Li, H.; Wang, G.; Luo, X.; Tan, D. Relationship Network Structure and Organizational Competitiveness: Evidence from BIM Implementation Practices in the Construction Industry. J. Manag. Eng. 2018 , 34 , 04018005. [ Google Scholar ] [ CrossRef ]
  • Clevenger, C.M. Development of a Project Management Certification Plan for a DOT. J. Manag. Eng. 2018 , 34 , 06018002. [ Google Scholar ] [ CrossRef ]
  • Bygballe, L.E.; Swärd, A. Collaborative Project Delivery Models and the Role of Routines in Institutionalizing Partnering. Proj. Manag. J. 2019 , 50 , 161–176. [ Google Scholar ] [ CrossRef ]
  • Collins, W.; Parrish, K. The Need for Integrated Project Delivery in the Public Sector. In Proceedings of the Construction Research Congress 2014, Atlanta, GA, USA, 19–21 May 2014; pp. 719–728. [ Google Scholar ] [ CrossRef ]
  • Turk, Ž.; Klinc, R. Potentials of Blockchain Technology for Construction Management. Procedia Eng. 2017 , 196 , 638–645. [ Google Scholar ] [ CrossRef ]
  • Elghaish, F.; Abrishami, S.; Hosseini, M.R. Integrated project delivery with blockchain: An automated financial system. Autom. Constr. 2020 , 114 , 103182. [ Google Scholar ] [ CrossRef ]
  • Fish, A. Integrated Project Delivery: The Obstacles of Implementation. May 2011. Available online: http://hdl.handle.net/2097/8554 (accessed on 3 April 2024).
  • Pan, Y.; Zhang, L. Roles of artificial intelligence in construction engineering and management: A critical review and future trends. Autom. Constr. 2020 , 122 , 103517. [ Google Scholar ] [ CrossRef ]
  • Mellit, A.; Kalogirou, S.A. Artificial intelligence techniques for photovoltaic applications: A review. Prog. Energy Combust. Sci. 2008 , 34 , 574–632. [ Google Scholar ] [ CrossRef ]
  • Smith, C.J.; Wong, A.T.C. Advancements in Artificial Intelligence-Based Decision Support Systems for Improving Construction Project Sustainability: A Systematic Literature Review. Informatics 2022 , 9 , 43. [ Google Scholar ] [ CrossRef ]
  • Villa, F. Semantically driven meta-modelling: Automating model construction in an environmental decision support system for the assessment of ecosystem services flows. In Information Technologies in Environmental Engineering ; Athanasiadis, I.N., Rizzoli, A.E., Mitkas, P.A., Gómez, J.M., Eds.; Springer: Berlin, Heidelberg, 2009; pp. 23–36. [ Google Scholar ]
  • Minhas, M.R.; Potdar, V. Decision Support Systems in Construction: A Bibliometric Analysis. Buildings 2020 , 10 , 108. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

PaperReferenceTotal Citation
TC
TC Per YearNormalized TC
Kent D.C., 2010, J Constr Eng Manage(Kent and Becerik-Gerber, 2010) [ ]30021.437.67
Ugwu O.O., 2007, Build Environ(Ugwu and Haupt, 2007) [ ]26915.827.69
Kines P., 2010, J Saf Res(Kines et al., 2010) [ ]23817.006.08
Asmar M., 2013, J Constr Eng Manag(Asmar et al., 2013) [ ]22620.555.01
Ballard G., 2008, Lean Constr J(Ballard, 2008) [ ]22113.816.85
Hale D.R., 2009, J Constr Eng Manag(Hale et al., 2009) [ ]21114.076.95
Bynum P., 2013, J Constr Eng Manag(Bynum et al., 2013) [ ]18516.824.11
Ibbs C.W., 2003, J Constr Eng Manag(Ibbs et al., 2003) [ ]1838.718.58
Choudry R.M., 2009, J Constr Eng Manag(Choudhry et al., 2009) [ ]18212.136.00
Mollaoglu-Korkmaz S., 2013, J Manage Eng(Mollaoglu-Korkmaz et al., 2013) [ ]15213.823.37
El Wardani M.A., 2006, J Constr Eng Manag(El Wardani et al., 2006) [ ]1448.004.65
Ghassemi R., 2011, Lean Constr J(Ghassemi and Becerik-Gerber, 2011) [ ]14311.005.54
Liu J., 2016, J Constr Eng Manag(Liu et al., 2016) [ ]14017.505.12
El-Sayegh S.M., 2015, J Manag Eng(El-Sayegh and Mansour, 2015) [ ]13515.006.59
Fang C., 2012, Reliab Eng Syst Saf(Fang et al., 2012) [ ]13110.924.05
Franz B., 2017, J Constr Eng Manag(Franz et al., 2017) [ ]12618.005.56
Kim H., 2016, J Comput Civ Eng(Kim et al., 2016) [ ]12515.634.57
Ding L.Y., 2013, Autom Constr(Ding and Zhou, 2013) [ ]11810.732.62
Wanberg J., 2013, J Constr Eng Manag(Wanberg et al., 2013) [ ]11610.552.57
Shrestha, P.P., 2012, J Constr Eng Manag(Shrestha et al., 2012) [ ]1129.333.47
Torabi S.A., 2009, Int J Prod Res(Torabi and Hassini, 2009) [ ]1057.003.46
Baradan S., 2006, J Constr Eng Manag(Baradan and Usmen, 2006) [ ]995.503.20
Levitt R.E., 2007, J Constr Eng Manag(Levitt, 2007) [ ]975.712.77
Sullivan J., 2017, J Constr Eng Manag(Sullivan et al., 2017) [ ]9313.294.11
Araya F., 2021, Saf Sci(Araya, 2021) [ ]9230.679.5
Country Frequency
USA584
CHINA167
UK101
AUSTRALIA71
SOUTH KOREA56
CANADA51
IRAN39
MALAYSIA39
INDIA30
SOUTH AFRICA22
SPAIN22
FINLAND18
FRANCE17
DENMARK16
EGYPT16
SWEDEN16
INDONESIA15
NETHERLANDS14
NEW ZEALAND14
BRAZIL13
GERMANY13
NIGERIA13
UNITED ARAB ENIRATES13
JORDAN12
SAUDI ARABIA12
CountryTCAverage Article Citations
USA493323.70
CHINA110618.10
UNITED KINGDOM76319.10
HONG KONG70337.00
AUSTRALIA49421.50
SOUTH KOREA31216.00
IRAN19852.00
SPAIN19115.20
SWEDEN18821.20
PAKISTAN18220.90
FRANCE164182.00
UNITED ARAB EMIRATES16332.80
MALAYSIA15432.60
INDIA14515.40
SINGAPORE13013.20
CANADA10743.30
ITALY927.60
LEBANON9218.40
NETHERLANDS9118.40
NORWAY7418.20
IPD Advantages
Advantages% Percentage of Advantages from Ordered List of PublicationPublication List
Collaborative atmosphere and fairness79B = [ ] C = [ ] D = [ ] E = [ ] F = [ ] G = [ ] H = [ ] I = [ ] J = [ ] K = [ ] L = [ ] M = [ ] N = [ ] O = [ ] P = [ ] Q = [ ] R = [ ] S = [ ] T = [ ] U = [ ] V = [ ]
Early involvement of stakeholders63B = [ ] C = [ ] D = [ ] E = [ ] F = [ ] G = [ ] H = [ ] I = [ ] J = [ ] L = [ ] M = [ ] N = [ ] O U = [ ] V = [ ] W = [ ]
Promoting trust25R = [ ] S = [ ] U = [ ] V = [ ] W = [ ] X = [ ]
Reduce schedule time42C = [ ] D = [ ] E = [ ] F = [ ] G = [ ] H = [ ] I = [ ] J = [ ] S = [ ] T = [ ]
Reduce waste42C = [ ] D = [ ] E = [ ] F = [ ] G = [ ] H = [ ] I = [ ] J = [ ] S = [ ] T = [ ]
Shared cost, risk reward, and responsibilities75C = [ ] D = [ ] E = [ ] F = [ ] G = [ ] H = [ ] I = [ ] J = [ ] S = [ ] T = [ ] U = [ ] V = [ ] W = [ ] X = [ ]
Multi-party agreement and noncompetitive bidding54C = [ ] D = [ ] E = [ ] F = [ ] G = [ ] H = [ ] I = [ ] J = [ ] K = [ ] N = [ ] Q = [ ] T = [ ] V = [ ]
Integrated decision-making for designs and shared design responsibilities38C = [ ] D = [ ] E = [ ] H = [ ] I = [ ] J = [ ] L = [ ] P = [ ] T = [ ]
Open communication and time management38D = [ ] E = [ ] F = [ ] O = [ ] R = [ ] S = [ ] T = [ ] U = [ ] V = [ ]
Reduce project duration and liability by fast-tracking design and construction25F = [ ] G = [ ] L = [ ] O = [ ] S = V
Shared manpower and changes in SOW, equipment rentage, and change orders17A = [ ] F = [ ] G = [ ] Q = [ ]
Information sharing and technological impact38A = [ ] D = [ ] G = KLMPRV
Fast problem resolution through an integrated approach21B = [ ] C = [ ] D = [ ] E = [ ] S = [ ]
Lowest cost delivery and project cost33A = [ ] C = [ ] F = [ ] G = [ ] L = [ ] P = [ ] Q = [ ] S = [ ] T = [ ] U = [ ]
Improved efficiency and reduced errors29B = [ ] C = [ ] F = [ ] L = [ ] Q = [ ] S = [ ] T = [ ]
Combined risk pool estimated maximum price (allowable cost)17A = [ ] L = [ ] P = [ ] Q = [ ]
Cooperation innovation and coordination46CEFLPQRSTUV
Combined labor material cost estimation, budgeting, and profits25A = [ ] D = [ ] P = [ ] S = [ ] T = [ ] U = [ ] V = [ ]
Strengthened relationship and self-governance17C = [ ] D = [ ] F = [ ]
Fewer change orders, Schedules, and request for information21L = [ ] O = [ ] Q = [ ] T = [ ] V = [ ]
Ordered list of publication A = [ ] B = [ ] C = [ ] D = [ ] E = [ ] F = [ ] G = [ ] H = [ ] I = [ ] J = [ ] K = [ ] L = [ ] M = [ ] N = [ ] O = [ ] P = [ ] Q = [ ] R = [ ] S = [ ] T = [ ] U = [ ] V = [ ] W = [ ] X = [ ]
DB Advantages
Disadvantages%Percentage of Advantages from Ordered List of PublicationPublication List
Single point of accountability for the design and construction39CDIJMOQRT C = [ ] D = [ ] I = [ ] J = [ ] M = [ ] O = [ ] Q = [ ] R = [ ] T = [ ]
Produces time saving schedule52CDHJKLMORSTV C = [ ] D = [ ] H = [ ] J = [ ] K = [ ] L = [ ] M = [ ] O = [ ] R = [ ] S = [ ] T = [ ] V = [ ]
Cost effective projects39CKLMNOPQSV C = [ ] K = [ ] L = [ ] M = [ ] N = [ ] O = [ ] P = [ ] Q = [ ] S = [ ] V = [ ]
Design build functions as a single Entity8DF D = [ ] F = [ ]
Enhances quality and mitigates design errors21F = [ ] J = [ ] S = [ ] V = [ ] W = [ ] F = [ ]
Facilitates teamwork between owner and design builder 30J = [ ] N = [ ] P = [ ] S = [ ] U = [ ] V = [ ] W = [ ]
Insight into constructability of the design build contractor (Early involvement of contractor)13H = [ ] I = [ ] T = [ ]
Enhances fast tracking4R = [ ]
Good coordination and decision-making27C = [ ] D = [ ] E = [ ] M = [ ] O = [ ] Q = [ ]
Clients’ owner credibility13A = [ ] C = [ ] G = [ ]
Dispute reduction mitigates disputes21B = [ ] H = [ ] I = [ ] J = [ ] Q = [ ]
Ordered list of publication A = [ ] B = [ ] C = [ ] D = [ ] E = [ ] F = [ ] G = [ ] H = [ ] I = [ ] J = [ ] K = [ ] L = [ ] M = [ ] N = [ ] O = [ ] P = [ ] Q = [ ] R = [ ] S = [ ] T = [ ] U = [ ] V = [ ] W = [ ]
CMAR Advantages
AdvantagesPercentage of Advantages from the Ordered List of PublicationPublication List
Early stakeholder involvement 31H = [ ] I = [ ] L = [ ] M = [ ] O = [ ]
Fast-tracking cost savings and delivery within budget50A = [ ] B = [ ] C = [ ] D = [ ] F = [ ] I = [ ] M = [ ] O = [ ]
Reduce project duration by fast-tracking design and construction6C = [ ]
Clients have control over the design details and early knowledge of costs50B = [ ] C = [ ] D = [ ] H = [ ] I = [ ] K = [ ] M = [ ] P = [ ]
Mitigates against change order50A = [ ] C = [ ] E = [ ] H = [ ] I = [ ] K = [ ] M = [ ] P = [ ]
Provides a GMP by considering the risk of price31A = [ ] B = [ ] C = [ ] M = [ ] O = [ ]
Reduces design cost and redesigning cost25C = [ ] D = [ ] E = [ ] H = [ ]
Facilitates schedule management75B = [ ] C = [ ] D = [ ] E = [ ] F = [ ] G = [ ] H = [ ] I = [ ] J = [ ] K = [ ] M = [ ] N = [ ]
Facilitates cost control and transparency 69C = [ ] D = [ ] E = [ ] F = [ ] G = [ ] H = [ ] I = [ ] J = [ ] K = [ ] M = [ ] N = [ ]
Single point of responsibility for construction and joint team orientation for accountability44A = [ ] B = [ ] E = [ ] F = [ ] I = [ ] M = [ ] N = [ ]
Facilitates Collaboration25E = [ ] F = [ ] I = [ ] J = [ ]
Ordered list of publication A = [ ] B = [ ] C = [ ] D = [ ] E = [ ] F = [ ] G = [ ] H = [ ] I = [ ] J = [ ] K = [ ] L = [ ] M = [ ] N = [ ] O = [ ] P = [ ]
IPD Disadvantages
Disadvantages% Percentage of Disadvantages from Ordered List of PublicationPublication List
Impossibility of being sued internally over disputes and mistrust, alongside complexities in compensation and resource distribution42C = [ ] E = [ ] F = [ ] I = [ ] L = [ ]
Skepticism of the added value of IPD and impossibility of owners’ inability to tap into financial reserves from shared risk funds50E = [ ] F = [ ] G = [ ] J = [ ] K = [ ] L = [ ]
Difficulty in deciding scope17A = [ ] H = [ ]
Difficulty in deciding target cost/Budgeting25A = [ ] D = [ ] H = [ ]
Adversarial team relationships and legality issues50B = [ ] C = [ ] D = [ ] F = [ ] K = [ ] L = [ ]
Immature insurance policy for IPD and uneasiness to produce a coordinating document25A = [ ] J = [ ] K = [ ]
Fabricated drawings in place of engineering drawings because of too early interactions8F = [ ]
High initial cost of investment in setting up IPD team and difficulty in replacing a member of IPD team16J = [ ] L = [ ]
Inexperience in initiating/developing an IPD team and knowledge level16K = [ ] L = [ ]
Low adoption of IPD due to cultural, financial, and technological barriers33E = [ ] F = [ ] K = [ ] L = [ ]
High degree of risks amongst teams coming together for IPD and owners responsible for claims, damages, and expenses (liabilities)25D = [ ] F = [ ] L = [ ]
Issues with poor collaboration8H = [ ]
Non-adaptability to IPD environment42E = [ ] G = [ ] J = [ ] K = [ ] L = [ ]
Ordered list of publication A = [ ] B = [ ] C = [ ] D = [ ] E = [ ] F = [ ] G = [ ] H = [ ] I = [ ] J = [ ] K = [ ] L = [ ]
DB Disadvantages
DisadvantagesPercentage of Disadvantages from Ordered List of PublicationPublication List
Non-competitive selection of team not dependent on best designs of professionals and general contractors35B = [ ] C = [ ] D = [ ] E = [ ] G = [ ] I = [ ] J = [ ] K = [ ] L = [ ] M = [ ] O = [ ] P = [ ] Q = [ ] R = [ ] S = [ ]
Deficient checks, balances, and insurance among the designer, general contractor, and owner30A = [ ] B = [ ] C = [ ] D = [ ] E = [ ] F = [ ] G = [ ] H = [ ] I = [ ] J = [ ] L = [ ] M = [ ] N = [ ] U = V
Unfair allocation of risk and high startup cost40R = [ ] C = [ ] S = [ ]
Architect/Engineer(A/E) not related to clients/owners with no control over the design requirements. A/E has less control or influence over the final design and project requirements60C = [ ] D = [ ] E = [ ] F = [ ] G = [ ] H = [ ] I = [ ] J = [ ] S = [ ]
Owner cannot guarantee the quality of the finished project35C = [ ] D = [ ] E = [ ] F = [ ] G = [ ] H = [ ] I = [ ] J = [ ] S = [ ]
Difficulty in defining SOW, and alterations in the designs after the contract and during construction with decrease in time35C = [ ] D = [ ] E = [ ] F = [ ] G = [ ] H = [ ] I = [ ] J = [ ] K = [ ] M = [ ] N = [ ]
Difficulty in providing track record for design and construction40C = [ ] D = [ ] E = [ ] F = [ ] G = [ ] H = [ ] I = [ ] J = [ ] K = [ ] N = [ ]
Discrepancy in quality control and testing intensive of owner’s viewpoint25C = [ ] D = [ ] E = [ ] H = [ ] I = [ ] J = [ ] K = [ ] N = [ ]
Delay in design changes, inflexibility, and the absence of a detailed design35D = [ ] E = [ ] F = [ ] O = [ ] R = [ ] S = [ ]
Owner/client needs external support to develop SOW/preliminary design of the project 10E = [ ] F = [ ] L = [ ] O = [ ] S = [ ]
Increased labour costs and tender prices5A = [ ] F = [ ] G = [ ] Q = [ ]
Guaranteed maximum price is established with Incomplete designs and work requirement25A = [ ] D = [ ] G = [ ] K = [ ] L = [ ] M = [ ] P = [ ] R = [ ]
Responsibility of contractor for omission and changes in design20A = [ ] B = [ ] C = [ ] D = [ ] S = [ ]
Ordered list of publication A = [ ] B = [ ] C = [ ] D = [ ] E = [ ] F = [ ] G = [ ] H = [ ] I = [ ] J = [ ] K = [ ] L = [ ] M = [ ] N = [ ] O = [ ] P = [ ] Q = [ ] R = [ ] S = [ ]
CMAR Disadvantages
Disadvantages% Percentage of Advantages from Ordered List of PublicationPublication List
Unclear definition and relationship of roles and responsibilities of CM and design professionals78A = [ ] B = [ ] C = [ ] D = [ ] G = [ ] H = [ ] I = [ ]
Difficult to enforce GMP, SOW, and construction based on incomplete documents67A = [ ] D = [ ] E = [ ] G = [ ] H = [ ] I = [ ]
Not suitable for small projects or hold trade contractors over GMP tradeoffs and prices56B = [ ] C = [ ] G = [ ] H = [ ] I = [ ]
Improper education on CMAR methodology, polices, and regulations56E = [ ] F = [ ] G = [ ] H = [ ] I = [ ]
Knowledge, conflicts, and communication issues between the designer and the CM 56B = [ ] E = [ ] F = [ ] G = [ ] H = [ ]
Shift of responsibilities (including money) from owners/clients to CM44A = [ ] B = [ ] E = [ ] I = [ ]
Additional cost due to design and construction and design defects56A = [ ] C = [ ] D = [ ] G = [ ] H = [ ]
Inability of CMAR to self-perform during preconstruction 11C = [ ]
Disputes/issues concerning construction quality and the completeness of the design22A = [ ] D = [ ]
No information exchange/alignment between the A/E with the CMAR11A = [ ]
Ordered list of publication A = [ ] B = [ ] C = [ ] D = [ ] E = [ ] F = [ ] G = [ ] H = [ ] I = [ ]
Critical Success Factors for Sustainable Construction
AdvantagesPercentage of Advantages from Ordered List of Publication %Publication List
Collaborative atmosphere47A = [ ] C = [ ] G = [ ] H = [ ] K = [ ] N = [ ] O = [ ]
Early stakeholder involvement26N = [ ] J = [ ] I = [ ]
Reduce design errors13N = [ ] O = [ ]
Cost savings and delivery within budget/Client representative 33ABCEF A = [ ] B = [ ] C = [ ]
Influence of client 13B = [ ] J = [ ]
Ordered list of publication A = [ ] B = [ ] C = [ ] D = [ ] E = [ ] F = [ ] G = [ ] H = [ ] I = [ ] J = [ ] K = [ ] L = [ ] M = [ ] N = [ ] O = [ ] P = [ ] Q = [ ]
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Babalola, O.G.; Alam Bhuiyan, M.M.; Hammad, A. Literature Review on Collaborative Project Delivery for Sustainable Construction: Bibliometric Analysis. Sustainability 2024 , 16 , 7707. https://doi.org/10.3390/su16177707

Babalola OG, Alam Bhuiyan MM, Hammad A. Literature Review on Collaborative Project Delivery for Sustainable Construction: Bibliometric Analysis. Sustainability . 2024; 16(17):7707. https://doi.org/10.3390/su16177707

Babalola, Olabode Gafar, Mohammad Masfiqul Alam Bhuiyan, and Ahmed Hammad. 2024. "Literature Review on Collaborative Project Delivery for Sustainable Construction: Bibliometric Analysis" Sustainability 16, no. 17: 7707. https://doi.org/10.3390/su16177707

Article Metrics

Further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Open access
  • Published: 03 September 2024

Financial fraud detection through the application of machine learning techniques: a literature review

  • Ludivia Hernandez Aros   ORCID: orcid.org/0000-0002-1571-3439 1 ,
  • Luisa Ximena Bustamante Molano   ORCID: orcid.org/0009-0001-2038-8730 2 ,
  • Fernando Gutierrez-Portela   ORCID: orcid.org/0000-0003-3722-3809 2 ,
  • John Johver Moreno Hernandez   ORCID: orcid.org/0000-0002-8742-7781 1 &
  • Mario Samuel Rodríguez Barrero   ORCID: orcid.org/0000-0001-9356-6764 3  

Humanities and Social Sciences Communications volume  11 , Article number:  1130 ( 2024 ) Cite this article

Metrics details

  • Business and management

Financial fraud negatively impacts organizational administrative processes, particularly affecting owners and/or investors seeking to maximize their profits. Addressing this issue, this study presents a literature review on financial fraud detection through machine learning techniques. The PRISMA and Kitchenham methods were applied, and 104 articles published between 2012 and 2023 were examined. These articles were selected based on predefined inclusion and exclusion criteria and were obtained from databases such as Scopus, IEEE Xplore, Taylor & Francis, SAGE, and ScienceDirect. These selected articles, along with the contributions of authors, sources, countries, trends, and datasets used in the experiments, were used to detect financial fraud and its existing types. Machine learning models and metrics were used to assess performance. The analysis indicated a trend toward using real datasets. Notably, credit card fraud detection models are the most widely used for detecting credit card loan fraud. The information obtained by different authors was acquired from the stock exchanges of China, Canada, the United States, Taiwan, and Tehran, among other countries. Furthermore, the usage of synthetic data has been low (less than 7% of the employed datasets). Among the leading contributors to the studies, China, India, Saudi Arabia, and Canada remain prominent, whereas Latin American countries have few related publications.

Similar content being viewed by others

database in literature review

Feature generation and contribution comparison for electronic fraud detection

database in literature review

A synthetic data set to benchmark anti-money laundering methods

database in literature review

A numeric-based machine learning design for detecting organized retail fraud in digital marketplaces

Introduction.

Financial fraud represents a highly significant problem, resulting in grave consequences across business sectors and impacting people’s daily lives (Singh et al., 2022 ). Its occurrence leads to reduced confidence in the economy, resulting in destabilization and direct economic repercussions for stakeholders (Reurink, 2018 ). Abdallah et al. ( 2016 ) define fraud as a criminal act aimed at obtaining money unlawfully. There are diverse types of fraud, such as asset misappropriation, expense reimbursement, and financial statement manipulation. Scholars have classified fraud into three categories: banking, corporate, and insurance (Ali et al., 2022 ; Nicholls et al., 2021 ; West and Bhattacharya, 2016 ).

The problem becomes evident in the case of financial fraud, evidenced by the 2022 figures of the PricewaterhouseCoopers survey report revealing that 56% of companies globally have fallen victim to some form of fraud. In Latin America, 32% of companies have experienced fraud (PricewaterhouseCoopers, 2022 ). These alarming statistics align with the findings from Klynveld Peat Marwick Goerdeler (KPMG), indicating that 83% of the surveyed executives reported being targeted by cyber-attacks in the past 12 months. Furthermore, 71% had encountered some type of internal or external fraud (KPMG, 2022 ). These survey results reveal the higher risks of financial fraud faced by companies in Latin America, the United States, and Canada. In this context, traditional approaches, and techniques, as well as manual methods, have lost relevance and effectiveness because they cannot effectively address the complexity and scale of the information involved in detecting financial fraud.

As previously mentioned, despite the interest of organizations in detecting financial fraud using machine learning (ML), current knowledge in this field remains limited. After an initial research phase, specialized literature shows that most researchers have directed their efforts toward the analysis of credit card fraud using a supervised approach (Femila Roseline et al., 2022 ; Madhurya et al., 2022 ; Plakandaras et al., 2022 ; Saragih et al., 2019 ). In the studies of Ali et al. ( 2022 ), Hilal et al. ( 2022 ), and Ramírez-Alpízar et al. ( 2020 ), ML techniques employing the supervised approach were found to be the most widely used method for detecting financial fraud, compared to the unsupervised, deep learning, reinforcement, and semi-supervised approaches, among others. Moreover, scholars such as Whiting et al. ( 2012 ) have compared the performance of data mining models for detecting fraudulent financial statements using data from quarterly and annual financial indexes of public companies from the COMPUSTAT database.

Reurink ( 2018 ) has analyzed financial fraud resulting from false financial reports, scams, and misleading financial sales in the context of the financial market. Just like Wadhwa et al. ( 2020 ), he presented a wide variety of data mining methods, approaches, and techniques used in fraud detection, in addition to research addressing online banking fraud (Zhou et al., 2018 ; Moreira et al., 2022 ; Srokosz et al., 2023 ) and financial statement fraud (S. Chen, 2016 ; Ramírez-Alpízar et al., 2020 ). The abovementioned research works show that the accuracy of ML techniques in developing models for detecting financial fraud has increased (Al-Hashedi and Magalingam, 2021 ).

The effectiveness of financial fraud detection and prevention depends on the effective selection of appropriate ML techniques to identify new threats and minimize false fraud alarm warnings, responding to the negative impact of financial fraud on organizations (Ahmed et al., 2016 ). The use of ML techniques has made it possible to identify patterns and anomalies in large financial data sets. However, developments in detection tools, inaccurate classification, detection methods, privacy, computer performance, and disproportionate misclassification costs continue to hinder the accurate and timely detection of financial fraud (Dantas et al., 2022 ; Mongwe and Malan, 2020 ; Nicholls et al., 2021 ; West and Bhattacharya, 2016 ).

Recently, several studies have reviewed financial statement fraud detection methods in data mining and ML (Gupta and Mehta, 2021 ; Shahana et al., 2023 ); however, the present study is different from these past works in the area. These authors established the types of financial fraud and the different data mining techniques and approaches used to detect financial statement fraud. In contrast, our study explains the trends in the use of ML approaches and techniques to detect financial fraud, and it presents the more frequently used datasets in the literature for conducting experiments.

Fraud detection mechanisms using machine learning techniques help detect unusual transactions and prevent cybercrime (Polak et al., 2020 ). Although each of these approaches uses different methods in their experimentation, a systematic literature review (SLR) shows that the application of each algorithm mirrors performance metrics to determine the accuracy with which it predicts that a financial transaction is fraud. Such metrics include Accuracy, Precision, F1 Score, Recall, and Sensitivity, among others.

The research presented uses a rigorous and well-structured methodology to expand current knowledge on financial fraud detection using machine learning (ML) techniques. Through the use of a systematic literature review that follows adaptations of PRISMA guidelines and Kitchenham’s methodology, the study ensures a carefully planned and transparent review process. The sources of information consulted include research articles published in reputable academic databases such as Scopus, IEEE Xplore, Taylor & Francis, SAGE, and ScienceDirect, ensuring that the review covers the most relevant and quality scientific literature in the field of financial fraud and machine learning. Moreover, the study includes a bibliometric analysis using VOSviewer software, which allows identifying trends and patterns within the literature both quantitatively and visually. Based on the 104 articles reviewed, which cover the period 2012–2023, we manage to describe the types of fraud, the models applied, the ML techniques used, the datasets employed, and the metrics of performance reported. These contribute to filling the existing gaps in the literature by providing a comprehensive and up-to-date synthesis of the evidence on the use of machine learning techniques for financial fraud detection, thus laying the groundwork for future research and practical applications in this field.

Our responses to the initial research questions raised are four main contributions that justify this research. Thus, this study contributes to the literature on financial fraud detection by examining the relationship between the current literature on financial fraud detection and ML based on the scholars, articles, countries, journals, and trends in the area. Fraud has been classified as internal and external, with a focus on credit card loan fraud investigations and insurance fraud. The different ML techniques and their models applied to experiments were grouped. The most widely used datasets in financial fraud detection using ML are analyzed according to the 86 articles that contained experiments, highlighting that most of them involve real data. This paper is useful for researchers because it studies and presents the metrics used in supervised and unsupervised learning experiments, providing a clear view of their application in the different models.

Therefore, this study is relevant because it presents in a consolidated and updated manner new contributions derived from experiment results regarding the use of ML, which helps address the problem when financial fraud occurs.

The research work is organized as follows: the section “Methods” comprehensively describes the research method and the questions addressed in the study. Section “Results of the data synthesis” presents the findings encompassing authors, articles, sources, countries, trends, financial fraud types, and datasets with their characteristics to which the detection models using ML techniques were applied, with the results of their metrics. Finally, the section “Discussion and conclusion” highlights the conclusions, including future lines of research in the field.

The study focuses on SLR, which provides a comprehensive view of the great developments in financial fraud detection. Considering the purpose, scientific guidelines were followed in the literature review of the PRISMA and Kitchenham methods, which were adapted by the authors (Ashtiani and Raahemi, 2022 ; Kitchenham and Brereton, 2013 ; Kitchenham and Stuart, 2007 ; Kumbure et al., 2022 ; Moher et al., 2009 ; Roehrs et al., 2017 ; Saputra et al., 2023 ; Wohlin, 2014 ).

The method used in the SLR was developed with carefully planned and executed activities: (a) planning of the review, (b) definition of research questions, (c) description of the search strategy, (d) consultation concerning the search strategy, (e) selection of the inclusion/exclusion criteria and data selection, (f) description of the quality assessment, (g) investigation of the study topics, (h) description of data extraction, and (i) synthesis of the data.

Each of the activities conducted in this study is explained below.

Planning of the review

The research purpose was established in accordance with the indicated research goals and questions. The analysis focused on research articles published between 2012 and 2023, particularly those using ML methods for financial fraud detection. Accordingly, the SLR procedure presented by Kitchenham and Stuart ( 2007 ) and Moher et al. ( 2009 ) was implemented following a series of steps adapted and modified by Ashtiani and Raahemi ( 2022 ) and Kumbure et al. ( 2022 ), as depicted in Fig. 1 . Thus, it was possible to ensure a rigorous and objective analysis of the available literature in our field of interest.

figure 1

Description of the general process used to review the literature in the study area. Authors’ own elaboration.

The procedures implemented in this review process are discussed in the following subsections.

Definition of research questions

In SLR, research questions are key and decisive for the success of the study (Kitchenham and Stuart, 2007 ). Therefore, analyzing the existing literature on financial fraud detection through ML techniques and its characteristics, problems, challenges, solutions, and research trends is crucial. Table 1 describes the research questions to provide a structured framework for the study.

Within the proposed systematic review, the questions were fine-tuned, achieving a better classification and thematic analysis. The research questions were categorized into two groups: general questions (GQ) and specific questions (SQ). GQs provide an overview of the current state of the art, that is, a general framework for future research. Meanwhile, SQs focus on specific matters emerging from the application areas of the topic, thereby improving the filtering process of the study.

Description of the search strategy

The search strategy was designed to identify a set of studies addressing the research questions posed. This strategy was to be implemented in two stages. In the first stage, a manual search was conducted by selecting a set of test documents through a defined database. Following the strategy proposed by Wohlin ( 2014 ), a snowball search was conducted. This approach involved choosing from a set of initial references (e.g., relevant articles or books addressing the subject matter) and searching for new related references relevant to the study based on these.

In the second stage, an automated search was performed using the technique described by Kitchenham and Brereton ( 2013 ), which included preparing a list of the main search terms to be applied in the queries in each database, as indicated in subsection “Search queries”.

Manual search

In the study’s initial stage, nine journal articles were selected from the test set of papers (Ahmed et al., 2016 ; Ali et al., 2022 ; Bakumenko and Elragal, 2022 ; Gupta and Mehta, 2021 ; Hilal et al., 2022 ; Nicholls et al., 2021 ; Nonnenmacher and Marx Gómez, 2021 ; Ramírez-Alpízar et al., 2020 ; West and Bhattacharya, 2016 ). The manual literature search helped identify articles related to financial fraud detection through ML techniques, which were used as an initial set and were part of the final analysis. In the subsequent stage, a backward and forward snowball search was conducted. This approach involved using the initial set to select the relevant articles.

The backward snowball search process comprised reviewing article titles, including those meeting the inclusion and exclusion criteria. In the forward snowball search, the analysis was performed in the Scopus database to identify studies citing one or more of the articles in the initial set. This filtering method helped identify studies meeting the inclusion and exclusion criteria, eliminate duplicates from the previous set, and analyze articles answering the questions posed, which were retained in the final study set.

Automated search

The research work mainly aimed to obtain a reliable set of relevant studies to minimize bias and increase the validity of the results. To this end, a manual search for articles meeting the inclusion and exclusion criteria was conducted by assessing the abstracts and other sections of articles. We decided to implement an automated search strategy using five databases: Scopus, IEEE Xplore, Taylor & Francis, SAGE, and ScienceDirect, known for their impartiality in the representation of research works, with inclusion and exclusion criteria already defined, thereby complementing the search. Thus, 104 related articles meeting the criteria established in the final set were identified.

Search queries

Studies from 2012 onward were reviewed with keywords such as “financial fraud” and “machine learning” to identify model-based approaches and associated techniques. Table 2 presents a summary of the queries used in each data source.

Inclusion and exclusion criteria and study selection

The study established inclusion and exclusion criteria, a key process to select the most relevant articles. The exclusion criteria were documents published between 2012 and 2023 (until March), such as conference reviews, book chapters, editorials, and reviews. Further, the availability of the full text of the article was considered. We decided to exclude articles published before 2012 for the following reasons: (i) They were over 11 years old; (ii) Relevant publications prior to 2012 were scarce; and (iii) Sufficient number of articles were available between 2012 and 2023.

For the inclusion and exclusion criteria, appropriate filtering tools were applied to each data source during the search stage. This enabled the automated selection of the most relevant and appropriate studies based on the research goal.

Data processing strategies

In the data processing strategy used, databases were selected following strict inclusion and exclusion criteria to ensure the quality and relevance of the information collected (Table 3 ). Various databases initially identified the following number of relevant articles: Scopus (28), Taylor & Francis (80), SAGE (71), ScienceDirect (663), and IEEE Xplore (5132). This initial step provides a broad overview of the available literature in the field of financial fraud detection using ML models.

Subsequently, a data removal phase was carried out so as to ensure data integrity, such that the following number of articles (given in parentheses) were removed from each database: Scopus (0), Taylor & Francis (63), SAGE (57), ScienceDirect (636), and IEEE Xplore (5114). This rigorous process ensures the integrity of the data collected and avoids redundancy.

The final step consisted of obtaining the consolidated number of articles included after the selection and exclusion of duplicates: Scopus (28), Taylor & Francis (17), SAGE (14), ScienceDirect (27), and IEEE Xplore (18). This methodological strategy ensured the relevance of the articles that carried out a complete analysis in the field of financial fraud detection using ML models.

Quality assessment

Once the inclusion and exclusion criteria were applied, the remaining articles were assessed for quality. The evaluation criteria used included the purpose of the research; contextualization; literature review; and related works, methods, conclusions, and results. To minimize the empirical obstacles associated with full-text filtering, a set of questions proposed by Roehrs et al. ( 2017 ) (see Table 4 ) was used to validate whether the selected articles met the previously established quality criteria.

Research topics

In conducting the literature review to understand the current state of published research on the topic, a data orientation process was addressed, including preprocessing techniques and ML models and their metrics. Accordingly, four research topics were defined based on the research goals. They are presented in Table 5 .

Data extraction

For data extraction, the necessary attributes were first defined and the information pertaining to the study goals was summarized. Next, the relevant information was identified and obtained through a detailed reading of the full text of each article. The information was then stored in a Microsoft Excel spreadsheet. Data were collected on the attributes specified in Table 6 . In Table 6 , the “Study” column corresponds to the identifiers of the research topics in Quality Assessment, and the “Subject” column refers to the category to which the different attributes belong. The names of the attributes and a brief description are presented in the last two columns of the table, including additional columns with relevant information.

Data synthesis

Data synthesis included analyzing and summarizing the information observed in the selected articles to address the research questions. To perform this task, a synthesis was conducted following the guidelines proposed by Moher et al. ( 2009 ) based on qualitative data. Further, a descriptive analysis was performed to obtain answers to the research questions. Consequently, a qualitative approach to data evidence was followed.

Results of the data synthesis

In this section, the 104 finally selected articles have been considered. The data were synthesized to address the five research questions mentioned.

General questions (GQ)

GQ1: Which were the most relevant authors, articles, sources, countries, and trends in the literature review on financial fraud detection based on the application of machine learning (ML) models?

The literature on financial fraud detection applying ML models has been studied by a large number of authors. However, some authors stood out in terms of the number of published papers and number of citations. Specifically, the most significant authors with two publications are Ahmed M. (with 318 citations), Ileberi E. (82 citations), Ali A. (20 citations), Chen S. (84 citations), and Domashova J and Kripak E. (each with 6 citations). Other relevant authors with one publication and who have been cited several times are Abdallah A. (with 333 citations), Abbasimehr H. (18 citations), Abd Razak S. (13 citations), Achakzai M. A. K. (5 citations), and Abosaq H. (2 citations). The aforementioned authors have contributed significantly to the development of research in financial fraud detection using ML models (Fig. 2 ).

figure 2

Shows the analysis of the connections between authors based on co-authorship of publications. Produced with VOSviewer.

Collectively, the researchers have contributed a solid knowledge base and have laid the foundation for future research in financial fraud detection using ML models. Although other researchers contributed to the field, such as Khan, S. and Mishra, B., both with 7 citations, among others, some have been more prominent in terms of the number of papers published. Their collective works have enriched the field and have promoted a greater understanding of the challenges and opportunities in this area.

As depicted in Fig. 3 , clusters 2 (green) and 4 (yellow) present the most relevant research articles on financial fraud detection using ML models. Cluster 2, comprising 9 articles with 357 citations and 32 links, is highlighted because of the significant impact of the articles by Sahin, Huang, and Kim. These articles have the highest number of citations and are deemed to be useful starting points for those intending to dive into this research field. Cluster 4, constituting 6 articles with 158 citations and 27 links, includes the works of Dutta and Kim, who have also been cited considerably.

figure 3

Depicts the connections between articles based on their bibliographic references. Produced with VOSviewer.

Articles in clusters 1 (red) and 3 (dark blue) could be valuable sources of information; however, they were observed to have a lower number of citations and links than those in clusters 2 and 4, such as that of Nian K. (62 citations and 4 links) and Olszewski (92 citations and 4 links). However, some articles in these clusters have had a substantial number of citations.

In Cluster 10 (pink), the article by Reurink A. is prominent, with 38 citations. This is followed by the article by Ashtiani M.N. with 10 citations. In Cluster 11 (light green), the article by Hájek P. has 129 citations. In Cluster 12 (grayish blue), the articles by Blaszczynski J. and Elshaar S. have the greatest number of citations, indicating their influence in the field of financial fraud detection.

In Cluster 13 (light brown), the article by Pourhabibi T. has the greatest number of citations at 102, suggesting that he has been relevant in the research on financial fraud detection. Finally, in Cluster 14 (purple), the articles by Seera M. have 63 citations and 2 links. The article by Ileberi E. has 11 citations and 1 link. Both articles have a small number of citations, indicating a lower influence on the topic.

In conclusion, clusters 2, 4, and 11 are the most relevant in this literature review. The articles by Sahin, Huang, Kim, Dutta, and Pumsirirat are the most influential ones in the research on financial fraud detection through the application of ML models.

The information presented in Fig. 4 is the result of a clustering analysis of the articles resulting from the literature review on financial fraud detection by implementing ML models. In total, 48 items were identified and grouped into 12 clusters. The links between the items were 100, with a total link strength of 123.

figure 4

Shows the relationship between different scientific journals based on bibliographic links. Produced with VOSviewer.

The following is a description of each cluster with its respective number of items, links, and total link strength (the number of times a link appears between two items and its strength):

Cluster 1 (6 articles—red): This cluster includes journals such as Computers and Security , Journal of Network and Computer Applications , and Journal of Advances in Information Technology . The total number of links is 27, and the total link strength is 32.

Cluster 2 (6 articles—dark green): This cluster includes articles from Technological Forecasting and Social Change , Journal of Open Innovation: Technology, Market, and Complexity , and Global Business Review . The total number of links is 18, and the total link strength is 19.

Cluster 3 (5 articles—dark blue): This cluster includes articles from the International Journal of Advanced Computer Science and Applications , Decision Support Systems , and Sustainability . The total number of links is 19, and the total link strength is 20.

Cluster 4 (4 articles—dark yellow): This cluster includes articles from Expert Systems with Applications and Applied Artificial Intelligence . The total number of links is 26, and the total link strength is 45.

Cluster 5 (4 articles—purple): This cluster includes articles from Future Generation Computer Systems and the International Journal of Accounting Information Systems . The total number of links is 15, and the total link strength is 16.

Cluster 6 (4 articles—dark blue): This cluster includes articles from IEEE Access and Applied Intelligence . The total number of links is 18, and the total link strength is 26.

Cluster 7 (4 articles—orange): This cluster includes articles from Knowledge-Based Systems and Mathematics . The total number of links is 23, and the total link strength is 29.

Cluster 8 (4 articles—brown): This cluster includes articles from the Journal of King Saud University—Computer and Information Sciences and the Journal of Finance and Data Science . The total number of links is 13, and the total link strength is 13.

Cluster 9 (4 articles—light purple): This cluster includes articles from the International Journal of Digital Accounting Research and Information Processing and Management . The total number of links is 2, and the total link strength is 2.

The clusters represent groups of related articles published in different academic journals. Each cluster has a specific number of articles, links, and total link strength. These findings provide an overview of the distribution and connectedness of articles in the literature on financial fraud detection using ML models. Further, clustering helps identify patterns and common thematic areas in the research, which may be useful for future researchers seeking to explore this field.

Clusters 1, 4, and 7 indicate a greater number of stronger articles and links. These clusters encompass articles from Computers and Security , Expert Systems with Applications , and Knowledge-Based Systems , which are important sources for the SLR on financial fraud detection through the implementation of ML models.

The analysis presented indicates the number of documents related to research in different countries and territories. In this case, a list of 50 countries/territories and the number of documents related to the research conducted in each of them is presented. China leads with the highest paper count at 18, followed by India at 13 and Saudi Arabia and Canada at 9 each. Canada, Malaysia, Pakistan, South Africa, the United Kingdom, France, Germany, and Russia have similar research outputs with 4–9 papers. Sweden and Romania have 1 or 2 research papers, indicating limited scientific research output.

The presence of little-known countries such as Armenia, Costa Rica, and Slovenia suggests ongoing research in places less common in the academic world. From that point on, the number of papers has gradually decreased.

The production of papers is geographically distributed across countries from different continents and regions. However, more research exists on the subject from countries with developed and transition economies, which allows for a greater capacity to conduct research and produce papers.

Figure 5 , sourced from Scopus’s “Analyze search results” option, depicts countries with their respective number of published papers on the topic of financial fraud detection through ML models.

figure 5

Represents the number of scientific publications in the study area classified by country. Produced with VOSviewer.

The above shows the diversity of countries involved in the research, where China leads the number of studies with 18 papers, followed by India with 13 and Saudi Arabia and Canada each with 9 papers. The other countries show little production, with less than 7 publications, which indicates an emerging topic of interest for the survival of companies that must prevent and detect different financial frauds using ML techniques.

The most relevant keywords in the review of literature on financial fraud detection implementing ML models include the following:

In Cluster 1, the most relevant keywords are “decision trees” (13 repetitions), “support vector machine (SVM)” (11 repetitions), “machine-learning” (10 repetitions), and “credit card fraud detection” (9 repetitions). A special focus has been placed on the topic of artificial intelligence (ML), in addition to algorithms and/or supervised learning models such as decision trees, support vector machines, and credit card fraud detection.

In Cluster 2, the most relevant keywords are “crime” (46 repetitions), “fraud detection” (43 repetitions), and “learning systems” (13 repetitions). These terms reflect a broader focus on financial fraud detection, where the aspects of crime in general, fraud detection, and learning systems used for this purpose have been addressed.

In Cluster 3, the most relevant keywords are “Finance” (19 repetitions), “Data Mining” (18 repetitions), and “Financial Fraud” (12 repetitions). These keywords indicate a focus on the financial industry, where data mining is used to reveal patterns and trends related to financial fraud.

In Cluster 4, the most relevant keywords are “Machine Learning” (45 repetitions), “Anomaly Detection” (16 repetitions), and “Deep Learning” (11 repetitions). They reflect an emphasis on the use of traditional ML and deep learning techniques for anomaly detection and financial fraud detection.

In general, the different clusters indicate the most relevant keywords in the SLR on financial fraud detection through ML models. Each cluster presents a specific set of keywords reflecting the most relevant trends and approaches in this field of research (Fig. 6 ).

figure 6

Shows the relationships between keywords based on their co-occurrence in the literature reviewed. Produced with VOSviewer.

GQ2: What types of financial fraud have been identified in ML studies?

Financial fraud is generated by weaknesses in companies’ control mechanisms, which are analyzed based on the variables that allow them to materialize. These include opportunity, motivation, self-fulfillment, capacity, and pressure. Some of these are comprehensively analyzed by Donald Cressey through the fraud theory approach. The lack of modern controls has led organizations to use ML in response to this major problem. According to the findings of the Global Economic Crime and Fraud Survey 2022–2023, which gathered insights from 1,028 respondents across 36 countries worldwide, instances of fraud within these companies have caused a financial loss of approximately 10 million dollars (PricewaterhouseCoopers, 2022 ).

Referring to the concept of fraud, as outlined in international studies (Estupiñán Gaitán, 2015 ; Márquez Arcila, 2019 ; Montes Salazar, 2019 ) and the guidelines of the American Institute of Certified Public Accountants, it is an illegal, intentional act in which there is a victim (someone who loses a financial resource) and a victimizer (someone who obtains a financial resource from the victim). Thus, the proposed classification includes corporate fraud and/or fraud in organizations, considering that the purpose is to misappropriate the capital resources of an entity or individual: cash, bank accounts, loans, bonds, stocks, real estate, and precious metals, among others.

In this SLR study, we have considered fraud classifications by authors of 86 articles, which encompass experiments. We have excluded the 18 SLR articles from our analysis. The types presented in Table 7 follow the holistic view of the authors of the research for a better understanding of the subject of financial fraud, considering whether it is internal or external fraud.

Table 7 highlights the diverse types of frauds, and the research works on them. According to the classification, external frauds correspond to those performed by stakeholders outside the company. This study’s findings show that 54% of the analyzed articles investigate external fraud, among which the most important studies are on credit card loan fraud, followed by insurance fraud, using supervised and unsupervised ML techniques for their detection.

In research works (Kumar et al., 2022 ) analyzing credit card fraud, attention is drawn to the importance of prevention through the behavioral analysis of customers who acquire a bank loan and identifying applicants for bad loans through ML models. The datasets used in these fraud studies have covered transactions performed by credit card holders (Alarfaj et al., 2022 ; Baker et al., 2022 ; Hamza et al., 2023 ; Madhurya et al., 2022 ; Ounacer et al., 2018 ; Sahin et al., 2013 ), while other research works have covered master credit card money transactions in different countries (Wu et al., 2023 ) and fraudulent transactions gathered from 2014 to 2016 by the international auditing firm Mazars (Smith and Valverde, 2021 ).

The second major type of external fraud is insurance fraud, which is classified as fraud in health insurance programs involving practices such as document forgery, fraudulent billing, and false medical prescriptions (Sathya and Balakumar, 2022 ; Van Capelleveen et al., 2016 ) and automobile insurance fraud involving fraudulent actions between policyholders and repair shops, who mutually rely on each other to obtain benefits (Aslam et al., 2022 ; Nian et al., 2016 ; Subudhi and Panigrahi, 2020 ); as a result of the issues they face, insurance companies have developed robust models using ML.

As regards internal fraud, caused by an individual within the company, 46% of studies have analyzed this type, with financial statement fraud, money laundering fraud, and tax fraud standing out. The studies show that the investigations are based on information reported by the US Securities and Exchange Commission (SEC) and the stock exchanges of China, Canada, Tehran, and Taiwan, among others. To a considerable extent, the information taken is from the real sector, and very few studies have obtained synthetic information based on the application of different learning models.

The following is a summary of the financial information obtained by the researchers to apply AI models and techniques:

Stock market financial reports : Fraud in the Canadian securities industry (Lokanan and Sharma, 2022 ), companies listed on the Chinese stock exchanges (Achakzai and Juan, 2022 ; Y. Chen and Wu, 2022 ; Xiuguo and Shengyong, 2022 ), companies with shares according to the SEC (Hajek and Henriques, 2017 ; Papík and Papíková, 2022 ), companies listed on the Tehran Stock Exchange (Kootanaee et al. 2021 ), companies in the Taiwan Economic Journal Data Bank (TEJ) stock market (S. Chen, 2016 ; S. Chen et al., 2014 ), analysis of SEC accounting and auditing publications (Whiting et al., 2012 )

Wrong financial reporting to manipulate stock prices (Chullamonthon and Tangamchit, 2023 ; Khan et al., 2022 ; Zhao and Bai, 2022 )

Financial data of 2318 companies with the highest number of financial frauds (mechanical equipment, medical biology, media, and chemical industries; Shou et al., 2023 ), fraudulent financial restatements (Dutta et al., 2017 )

Data from 950 companies in the Middle East and North Africa region (Ali et al., 2023 ), analyzing outliers in sampling risk and inefficiency of general ledger financial auditing (Bakumenko and Elragal, 2022 ), fraudulent intent errors by top management of public companies (Y. J. Kim et al., 2016 ), reporting of general ledger journal entries from an enterprise resource planning system (Zupan et al., 2020 )

Synthetic financial dataset for fraud detection (Alwadain et al., 2023 ).

Studies have analyzed situations involving fraudulent financial statements. In these cases, instances of fraud have already occurred, leading to the creation of financial reports that contain statements with outliers that can be deemed fraudulent intent or errors in financial figures. This raises a reasonable doubt about whether an intent exists with regard to the reporting of unrealistic figures. Notably, once there are parties responsible for the financial information presented to stakeholders, such as organization owners, managers, administrators, accountants, or auditors, it is unlikely for it to be unintentional (an error). In this context, transparency and explainability are essential so as to ensure fairness in decisions, thus avoiding bias and discrimination based on prejudiced data (Rakowski et al., 2021 ).

Because of its significance, the information reported in financial statements is vital for investigations. Studies have indicated substantial amounts of data extracted from the financial reports of regulatory bodies such as stock exchanges and auditing firms. These entities use the data to establish the existence of fraud and its types through predictive models that use ML techniques. Thus, they require financial data such as dates, the third party affected, user, debit or credit amount, and type of document, among other aspects involving an accounting record. This information aids in identifying the possible impact in terms of lower profits and the perpetrator and/or perpetrators to gather sufficient evidence and file criminal proceedings for the financial damage caused.

Moreover, investigations concerning money laundering fraud and/or money laundering, the second most investigated internal fraud type, encompass the reports of natural and legal persons exposed by the Financial Action Task Force in countries such as the Kingdom of Saudi Arabia (Alsuwailem et al., 2022 ), transactions from April to September 2018 from Taiwan’s “T” bank and the account watch list of the National Police Agency of the Ministry of Interior (Ti et al., 2022 ), money laundering frauds in Middle East banks (Lokanan, 2022 ), transactions of financial institutions in Mexico from January 2020 (Rocha-Salazar et al., 2021 ), and synthetic data of simulated banking transactions (Usman et al., 2023 ).

Concerns regarding the entry of proceeds from money laundering into an organization have been articulated in relation to the financial damage it causes to the country. At the macroeconomic level, these activities negatively affect financial stability, distorting the prices of goods and services. Moreover, such activities disrupt markets, making it difficult to make efficient financial decisions. At the microeconomic level, legitimate businesses face unfair competition with companies using illegal money, which may lead to higher unemployment levels. Furthermore, money laundering has a social impact because it affects the security and welfare of society.

Thus, some research works (Alsuwailem et al., 2022 ) have indicated the need to implement ML models for promoting anti-money laundering measures. For instance, in Saudi Arabia, money from illicit drug trafficking, corruption, counterfeiting, and product piracy have entered the country. The measures to be taken are categorized according to the three stages of money laundering: placement, layering (also known as concealment), and integration. These include new legal regulations against money laundering, staff training, customer identification and validation, reporting of suspicious activities, and documentation and storage of relevant data (Bolgorian et al., 2023 ).

Regarding the 7.5% incidence of internal fraud, specifically categorized as tax fraud resulting from tax evasion, the studies have analyzed tax returns on income and/or profits of legal persons and/or individuals from the Serbian tax administration during 2016–2017 (Savić et al., 2022 ). Studies have encompassed periodic value-added tax (VAT) returns, together with the anonymous list of clients for the tax year 2014 obtained from the Belgian tax administration (Vanhoeyveld et al., 2020 ) and income tax and VAT taxpayers registered and provided by the State Revenue Committee of the Republic of Armenia in 2018 (Baghdasaryan et al., 2022 ). These studies hold great relevance for tax administrations using different strategies to minimize the impact of fraud resulting from tax evasion. Tax evasion reduces the government’s ability to collect revenue, directly affecting government finances and causing budget deficits, thereby increasing public debt.

GQ3: Which ML models were implemented to detect financial fraud in the datasets?

Given that ML is a key tool to extract meaningful information and make informed decisions, this study analyzes the most widely used ML techniques in the field of financial fraud detection. It takes as reference 86 experimental articles, excluding 18 SLR articles. In these articles, the most commonly used trends and approaches in the implementation of ML techniques in financial fraud detection were identified.

For the analysis, the pattern of frequency of use of ML models was observed. Several of them have been prominent because of their popularity and implementation in detecting financial fraud (Fig. 7 ). Some of the most widely used models include long-short term memory (LSTM) with 7 mentions, autoencoder with 10 mentions, XGBoost with 13 mentions, k -nearest neighbors (KNN) with 14 mentions, artificial neural network (ANN) with 17 mentions, NB with 19 mentions, SVM with 29 mentions, DT with 29 mentions, LR with 32 mentions, and RF with 34 mentions.

figure 7

Illustrates the most common machine learning models in financial fraud detection. Authors’ own elaboration.

The LSTM model is a recurrent neural network used for sequence processing, especially for tasks concerning natural language processing (Chullamonthon and Tangamchit, 2023 ; Esenogho et al., 2022 ; Femila Roseline et al., 2022 ). Moreover, autoencoders are models used for data compression and decompression. These models are useful in dimensionality reduction applications (Misra et al., 2020 ; Srokosz et al., 2023 ). XGBoost is a library combining multiple weak DT models, offering a scalable and efficient solution in classification and regression tasks (Dalal et al., 2022 ; Udeze et al., 2022 ).

KNN and ANN are widely used models in various ML applications. KNN is based on neighbor closeness, and ANN is inspired by human brain functioning. NB is a probabilistic algorithm commonly used in text classification and data mining (Ashtiani and Raahemi, 2022 ; Lei et al., 2022 ; Shahana et al., 2023 ).

SVM, DT, LR, and RF, the most commonly mentioned models, are used in a wide range of classification and regression applications. These models are prominent because of their effectiveness and applicability to different scenarios, such as credit card loan fraud (external fraud) and financial statement fraud (internal fraud).

The most frequently used ML techniques are supervised learning (56.73%); unsupervised learning (18.29%), a combination of supervised and unsupervised learning (15.38%), a combination of supervised and deep learning (2.88%), and mathematical approach, supervised, and semi-supervised learning (0.96%). Figure 8 presents the ML techniques in the literature reviewed and indicates the number of times each type of technique is applied. Some articles applied several ML methods, in which the algorithms are mainly classified according to the learning method. In this case, there are four main types: supervised, semi-supervised, unsupervised, and deep learning.

figure 8

Shows the different experimental approaches used in the study. Authors’ own elaboration.

Supervised learning is the most widely used technique, with 56.73% of citations in financial fraud studies. In this approach, labeled training data are used, where the expected outputs are known and a model is built that can make higher-accuracy predictions on new unlabeled data. Common examples of supervised learning techniques include the models of LR, SVM, DT, RF, KNM, NB, and ANN.

Moreover, unsupervised learning constitutes 18.27% of the mentions. The technique focuses on discovering patterns in the data without knowing data with labels and/or types for training. Some of these include DBSCAN, autoencoder, and isolation forest (IF).

The combination of supervised, unsupervised, and semi-supervised learning is used with a frequency of 1.92%. This technique and/or approach combines elements of supervised and unsupervised learning, using both labeled and unlabeled data to train the models. It is also used when labeled data are scarce or expensive to obtain; thus, the aim is to take advantage of unlabeled information to improve model performance.

Finally, supervised and deep learning represents 2.88% of the mentions. It is based on deep neural networks with multiple neurons and hidden layers to learn complex data representations. It has achieved remarkable developments in areas such as image processing, voice recognition, and machine translation.

Specific questions (SQ)

SQ1: What datasets were used by implementing ML models for financial fraud detection?

First, the data structure and fraud types may vary with the collection of datasets. The performance of fraud detection models may be affected by variations in the number of instances and attributes selected. Therefore, investigating the datasets and their characteristics is relevant, as data differ in terms of data type (number, text) and the data source from which they were obtained (synthetic and/or real), as can be observed in Fig. 9 .

figure 9

Depicts the datasets used in the research on financial fraud detection. Authors’ own elaboration.

Credit card fraud detection

The dataset was created by the Machine Learning group at Université Libre de Bruxelles. It encompasses anonymized credit card transactions labeled as fraudulent or genuine. The transactions were performed in September 2013 over two days by European cardholders; a record of only 492 frauds out of 284,807 transactions is highly unbalanced because the positive types (frauds) represent only 0.172% of all transactions (Machine Learning Group, 2018 ).

The characteristics of the set encompass numerical variables resulting from a principal component analysis (PCA) transformation. For confidentiality, the original features of the data have not been disclosed. Features V1, V2…, V28 have been the main components obtained through PCA. The only features that have not transformed with PCA include “Time,” which denotes the seconds elapsed between each transaction. “Amount” denotes the transaction amount. The “Class” feature is the response variable, taking 1 as the value in case of fraud and 0 (no fraud) otherwise.

This dataset has been used by 15 authors in their papers, who have applied different financial fraud detection techniques (Alarfaj et al., 2022 ; Baker et al., 2022 ; Fanai and Abbasimehr, 2023 ; Fang et al., 2019 ; Femila Roseline et al., 2022 ; Hwang and Kim, 2020 ; Ileberi et al., 2021 , 2022 ; Khan et al., 2022 ; Misra et al., 2020 ; Ounacer et al., 2022 ).

Statlog (German credit data)

The dataset was proposed by Professor Hofmann to the UC Irvine ML repository on November 16, 1994, for facilitating credit rating (Hofmann, 1994 ). It mainly aims to determine whether a person presents a favorable or unfavorable credit risk (binary rating). The set is multivariate, which implies that it contains many attributes used in credit rating. These attributes include information on existing current account status, credit duration, credit history, and credit purpose and amount, among others. In total, there are 20 attributes describing several characteristics of individuals and contains 1000 instances; it has been widely used in research related to credit rating (Esenogho et al., 2022 ; Fanai and Abbasimehr, 2023 ; Lee et al., 2018 ; Pumsirirat and Yan, 2018 ; Seera et al., 2021 ).

Stalog (Australian credit approval)

The dataset belongs to the UC Irvine ML repository and was created by Ross Quinlan in 1997. It focuses on credit card applications within the financial field (Quinlan, 1997 ). It has a total of 690 instances and 14 attributes of which 6 are numeric of type integer/actual and 8 are categorical; consequently, its data characteristics are multivariate—that is, it contains multiple variables and/or attributes. Several studies have used the ensemble data (Lee et al., 2018 ; Pumsirirat and Yan, 2018 ; Seera et al., 2021 ; Singh et al., 2022 ).

China Stock Market and Accounting Research

The China Stock Market and Accounting Research (CSMAR) Database contains financial reports and violations of CSMAR. It provides information on China’s stock markets and the financial statements of listed companies; the data were collected between 1998 and 2016 from publicly funded companies (CSMAR, 2022 ). It includes fraudulent and non-fraudulent companies committing several types of fraud, such as showing higher profits and/or earnings, fictitious assets, false records, and other irregularities in financial reporting.

The set comprises 35,574 samples, including 337 annual fraud samples of companies in the Chinese stock market. This is selected as a data source to illustrate the financial statement information of listed companies in three studies (Achakzai and Juan, 2022 ; Y. Chen and Wu, 2022 ; Shou et al., 2023 ).

Synthetic financial datasets for fraud detection

It was generated by the PaySim mobile money simulator using aggregated data from a private dataset deriving from one month of financial records from a mobile money service in an African country (López-Rojas, 2017 ). The original records were provided by a multinational company offering mobile financial services in more than 14 countries worldwide. The dataset has been used in numerous studies (Alwadain et al., 2023 ; Hwang and Kim, 2020 ; Moreira et al., 2022 ).

The synthetic dataset provided is a scaled-down version, representing a quarter of the original dataset. It was made available for Kaggle. It constitutes 6,362,620 samples, with 8213 fraudulent transaction samples and 6,354,407 non-fraudulent transactions. It includes several attributes related to mobile money transactions: transaction type (cash-in, cash-out, debit, payment, and transfer); transaction amount in local currency; customer information (customer conducting the transaction and transaction recipient); initial balances before and after the transaction; and fraudulent behavior indicators (isFraud and isFlaggedFraud). These attributes indicate a binary classification.

Default of credit card clients

It was created by I-Cheng Yeh and introduced on January 25, 2016, and is available in the UC Irvine ML repository (Yeh, 2016 ). The dataset, which is used for classification tasks, focuses on the case of defaulted payments of credit card customers in Taiwan in the business area. Moreover, it is a multivariate dataset with 30,000 instances and 24 attributes. They include attributes such as the amount of credit granted, payment history, and statement records spanning April through September 2005. This data source is selected in studies such as those by Esenogho et al. ( 2022 ), Pumsirirat and Yan ( 2018 ), and Seera et al. ( 2021 ).

Synthetic data from a financial payment system

Edgar Lopez Rojas created the dataset in 2017. The synthetic data were generated in the BankSim payment simulator. It is based on a sample of transactional data provided by a bank in Spain (López-Rojas, 2017 ). It includes the following characteristics: step, customer ID, age, gender, zip code, merchant ID, zip code of merchant, category of purchase, amount of purchase, and fraud status. It comprises 594,643 transactions, of which ~1.2% (7200) were labeled as fraud and the rest (587,443) were labeled as genuine, and it was processed as a binary classification problem. The dataset has been used in several investigations (Esenogho et al., 2022 ; Pumsirirat and Yan, 2018 ; Seera et al., 2021 ).

This dataset is a financial and economic information and research database (Compustat, 2022 ). It contains characteristics related to various aspects of companies, such as asset quality, revenues earned, administrative and sales expenses, and sales growth, among others. COMPUSTAT collects and stores detailed information on listed companies in the United States and Canada. The set includes information on 61 characteristics and consists of 228 companies, of which half showed fraud in their information while the other half did not present fraud (binary classification), and it is used in studies (Dutta et al., 2017 ; Whiting et al., 2012 ).

Insurance Company Benchmark (COIL 2000)

This dataset is used in the CoIL 2000 challenge, available at the UC Irvine Machine Learning Repository, created by Peter Van Der Putten. It consists of 9822 instances and 86 attributes containing information about customers of an insurance company and includes data on product use and sociodemographic data (Putten, 2000 ). It is characterized as multivariate and is used to perform regression/classification tasks by studies using the dataset (Huang et al., 2018 ; Sathya and Balakumar, 2022 ).

Bitcoin network transactional metadata

This dataset contains Bitcoin transaction metadata from 2011 to 2013. It was created by Omer Shafiq (Kaggle handle: OmerShafiq) and introduced to the Kaggle online community in 2019. The set comprises 11 attributes and 30,000 instances related to Bitcoin transactions, bitcoin flows, connections between transactions, average ratings, and malicious transactions (Omershafiq, 2019 ). It is efficient for investigating and analyzing anomalies and fraud detection in Bitcoin transactions (Ashfaq et al., 2022 ).

SQ2: What were the metrics used to assess the performance of ML models to detect financial fraud?

Based on previous studies (Nicholls et al., 2021 ; Shahana et al., 2023 ), the performance of the metrics used in ML models is the last step in determining whether the results align with the problem at hand. The metrics demonstrate the ability to do a specific task, such as classification, regression, or clustering quality, as they allow comparing the performance of models.

Many evaluation metrics have been used in previous studies, such as precision, sensitivity, recall, accuracy, and area under the curve. These metrics can be calculated using the confusion matrix. Figure 10 compares the target and true values with the predicted ones based on the study by Torrano et al. ( 2018 ).

figure 10

Presents the confusion matrix generated during the evaluation of the financial fraud detection models. Authors’ own elaboration.

According to previous studies (Shahana et al., 2023 ; Zhao and Bai, 2022 ), true positive (TP) projects a positive value (fraud) that matches the true value; true negative (TN) accurately predicts a negative outcome (no fraud); false positive (FP) denotes the predicted positive whose true value is negative (no fraud); and false negative (FN) represents the predicted negative whose true value is positive (fraud). FP and FN represent the misclassification cost, also known as classification model prediction error.

The metrics used to evaluate the effectiveness of supervised ML techniques are as follows. The accuracy metric is the most commonly used (Ramírez-Alpízar et al., 2020 ). It is defined as the total number or proportion of correct predictions/samples over the total number of records analyzed. Further, it is a method of evaluating the performance of a binary classification model distinguishing between true and false. In Eq. ( 1 ), it calculates the accuracy metric.

The sensitivity metric known as recall (TP or TPR rate) is the ratio of successfully identified fraudulent predictions to the total number of fraudulent samples. Equation ( 2 ) calculates the sensitivity metric.

The specificity metric (TN rate or TNR) is the percentage of non-fraudulent samples properly designated as non-fraudulent. It is represented in Eq. ( 3 ).

Accuracy is the ratio of correctly classified fraudulent predictions to the total number of fraudulent predictions. Equation ( 4 ) calculates the precision metric.

F1-score is a metric that combines accuracy and recall using a weighted harmonic mean (Bakumenko and Elragal, 2022 ). It is presented in Eq. ( 5 ).

Type I error (FP or FPR rate) is the number of legitimate predictions mistakenly labeled as fraudulent as a percentage of all legitimate predictions. The metric is defined in Eq. ( 6 ).

Type II error (FN or FNR rate) is the proportion of fraudulent samples incorrectly designated as non-fraudulent. Type I and II errors make up the overall error rate. It is defined in Eq. ( 7 ).

The area under the curve (AUC), or area under the receiver operating characteristic curve, represents a graphic of TPR versus FPR (Y. Chen and Wu, 2022 ). AUC values range from 0 to 1; the more accurate an ML model, the higher its AUC value. It is a metric that represents the model’s performance when differentiating between two classes.

Following the guidelines in previous studies (Amrutha et al., 2023 ; García-Ordás et al., 2023 ; Palacio, 2019 ), some metrics used to evaluate the effectiveness of unsupervised ML techniques will be defined.

The silhouette coefficient identifies the most appropriate number of clusters; a higher coefficient means better quality with this number of clusters. Equation ( 8 ) calculates the metric.

where x denotes the average of the distances of observation j with respect to the rest of the observations of the cluster to which j belongs. Furthermore, y denotes the minimum distance to a different cluster. The silhouette score takes values between −1 and 1. Based on the study by Viera et al. ( 2023 ), 1 (correct) represents the assignment of observation j to a good cluster, zero (0) indicates that observation j is between two distinct groups, and −1 (incorrect) indicates that the assignment of j to the cluster is a bad clustering.

The rand index is the similarity measure between two clusters considering all pairs and including those assigned to the same cluster in both the predictions and the true cluster. Equation ( 9 ) calculates the index.

The Davies–Bouldin metric is a score used to evaluate clustering algorithms. It is defined as the mean value of the samples, represented in Eq. ( 10 ).

where k denotes the number of groups \({c}_{i},{c}_{j}\) , k represents the centroids of cluster i and j , respectively, with \(d\left({c}_{i},{c}_{i}\right)\) as the distance between them, while \({\alpha }_{i}\) and \({\alpha }_{j}\) corresponds to the average distance of all elements in clusters i and j and the distance to their respective \({c}_{i}\) and \({c}_{j}\) centroids (Viera et al., 2023 ).

The Fowlkes–Mallows index is defined as the geometric mean between precision and recall, represented in Eq. ( 11 ).

The cophenetic correlation coefficient is a clustering method to produce a dendrogram (tree diagram). Equation ( 12 ) indicates the metric.

where \(x(i,j)=|{x}_{i}-{x}_{j}|\) represents the Euclidean distance between the i th and j th points of \(x\) . While \(t(i,j)\) is the height of the node at which the two points, \({t}_{i}\) and \({t}_{j}\) , of the dendrogram meet and \(\bar{x}\) and \(\bar{t}\) are the mean value of \(x(i,j)\) and \(t(i,j).\)

Discussion and conclusion

Research on the detection of financial fraud by applying ML techniques is a significant topic. On the one hand, fraud directly affects the business world and, on the other hand, detecting it early involves great challenges; this has led to designing tools using AI, such as ML techniques. This study is an SLR using adaptations of the PRISMA and Kitchenham methods to critically analyze and synthesize the study results. Research articles published in Scopus, IEEE Xplore, Taylor & Francis, SAGE, and ScienceDirect were explored. The results were presented in two parts. The first one included a bibliometric study with the open-source software VOSviewer, followed by a discussion of the SLR results.

The bibliometric analysis presented the results of the authors, articles, sources, countries, and most important trends in the literature on financial fraud detection by applying ML, as well as an analysis of fraud types, ML models, and datasets. From the 104 articles dating from 2012 to 2023, several types of fraudulent activities are described, as well as external (e.g., credit cards, insurance) and internal (e.g., financial statements, money laundering) frauds, and a brief report on fraud, in general, is provided. Further, it was possible to extract supervised and unsupervised ML techniques, with the 10 most used models as RF in supervised techniques and autoencoder as an unsupervised technique.

During the literature review on the detection of financial fraud using machine learning models, it became evident that several authors have made significant contributions. However, some stand out more in terms of the number of publications and citations. Some of the most notable ones, Ahmed M. with 318 citations, Ileberi E. with 82, and Chen S. with 84, have made important advances in the field. Others, such as Abdallah A., with only one publication, but with 333 citations, have also made a considerable impact. And although researchers such as Khan S. and Mishra B. have fewer citations, the combined work of all these authors has established a robust knowledge base, providing a deeper understanding of the challenges and opportunities present in financial fraud detection through machine learning techniques.

Consistent with the analysis of the article clusters, clusters 2, 4 and 11 emerge as the most influential in this field with topics of interdisciplinary interest (artificial intelligence/machine learning, accounting, finance), among academics and auditing firms. The SLR evidences that authors in these domains often cooperate when it comes to publication, in turn, studies by (Huang et al., 2018 ; J. Kim et al., 2019 ; Sahin et al., 2013 ; Dutta et al., 2017 ) are highly cited articles.

Similarly, the leading countries in the research area include China, which has the largest number of published articles, followed by India and Saudi Arabia. The production of articles on the subject was found to be geographically distributed among countries whose economies are developing and are in transition, which indicates a greater capacity for the production of papers and research. In comparison to Ashtiani and Raahemi’s ( 2022 ) study highlighting the United States, leading with the largest number of papers (18) in the area, followed by China (8) and Greece (7), Al-Hashedi and Magalingam’s ( 2021 ) posit that India is the top producer of articles with 24, followed by China (14) and the United States (9).

The journals that have accepted the publication of these studies are specifically in the accounting and computer science domain. There is much literature on computers and security, expert systems with applications, and knowledge-based systems on financial fraud detection through ML models, as supported by Al-Hashedi and Magalingam ( 2021 ) and Ali et al. ( 2022 ). The keywords highlighted in the studies include crime, fraud detection, and ML. These words indicate a central focus on the financial industry, where learning and/or data mining systems help discover patterns or anomalies in financial data, in addition to attractive trends and approaches in the research field.

The literature has indicated articles investigating fraud types, particularly credit card loan fraud and insurance fraud, which are of great interest to the scientific community (Al-Hashedi and Magalingam, 2021 ; Ali et al., 2022 ; West and Bhattacharya, 2016 ). This study has classified the different types of fraud into internal and external, and sub-classifications have been derived. In both types, ML techniques have been used to detect financial fraud—supervised (59 articles), unsupervised (19 articles), supervised and unsupervised (16 articles), and deep learning (3 articles), among others. Most of the studies analyzed have developed binary classification models, that is, fraud or non-fraud. Supervised learning techniques require labeled data, and the most frequently used models are LR, RF, and SVM, among others. In the experiments, the prevalence of metrics such as accuracy, precision, sensitivity, and F1-score are highlighted. For unsupervised learning as a technique, the data do not have a label and focus on discovering new patterns with algorithms such as DBSCAN, autoencoder, and IF, among others. The evaluation with internal metrics was not made in detail. Few studies using semi-supervised learning and deep learning techniques have been highlighted because of the fact that they are novel.

Further, it is found in the trend through the keywords, as the research works address the subject of ML, learning algorithms, deep learning, SVM, fraudulent transactions, and anomaly detection, but it is evident that there is little research on unsupervised learning and deep learning. The scarce use of these techniques may be because of the complexity of the models and the high consumption of computational resources. In the analysis of the 86 experiment articles, few articles were found that used unsupervised techniques. Also, a large part of the datasets used is labeled, which requires further experimentation with models and unlabeled real-world datasets (Ounacer et al., 2018 ; Pumsirirat and Yan, 2018 ; Rubio et al., 2020 ; Van Capelleveen et al., 2016 ; Vanini et al., 2023 ). Meanwhile, labeled data are costly because an expert is required for their construction. Thus, more attention has been given to data origin, preprocessing, and feature extraction before training an ML model to increase detection accuracy. Accordingly, it should be emphasized that deep learning models require a thorough design and adjustment compared with previous models. They are quite sensitive to the architecture structure and choice of hyperparameters. Further, the data quality and quantity required is relatively high, so it should be considered in the design stage.

The studies show that the datasets for the experiments were taken from the stock exchanges of China, Canada, the United States, Taiwan, and Tehran, among others. The researchers used ML models to detect financial fraud in credit card loans, highlighting the use of the “Credit Card Fraud Detection” dataset, mentioned 15 times. Also, the performance of ML models can be affected because of the selected set by the number of selected attributes and instances. From the analysis, it was observed that most of the articles use real datasets obtained from existing databases, historical records, or other collection methods, and few studies use synthetic datasets (four articles), which are those generated by modeling or simulation techniques and try to mimic a real dataset.

Still, the integration of real and synthetic datasets enables a comprehensive approach to the problem by providing a basis and complementary information for conclusions and comparisons with other studies on the performance of ML models. Specifically, the datasets used in recent studies and/or articles, spanning from 2012 to 2023, reveal concern related to obsolete data approximately from 1994, which, because of their age, do not provide effective and accurate results in the current context as a result of the new fraud modalities created day after day, with characteristics and behavior patterns that have evolved significantly over time.

The literature review and bibliometric analyses on financial fraud detection using machine learning and its various techniques conducted between 2012 and 2023 show a remarkable evolution in this field. Authors, including Ahmed M., Ileberi E., and Chen S. have made important contributions with a high number of citations. There has been fundamental interdisciplinary collaboration between areas such as artificial intelligence, accounting, finance, and information security, highlighting widely cited studies such as Huang et al. ( 2018 ), J. Kim et al. ( 2019 ), Sahin et al. ( 2013 ), and Dutta et al. ( 2017 ). Countries such as China, India and Saudi Arabia leading in publications can be seen, which reflects the global effort of emerging economies. Supervised learning techniques such as Random Forest, and unsupervised ones, like Autoencoder, are the most widely used. Furthermore, the effort and enthusiasm for the use of deep learning, despite its complexity and high computational resource requirements, are evident.

Research mainly uses real datasets such as those from the Chinese, Canadian, US, Taiwanese, and Tehran stock exchanges, with the “Credit Card Fraud Detection” dataset being the most important one. The journals that publish these studies belong both to the accounting area and to computer science, with extensive literature in Computers and Security, Expert Systems with Applications, and Knowledge-Based Systems. While it is true that the accuracy of fraud detection depends on the quality of the data and preprocessing with various algorithms, the need for robust and updated approaches to face new fraud modalities is particularly highlighted.

Limitations and scope for future research

The study had limitations that affected the scope and interpretation of the results. Although a systematic review was performed, the lack of quantitative support in the data collected is acknowledged. From the 104 articles identified in the SLR, 18 correspond to systematic reviews, which limits the availability of studies with specific details or experiments. This affected the depth of the analysis and the comprehensiveness of the results obtained.

The literature review reveals a predominant emphasis on the banking sector, especially in relation to credit card fraud and insurance fraud. The narrow focus leads to a lack of diversity in the types of fraud studied, excluding internal fraud types such as embezzlement, racketeering, smurfing, defalcation, collusion, signature forgery, and manipulation of accounting documents, among others. The underrepresentation of these other fraud types compromises the generalization of the findings and the applicability of ML models to contexts beyond the banking sector.

The datasets analyzed show a significant deficiency in the representation of fraud types. It can be observed that most of these datasets originated from the main stock exchanges and, additionally, the information used to carry out the experiments is old. This scenario indicates the inclusion of non-contemporary fraud types in the analysis. The limited availability of information on the performance metrics of the unsupervised learning models made it difficult to count the evaluation metrics used to predict financial fraud.

The field of financial fraud detection using ML models offers promising prospects for future research. An area of potential improvement is experimentation with advanced techniques, such as reinforcement learning or deep neural network architectures, to improve the accuracy and efficiency of models, including unsupervised learning. This approach could enable the development of more sophisticated systems capable of identifying complex fraud patterns and dynamically adjusting to the changing strategies of criminals, who are constantly innovating new fraud methods.

Moreover, it is suggested that the applicability of fraud detection systems in contexts other than banking be analyzed by adopting the anomaly approach, which would make it possible to move forward in the detection of fraud in real-time and minimize risks in organizations. It is also proposed that a dataset be created, containing real context information, which is freely accessible and includes new fraud methods to provide the scientific community with an updated dataset.

Data availability

The datasets generated and/or analyzed in this study are available in the Harvard Dataverse repository https://doi.org/10.7910/DVN/CM8NVY .

Abdallah A, Maarof MA, Zainal A (2016) Fraud detection system: a survey. J Netw Comput Appl 68:90–113. https://doi.org/10.1016/j.jnca.2016.04.007

Article   Google Scholar  

Achakzai MAK, Juan P (2022) Using machine learning meta-classifiers to detect financial frauds. Financ Res Lett 48:102915. https://doi.org/10.1016/j.frl.2022.102915

Ahmed M, Mahmood AN, Islam MdR (2016) A survey of anomaly detection techniques in financial domain. Future Gener Comput Syst 55:278–288. https://doi.org/10.1016/j.future.2015.01.001

Al Ali A, Khedr AM, El-Bannany M, Kanakkayil S (2023) A powerful predicting model for financial statement fraud based on optimized XGBoost ensemble learning technique. Appl Sci 13(4):2272. https://doi.org/10.3390/app13042272

Article   CAS   Google Scholar  

Alarfaj FK, Malik I, Khan HU, Almusallam N, Ramzan M, Ahmed M (2022) Credit card fraud detection using state-of-the-art machine learning and deep learning algorithms. IEEE Access 10:39700–39715. https://doi.org/10.1109/ACCESS.2022.3166891

Al-Hashedi KG, Magalingam P (2021) Financial fraud detection applying data mining techniques: a comprehensive review from 2009 to 2019. Comput Sci Rev 40:100402. https://doi.org/10.1016/j.cosrev.2021.100402

Ali A, Abd Razak S, Othman SH, Eisa TAE, Al-Dhaqm A, Nasser Tusneem ME, Elshafie H, Saif A (2022) Financial fraud detection based on machine learning: a systematic literature review. Appl Sci (Switz). https://doi.org/10.3390/app12199637

Alsuwailem AAS, Salem E, Saudagar AKJ (2022) Performance of different machine learning algorithms in detecting financial fraud. Comput Econ. https://doi.org/10.1007/s10614-022-10314-x

Alwadain A, Ali RF, Muneer A (2023) Estimating financial fraud through transaction-level features and machine learning. Mathematics 11(5):1184. https://doi.org/10.3390/math11051184

Amrutha E, Arivazhagan S, Jebarani WSL (2023) Deep clustering network for steganographer detection using latent features extracted from a novel convolutional autoencoder. Neural Process Lett 55(3):2953–2964. https://doi.org/10.1007/s11063-022-10992-6

Arévalo F, Barucca P, Téllez-León I-E, Rodríguez W, Gage G, Morales R (2022) Identifying clusters of anomalous payments in the salvadorian payment system. Lat Am J Cent Bank. 3(1):100050. https://doi.org/10.1016/j.latcb.2022.100050

Ashfaq T, Khalid R, Yahaya A, Aslam S, Alsafari S, Hameed I (2022) A machine learning and blockchain bases efficient fraud detection mechanism. Sensors 22(19):7162. https://doi.org/10.3390/s22197162

Article   ADS   PubMed   PubMed Central   Google Scholar  

Ashtiani MN, Raahemi B (2022) Intelligent fraud detection in financial statements using machine learning and data mining: a systematic literature review. IEEE Access 10:72504–72525. https://doi.org/10.1109/ACCESS.2021.3096799

Aslam F, Hunjra A, Ftiti Z, Louhichi W, Shams T (2022) Insurance fraud detection: evidence from artificial intelligence and machine learning. Res Int Bus Financ. https://doi.org/10.1016/j.ribaf.2022.101744

Baghdasaryan V, Davtyan H, Sarikyan A, Navasardyan Z (2022) Improving tax audit efficiency using machine learning: the role of taxpayer’s network data in fraud detection. Appl Artif Intell 36(1). https://doi.org/10.1080/08839514.2021.2012002

Baker MR, Mahmood ZN, Shaker EH (2022) Ensemble learning with supervised machine learning models to predict credit card fraud transactions. Rev Intell Artif. https://doi.org/10.18280/ria.360401

Bakumenko A, Elragal A (2022) Detecting anomalies in financial data using machine learning algorithms. Systems. https://doi.org/10.3390/systems10050130

Bekirev AS, Klimov VV, Kuzin MV, Shchukin BA (2015) Payment card fraud detection using neural network committee and clustering. Optical Mem. Neural Netw 24(3):193–200. https://doi.org/10.3103/S1060992X15030030

Benchaji I, Douzi S, Ouahidi BEl (2021) Credit card fraud detection model based on LSTM recurrent neural networks. J Adv Inf Technol 12(2):113–118. https://doi.org/10.12720/jait.12.2.113-118

Błaszczyński J, de Almeida Filho AT, Matuszyk A, Szeląg M, Słowiński R (2021) Auto loan fraud detection using dominance-based rough set approach versus machine learning methods. Expert Syst Appl 163:113740. https://doi.org/10.1016/j.eswa.2020.113740

Bolgorian M, Mayeli A, Ronizi NG (2023) CEO compensation and money laundering risk. J Econ Criminol 1:100007. https://doi.org/10.1016/j.jeconc.2023.100007

Chen S (2016) Detection of fraudulent financial statements using the hybrid data mining approach. SpringerPlus 5(1):89. https://doi.org/10.1186/s40064-016-1707-6

Article   PubMed   PubMed Central   Google Scholar  

Chen S, Goo Y-JJ, Shen Z-D (2014) A hybrid approach of stepwise regression, logistic regression, support vector machine, and decision tree for forecasting fraudulent financial statements. Sci World J 2014:1–9. https://doi.org/10.1155/2014/968712

Chen Y, Wu Z (2022) Financial fraud detection of listed companies in China: a machine learning approach. Sustainability 15(1):105. https://doi.org/10.3390/su15010105

Chullamonthon P, Tangamchit P (2023) Ensemble of supervised and unsupervised deep neural networks for stock price manipulation detection. Expert Syst Appl 220:119698. https://doi.org/10.1016/j.eswa.2023.119698

Compustat (2022) Compustat. S&P Global Market Intelligence. https://www.marketplace.spglobal.com/en/datasets?cq_cmp=9778467255&cq_plac=&cq_net=g&cq_pos=&cq_plt=gp&utm_source=google&utm_medium=cpc&utm_campaign=DMS_Marketplace_Search_Google&utm_term=&utm_content=586436401424&_bt=586436401424&_bk=&_bm=&_bn=g&_bg=133704002389&gclid=Cj0KCQjw4s-kBhDqARIsAN-ipH3TguUoVohfDZgD65fjvKomc6BBgJ3uA9zP95m6u4vOs5yG7_L7w2UaAnnvEALw_wcB

CSMAR (2022) China Stock Market & Accounting Research (CSMAR). Wharton University of Pennsylvania. https://wrds-www.wharton.upenn.edu/pages/about/data-vendors/china-stock-market-accounting-research-csmar/

Dalal S, Seth B, Radulescu M, Secara C, Tolea C (2022) Predicting fraud in financial payment services through optimized hyper-parameter-tuned XGBoost model. Mathematics 10(24):4679. https://doi.org/10.3390/math10244679

Dantas RM, Firdaus R, Jaleel F, Neves Mata P, Mata MN, Li G (2022) Systemic acquired critique of credit card deception exposure through machine learning. J Open Innov: Technol Mark Complex 8(4):192. https://doi.org/10.3390/joitmc8040192

Domashova J, Kripak E (2021) Identification of non-typical international transactions on bank cards of individuals using machine learning methods. Procedia Comput Sci 190:178–183. https://doi.org/10.1016/j.procs.2021.06.023

Domashova J, Kripak E (2022) Development of a generalized algorithm for identifying atypical bank transactions using machine learning methods. Procedia Comput Sci 213:101–109. https://doi.org/10.1016/j.procs.2022.11.044

Dutta I, Dutta S, Raahemi B (2017) Detecting financial restatements using data mining techniques. Expert Syst Appl 90:374–393. https://doi.org/10.1016/j.eswa.2017.08.030

Elshaar S, Sadaoui S (2020) Semi-supervised Classification of Fraud Data in Commercial Auctions. Appl Artif Intell 34(1):47–63. https://doi.org/10.1080/08839514.2019.1691341

Esenogho E, Mienye ID, Swart TG, Aruleba K, Obaido G (2022) A neural network ensemble with feature engineering for improved credit card fraud detection. IEEE Access 10:16400–16407. https://doi.org/10.1109/ACCESS.2022.3148298

Eshghi A, Kargari M (2019) Introducing a new method for the fusion of fraud evidence in banking transactions with regards to uncertainty. Expert Syst Appl 121:382–392. https://doi.org/10.1016/j.eswa.2018.11.039

Estupiñán Gaitán R (2015) Control interno y fraudes: análisis de informe COSO I, II y III con base en los ciclos transaccionales, Tercera edición (Niebel BW (ed)). Ecoe Ediciones

Fanai H, Abbasimehr H (2023) A novel combined approach based on deep autoencoder and deep classifiers for credit card fraud detection. Expert Syst Appl 217:119562. https://doi.org/10.1016/j.eswa.2023.119562

Fang Y, Zhang Y, Huang C (2019) Credit card fraud detection based on machine learning. Comput Mater Contin 61(1):185–195. https://doi.org/10.32604/cmc.2019.06144

Femila Roseline J, Naidu G, Samuthira Pandi V, Alamelu alias Rajasree S, Mageswari N (2022) Autonomous credit card fraud detection using machine learning approach✰. Comput Electr Eng 102:108132. https://doi.org/10.1016/j.compeleceng.2022.108132

García-Ordás MT, Alaiz-Moretón H, Casteleiro-Roca J-L, Jove E, Benítez-Andrades JA, García-Rodríguez I, Quintián H, Calvo-Rolle JL (2023) Clustering techniques selection for a hybrid regression model: a case study based on a solar thermal system. Cybern Syst 54(3):286–305. https://doi.org/10.1080/01969722.2022.2030006

Gupta S, Mehta SK (2021) Data mining-based financial statement fraud detection: systematic literature review and meta-analysis to estimate data sample mapping of fraudulent companies against non-fraudulent companies. Global Bus Rev https://doi.org/10.1177/0972150920984857

Hajek P, Henriques R (2017) Mining corporate annual reports for intelligent detection of financial statement fraud—a comparative study of machine learning methods. Knowl-Based Syst 128:139–152. https://doi.org/10.1016/j.knosys.2017.05.001

Hamza C, Lylia A, Nadine C, Nicolas C (2023) Semi-supervised method to detect fraudulent transactions and identify fraud types while minimizing mounting costs. Int J Adv Comput Sci Appl 14(2). https://doi.org/10.14569/IJACSA.2023.0140298

Hilal W, Gadsden SA, Yawney J (2022) Financial fraud: a review of anomaly detection techniques and recent advances. Expert Syst Appl 193:116429. https://doi.org/10.1016/j.eswa.2021.116429

Hofmann H (1994) Statlog (German credit data). UCI Machine Learning Repository. https://doi.org/10.24432/C5NC77

Huang D, Mu D, Yang L, Cai X (2018) CoDetect: financial fraud detection with anomaly feature detection. IEEE Access 6:19161–19174. https://doi.org/10.1109/ACCESS.2018.2816564

Hwang J, Kim K (2020) An efficient domain-adaptation method using GAN for fraud detection. Int J Adv Comput Sci Appl 11(11). https://doi.org/10.14569/IJACSA.2020.0111113

Ileberi E, Sun Y, Wang Z (2021) Performance evaluation of machine learning methods for credit card fraud detection using SMOTE and AdaBoost. IEEE Access 9:165286–165294. https://doi.org/10.1109/ACCESS.2021.3134330

Ileberi E, Sun Y, Wang Z (2022) A machine learning based credit card fraud detection using the GA algorithm for feature selection. J Big Data 9(1):24. https://doi.org/10.1186/s40537-022-00573-8

Khan S, Alourani A, Mishra B, Ali A, Kamal M (2022) Developing a credit card fraud detection model using machine learning approaches. Int J Adv Comput Sci Appl 13(3). https://doi.org/10.14569/IJACSA.2022.0130350

Kim J, Kim H-J, Kim H (2019) Fraud detection for job placement using hierarchical clusters-based deep neural networks. Appl Intell 49(8):2842–2861. https://doi.org/10.1007/s10489-019-01419-2

Kim YJ, Baik B, Cho S (2016) Detecting financial misstatements with fraud intention using multi-class cost-sensitive learning. Expert Syst Appl 62:32–43. https://doi.org/10.1016/j.eswa.2016.06.016

Kitchenham B, Brereton P (2013) A systematic review of systematic review process research in software engineering. Inf Softw Technol 55(12):2049–2075. https://doi.org/10.1016/j.infsof.2013.07.010

Kitchenham B, Stuart C (2007) Guidelines for performing systematic literature reviews in software engineering. https://www.researchgate.net/publication/302924724_Guidelines_for_performing_Systematic_Literature_Reviews_in_Software_Engineering

Kootanaee AJ, Aghajan AAP, Shirvani MH (2021) A hybrid model based on machine learning and genetic algorithm for detecting fraud in financial statements. J Optim Ind Eng 14(2):183–201. https://doi.org/10.22094/JOIE.2020.1877455.1685

KPMG (2022) Una triple amenaza en las Américas. KMPG. https://kpmg.com/co/es/home/insights/2022/01/kpmg-fraud-outlook-survey.html

Kumar S, Ahmed R, Bharany S, Shuaib M, Ahmad T, Tag Eldin E, Rehman AU, Shafiq M (2022) Exploitation of machine learning algorithms for detecting financial crimes based on customers’ behavior. Sustainability 14(21):13875. https://doi.org/10.3390/su142113875

Kumbure MM, Lohrmann C, Luukka P, Porras J (2022) Machine learning techniques and data for stock market forecasting: a literature review. Expert Syst Appl 197:116659. https://doi.org/10.1016/j.eswa.2022.116659

Lee H, Choi E, Kim I, Choi D, Go W, Lee K, Yim H, Lee T (2018) Feature selection practice for unsupervised learning of credit card fraud detection. J Theor Appl Inf Technol 96(2):408–417

Google Scholar  

Lei X, Mohamad UH, Sarlan A, Shutaywi M, Daradkeh YI, Mohammed HO (2022) Development of an intelligent information system for financial analysis depend on supervised machine learning algorithms. Inf Process Manag 59(5):103036. https://doi.org/10.1016/j.ipm.2022.103036

Lokanan M, Tran V, Vuong NH (2019) Detecting anomalies in financial statements using machine learning algorithm. Asian J Account Res 4(2):181–201. https://doi.org/10.1108/AJAR-09-2018-0032

Lokanan ME, Sharma K (2022) Fraud prediction using machine learning: The case of investment advisors in Canada. Mach Learn Appl 8:100269. https://doi.org/10.1016/j.mlwa.2022.100269

Lokanan ME (2022) Predicting money laundering using machine learning and artificial neural networks algorithms in banks. J Appl Secur Res 1–25. https://doi.org/10.1080/19361610.2022.2114744

López-Rojas E (2017) Synthetic financial datasets for fraud detection. Kaggle. https://www.kaggle.com/datasets/ealaxi/paysim1

Machine Learning Group (2018) Credit card fraud detection. Kaggle. https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud

Madhurya MJ, Gururaj HL, Soundarya BC, Vidyashree KP, Rajendra AB (2022) Exploratory analysis of credit card fraud detection using machine learning techniques. Glob Transit Proc 3(1):31–37. https://doi.org/10.1016/j.gltp.2022.04.006

Malik EF, Khaw KW, Belaton B, Wong WP, Chew X (2022) Credit card fraud detection using a new hybrid machine learning architecture. Mathematics 10(9):1480. https://doi.org/10.3390/math10091480

Márquez Arcila RH (2019) Auditoría forense. Ecoe Ediciones

Misra S, Thakur S, Ghosh M, Saha SK (2020) An autoencoder based model for detecting fraudulent credit card transaction. Procedia Comput Sci 167:254–262. https://doi.org/10.1016/j.procs.2020.03.219

Moher D, Liberati A, Tetzlaff J, Altman DG (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med 6(7):e1000097. https://doi.org/10.1371/journal.pmed.1000097

Mongwe W, Malan K (2020) A survey of automated financial statement fraud detection with relevance to the South African context. S Afr Comput J 32(1). https://doi.org/10.18489/sacj.v32i1.777

Montes Salazar CA (2019) Riesgos de fraude en una auditoría de estados financieros (1.a ed.). Alfaomega. ISBN: 9789587782639. https://www.alfaomegacloud.com/reader/riesgos-de-fraude-en-una-auditoria-de-estados-financieros?location=3

Moreira MÂL, Junior C, de SR, Silva DF, de L, de Castro Junior MAP, Costa IP, de A, Gomes CFS, dos Santos M (2022) Exploratory analysis and implementation of machine learning techniques for predictive assessment of fraud in banking systems. Procedia Comput Sci 214:117–124. https://doi.org/10.1016/j.procs.2022.11.156

Narsimha B, Raghavendran CV, Rajyalakshmi P, Reddy GK, Bhargavi M, Naresh P (2022) Cyber defense in the age of artificial intelligence and machine learning for financial fraud detection application. Int J Electr Electron Res 10(2):87–92. https://doi.org/10.37391/ijeer.100206

Nian K, Zhang H, Tayal A, Coleman T, Li Y (2016) Auto insurance fraud detection using unsupervised spectral ranking for anomaly. J Financ Data Sci 2(1):58–75. https://doi.org/10.1016/j.jfds.2016.03.001

Nicholls J, Kuppa A, Le-Khac N-A (2021) Financial cybercrime: a comprehensive survey of deep learning approaches to tackle the evolving financial crime landscape. IEEE Access 9:163965–163986. https://doi.org/10.1109/ACCESS.2021.3134076

Nonnenmacher J, Marx Gómez J (2021) Unsupervised anomaly detection for internal auditing: Literature review and research agenda. Int J Digit Account Res 1–22. https://doi.org/10.4192/1577-8517-v21_1

Olszewski D (2014) Fraud detection using self-organizing map visualizing the user profiles. Knowl Based Syst 70:324–334. https://doi.org/10.1016/j.knosys.2014.07.008

Omershafiq (2019) Bitcoin network transactional metadata. Kaggle. https://www.kaggle.com/datasets/omershafiq/bitcoin-network-transactional-metadata

Ounacer S, Ait El Bour H, Oubrahim Y, Ghoumari MY, Azzouazi M (2018) Using isolation forest in anomaly detection: the case of credit card transactions. Period Eng Nat Sci 6(2):394. https://doi.org/10.21533/pen.v6i2.533

Palacio SM (2019) Abnormal pattern prediction: detecting fraudulent insurance property claims with semi-supervised machine-learning. Data Sci J 18(1):35. https://doi.org/10.5334/dsj-2019-035

Papík M, Papíková L (2022) Detecting accounting fraud in companies reporting under US GAAP through data mining. Int J Account Inf Syst 45:100559. https://doi.org/10.1016/j.accinf.2022.100559

Plakandaras V, Gogas P, Papadimitriou T, Tsamardinos I (2022) Credit card fraud detection with automated machine learning systems. Appl Artif Intell 36(1). https://doi.org/10.1080/08839514.2022.2086354

Polak P, Nelischer C, Guo H, Robertson DC (2020) Intelligent” finance and treasury management: what we can expect. AI Soc 35(3):715–726. https://doi.org/10.1007/s00146-019-00919-6

PricewaterhouseCoopers (2022) Encuesta Global de Crimen y Fraude Económico de PwC Colombia 2022 – 2023. https://www.pwc.com/co/es/publicaciones/encuesta-crimen-fraude-economico.html

Pumsirirat A, Yan L (2018) Credit card fraud detection using deep learning based on auto-encoder and restricted Boltzmann machine. Int J Adv Comput Sci Appl 9(1). https://doi.org/10.14569/IJACSA.2018.090103

Putten P (2000) Insurance Company Benchmark (COIL 2000). UCI Machine Learning Repository. https://doi.org/10.24432/C5630S

Quinlan R (1997) Statlog (Australian credit approval). UCI Machine Learning Repository. https://doi.org/10.24432/C59012

Rakowski R, Polak P, Kowalikova P (2021) Ethical aspects of the impact of AI: the status of humans in the era of artificial intelligence. Society 58(3):196–203. https://doi.org/10.1007/s12115-021-00586-8

Ramírez-Alpízar A, Jenkins M, Martínez A, Quesada-López C (2020a) Use of data mining and machine learning techniques for fraud detection in financial statements: a systematic mapping study. Rev Ibér Sist Tecnol Inf Lousada No. E28:97–109

Reurink A (2018) Financial fraud: a literature review. J Econ Surv 32(5):1292–1325. https://doi.org/10.1111/joes.12294

Rocha-Salazar J-J, Segovia-Vargas M-J, Camacho-Miñano M-M (2021) Money laundering and terrorism financing detection using neural networks and an abnormality indicator. Expert Syst Appl 169:114470. https://doi.org/10.1016/j.eswa.2020.114470

Roehrs A, da Costa CA, Righi R, da R, de Oliveira KSF (2017) Personal health records: a systematic literature review. J Med Internet Res 19(1):e13. https://doi.org/10.2196/jmir.5876

Rubio J, Barucca P, Gage G, Arroyo J, Morales-Resendiz R (2020) Classifying payment patterns with artificial neural networks: an autoencoder approach. Lat Am J Cent Bank 1(1–4):100013. https://doi.org/10.1016/j.latcb.2020.100013

Sahin Y, Bulkan S, Duman E (2013) A cost-sensitive decision tree approach for fraud detection. Expert Syst Appl 40(15):5916–5923. https://doi.org/10.1016/j.eswa.2013.05.021

Saputra M, Santosa PI, Permanasari AE (2023) Consumer behaviour and acceptance in fintech adoption: a systematic literature review. Acta Inform Pragensia 12(2):468–489. https://doi.org/10.18267/j.aip.222

Saragih MG, Chin J, Setyawasih R, Nguyen PT, Shankar K (2019) Machine learning methods for analysis fraud credit card transaction. Int J Eng Adv Technol 8(6S):870–874. https://doi.org/10.35940/ijeat.F1164.0886S19

Sathya M, Balakumar B (2022) Insurance fraud detection using novel machine learning technique. Int J Intell Syst Appl Eng 10(3):374–381

Savić M, Atanasijević J, Jakovetić D, Krejić N (2022) Tax evasion risk management using a hybrid unsupervised outlier detection method. Expert Syst Appl 193:116409. https://doi.org/10.1016/j.eswa.2021.116409

Seera M, Lim CP, Kumar A, Dhamotharan L, Tan KH (2021) An intelligent payment card fraud detection system. Ann Oper Res. https://doi.org/10.1007/s10479-021-04149-2

Shahana T, Lavanya V, Bhat AR (2023) State of the art in financial statement fraud detection: a systematic review. Technol Forecast Soc Change 192:122527. https://doi.org/10.1016/j.techfore.2023.122527

Shou M, Bao X, Yu J (2023) An optimal weighted machine learning model for detecting financial fraud. Appl Econ Lett 30(4):410–415. https://doi.org/10.1080/13504851.2021.1989367

Singh A, Jain A, Biable SE (2022) Financial fraud detection approach based on firefly optimization algorithm and support vector machine. Appl Comput Intell Soft Comput 2022:1–10. https://doi.org/10.1155/2022/1468015

Smith Q-J, Valverde R (2021) A perceptron based neural network data analytics architecture for the detection of fraud in credit card transactions in financial legacy systems. WSEAS Trans Syst Control 16:358–374. https://doi.org/10.37394/23203.2021.16.31

Sofy MA, Khafagy MH, Badry RM (2023) An intelligent Arabic model for recruitment fraud detection using machine learning. J Adv Informat Technol. https://doi.org/10.12720/jait.14.1.102-111

Srokosz M, Bobyk A, Ksiezopolski B, Wydra M (2023) Machine-learning-based scoring system for antifraud CISIRTs in banking environment. Electronics 12(1):251. https://doi.org/10.3390/electronics12010251

Subudhi S, Panigrahi S (2020) Use of optimized fuzzy C -Means clustering and supervised classifiers for automobile insurance fraud detection. J King Saud Univ— Comput Inf Sci 32(5):568–575. https://doi.org/10.1016/j.jksuci.2017.09.010

Ti Y-W, Hsin Y-Y, Dai T-S, Huang M-C, Liu L-C (2022) Feature generation and contribution comparison for electronic fraud detection. Sci Rep 12(1):18042. https://doi.org/10.1038/s41598-022-22130-2

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Tingfei H, Guangquan C, Kuihua H (2020) Using variational auto encoding in credit card fraud detection. IEEE Access 8:149841–149853. https://doi.org/10.1109/ACCESS.2020.3015600

Torrano C, Recuero P, Ramirez F, Hernández S, Torres J (2018) Machine learning aplicado a la ciberseguridad: técnicas y ejemplos en detección de amenazas. Zeroxword Computing

Udeze CL, Eteng IE, Ibor AE (2022) Application of machine learning and resampling techniques to credit card fraud detection. J Niger Soc Phys Sci 769. https://doi.org/10.46481/jnsps.2022.769

Usman A, Naveed N, Munawar S (2023) Intelligent anti-money laundering fraud control using graph-based machine learning model for the financial domain. J Cases Inf Technol 25(1):1–20. https://doi.org/10.4018/JCIT.316665

Van Capelleveen G, Poel M, Mueller RM, Thornton D, Van Hillegersberg J (2016) Outlier detection in healthcare fraud: a case study in the Medicaid dental domain. Int J Account Inf Syst 21:18–31. https://doi.org/10.1016/j.accinf.2016.04.001

Vanhoeyveld J, Martens D, Peeters B (2020) Value-added tax fraud detection with scalable anomaly detection techniques. Appl Soft Comput 86:105895. https://doi.org/10.1016/j.asoc.2019.105895

Vanini P, Rossi S, Zvizdic E, Domenig T (2023) Online payment fraud: from anomaly detection to risk management. Financ Innov 9(1):66. https://doi.org/10.1186/s40854-023-00470-w

Vanneschi L, Horn DM, Castelli M, Popovič A (2018) An artificial intelligence system for predicting customer default in e-commerce. Expert Syst Appl 104:1–21. https://doi.org/10.1016/j.eswa.2018.03.025

Viera J, Aguilar J, Rodríguez-Moreno M, Quintero-Gull C (2023) Analysis of the behavior pattern of energy consumption through online clustering techniques. Energies 16(4):1649. https://doi.org/10.3390/en16041649

Wadhwa VK, Saini AK, Kumar SS (2020) Financial fraud prediction models: a review of research evidence. Int J Sci Technol Res 9(1):677–680

West J, Bhattacharya M (2016) Intelligent financial fraud detection: a comprehensive review. Comput Secur 57:47–66. https://doi.org/10.1016/j.cose.2015.09.005

Whiting DG, Hansen JV, McDonald JB, Albrecht C, Albrecht WS (2012) Machine learning methods for detecting patterns of management fraud. Comput Intell 28(4):505–527. https://doi.org/10.1111/j.1467-8640.2012.00425.x

Article   MathSciNet   Google Scholar  

Wohlin C (2014) Guidelines for snowballing in systematic literature studies and a replication in software engineering. In: Proceedings of the 18th international conference on evaluation and assessment in software engineering. pp. 1–10

Wu B, Lv X, Alghamdi A, Abosaq H, Alrizq M (2023) Advancement of management information system for discovering fraud in master card based intelligent supervised machine learning and deep learning during SARS-CoV2. Inf Process Manag 60(2):103231. https://doi.org/10.1016/j.ipm.2022.103231

Article   PubMed   Google Scholar  

Xiong T, Ma Z, Li Z, Dai J (2022) The analysis of influence mechanism for internet financial fraud identification and user behavior based on machine learning approaches. Int J Syst Assur Eng Manag 13(S3):996–1007. https://doi.org/10.1007/s13198-021-01181-0

Xiuguo W, Shengyong D (2022) An analysis on financial statement fraud detection for Chinese listed companies using deep learning. IEEE Access 10:22516–22532. https://doi.org/10.1109/ACCESS.2022.3153478

Yeh I-C (2016) Default of credit card clients. UCI Machine Learning Repository. https://doi.org/10.24432/C55S3H

Zhang Z, Zhou X, Zhang X, Wang L, Wang P (2018) A model based on convolutional neural network for online transaction fraud detection. Secur Commun. Netw. 2018:1–9. https://doi.org/10.1155/2018/5680264

Zhao Z, Bai T (2022) Financial fraud detection and prediction in listed companies using SMOTE and machine learning algorithms. Entropy 24(8):1157. https://doi.org/10.3390/e24081157

Zhou H, Chai H, Qiu M (2018) Fraud detection within bankcard enrollment on mobile device based payment using machine learning. Front Inf Technol Electron Eng 19(12):1537–1545. https://doi.org/10.1631/FITEE.1800580

Zupan M, Budimir V, Letinic S (2020) Journal entry anomaly detection model. Intell Syst Account Financ Manag 27(4):197–209. https://doi.org/10.1002/isaf.1485

Download references

Acknowledgements

We would like to express our gratitude to the Universidad Cooperativa de Colombia, Ibagué campus, Espinal. This research work was supported by Universidad Cooperativa de Colombia and derived from research project INV3456 entitled “Detection of anomalies in financial data in social economy organizations through machine learning techniques” associated with the PLANAUDI, AQUA and SINERGIA UCC group, from the Research Center of the Public Accounting and Systems Engineering program of the UCC Ibagué campus.

Author information

Authors and affiliations.

School of Public Accounting, Universidad Cooperativa de Colombia, 730001, Ibagué-Espinal campus, Ibagué, Colombia

Ludivia Hernandez Aros & John Johver Moreno Hernandez

School of Systems Engineering, Universidad Cooperativa de Colombia, 730001, Ibagué-Espinal campus, Ibagué, Colombia

Luisa Ximena Bustamante Molano & Fernando Gutierrez-Portela

School of Business Administration, Universidad Cooperativa de Colombia, 730001, Ibagué-Espinal campus, Ibagué, Colombia

Mario Samuel Rodríguez Barrero

You can also search for this author in PubMed   Google Scholar

Contributions

All authors contributed to the creation and design of the study.

Corresponding author

Correspondence to Ludivia Hernandez Aros .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethical approval and consent to participate

The authors declare that they have no human participants, human data, or human tissue.

Consent to publish

The authors have no data from any individual person on any form.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Hernandez Aros, L., Bustamante Molano, L.X., Gutierrez-Portela, F. et al. Financial fraud detection through the application of machine learning techniques: a literature review. Humanit Soc Sci Commun 11 , 1130 (2024). https://doi.org/10.1057/s41599-024-03606-0

Download citation

Received : 15 November 2023

Accepted : 13 August 2024

Published : 03 September 2024

DOI : https://doi.org/10.1057/s41599-024-03606-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

database in literature review

IMAGES

  1. A Comprehensive Guide to the Best Literature Search Databases

    database in literature review

  2. The Benefits Of Using A Literature Database

    database in literature review

  3. (PDF) A Literature Review on Evolving Database

    database in literature review

  4. Structure of the Database for the Literature Review.

    database in literature review

  5. (PDF) Optimal database combinations for literature searches in

    database in literature review

  6. Literature Review Database Management

    database in literature review

VIDEO

  1. How to Produce Database Specification

  2. Masterclass on Database Literature Search for Research Writing and Paper Publication

  3. Manage Your Research Literature using Linked Database in Notion

  4. Bloom's Literature Tutorial

  5. مراجعة قبل الإختبار

  6. Modeling Relationships and Hierarchies in a Document Database

COMMENTS

  1. Literature Search: Databases and Gray Literature

    Gray Literature. Gray Literature is the term for information that falls outside the mainstream of published journal and mongraph literature, not controlled by commercial publishers. includes: hard to find studies, reports, or dissertations. conference abstracts or papers. governmental or private sector research.

  2. Choosing Databases

    Use Multiple Databases. While not every literature search you undertake will be for a systematic review, the Cochrane Handbook's statement that "a search of MEDLINE alone is not considered adequate" holds true for almost all literature reviews. You need to go beyond one database to get a more comprehensive picture of your topic and to minimize ...

  3. Literature searches: what databases are available?

    PubMed. PubMed was launched in 1996 and, since June 1997, provides free and unlimited access for all users through the internet. PubMed database contains more than 30 million references of biomedical literature from approximately 7,000 journals. The largest percentage of records in PubMed comes from MEDLINE (95%), which contains 25 million ...

  4. Optimal database combinations for literature searches in systematic

    To ensure adequate performance in searches (i.e., recall, precision, and number needed to read), we find that literature searches for a systematic review should, at minimum, be performed in the combination of the following four databases: Embase, MEDLINE (including Epub ahead of print), Web of Science Core Collection, and Google Scholar.

  5. A practical guide to data analysis in general literature reviews

    A general literature review starts with formulating a research question, defining the population, and conducting a systematic search in scientific databases, steps that are well-described elsewhere. 1,2,3 Once students feel confident that they have thoroughly combed through relevant databases and found the most relevant research on the topic ...

  6. Systematic Reviews and Evidence Syntheses : Databases

    Three databases alone does not complete the search standards for systematic review requirements as you will also have additional searches of the grey literature and hand searches to complete. Which databases you search is highly dependent on your systematic review topic, so it is recommended you meet with a librarian .

  7. Systematic reviews: Structure, form and content

    Systematic reviews: Structure, form and content. This article aims to provide an overview of the structure, form and content of systematic reviews. It focuses in particular on the literature searching component, and covers systematic database searching techniques, searching for grey literature and the importance of librarian involvement in the ...

  8. Where to search when doing a literature review

    Aim to be as comprehensive as possible when conducting a literature review. Knowing exactly where to search for information is important. Work through the steps to find out the best databases to search for information on your research topic. 1. Start with research databases.

  9. Databases and Sources

    No one database can cover the literature for any topic. For medical topics, a combination of PubMed (or other search of PubMed data) plus Embase, Web of Science, and Google Scholar has been shown to provide adequate recall ( Syst Rev. 2017;6 (1):245 ). For topics that reach beyond the biomedicine, other databases need to be considered. PubMed.

  10. Integrity of Databases for Literature Searches in Nursing

    The quality of literature used as the foundation to any research or scholarly project is critical. The purpose of this study was to analyze the extent to which predatory nursing journals were included in credible databases, MEDLINE, Cumulative Index to Nursing and Allied Health Literature (CINAHL), and Scopus, commonly used by nurse scholars when searching for information.

  11. Literature searches in systematic reviews and meta-analyses: A review

    Second, although database searches tend to be the most used study identification method, a variety of other complementary strategies exist that can be incorporated in conjunction with a database search to help to ensure comprehensive coverage of a research literature. We report and define the most frequently used complementary strategies in Table 1. ...

  12. How to do a Literature Search: Choosing a database

    Some databases index the literature from all subject areas, others specialise in one particular subject area. ... Some articles contain original data from research projects: these are referred to as primary literature. In a review, the author has selected the most important primary articles and given an overview of the key developments on a topic.

  13. LibGuides: Doing the literature review: Selecting databases

    There are different types of literature databases: Abstracting & indexing databases (A&I) provide metadata and abstracts. The metadata includes the title, author (s), date of publication, journal title, volume and issue, page numbers, keywords, DOI, etc. Discipline specific databases, such as PsycINFO, Philosopher's Index, Sociologial Abstracts ...

  14. The best academic research databases [Update 2024]

    Organize your papers in one place. Try Paperpile. 1. Scopus. Scopus is one of the two big commercial, bibliographic databases that cover scholarly literature from almost any discipline. Besides searching for research articles, Scopus also provides academic journal rankings, author profiles, and an h-index calculator. 2.

  15. Scopus

    About Scopus. Scopus is the largest abstract and citation database of peer-reviewed literature: scientific journals, books and conference proceedings. Delivering a comprehensive overview of the world's research output in the fields of science, technology, medicine, social sciences, and arts and humanities, Scopus features smart tools to track ...

  16. Literature Review Guide: Search strategies and Databases

    Performing a Keyword search in library databases: Sample research topic: The effect of supplements on athletic performance. Step 1: Identify the concepts in your research topic: In this case 'Supplement' is one concept and 'Athletic performance' is another separate concept Step 2: Identify Keywords for these concepts: In this case use the concepts themselves as keywords and also other synonyms ...

  17. Optimal database combinations for literature searches in systematic

    Investigators and information specialists searching for relevant references for a systematic review (SR) are generally advised to search multiple databases and to use additional methods to be able to adequately identify all literature related to the topic of interest [1,2,3,4,5,6].The Cochrane Handbook, for example, recommends the use of at least MEDLINE and Cochrane Central and, when ...

  18. Literature review as a research methodology: An overview and guidelines

    As mentioned previously, there are a number of existing guidelines for literature reviews. Depending on the methodology needed to achieve the purpose of the review, all types can be helpful and appropriate to reach a specific goal (for examples, please see Table 1).These approaches can be qualitative, quantitative, or have a mixed design depending on the phase of the review.

  19. LSBU Library: Literature Reviews: Developing a Literature Review

    Developing a Literature Review . 1. Purpose and Scope. To help you develop a literature review, gather information on existing research, sub-topics, relevant research, and overlaps. ... Use a variety of sources, including online databases, university libraries, and reference lists from relevant articles. This ensures a comprehensive coverage of ...

  20. Scopus

    Scopus outperforms other abstract and citation databases by providing a broader range of research metrics covering nearly twice the number of peer-reviewed publications. Using Scopus metrics, you can demonstrate the influence of your institution's scholarly output. Discover the details behind our metrics, giving you confidence in knowing how ...

  21. Databases

    CINAHL Comprehensive literature database for nursing and allied health disciplines. Cochrane Library The Cochrane Library provides access to systematic reviews and clinical trials. Users can browse by topic or review group.

  22. Literature Reviews

    Structure. The three elements of a literature review are introduction, body, and conclusion. Introduction. Define the topic of the literature review, including any terminology. Introduce the central theme and organization of the literature review. Summarize the state of research on the topic. Frame the literature review with your research question.

  23. How to Write a Literature Review

    Examples of literature reviews. Step 1 - Search for relevant literature. Step 2 - Evaluate and select sources. Step 3 - Identify themes, debates, and gaps. Step 4 - Outline your literature review's structure. Step 5 - Write your literature review.

  24. Technology-based interventions for children with reading difficulties

    Technology-based interventions have been used to improve reading skills for students with reading difficulties. Thus, many literature reviews and meta-analyses have investigated the effectiveness of this type of intervention; however, constant changes in the technology field make it important to review the most recent studies and how these studies were implemented to improve reading skills for ...

  25. Sustainability

    This paper aims to conduct a bibliometric analysis and traditional literature review concerning collaborative project delivery (CPD) methods, with an emphasis on design-build (DB), construction management at risk (CMAR), and integrated project delivery (PD) Methods. This article seeks to identify the most influential publications, reveal the advantages and disadvantages of CPD, and determine ...

  26. Financial fraud detection through the application of machine learning

    Review Article; Open access; Published: 03 September 2024 Financial fraud detection through the application of machine learning techniques: a literature review. Ludivia Hernandez Aros ORCID: orcid ...