• UNC Libraries
  • HSL Academic Process
  • Systematic Reviews
  • Step 7: Extract Data from Included Studies

Systematic Reviews: Step 7: Extract Data from Included Studies

Created by health science librarians.

HSL Logo

  • Step 1: Complete Pre-Review Tasks
  • Step 2: Develop a Protocol
  • Step 3: Conduct Literature Searches
  • Step 4: Manage Citations
  • Step 5: Screen Citations
  • Step 6: Assess Quality of Included Studies

About Step 7: Extract Data from Included Studies

About data extraction, select a data extraction tool, what should i extract, helpful tip- data extraction.

  • Data extraction FAQs
  • Step 8: Write the Review

  Check our FAQ's

   Email us

  Chat with us (during business hours)

   Call (919) 962-0800

   Make an appointment with a librarian

  Request a systematic or scoping review consultation

In Step 7, you will skim the full text of included articles to collect information about the studies in a table format (extract data), to summarize the studies and make them easier to compare. You will: 

  • Make sure you have collected the full text of any included articles.
  • Choose the pieces of information you want to collect from each study.
  • Choose a method for collecting the data.
  • Create the data extraction table.
  • Test the data collection table (optional). 
  • Collect (extract) the data. 
  • Review the data collected for any errors. 

For accuracy, two or more people should extract data from each study. This process can be done by hand or by using a computer program. 

Click an item below to see how it applies to Step 7: Extract Data from Included Studies.

Reporting your review with PRISMA

If you reach the data extraction step and choose to exclude articles for any reason, update the number of included and excluded studies in your PRISMA flow diagram.

Managing your review with Covidence

Covidence allows you to assemble a custom data extraction template, have two reviewers conduct extraction, then send their extractions for consensus.

How a librarian can help with Step 7

A librarian can advise you on data extraction for your systematic review, including: 

  • What the data extraction stage of the review entails
  • Finding examples in the literature of similar reviews and their completed data tables
  • How to choose what data to extract from your included articles 
  • How to create a randomized sample of citations for a pilot test
  • Best practices for reporting your included studies and their important data in your review

In this step of the systematic review, you will develop your evidence tables, which give detailed information for each study (perhaps using a PICO framework as a guide), and summary tables, which give a high-level overview of the findings of your review. You can create evidence and summary tables to describe study characteristics, results, or both. These tables will help you determine which studies, if any, are eligible for quantitative synthesis.

Data extraction requires a lot of planning.  We will review some of the tools you can use for data extraction, the types of information you will want to extract, and the options available in the systematic review software used here at UNC, Covidence .

How many people should extract data?

The Cochrane Handbook and other studies strongly suggest at least two reviewers and extractors to reduce the number of errors.

  • Chapter 5: Collecting Data (Cochrane Handbook)
  • A Practical Guide to Data Extraction for Intervention Systematic Reviews (Covidence)

Click on a type of data extraction tool below to see some more information about using that type of tool and what UNC has to offer.

Systematic Review Software (Covidence)

Most systematic review software tools have data extraction functionality that can save you time and effort.  Here at UNC, we use a systematic review software called Covidence. You can see a more complete list of options in the Systematic Review Toolbox .

Covidence allows you to create and publish a data extraction template with text fields, single-choice items, section headings and section subheadings; perform dual and single reviewer data extraction ; review extractions for consensus ; and export data extraction and quality assessment to a CSV with each item in a column and each study in a row.

  • Covidence@UNC Guide
  • Covidence for Data Extraction (Covidence)
  • A Practical Guide to Data Extraction for Intervention Systematic Reviews(Covidence)

Spreadsheet or Database Software (Excel, Google Sheets)

You can also use spreadsheet or database software to create custom extraction forms. Spreadsheet software (such as Microsoft Excel) has functions such as drop-down menus and range checks can speed up the process and help prevent data entry errors. Relational databases (such as Microsoft Access) can help you extract information from different categories like citation details, demographics, participant selection, intervention, outcomes, etc.

  • Microsoft Products (UNC Information Technology Services)

Cochrane RevMan

RevMan offers collection forms for descriptive information on population, interventions, and outcomes, and quality assessments, as well as for data for analysis and forest plots. The form elements may not be changed, and data must be entered manually. RevMan is a free software download.

  • Cochrane RevMan 5.0 Download
  • RevMan for Non-Cochrane Reviews (Cochrane Training)

Survey or Form Software (Qualtrics, Poll Everywhere)

Survey or form tools can help you create custom forms with many different question types, such as multiple choice, drop downs, ranking, and more. Content from these tools can often be exported to spreadsheet or database software as well. Here at UNC we have access to the survey/form software Qualtrics & Poll Everywhere.

  • Qualtrics (UNC Information Technology Services)
  • Poll Everywhere (UNC Information Technology Services)

Electronic Documents or Paper & Pencil (Word, Google Docs)

In the past, people often used paper and pencil to record the data they extracted from articles. Handwritten extraction is less popular now due to widespread electronic tools. You can record extracted data in electronic tables or forms created in Microsoft Word or other word processing programs, but this process may take longer than many of our previously listed methods. If chosen, the electronic document or paper-and-pencil extraction methods should only be used for small reviews, as larger sets of articles may become unwieldy. These methods may also be more prone to errors in data entry than some of the more automated methods.

There are benefits and limitations to each method of data extraction.  You will want to consider:

  • The cost of the software / tool
  • Shareability / versioning
  • Existing versus custom data extraction forms
  • The data entry process
  • Interrater reliability

For example, in Covidence you may spend more time building your data extraction form, but save time later in the extraction process as Covidence can automatically highlight discrepancies for review and resolution between different extractors. Excel may require less time investment to create an extraction form, but it may take longer for you to match and compare data between extractors. More in-depth comparison of the benefits and limitations of each extraction tool can be found in the table below.

Sample information to include in an extraction table

It may help to consult other similar systematic reviews to identify what data to collect or to think about your question in a framework such as PICO .

Helpful data for an intervention question may include:

  • Information about the article (author(s), year of publication, title, DOI)
  • Information about the study (study type, participant recruitment / selection / allocation, level of evidence, study quality)
  • Patient demographics (age, sex, ethnicity, diseases / conditions, other characteristics related to the intervention / outcome)
  • Intervention (quantity, dosage, route of administration, format, duration, time frame, setting)
  • Outcomes (quantitative and / or qualitative)

If you plan to synthesize data, you will want to collect additional information such as sample sizes, effect sizes, dependent variables, reliability measures, pre-test data, post-test data, follow-up data, and statistical tests used.

Extraction templates and approaches should be determined by the needs of the specific review.   For example, if you are extracting qualitative data, you will want to extract data such as theoretical framework, data collection method, or role of the researcher and their potential bias.

  • Supplementary Guidance for Inclusion of Qualitative Research in Cochrane Systematic Reviews of Interventions (Cochrane Collaboration Qualitative Methods Group)
  • Look for an existing extraction form or tool to help guide you.  Use existing systematic reviews on your topic to identify what information to collect if you are not sure what to do.
  • Train the review team on the extraction categories and what type of data would be expected.  A manual or guide may help your team establish standards.
  • Pilot the extraction / coding form to ensure data extractors are recording similar data. Revise the extraction form if needed.
  • Discuss any discrepancies in coding throughout the process.
  • Document any changes to the process or the form.  Keep track of the decisions the team makes and the reasoning behind them.
  • << Previous: Step 6: Assess Quality of Included Studies
  • Next: Step 8: Write the Review >>
  • Last Updated: Apr 24, 2024 2:00 PM
  • URL: https://guides.lib.unc.edu/systematic-reviews

Search & Find

  • E-Research by Discipline
  • More Search & Find

Places & Spaces

  • Places to Study
  • Book a Study Room
  • Printers, Scanners, & Computers
  • More Places & Spaces
  • Borrowing & Circulation
  • Request a Title for Purchase
  • Schedule Instruction Session
  • More Services

Support & Guides

  • Course Reserves
  • Research Guides
  • Citing & Writing
  • More Support & Guides
  • Mission Statement
  • Diversity Statement
  • Staff Directory
  • Job Opportunities
  • Give to the Libraries
  • News & Exhibits
  • Reckoning Initiative
  • More About Us

UNC University Libraries Logo

  • Search This Site
  • Privacy Policy
  • Accessibility
  • Give Us Your Feedback
  • 208 Raleigh Street CB #3916
  • Chapel Hill, NC 27515-8890
  • 919-962-1053

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 26, Issue 3
  • Summarising good practice guidelines for data extraction for systematic reviews and meta-analysis
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0001-6589-5456 Kathryn S Taylor 1 ,
  • http://orcid.org/0000-0002-7791-8552 Kamal R Mahtani 1 ,
  • http://orcid.org/0000-0003-1139-655X Jeffrey K Aronson 2
  • 1 Nuffield Department of Primary Care Health Sciences , University of Oxford , Oxford , UK
  • 2 Centre for Evidence Based Medicine , University of Oxford , Oxford , UK
  • Correspondence to Dr Kathryn S Taylor, Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford OX2 6GG, UK; kathryn.taylor{at}phc.ox.ac.uk

https://doi.org/10.1136/bmjebm-2020-111651

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Data extraction is the process of a systematic review that occurs between identifying eligible studies and analysing the data, whether it can be a qualitative synthesis or a quantitative synthesis involving the pooling of data in a meta-analysis. The aims of data extraction are to obtain information about the included studies in terms of the characteristics of each study and its population and, for quantitative synthesis, to collect the necessary data to carry out meta-analysis. In systematic reviews, information about the included studies will also be required to conduct risk of bias assessments, but these data are not the focus of this article.

Following good practice when extracting data will help make the process efficient and reduce the risk of errors and bias. Failure to follow good practice risks basing the analysis on poor quality data, and therefore providing poor quality inputs, which will result in poor quality outputs, with unreliable conclusions and invalid study findings. In computer science, this is known as ‘garbage in, garbage out’ or ‘rubbish in, rubbish out’. Furthermore, providing insufficient information about the included studies for readers to be able to assess the generalisability of the findings from a systematic review will undermine the value of the pooled analysis. Such failures will cause your systematic review and meta-analysis to be less useful than it ought to be.

Some guidelines for data extraction are formal, including those described in the Cochrane Handbook for Systematic Reviews of Interventions, 1 the Cochrane Handbook for Diagnostic Test Accuracy Reviews, 2 3 the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) reporting guidelines for systematic reviews and their protocols 4–7 and other sources, 8 9 , formal guidelines are complemented with informal advice in the form of examples and videos on how to avoid possible pitfalls and guidance on how to carry out data extraction more efficiently. 10–12

Guidelines for data extraction involve recommendations for:

Duplication.

Anticipation.

Organisation.

Documentation.

Duplication

Ideally, at least two reviewers should extract data independently, 1 2 9–12 particularly outcome data, 1 as data extraction by only one person can generate errors. 1 13 Data will be extracted from the same sources into identical data extraction forms. If time or resources prevent independent dual extraction, one reviewer should extract the full data and another should independently check the extracted data for both accuracy and completeness. 8 In rapid or restricted reviews, an acceptable level of verification of the data extraction by the first reviewer may be achieved by a second reviewer extracting a random sample of data. 14 Then before comparing the extracted data and seeking a consensus, the extent to which coded (categorical) data extracted by two different reviewers is consistent and may be measured using kappa statistics, 1 2 12 15 or Fleis kappa statistics, when more than two people have extracted the data. 16 Formal comparisons are not routine in Cochrane Reviews, and the Cochrane Handbook recommends that if agreement is to be formally assessed, it should focus only on key outcomes or risk of bias assessments. 1

Anticipation

Disagreement between reviewers when extracting data . Some differences in extracted data are simply due to human error, and such conflicts can be easily resolved. Conflicts and questions about clinical issues, about which data to extract, or whether the relevant data have been reported can be addressed by involving both clinicians and methodologists in data extraction. 3 12 The protocol should set out the strategy for resolving disagreements between reviewers, using consensus and, if necessary, arbitration by another reviewer. If arbitration fails, the study authors should be contacted for clarification. If that is unsuccessful, the disagreement should be documented and reported. 1 6 7

Outcome data being reported in different ways, which are not necessarily suitable for meta-analysis . Many resources are available for helping with data extraction, involving various methods and equations to transform reported data or make estimates. 1 2 10 The protocol may acknowledge this by stating that any estimates made and their justification will be documented and reported.

Including estimates and alternative data . It is also important to anticipate the roles that extracted data will play in the analysis. Studies should be highlighted when multiple sets of outcome data are reported or when estimates have been made in extracting outcome data. 9 Clearly identifying these studies during the data extraction phase will ensure that the studies can be quickly identified later, during the data analysis phase.

Risk of double counting patients . Some studies involve multiple reports, but the study should be the unit of interest. 1 Tracking down multiple reports and ensuring that patients are not double-counted may require good detective skills.

Risk of human error, inconsistency and subjectivity when extracting data . The protocol should state whether data extraction was independent and carried out in duplicate, if a standardised data extraction form was used, and whether it was piloted. The protocol should also state any special instruction, for example, only extracting prespecified eligibility criteria. 1 2 6–9 11 12

Ambiguous or incomplete data . Authors should be contacted to seek clarification about data and make enquiries about the availability of unreported data. 1 2 9 The process of confirming and obtaining data from authors should be prespecified 6 7 including the number of attempts that will be made to make contact, who will be contacted (eg, the first author), and what form the data request will take. Asking for data that are likely to be readily available will reduce the risk of authors offering data with preconditions.

Extracting the right amount of data . Time and resources are wasted extracting data that will not be analysed, such as the language of the publication and the journal name when other extracted data (first author, title and year) adequately identify the publication. The aim of the systematic review will determine which study characteristics are extracted. 16 For example, if the prevalence of a disease is important and is known to vary across cities, the country and city should be extracted. Any assumptions and simplifications should be listed in the protocol. 6 7 The protocol should allow some flexibility for alternative analyses by not over-aggregating data, for example, collecting data on smoking status in categories ‘smoker/ex-smoker/never smoked’ instead of ‘smoker/non-smoker’. 11

Organisation

Guidelines recommend that the process of extracting data should be well organised. This involves having a clear plan, which should feature in the protocol, stating who will extract the data, the actual data that will be extracted, details about the use, development, piloting of a standardised data extraction form 1 6–9 and having good data management procedures, 10 including backing up files frequently. 11 Standardised data extraction forms can provide consistency in a systematic review, while at the same time reducing biases and improving validity and reliability. It may be possible to reuse a form from another review. 12 It is recommended that the data extraction form is piloted and that reviewers receive training in advance 1 2 12 and instructions should be given with extraction forms (eg, about codes and definitions used in the form) to reduce subjectivity and to ensure consistency. 1 2 12 It is recommended that instructions be integrated into the extraction form, so that they are seen each time data are extracted, rather than having instructions in a separate instruction document, which may be ignored or forgotten. 2 Data extraction forms may be paper based or electronic or involve sophisticated data systems. Each approach will have advantages and disadvantages. 1 11 17 For example, using a paper-based form does not require internet access or software skills, but using an electronic extraction form facilitates data analysis. Data systems, while costly, can provide online data storage and automated comparisons between data that have been independently extracted.

Documentation

Data extraction procedures and preanalysis calculations should be well documented 9 10 and based on ‘good bookkeeping’. 5 10 Having good documentation supports accurate reporting, transparency and the ability to scrutinise and replicate the analysis. Reporting guidelines for systematic reviews are provided by PRISMA, 4 5 and these correspond to the set of PRISMA guidelines for protocols of systematic reviews. 6 7 In cases where data are derived from multiple reports, documenting the source of each data item will facilitate the process of resolving disagreements with other reviewers, by enabling the source of conflict to be quickly identified. 10

Data extraction is both time consuming and error-prone, and automation of data extraction is still in its infancy. 1 18 Following both formal and informal guidelines for good practice in data extraction ( table 1 ) will make the process efficient and reduce the risk of errors and bias when extracting data. This will contribute towards ensuring that systematic reviews and meta-analyses are carried out to a high standard.

  • View inline

Summarising guidelines for extracting data for systematic reviews meta-analysis

  • Higgins JPT ,
  • Leeflang MM ,
  • Davenport CF ,
  • Takwoini Y ,
  • Davenport CF
  • Liberati A ,
  • Tetzlaff J , et al
  • Altman DG ,
  • Shamseer L ,
  • Clarke M , et al
  • Centre for Reviews and Dissemination, University of York
  • Collaboration for Environmental Evidence
  • Dalhousie University
  • Buscemi N ,
  • Hartling L ,
  • Vandermeer B , et al
  • Plüddemann A ,
  • Aronson JK ,
  • Onakpoya I , et al
  • Vedula SS ,
  • Hadar N , et al
  • Jonnalagadda SR ,

Twitter @dataextips

Contributors KST and KRM conceived the idea of the series of which this is one part. KST wrote the first draft of the manuscript. All authors revised the manuscript and agreed the final version.

Funding This research was supported by the National Institute for Health Research Applied Research Collaboration Oxford and Thames Valley at Oxford Health NHS Foundation Trust.

Disclaimer The views expressed in this publication are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care.

Competing interests KRM and JKA were associate editors of BMJ Evidence Medicine at the time of submission.

Provenance and peer review Commissioned; internally peer reviewed.

Read the full text or download the PDF:

Jump to navigation

Home

Cochrane Training

Chapter 5: collecting data.

Tianjing Li, Julian PT Higgins, Jonathan J Deeks

Key Points:

  • Systematic reviews have studies, rather than reports, as the unit of interest, and so multiple reports of the same study need to be identified and linked together before or after data extraction.
  • Because of the increasing availability of data sources (e.g. trials registers, regulatory documents, clinical study reports), review authors should decide on which sources may contain the most useful information for the review, and have a plan to resolve discrepancies if information is inconsistent across sources.
  • Review authors are encouraged to develop outlines of tables and figures that will appear in the review to facilitate the design of data collection forms. The key to successful data collection is to construct easy-to-use forms and collect sufficient and unambiguous data that faithfully represent the source in a structured and organized manner.
  • Effort should be made to identify data needed for meta-analyses, which often need to be calculated or converted from data reported in diverse formats.
  • Data should be collected and archived in a form that allows future access and data sharing.

Cite this chapter as: Li T, Higgins JPT, Deeks JJ (editors). Chapter 5: Collecting data. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.4 (updated August 2023). Cochrane, 2023. Available from www.training.cochrane.org/handbook .

5.1 Introduction

Systematic reviews aim to identify all studies that are relevant to their research questions and to synthesize data about the design, risk of bias, and results of those studies. Consequently, the findings of a systematic review depend critically on decisions relating to which data from these studies are presented and analysed. Data collected for systematic reviews should be accurate, complete, and accessible for future updates of the review and for data sharing. Methods used for these decisions must be transparent; they should be chosen to minimize biases and human error. Here we describe approaches that should be used in systematic reviews for collecting data, including extraction of data directly from journal articles and other reports of studies.

5.2 Sources of data

Studies are reported in a range of sources which are detailed later. As discussed in Section 5.2.1 , it is important to link together multiple reports of the same study. The relative strengths and weaknesses of each type of source are discussed in Section 5.2.2 . For guidance on searching for and selecting reports of studies, refer to Chapter 4 .

Journal articles are the source of the majority of data included in systematic reviews. Note that a study can be reported in multiple journal articles, each focusing on some aspect of the study (e.g. design, main results, and other results).

Conference abstracts are commonly available. However, the information presented in conference abstracts is highly variable in reliability, accuracy, and level of detail (Li et al 2017).

Errata and letters can be important sources of information about studies, including critical weaknesses and retractions, and review authors should examine these if they are identified (see MECIR Box 5.2.a ).

Trials registers (e.g. ClinicalTrials.gov) catalogue trials that have been planned or started, and have become an important data source for identifying trials, for comparing published outcomes and results with those planned, and for obtaining efficacy and safety data that are not available elsewhere (Ross et al 2009, Jones et al 2015, Baudard et al 2017).

Clinical study reports (CSRs) contain unabridged and comprehensive descriptions of the clinical problem, design, conduct and results of clinical trials, following a structure and content guidance prescribed by the International Conference on Harmonisation (ICH 1995). To obtain marketing approval of drugs and biologics for a specific indication, pharmaceutical companies submit CSRs and other required materials to regulatory authorities. Because CSRs also incorporate tables and figures, with appendices containing the protocol, statistical analysis plan, sample case report forms, and patient data listings (including narratives of all serious adverse events), they can be thousands of pages in length. CSRs often contain more data about trial methods and results than any other single data source (Mayo-Wilson et al 2018). CSRs are often difficult to access, and are usually not publicly available. Review authors could request CSRs from the European Medicines Agency (Davis and Miller 2017). The US Food and Drug and Administration had historically avoided releasing CSRs but launched a pilot programme in 2018 whereby selected portions of CSRs for new drug applications were posted on the agency’s website. Many CSRs are obtained through unsealed litigation documents, repositories (e.g. clinicalstudydatarequest.com ), and other open data and data-sharing channels (e.g. The Yale University Open Data Access Project) (Doshi et al 2013, Wieland et al 2014, Mayo-Wilson et al 2018)).

Regulatory reviews such as those available from the US Food and Drug Administration or European Medicines Agency provide useful information about trials of drugs, biologics, and medical devices submitted by manufacturers for marketing approval (Turner 2013). These documents are summaries of CSRs and related documents, prepared by agency staff as part of the process of approving the products for marketing, after reanalysing the original trial data. Regulatory reviews often are available only for the first approved use of an intervention and not for later applications (although review authors may request those documents, which are usually brief). Using regulatory reviews from the US Food and Drug Administration as an example, drug approval packages are available on the agency’s website for drugs approved since 1997 (Turner 2013); for drugs approved before 1997, information must be requested through a freedom of information request. The drug approval packages contain various documents: approval letter(s), medical review(s), chemistry review(s), clinical pharmacology review(s), and statistical reviews(s).

Individual participant data (IPD) are usually sought directly from the researchers responsible for the study, or may be identified from open data repositories (e.g. www.clinicalstudydatarequest.com ). These data typically include variables that represent the characteristics of each participant, intervention (or exposure) group, prognostic factors, and measurements of outcomes (Stewart et al 2015). Access to IPD has the advantage of allowing review authors to reanalyse the data flexibly, in accordance with the preferred analysis methods outlined in the protocol, and can reduce the variation in analysis methods across studies included in the review. IPD reviews are addressed in detail in Chapter 26 .

MECIR Box 5.2.a Relevant expectations for conduct of intervention reviews

5.2.1 Studies (not reports) as the unit of interest

In a systematic review, studies rather than reports of studies are the principal unit of interest. Since a study may have been reported in several sources, a comprehensive search for studies for the review may identify many reports from a potentially relevant study (Mayo-Wilson et al 2017a, Mayo-Wilson et al 2018). Conversely, a report may describe more than one study.

Multiple reports of the same study should be linked together (see MECIR Box 5.2.b ). Some authors prefer to link reports before they collect data, and collect data from across the reports onto a single form. Other authors prefer to collect data from each report and then link together the collected data across reports. Either strategy may be appropriate, depending on the nature of the reports at hand. It may not be clear that two reports relate to the same study until data collection has commenced. Although sometimes there is a single report for each study, it should never be assumed that this is the case.

MECIR Box 5.2.b Relevant expectations for conduct of intervention reviews

It can be difficult to link multiple reports from the same study, and review authors may need to do some ‘detective work’. Multiple sources about the same trial may not reference each other, do not share common authors (Gøtzsche 1989, Tramèr et al 1997), or report discrepant information about the study design, characteristics, outcomes, and results (von Elm et al 2004, Mayo-Wilson et al 2017a).

Some of the most useful criteria for linking reports are:

  • trial registration numbers;
  • authors’ names;
  • sponsor for the study and sponsor identifiers (e.g. grant or contract numbers);
  • location and setting (particularly if institutions, such as hospitals, are named);
  • specific details of the interventions (e.g. dose, frequency);
  • numbers of participants and baseline data; and
  • date and duration of the study (which also can clarify whether different sample sizes are due to different periods of recruitment), length of follow-up, or subgroups selected to address secondary goals.

Review authors should use as many trial characteristics as possible to link multiple reports. When uncertainties remain after considering these and other factors, it may be necessary to correspond with the study authors or sponsors for confirmation.

5.2.2 Determining which sources might be most useful

A comprehensive search to identify all eligible studies from all possible sources is resource-intensive but necessary for a high-quality systematic review (see Chapter 4 ). Because some data sources are more useful than others (Mayo-Wilson et al 2018), review authors should consider which data sources may be available and which may contain the most useful information for the review. These considerations should be described in the protocol. Table 5.2.a summarizes the strengths and limitations of different data sources (Mayo-Wilson et al 2018). Gaining access to CSRs and IPD often takes a long time. Review authors should begin searching repositories and contact trial investigators and sponsors as early as possible to negotiate data usage agreements (Mayo-Wilson et al 2015, Mayo-Wilson et al 2018).

Table 5.2.a Strengths and limitations of different data sources for systematic reviews

5.2.3 Correspondence with investigators

Review authors often find that they are unable to obtain all the information they seek from available reports about the details of the study design, the full range of outcomes measured and the numerical results. In such circumstances, authors are strongly encouraged to contact the original investigators (see MECIR Box 5.2.c ). Contact details of study authors, when not available from the study reports, often can be obtained from more recent publications, from university or institutional staff listings, from membership directories of professional societies, or by a general search of the web. If the contact author named in the study report cannot be contacted or does not respond, it is worthwhile attempting to contact other authors.

Review authors should consider the nature of the information they require and make their request accordingly. For descriptive information about the conduct of the trial, it may be most appropriate to ask open-ended questions (e.g. how was the allocation process conducted, or how were missing data handled?). If specific numerical data are required, it may be more helpful to request them specifically, possibly providing a short data collection form (either uncompleted or partially completed). If IPD are required, they should be specifically requested (see also Chapter 26 ). In some cases, study investigators may find it more convenient to provide IPD rather than conduct additional analyses to obtain the specific statistics requested.

MECIR Box 5.2.c Relevant expectations for conduct of intervention reviews

5.3 What data to collect

5.3.1 what are data.

For the purposes of this chapter, we define ‘data’ to be any information about (or derived from) a study, including details of methods, participants, setting, context, interventions, outcomes, results, publications, and investigators. Review authors should plan in advance what data will be required for their systematic review, and develop a strategy for obtaining them (see MECIR Box 5.3.a ). The involvement of consumers and other stakeholders can be helpful in ensuring that the categories of data collected are sufficiently aligned with the needs of review users ( Chapter 1, Section 1.3 ). The data to be sought should be described in the protocol, with consideration wherever possible of the issues raised in the rest of this chapter.

The data collected for a review should adequately describe the included studies, support the construction of tables and figures, facilitate the risk of bias assessment, and enable syntheses and meta-analyses. Review authors should familiarize themselves with reporting guidelines for systematic reviews (see online Chapter III and the PRISMA statement; (Liberati et al 2009) to ensure that relevant elements and sections are incorporated. The following sections review the types of information that should be sought, and these are summarized in Table 5.3.a (Li et al 2015).

MECIR Box 5.3.a Relevant expectations for conduct of intervention reviews

Table 5.3.a Checklist of items to consider in data collection

*Full description required for assessments of risk of bias (see Chapter 8 , Chapter 23 and Chapter 25 ).

5.3.2 Study methods and potential sources of bias

Different research methods can influence study outcomes by introducing different biases into results. Important study design characteristics should be collected to allow the selection of appropriate methods for assessment and analysis, and to enable description of the design of each included study in a table of ‘Characteristics of included studies’, including whether the study is randomized, whether the study has a cluster or crossover design, and the duration of the study. If the review includes non-randomized studies, appropriate features of the studies should be described (see Chapter 24 ).

Detailed information should be collected to facilitate assessment of the risk of bias in each included study. Risk-of-bias assessment should be conducted using the tool most appropriate for the design of each study, and the information required to complete the assessment will depend on the tool. Randomized studies should be assessed using the tool described in Chapter 8 . The tool covers bias arising from the randomization process, due to deviations from intended interventions, due to missing outcome data, in measurement of the outcome, and in selection of the reported result. For each item in the tool, a description of what happened in the study is required, which may include verbatim quotes from study reports. Information for assessment of bias due to missing outcome data and selection of the reported result may be most conveniently collected alongside information on outcomes and results. Chapter 7 (Section 7.3.1) discusses some issues in the collection of information for assessments of risk of bias. For non-randomized studies, the most appropriate tool is described in Chapter 25 . A separate tool also covers bias due to missing results in meta-analysis (see Chapter 13 ).

A particularly important piece of information is the funding source of the study and potential conflicts of interest of the study authors.

Some review authors will wish to collect additional information on study characteristics that bear on the quality of the study’s conduct but that may not lead directly to risk of bias, such as whether ethical approval was obtained and whether a sample size calculation was performed a priori.

5.3.3 Participants and setting

Details of participants are collected to enable an understanding of the comparability of, and differences between, the participants within and between included studies, and to allow assessment of how directly or completely the participants in the included studies reflect the original review question.

Typically, aspects that should be collected are those that could (or are believed to) affect presence or magnitude of an intervention effect and those that could help review users assess applicability to populations beyond the review. For example, if the review authors suspect important differences in intervention effect between different socio-economic groups, this information should be collected. If intervention effects are thought constant over such groups, and if such information would not be useful to help apply results, it should not be collected. Participant characteristics that are often useful for assessing applicability include age and sex. Summary information about these should always be collected unless they are not obvious from the context. These characteristics are likely to be presented in different formats (e.g. ages as means or medians, with standard deviations or ranges; sex as percentages or counts for the whole study or for each intervention group separately). Review authors should seek consistent quantities where possible, and decide whether it is more relevant to summarize characteristics for the study as a whole or by intervention group. It may not be possible to select the most consistent statistics until data collection is complete across all or most included studies. Other characteristics that are sometimes important include ethnicity, socio-demographic details (e.g. education level) and the presence of comorbid conditions. Clinical characteristics relevant to the review question (e.g. glucose level for reviews on diabetes) also are important for understanding the severity or stage of the disease.

Diagnostic criteria that were used to define the condition of interest can be a particularly important source of diversity across studies and should be collected. For example, in a review of drug therapy for congestive heart failure, it is important to know how the definition and severity of heart failure was determined in each study (e.g. systolic or diastolic dysfunction, severe systolic dysfunction with ejection fractions below 20%). Similarly, in a review of antihypertensive therapy, it is important to describe baseline levels of blood pressure of participants.

If the settings of studies may influence intervention effects or applicability, then information on these should be collected. Typical settings of healthcare intervention studies include acute care hospitals, emergency facilities, general practice, and extended care facilities such as nursing homes, offices, schools, and communities. Sometimes studies are conducted in different geographical regions with important differences that could affect delivery of an intervention and its outcomes, such as cultural characteristics, economic context, or rural versus city settings. Timing of the study may be associated with important technology differences or trends over time. If such information is important for the interpretation of the review, it should be collected.

Important characteristics of the participants in each included study should be summarized for the reader in the table of ‘Characteristics of included studies’.

5.3.4 Interventions

Details of all experimental and comparator interventions of relevance to the review should be collected. Again, details are required for aspects that could affect the presence or magnitude of an effect or that could help review users assess applicability to their own circumstances. Where feasible, information should be sought (and presented in the review) that is sufficient for replication of the interventions under study. This includes any co-interventions administered as part of the study, and applies similarly to comparators such as ‘usual care’. Review authors may need to request missing information from study authors.

The Template for Intervention Description and Replication (TIDieR) provides a comprehensive framework for full description of interventions and has been proposed for use in systematic reviews as well as reports of primary studies (Hoffmann et al 2014). The checklist includes descriptions of:

  • the rationale for the intervention and how it is expected to work;
  • any documentation that instructs the recipient on the intervention;
  • what the providers do to deliver the intervention (procedures and processes);
  • who provides the intervention (including their skill level), how (e.g. face to face, web-based) and in what setting (e.g. home, school, or hospital);
  • the timing and intensity;
  • whether any variation is permitted or expected, and whether modifications were actually made; and
  • any strategies used to ensure or assess fidelity or adherence to the intervention, and the extent to which the intervention was delivered as planned.

For clinical trials of pharmacological interventions, key information to collect will often include routes of delivery (e.g. oral or intravenous delivery), doses (e.g. amount or intensity of each treatment, frequency of delivery), timing (e.g. within 24 hours of diagnosis), and length of treatment. For other interventions, such as those that evaluate psychotherapy, behavioural and educational approaches, or healthcare delivery strategies, the amount of information required to characterize the intervention will typically be greater, including information about multiple elements of the intervention, who delivered it, and the format and timing of delivery. Chapter 17 provides further information on how to manage intervention complexity, and how the intervention Complexity Assessment Tool (iCAT) can facilitate data collection (Lewin et al 2017).

Important characteristics of the interventions in each included study should be summarized for the reader in the table of ‘Characteristics of included studies’. Additional tables or diagrams such as logic models ( Chapter 2, Section 2.5.1 ) can assist descriptions of multi-component interventions so that review users can better assess review applicability to their context.

5.3.4.1 Integrity of interventions

The degree to which specified procedures or components of the intervention are implemented as planned can have important consequences for the findings from a study. We describe this as intervention integrity ; related terms include adherence, compliance and fidelity (Carroll et al 2007). The verification of intervention integrity may be particularly important in reviews of non-pharmacological trials such as behavioural interventions and complex interventions, which are often implemented in conditions that present numerous obstacles to idealized delivery.

It is generally expected that reports of randomized trials provide detailed accounts of intervention implementation (Zwarenstein et al 2008, Moher et al 2010). In assessing whether interventions were implemented as planned, review authors should bear in mind that some interventions are standardized (with no deviations permitted in the intervention protocol), whereas others explicitly allow a degree of tailoring (Zwarenstein et al 2008). In addition, the growing field of implementation science has led to an increased awareness of the impact of setting and context on delivery of interventions (Damschroder et al 2009). (See Chapter 17, Section 17.1.2.1 for further information and discussion about how an intervention may be tailored to local conditions in order to preserve its integrity.)

Information about integrity can help determine whether unpromising results are due to a poorly conceptualized intervention or to an incomplete delivery of the prescribed components. It can also reveal important information about the feasibility of implementing a given intervention in real life settings. If it is difficult to achieve full implementation in practice, the intervention will have low feasibility (Dusenbury et al 2003).

Whether a lack of intervention integrity leads to a risk of bias in the estimate of its effect depends on whether review authors and users are interested in the effect of assignment to intervention or the effect of adhering to intervention, as discussed in more detail in Chapter 8, Section 8.2.2 . Assessment of deviations from intended interventions is important for assessing risk of bias in the latter, but not the former (see Chapter 8, Section 8.4 ), but both may be of interest to decision makers in different ways.

An example of a Cochrane Review evaluating intervention integrity is provided by a review of smoking cessation in pregnancy (Chamberlain et al 2017). The authors found that process evaluation of the intervention occurred in only some trials and that the implementation was less than ideal in others, including some of the largest trials. The review highlighted how the transfer of an intervention from one setting to another may reduce its effectiveness when elements are changed, or aspects of the materials are culturally inappropriate.

5.3.4.2 Process evaluations

Process evaluations seek to evaluate the process (and mechanisms) between the intervention’s intended implementation and the actual effect on the outcome (Moore et al 2015). Process evaluation studies are characterized by a flexible approach to data collection and the use of numerous methods to generate a range of different types of data, encompassing both quantitative and qualitative methods. Guidance for including process evaluations in systematic reviews is provided in Chapter 21 . When it is considered important, review authors should aim to collect information on whether the trial accounted for, or measured, key process factors and whether the trials that thoroughly addressed integrity showed a greater impact. Process evaluations can be a useful source of factors that potentially influence the effectiveness of an intervention.

5.3.5 Outcome s

An outcome is an event or a measurement value observed or recorded for a particular person or intervention unit in a study during or following an intervention, and that is used to assess the efficacy and safety of the studied intervention (Meinert 2012). Review authors should indicate in advance whether they plan to collect information about all outcomes measured in a study or only those outcomes of (pre-specified) interest in the review. Research has shown that trials addressing the same condition and intervention seldom agree on which outcomes are the most important, and consequently report on numerous different outcomes (Dwan et al 2014, Ismail et al 2014, Denniston et al 2015, Saldanha et al 2017a). The selection of outcomes across systematic reviews of the same condition is also inconsistent (Page et al 2014, Saldanha et al 2014, Saldanha et al 2016, Liu et al 2017). Outcomes used in trials and in systematic reviews of the same condition have limited overlap (Saldanha et al 2017a, Saldanha et al 2017b).

We recommend that only the outcomes defined in the protocol be described in detail. However, a complete list of the names of all outcomes measured may allow a more detailed assessment of the risk of bias due to missing outcome data (see Chapter 13 ).

Review authors should collect all five elements of an outcome (Zarin et al 2011, Saldanha et al 2014):

1. outcome domain or title (e.g. anxiety);

2. measurement tool or instrument (including definition of clinical outcomes or endpoints); for a scale, name of the scale (e.g. the Hamilton Anxiety Rating Scale), upper and lower limits, and whether a high or low score is favourable, definitions of any thresholds if appropriate;

3. specific metric used to characterize each participant’s results (e.g. post-intervention anxiety, or change in anxiety from baseline to a post-intervention time point, or post-intervention presence of anxiety (yes/no));

4. method of aggregation (e.g. mean and standard deviation of anxiety scores in each group, or proportion of people with anxiety);

5. timing of outcome measurements (e.g. assessments at end of eight-week intervention period, events occurring during eight-week intervention period).

Further considerations for economics outcomes are discussed in Chapter 20 , and for patient-reported outcomes in Chapter 18 .

5.3.5.1 Adverse effects

Collection of information about the harmful effects of an intervention can pose particular difficulties, discussed in detail in Chapter 19 . These outcomes may be described using multiple terms, including ‘adverse event’, ‘adverse effect’, ‘adverse drug reaction’, ‘side effect’ and ‘complication’. Many of these terminologies are used interchangeably in the literature, although some are technically different. Harms might additionally be interpreted to include undesirable changes in other outcomes measured during a study, such as a decrease in quality of life where an improvement may have been anticipated.

In clinical trials, adverse events can be collected either systematically or non-systematically. Systematic collection refers to collecting adverse events in the same manner for each participant using defined methods such as a questionnaire or a laboratory test. For systematically collected outcomes representing harm, data can be collected by review authors in the same way as efficacy outcomes (see Section 5.3.5 ).

Non-systematic collection refers to collection of information on adverse events using methods such as open-ended questions (e.g. ‘Have you noticed any symptoms since your last visit?’), or reported by participants spontaneously. In either case, adverse events may be selectively reported based on their severity, and whether the participant suspected that the effect may have been caused by the intervention, which could lead to bias in the available data. Unfortunately, most adverse events are collected non-systematically rather than systematically, creating a challenge for review authors. The following pieces of information are useful and worth collecting (Nicole Fusco, personal communication):

  • any coding system or standard medical terminology used (e.g. COSTART, MedDRA), including version number;
  • name of the adverse events (e.g. dizziness);
  • reported intensity of the adverse event (e.g. mild, moderate, severe);
  • whether the trial investigators categorized the adverse event as ‘serious’;
  • whether the trial investigators identified the adverse event as being related to the intervention;
  • time point (most commonly measured as a count over the duration of the study);
  • any reported methods for how adverse events were selected for inclusion in the publication (e.g. ‘We reported all adverse events that occurred in at least 5% of participants’); and
  • associated results.

Different collection methods lead to very different accounting of adverse events (Safer 2002, Bent et al 2006, Ioannidis et al 2006, Carvajal et al 2011, Allen et al 2013). Non-systematic collection methods tend to underestimate how frequently an adverse event occurs. It is particularly problematic when the adverse event of interest to the review is collected systematically in some studies but non-systematically in other studies. Different collection methods introduce an important source of heterogeneity. In addition, when non-systematic adverse events are reported based on quantitative selection criteria (e.g. only adverse events that occurred in at least 5% of participants were included in the publication), use of reported data alone may bias the results of meta-analyses. Review authors should be cautious of (or refrain from) synthesizing adverse events that are collected differently.

Regardless of the collection methods, precise definitions of adverse effect outcomes and their intensity should be recorded, since they may vary between studies. For example, in a review of aspirin and gastrointestinal haemorrhage, some trials simply reported gastrointestinal bleeds, while others reported specific categories of bleeding, such as haematemesis, melaena, and proctorrhagia (Derry and Loke 2000). The definition and reporting of severity of the haemorrhages (e.g. major, severe, requiring hospital admission) also varied considerably among the trials (Zanchetti and Hansson 1999). Moreover, a particular adverse effect may be described or measured in different ways among the studies. For example, the terms ‘tiredness’, ‘fatigue’ or ‘lethargy’ may all be used in reporting of adverse effects. Study authors also may use different thresholds for ‘abnormal’ results (e.g. hypokalaemia diagnosed at a serum potassium concentration of 3.0 mmol/L or 3.5 mmol/L).

No mention of adverse events in trial reports does not necessarily mean that no adverse events occurred. It is usually safest to assume that they were not reported. Quality of life measures are sometimes used as a measure of the participants’ experience during the study, but these are usually general measures that do not look specifically at particular adverse effects of the intervention. While quality of life measures are important and can be used to gauge overall participant well-being, they should not be regarded as substitutes for a detailed evaluation of safety and tolerability.

5.3.6 Results

Results data arise from the measurement or ascertainment of outcomes for individual participants in an intervention study. Results data may be available for each individual in a study (i.e. individual participant data; see Chapter 26 ), or summarized at arm level, or summarized at study level into an intervention effect by comparing two intervention arms. Results data should be collected only for the intervention groups and outcomes specified to be of interest in the protocol (see MECIR Box 5.3.b ). Results for other outcomes should not be collected unless the protocol is modified to add them. Any modification should be reported in the review. However, review authors should be alert to the possibility of important, unexpected findings, particularly serious adverse effects.

MECIR Box 5.3.b Relevant expectations for conduct of intervention reviews

Reports of studies often include several results for the same outcome. For example, different measurement scales might be used, results may be presented separately for different subgroups, and outcomes may have been measured at different follow-up time points. Variation in the results can be very large, depending on which data are selected (Gøtzsche et al 2007, Mayo-Wilson et al 2017a). Review protocols should be as specific as possible about which outcome domains, measurement tools, time points, and summary statistics (e.g. final values versus change from baseline) are to be collected (Mayo-Wilson et al 2017b). A framework should be pre-specified in the protocol to facilitate making choices between multiple eligible measures or results. For example, a hierarchy of preferred measures might be created, or plans articulated to select the result with the median effect size, or to average across all eligible results for a particular outcome domain (see also Chapter 9, Section 9.3.3 ). Any additional decisions or changes to this framework made once the data are collected should be reported in the review as changes to the protocol.

Section 5.6 describes the numbers that will be required to perform meta-analysis, if appropriate. The unit of analysis (e.g. participant, cluster, body part, treatment period) should be recorded for each result when it is not obvious (see Chapter 6, Section 6.2 ). The type of outcome data determines the nature of the numbers that will be sought for each outcome. For example, for a dichotomous (‘yes’ or ‘no’) outcome, the number of participants and the number who experienced the outcome will be sought for each group. It is important to collect the sample size relevant to each result, although this is not always obvious. A flow diagram as recommended in the CONSORT Statement (Moher et al 2001) can help to determine the flow of participants through a study. If one is not available in a published report, review authors can consider drawing one (available from www.consort-statement.org ).

The numbers required for meta-analysis are not always available. Often, other statistics can be collected and converted into the required format. For example, for a continuous outcome, it is usually most convenient to seek the number of participants, the mean and the standard deviation for each intervention group. These are often not available directly, especially the standard deviation. Alternative statistics enable calculation or estimation of the missing standard deviation (such as a standard error, a confidence interval, a test statistic (e.g. from a t-test or F-test) or a P value). These should be extracted if they provide potentially useful information (see MECIR Box 5.3.c ). Details of recalculation are provided in Section 5.6 . Further considerations for dealing with missing data are discussed in Chapter 10, Section 10.12 .

MECIR Box 5.3.c Relevant expectations for conduct of intervention reviews

5.3.7 Other information to collect

We recommend that review authors collect the key conclusions of the included study as reported by its authors. It is not necessary to report these conclusions in the review, but they should be used to verify the results of analyses undertaken by the review authors, particularly in relation to the direction of effect. Further comments by the study authors, for example any explanations they provide for unexpected findings, may be noted. References to other studies that are cited in the study report may be useful, although review authors should be aware of the possibility of citation bias (see Chapter 7, Section 7.2.3.2 ). Documentation of any correspondence with the study authors is important for review transparency.

5.4 Data collection tools

5.4.1 rationale for data collection forms.

Data collection for systematic reviews should be performed using structured data collection forms (see MECIR Box 5.4.a ). These can be paper forms, electronic forms (e.g. Google Form), or commercially or custom-built data systems (e.g. Covidence, EPPI-Reviewer, Systematic Review Data Repository (SRDR)) that allow online form building, data entry by several users, data sharing, and efficient data management (Li et al 2015). All different means of data collection require data collection forms.

MECIR Box 5.4.a Relevant expectations for conduct of intervention reviews

The data collection form is a bridge between what is reported by the original investigators (e.g. in journal articles, abstracts, personal correspondence) and what is ultimately reported by the review authors. The data collection form serves several important functions (Meade and Richardson 1997). First, the form is linked directly to the review question and criteria for assessing eligibility of studies, and provides a clear summary of these that can be used to identify and structure the data to be extracted from study reports. Second, the data collection form is the historical record of the provenance of the data used in the review, as well as the multitude of decisions (and changes to decisions) that occur throughout the review process. Third, the form is the source of data for inclusion in an analysis.

Given the important functions of data collection forms, ample time and thought should be invested in their design. Because each review is different, data collection forms will vary across reviews. However, there are many similarities in the types of information that are important. Thus, forms can be adapted from one review to the next. Although we use the term ‘data collection form’ in the singular, in practice it may be a series of forms used for different purposes: for example, a separate form could be used to assess the eligibility of studies for inclusion in the review to assist in the quick identification of studies to be excluded from or included in the review.

5.4.2 Considerations in selecting data collection tools

The choice of data collection tool is largely dependent on review authors’ preferences, the size of the review, and resources available to the author team. Potential advantages and considerations of selecting one data collection tool over another are outlined in Table 5.4.a (Li et al 2015). A significant advantage that data systems have is in data management ( Chapter 1, Section 1.6 ) and re-use. They make review updates more efficient, and also facilitate methodological research across reviews. Numerous ‘meta-epidemiological’ studies have been carried out using Cochrane Review data, resulting in methodological advances which would not have been possible if thousands of studies had not all been described using the same data structures in the same system.

Some data collection tools facilitate automatic imports of extracted data into RevMan (Cochrane’s authoring tool), such as CSV (Excel) and Covidence. Details available here https://documentation.cochrane.org/revman-kb/populate-study-data-260702462.html

Table 5.4.a Considerations in selecting data collection tools

5.4.3 Design of a data collection form

Regardless of whether data are collected using a paper or electronic form, or a data system, the key to successful data collection is to construct easy-to-use forms and collect sufficient and unambiguous data that faithfully represent the source in a structured and organized manner (Li et al 2015). In most cases, a document format should be developed for the form before building an electronic form or a data system. This can be distributed to others, including programmers and data analysts, and as a guide for creating an electronic form and any guidance or codebook to be used by data extractors. Review authors also should consider compatibility of any electronic form or data system with analytical software, as well as mechanisms for recording, assessing and correcting data entry errors.

Data described in multiple reports (or even within a single report) of a study may not be consistent. Review authors will need to describe how they work with multiple reports in the protocol, for example, by pre-specifying which report will be used when sources contain conflicting data that cannot be resolved by contacting the investigators. Likewise, when there is only one report identified for a study, review authors should specify the section within the report (e.g. abstract, methods, results, tables, and figures) for use in case of inconsistent information.

If review authors wish to automatically import their extracted data into RevMan, it is advised that their data collection forms match the data extraction templates available via the RevMan Knowledge Base. Details available here https://documentation.cochrane.org/revman-kb/data-extraction-templates-260702375.html.

A good data collection form should minimize the need to go back to the source documents. When designing a data collection form, review authors should involve all members of the team, that is, content area experts, authors with experience in systematic review methods and data collection form design, statisticians, and persons who will perform data extraction. Here are suggested steps and some tips for designing a data collection form, based on the informal collation of experiences from numerous review authors (Li et al 2015).

Step 1. Develop outlines of tables and figures expected to appear in the systematic review, considering the comparisons to be made between different interventions within the review, and the various outcomes to be measured. This step will help review authors decide the right amount of data to collect (not too much or too little). Collecting too much information can lead to forms that are longer than original study reports, and can be very wasteful of time. Collection of too little information, or omission of key data, can lead to the need to return to study reports later in the review process.

Step 2. Assemble and group data elements to facilitate form development. Review authors should consult Table 5.3.a , in which the data elements are grouped to facilitate form development and data collection. Note that it may be more efficient to group data elements in the order in which they are usually found in study reports (e.g. starting with reference information, followed by eligibility criteria, intervention description, statistical methods, baseline characteristics and results).

Step 3. Identify the optimal way of framing the data items. Much has been written about how to frame data items for developing robust data collection forms in primary research studies. We summarize a few key points and highlight issues that are pertinent to systematic reviews.

  • Ask closed-ended questions (i.e. questions that define a list of permissible responses) as much as possible. Closed-ended questions do not require post hoc coding and provide better control over data quality than open-ended questions. When setting up a closed-ended question, one must anticipate and structure possible responses and include an ‘other, specify’ category because the anticipated list may not be exhaustive. Avoid asking data extractors to summarize data into uncoded text, no matter how short it is.
  • Avoid asking a question in a way that the response may be left blank. Include ‘not applicable’, ‘not reported’ and ‘cannot tell’ options as needed. The ‘cannot tell’ option tags uncertain items that may promote review authors to contact study authors for clarification, especially on data items critical to reach conclusions.
  • Remember that the form will focus on what is reported in the article rather what has been done in the study. The study report may not fully reflect how the study was actually conducted. For example, a question ‘Did the article report that the participants were masked to the intervention?’ is more appropriate than ‘Were participants masked to the intervention?’
  • Where a judgement is required, record the raw data (i.e. quote directly from the source document) used to make the judgement. It is also important to record the source of information collected, including where it was found in a report or whether information was obtained from unpublished sources or personal communications. As much as possible, questions should be asked in a way that minimizes subjective interpretation and judgement to facilitate data comparison and adjudication.
  • Incorporate flexibility to allow for variation in how data are reported. It is strongly recommended that outcome data be collected in the format in which they were reported and transformed in a subsequent step if required. Review authors also should consider the software they will use for analysis and for publishing the review (e.g. RevMan).

Step 4. Develop and pilot-test data collection forms, ensuring that they provide data in the right format and structure for subsequent analysis. In addition to data items described in Step 2, data collection forms should record the title of the review as well as the person who is completing the form and the date of completion. Forms occasionally need revision; forms should therefore include the version number and version date to reduce the chances of using an outdated form by mistake. Because a study may be associated with multiple reports, it is important to record the study ID as well as the report ID. Definitions and instructions helpful for answering a question should appear next to the question to improve quality and consistency across data extractors (Stock 1994). Provide space for notes, regardless of whether paper or electronic forms are used.

All data collection forms and data systems should be thoroughly pilot-tested before launch (see MECIR Box 5.4.a ). Testing should involve several people extracting data from at least a few articles. The initial testing focuses on the clarity and completeness of questions. Users of the form may provide feedback that certain coding instructions are confusing or incomplete (e.g. a list of options may not cover all situations). The testing may identify data that are missing from the form, or likely to be superfluous. After initial testing, accuracy of the extracted data should be checked against the source document or verified data to identify problematic areas. It is wise to draft entries for the table of ‘Characteristics of included studies’ and complete a risk of bias assessment ( Chapter 8 ) using these pilot reports to ensure all necessary information is collected. A consensus between review authors may be required before the form is modified to avoid any misunderstandings or later disagreements. It may be necessary to repeat the pilot testing on a new set of reports if major changes are needed after the first pilot test.

Problems with the data collection form may surface after pilot testing has been completed, and the form may need to be revised after data extraction has started. When changes are made to the form or coding instructions, it may be necessary to return to reports that have already undergone data extraction. In some situations, it may be necessary to clarify only coding instructions without modifying the actual data collection form.

5.5 Extracting data from reports

5.5.1 introduction.

In most systematic reviews, the primary source of information about each study is published reports of studies, usually in the form of journal articles. Despite recent developments in machine learning models to automate data extraction in systematic reviews (see Section 5.5.9 ), data extraction is still largely a manual process. Electronic searches for text can provide a useful aid to locating information within a report. Examples include using search facilities in PDF viewers, internet browsers and word processing software. However, text searching should not be considered a replacement for reading the report, since information may be presented using variable terminology and presented in multiple formats.

5.5.2 Who should extract data?

Data extractors should have at least a basic understanding of the topic, and have knowledge of study design, data analysis and statistics. They should pay attention to detail while following instructions on the forms. Because errors that occur at the data extraction stage are rarely detected by peer reviewers, editors, or users of systematic reviews, it is recommended that more than one person extract data from every report to minimize errors and reduce introduction of potential biases by review authors (see MECIR Box 5.5.a ). As a minimum, information that involves subjective interpretation and information that is critical to the interpretation of results (e.g. outcome data) should be extracted independently by at least two people (see MECIR Box 5.5.a ). In common with implementation of the selection process ( Chapter 4, Section 4.6 ), it is preferable that data extractors are from complementary disciplines, for example a methodologist and a topic area specialist. It is important that everyone involved in data extraction has practice using the form and, if the form was designed by someone else, receives appropriate training.

Evidence in support of duplicate data extraction comes from several indirect sources. One study observed that independent data extraction by two authors resulted in fewer errors than data extraction by a single author followed by verification by a second (Buscemi et al 2006). A high prevalence of data extraction errors (errors in 20 out of 34 reviews) has been observed (Jones et al 2005). A further study of data extraction to compute standardized mean differences found that a minimum of seven out of 27 reviews had substantial errors (Gøtzsche et al 2007).

MECIR Box 5.5.a Relevant expectations for conduct of intervention reviews

5.5.3 Training data extractors

Training of data extractors is intended to familiarize them with the review topic and methods, the data collection form or data system, and issues that may arise during data extraction. Results of the pilot testing of the form should prompt discussion among review authors and extractors of ambiguous questions or responses to establish consistency. Training should take place at the onset of the data extraction process and periodically over the course of the project (Li et al 2015). For example, when data related to a single item on the form are present in multiple locations within a report (e.g. abstract, main body of text, tables, and figures) or in several sources (e.g. publications, ClinicalTrials.gov, or CSRs), the development and documentation of instructions to follow an agreed algorithm are critical and should be reinforced during the training sessions.

Some have proposed that some information in a report, such as its authors, be blinded to the review author prior to data extraction and assessment of risk of bias (Jadad et al 1996). However, blinding of review authors to aspects of study reports generally is not recommended for Cochrane Reviews as there is little evidence that it alters the decisions made (Berlin 1997).

5.5.4 Extracting data from multiple reports of the same study

Studies frequently are reported in more than one publication or in more than one source (Tramèr et al 1997, von Elm et al 2004). A single source rarely provides complete information about a study; on the other hand, multiple sources may contain conflicting information about the same study (Mayo-Wilson et al 2017a, Mayo-Wilson et al 2017b, Mayo-Wilson et al 2018). Because the unit of interest in a systematic review is the study and not the report, information from multiple reports often needs to be collated and reconciled. It is not appropriate to discard any report of an included study without careful examination, since it may contain valuable information not included in the primary report. Review authors will need to decide between two strategies:

  • Extract data from each report separately, then combine information across multiple data collection forms.
  • Extract data from all reports directly into a single data collection form.

The choice of which strategy to use will depend on the nature of the reports and may vary across studies and across reports. For example, when a full journal article and multiple conference abstracts are available, it is likely that the majority of information will be obtained from the journal article; completing a new data collection form for each conference abstract may be a waste of time. Conversely, when there are two or more detailed journal articles, perhaps relating to different periods of follow-up, then it is likely to be easier to perform data extraction separately for these articles and collate information from the data collection forms afterwards. When data from all reports are extracted into a single data collection form, review authors should identify the ‘main’ data source for each study when sources include conflicting data and these differences cannot be resolved by contacting authors (Mayo-Wilson et al 2018). Flow diagrams such as those modified from the PRISMA statement can be particularly helpful when collating and documenting information from multiple reports (Mayo-Wilson et al 2018).

5.5.5 Reliability and reaching consensus

When more than one author extracts data from the same reports, there is potential for disagreement. After data have been extracted independently by two or more extractors, responses must be compared to assure agreement or to identify discrepancies. An explicit procedure or decision rule should be specified in the protocol for identifying and resolving disagreements. Most often, the source of the disagreement is an error by one of the extractors and is easily resolved. Thus, discussion among the authors is a sensible first step. More rarely, a disagreement may require arbitration by another person. Any disagreement that cannot be resolved should be addressed by contacting the study authors; if this is unsuccessful, the disagreement should be reported in the review.

The presence and resolution of disagreements should be carefully recorded. Maintaining a copy of the data ‘as extracted’ (in addition to the consensus data) allows assessment of reliability of coding. Examples of ways in which this can be achieved include the following:

  • Use one author’s (paper) data collection form and record changes after consensus in a different ink colour.
  • Enter consensus data onto an electronic form.
  • Record original data extracted and consensus data in separate forms (some online tools do this automatically).

Agreement of coded items before reaching consensus can be quantified, for example using kappa statistics (Orwin 1994), although this is not routinely done in Cochrane Reviews. If agreement is assessed, this should be done only for the most important data (e.g. key risk of bias assessments, or availability of key outcomes).

Throughout the review process informal consideration should be given to the reliability of data extraction. For example, if after reaching consensus on the first few studies, the authors note a frequent disagreement for specific data, then coding instructions may need modification. Furthermore, an author’s coding strategy may change over time, as the coding rules are forgotten, indicating a need for retraining and, possibly, some recoding.

5.5.6 Extracting data from clinical study reports

Clinical study reports (CSRs) obtained for a systematic review are likely to be in PDF format. Although CSRs can be thousands of pages in length and very time-consuming to review, they typically follow the content and format required by the International Conference on Harmonisation (ICH 1995). Information in CSRs is usually presented in a structured and logical way. For example, numerical data pertaining to important demographic, efficacy, and safety variables are placed within the main text in tables and figures. Because of the clarity and completeness of information provided in CSRs, data extraction from CSRs may be clearer and conducted more confidently than from journal articles or other short reports.

To extract data from CSRs efficiently, review authors should familiarize themselves with the structure of the CSRs. In practice, review authors may want to browse or create ‘bookmarks’ within a PDF document that record section headers and subheaders and search key words related to the data extraction (e.g. randomization). In addition, it may be useful to utilize optical character recognition software to convert tables of data in the PDF to an analysable format when additional analyses are required, saving time and minimizing transcription errors.

CSRs may contain many outcomes and present many results for a single outcome (due to different analyses) (Mayo-Wilson et al 2017b). We recommend review authors extract results only for outcomes of interest to the review (Section 5.3.6 ). With regard to different methods of analysis, review authors should have a plan and pre-specify preferred metrics in their protocol for extracting results pertaining to different populations (e.g. ‘all randomized’, ‘all participants taking at least one dose of medication’), methods for handling missing data (e.g. ‘complete case analysis’, ‘multiple imputation’), and adjustment (e.g. unadjusted, adjusted for baseline covariates). It may be important to record the range of analysis options available, even if not all are extracted in detail. In some cases it may be preferable to use metrics that are comparable across multiple included studies, which may not be clear until data collection for all studies is complete.

CSRs are particularly useful for identifying outcomes assessed but not presented to the public. For efficacy outcomes and systematically collected adverse events, review authors can compare what is described in the CSRs with what is reported in published reports to assess the risk of bias due to missing outcome data ( Chapter 8, Section 8.5 ) and in selection of reported result ( Chapter 8, Section 8.7 ). Note that non-systematically collected adverse events are not amenable to such comparisons because these adverse events may not be known ahead of time and thus not pre-specified in the protocol.

5.5.7 Extracting data from regulatory reviews

Data most relevant to systematic reviews can be found in the medical and statistical review sections of a regulatory review. Both of these are substantially longer than journal articles (Turner 2013). A list of all trials on a drug usually can be found in the medical review. Because trials are referenced by a combination of numbers and letters, it may be difficult for the review authors to link the trial with other reports of the same trial (Section 5.2.1 ).

Many of the documents downloaded from the US Food and Drug Administration’s website for older drugs are scanned copies and are not searchable because of redaction of confidential information (Turner 2013). Optical character recognition software can convert most of the text. Reviews for newer drugs have been redacted electronically; documents remain searchable as a result.

Compared to CSRs, regulatory reviews contain less information about trial design, execution, and results. They provide limited information for assessing the risk of bias. In terms of extracting outcomes and results, review authors should follow the guidance provided for CSRs (Section 5.5.6 ).

5.5.8 Extracting data from figures with software

Sometimes numerical data needed for systematic reviews are only presented in figures. Review authors may request the data from the study investigators, or alternatively, extract the data from the figures either manually (e.g. with a ruler) or by using software. Numerous tools are available, many of which are free. Those available at the time of writing include tools called Plot Digitizer, WebPlotDigitizer, Engauge, Dexter, ycasd, GetData Graph Digitizer. The software works by taking an image of a figure and then digitizing the data points off the figure using the axes and scales set by the users. The numbers exported can be used for systematic reviews, although additional calculations may be needed to obtain the summary statistics, such as calculation of means and standard deviations from individual-level data points (or conversion of time-to-event data presented on Kaplan-Meier plots to hazard ratios; see Chapter 6, Section 6.8.2 ).

It has been demonstrated that software is more convenient and accurate than visual estimation or use of a ruler (Gross et al 2014, Jelicic Kadic et al 2016). Review authors should consider using software for extracting numerical data from figures when the data are not available elsewhere.

5.5.9 Automating data extraction in systematic reviews

Because data extraction is time-consuming and error-prone, automating or semi-automating this step may make the extraction process more efficient and accurate. The state of science relevant to automating data extraction is summarized here (Jonnalagadda et al 2015).

  • At least 26 studies have tested various natural language processing and machine learning approaches for facilitating data extraction for systematic reviews.

· Each tool focuses on only a limited number of data elements (ranges from one to seven). Most of the existing tools focus on the PICO information (e.g. number of participants, their age, sex, country, recruiting centres, intervention groups, outcomes, and time points). A few are able to extract study design and results (e.g. objectives, study duration, participant flow), and two extract risk of bias information (Marshall et al 2016, Millard et al 2016). To date, well over half of the data elements needed for systematic reviews have not been explored for automated extraction.

  • Most tools highlight the sentence(s) that may contain the data elements as opposed to directly recording these data elements into a data collection form or a data system.
  • There is no gold standard or common dataset to evaluate the performance of these tools, limiting our ability to interpret the significance of the reported accuracy measures.

At the time of writing, we cannot recommend a specific tool for automating data extraction for routine systematic review production. There is a need for review authors to work with experts in informatics to refine these tools and evaluate them rigorously. Such investigations should address how the tool will fit into existing workflows. For example, the automated or semi-automated data extraction approaches may first act as checks for manual data extraction before they can replace it.

5.5.10 Suspicions of scientific misconduct

Systematic review authors can uncover suspected misconduct in the published literature. Misconduct includes fabrication or falsification of data or results, plagiarism, and research that does not adhere to ethical norms. Review authors need to be aware of scientific misconduct because the inclusion of fraudulent material could undermine the reliability of a review’s findings. Plagiarism of results data in the form of duplicated publication (either by the same or by different authors) may, if undetected, lead to study participants being double counted in a synthesis.

It is preferable to identify potential problems before, rather than after, publication of the systematic review, so that readers are not misled. However, empirical evidence indicates that the extent to which systematic review authors explore misconduct varies widely (Elia et al 2016). Text-matching software and systems such as CrossCheck may be helpful for detecting plagiarism, but they can detect only matching text, so data tables or figures need to be inspected by hand or using other systems (e.g. to detect image manipulation). Lists of data such as in a meta-analysis can be a useful means of detecting duplicated studies. Furthermore, examination of baseline data can lead to suspicions of misconduct for an individual randomized trial (Carlisle et al 2015). For example, Al-Marzouki and colleagues concluded that a trial report was fabricated or falsified on the basis of highly unlikely baseline differences between two randomized groups (Al-Marzouki et al 2005).

Cochrane Review authors are advised to consult with Cochrane editors if cases of suspected misconduct are identified. Searching for comments, letters or retractions may uncover additional information. Sensitivity analyses can be used to determine whether the studies arousing suspicion are influential in the conclusions of the review. Guidance for editors for addressing suspected misconduct will be available from Cochrane’s Editorial Publishing and Policy Resource (see community.cochrane.org ). Further information is available from the Committee on Publication Ethics (COPE; publicationethics.org ), including a series of flowcharts on how to proceed if various types of misconduct are suspected. Cases should be followed up, typically including an approach to the editors of the journals in which suspect reports were published. It may be useful to write first to the primary investigators to request clarification of apparent inconsistencies or unusual observations.

Because investigations may take time, and institutions may not always be responsive (Wager 2011), articles suspected of being fraudulent should be classified as ‘awaiting assessment’. If a misconduct investigation indicates that the publication is unreliable, or if a publication is retracted, it should not be included in the systematic review, and the reason should be noted in the ‘excluded studies’ section.

5.5.11 Key points in planning and reporting data extraction

In summary, the methods section of both the protocol and the review should detail:

  • the data categories that are to be extracted;
  • how extracted data from each report will be verified (e.g. extraction by two review authors, independently);
  • whether data extraction is undertaken by content area experts, methodologists, or both;
  • pilot testing, training and existence of coding instructions for the data collection form;
  • how data are extracted from multiple reports from the same study; and
  • how disagreements are handled when more than one author extracts data from each report.

5.6 Extracting study results and converting to the desired format

In most cases, it is desirable to collect summary data separately for each intervention group of interest and to enter these into software in which effect estimates can be calculated, such as RevMan. Sometimes the required data may be obtained only indirectly, and the relevant results may not be obvious. Chapter 6 provides many useful tips and techniques to deal with common situations. When summary data cannot be obtained from each intervention group, or where it is important to use results of adjusted analyses (for example to account for correlations in crossover or cluster-randomized trials) effect estimates may be available directly.

5.7 Managing and sharing data

When data have been collected for each individual study, it is helpful to organize them into a comprehensive electronic format, such as a database or spreadsheet, before entering data into a meta-analysis or other synthesis. When data are collated electronically, all or a subset of them can easily be exported for cleaning, consistency checks and analysis.

Tabulation of collected information about studies can facilitate classification of studies into appropriate comparisons and subgroups. It also allows identification of comparable outcome measures and statistics across studies. It will often be necessary to perform calculations to obtain the required statistics for presentation or synthesis. It is important through this process to retain clear information on the provenance of the data, with a clear distinction between data from a source document and data obtained through calculations. Statistical conversions, for example from standard errors to standard deviations, ideally should be undertaken with a computer rather than using a hand calculator to maintain a permanent record of the original and calculated numbers as well as the actual calculations used.

Ideally, data only need to be extracted once and should be stored in a secure and stable location for future updates of the review, regardless of whether the original review authors or a different group of authors update the review (Ip et al 2012). Standardizing and sharing data collection tools as well as data management systems among review authors working in similar topic areas can streamline systematic review production. Review authors have the opportunity to work with trialists, journal editors, funders, regulators, and other stakeholders to make study data (e.g. CSRs, IPD, and any other form of study data) publicly available, increasing the transparency of research. When legal and ethical to do so, we encourage review authors to share the data used in their systematic reviews to reduce waste and to allow verification and reanalysis because data will not have to be extracted again for future use (Mayo-Wilson et al 2018).

5.8 Chapter information

Editors: Tianjing Li, Julian PT Higgins, Jonathan J Deeks

Acknowledgements: This chapter builds on earlier versions of the Handbook . For details of previous authors and editors of the Handbook , see Preface. Andrew Herxheimer, Nicki Jackson, Yoon Loke, Deirdre Price and Helen Thomas contributed text. Stephanie Taylor and Sonja Hood contributed suggestions for designing data collection forms. We are grateful to Judith Anzures, Mike Clarke, Miranda Cumpston and Peter Gøtzsche for helpful comments.

Funding: JPTH is a member of the National Institute for Health Research (NIHR) Biomedical Research Centre at University Hospitals Bristol NHS Foundation Trust and the University of Bristol. JJD received support from the NIHR Birmingham Biomedical Research Centre at the University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham. JPTH received funding from National Institute for Health Research Senior Investigator award NF-SI-0617-10145. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

5.9 References

Al-Marzouki S, Evans S, Marshall T, Roberts I. Are these data real? Statistical methods for the detection of data fabrication in clinical trials. BMJ 2005; 331 : 267-270.

Allen EN, Mushi AK, Massawe IS, Vestergaard LS, Lemnge M, Staedke SG, Mehta U, Barnes KI, Chandler CI. How experiences become data: the process of eliciting adverse event, medical history and concomitant medication reports in antimalarial and antiretroviral interaction trials. BMC Medical Research Methodology 2013; 13 : 140.

Baudard M, Yavchitz A, Ravaud P, Perrodeau E, Boutron I. Impact of searching clinical trial registries in systematic reviews of pharmaceutical treatments: methodological systematic review and reanalysis of meta-analyses. BMJ 2017; 356 : j448.

Bent S, Padula A, Avins AL. Better ways to question patients about adverse medical events: a randomized, controlled trial. Annals of Internal Medicine 2006; 144 : 257-261.

Berlin JA. Does blinding of readers affect the results of meta-analyses? University of Pennsylvania Meta-analysis Blinding Study Group. Lancet 1997; 350 : 185-186.

Buscemi N, Hartling L, Vandermeer B, Tjosvold L, Klassen TP. Single data extraction generated more errors than double data extraction in systematic reviews. Journal of Clinical Epidemiology 2006; 59 : 697-703.

Carlisle JB, Dexter F, Pandit JJ, Shafer SL, Yentis SM. Calculating the probability of random sampling for continuous variables in submitted or published randomised controlled trials. Anaesthesia 2015; 70 : 848-858.

Carroll C, Patterson M, Wood S, Booth A, Rick J, Balain S. A conceptual framework for implementation fidelity. Implementation Science 2007; 2 : 40.

Carvajal A, Ortega PG, Sainz M, Velasco V, Salado I, Arias LHM, Eiros JM, Rubio AP, Castrodeza J. Adverse events associated with pandemic influenza vaccines: Comparison of the results of a follow-up study with those coming from spontaneous reporting. Vaccine 2011; 29 : 519-522.

Chamberlain C, O'Mara-Eves A, Porter J, Coleman T, Perlen SM, Thomas J, McKenzie JE. Psychosocial interventions for supporting women to stop smoking in pregnancy. Cochrane Database of Systematic Reviews 2017; 2 : CD001055.

Damschroder LJ, Aron DC, Keith RE, Kirsh SR, Alexander JA, Lowery JC. Fostering implementation of health services research findings into practice: a consolidated framework for advancing implementation science. Implementation Science 2009; 4 : 50.

Davis AL, Miller JD. The European Medicines Agency and publication of clinical study reports: a challenge for the US FDA. JAMA 2017; 317 : 905-906.

Denniston AK, Holland GN, Kidess A, Nussenblatt RB, Okada AA, Rosenbaum JT, Dick AD. Heterogeneity of primary outcome measures used in clinical trials of treatments for intermediate, posterior, and panuveitis. Orphanet Journal of Rare Diseases 2015; 10 : 97.

Derry S, Loke YK. Risk of gastrointestinal haemorrhage with long term use of aspirin: meta-analysis. BMJ 2000; 321 : 1183-1187.

Doshi P, Dickersin K, Healy D, Vedula SS, Jefferson T. Restoring invisible and abandoned trials: a call for people to publish the findings. BMJ 2013; 346 : f2865.

Dusenbury L, Brannigan R, Falco M, Hansen WB. A review of research on fidelity of implementation: implications for drug abuse prevention in school settings. Health Education Research 2003; 18 : 237-256.

Dwan K, Altman DG, Clarke M, Gamble C, Higgins JPT, Sterne JAC, Williamson PR, Kirkham JJ. Evidence for the selective reporting of analyses and discrepancies in clinical trials: a systematic review of cohort studies of clinical trials. PLoS Medicine 2014; 11 : e1001666.

Elia N, von Elm E, Chatagner A, Popping DM, Tramèr MR. How do authors of systematic reviews deal with research malpractice and misconduct in original studies? A cross-sectional analysis of systematic reviews and survey of their authors. BMJ Open 2016; 6 : e010442.

Gøtzsche PC. Multiple publication of reports of drug trials. European Journal of Clinical Pharmacology 1989; 36 : 429-432.

Gøtzsche PC, Hróbjartsson A, Maric K, Tendal B. Data extraction errors in meta-analyses that use standardized mean differences. JAMA 2007; 298 : 430-437.

Gross A, Schirm S, Scholz M. Ycasd - a tool for capturing and scaling data from graphical representations. BMC Bioinformatics 2014; 15 : 219.

Hoffmann TC, Glasziou PP, Boutron I, Milne R, Perera R, Moher D, Altman DG, Barbour V, Macdonald H, Johnston M, Lamb SE, Dixon-Woods M, McCulloch P, Wyatt JC, Chan AW, Michie S. Better reporting of interventions: template for intervention description and replication (TIDieR) checklist and guide. BMJ 2014; 348 : g1687.

ICH. ICH Harmonised tripartite guideline: Struture and content of clinical study reports E31995. ICH1995. www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/E3/E3_Guideline.pdf .

Ioannidis JPA, Mulrow CD, Goodman SN. Adverse events: The more you search, the more you find. Annals of Internal Medicine 2006; 144 : 298-300.

Ip S, Hadar N, Keefe S, Parkin C, Iovin R, Balk EM, Lau J. A web-based archive of systematic review data. Systematic Reviews 2012; 1 : 15.

Ismail R, Azuara-Blanco A, Ramsay CR. Variation of clinical outcomes used in glaucoma randomised controlled trials: a systematic review. British Journal of Ophthalmology 2014; 98 : 464-468.

Jadad AR, Moore RA, Carroll D, Jenkinson C, Reynolds DJM, Gavaghan DJ, McQuay H. Assessing the quality of reports of randomized clinical trials: Is blinding necessary? Controlled Clinical Trials 1996; 17 : 1-12.

Jelicic Kadic A, Vucic K, Dosenovic S, Sapunar D, Puljak L. Extracting data from figures with software was faster, with higher interrater reliability than manual extraction. Journal of Clinical Epidemiology 2016; 74 : 119-123.

Jones AP, Remmington T, Williamson PR, Ashby D, Smyth RL. High prevalence but low impact of data extraction and reporting errors were found in Cochrane systematic reviews. Journal of Clinical Epidemiology 2005; 58 : 741-742.

Jones CW, Keil LG, Holland WC, Caughey MC, Platts-Mills TF. Comparison of registered and published outcomes in randomized controlled trials: a systematic review. BMC Medicine 2015; 13 : 282.

Jonnalagadda SR, Goyal P, Huffman MD. Automating data extraction in systematic reviews: a systematic review. Systematic Reviews 2015; 4 : 78.

Lewin S, Hendry M, Chandler J, Oxman AD, Michie S, Shepperd S, Reeves BC, Tugwell P, Hannes K, Rehfuess EA, Welch V, McKenzie JE, Burford B, Petkovic J, Anderson LM, Harris J, Noyes J. Assessing the complexity of interventions within systematic reviews: development, content and use of a new tool (iCAT_SR). BMC Medical Research Methodology 2017; 17 : 76.

Li G, Abbade LPF, Nwosu I, Jin Y, Leenus A, Maaz M, Wang M, Bhatt M, Zielinski L, Sanger N, Bantoto B, Luo C, Shams I, Shahid H, Chang Y, Sun G, Mbuagbaw L, Samaan Z, Levine MAH, Adachi JD, Thabane L. A scoping review of comparisons between abstracts and full reports in primary biomedical research. BMC Medical Research Methodology 2017; 17 : 181.

Li TJ, Vedula SS, Hadar N, Parkin C, Lau J, Dickersin K. Innovations in data collection, management, and archiving for systematic reviews. Annals of Internal Medicine 2015; 162 : 287-294.

Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JPA, Clarke M, Devereaux PJ, Kleijnen J, Moher D. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. PLoS Medicine 2009; 6 : e1000100.

Liu ZM, Saldanha IJ, Margolis D, Dumville JC, Cullum NA. Outcomes in Cochrane systematic reviews related to wound care: an investigation into prespecification. Wound Repair and Regeneration 2017; 25 : 292-308.

Marshall IJ, Kuiper J, Wallace BC. RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials. Journal of the American Medical Informatics Association 2016; 23 : 193-201.

Mayo-Wilson E, Doshi P, Dickersin K. Are manufacturers sharing data as promised? BMJ 2015; 351 : h4169.

Mayo-Wilson E, Li TJ, Fusco N, Bertizzolo L, Canner JK, Cowley T, Doshi P, Ehmsen J, Gresham G, Guo N, Haythomthwaite JA, Heyward J, Hong H, Pham D, Payne JL, Rosman L, Stuart EA, Suarez-Cuervo C, Tolbert E, Twose C, Vedula S, Dickersin K. Cherry-picking by trialists and meta-analysts can drive conclusions about intervention efficacy. Journal of Clinical Epidemiology 2017a; 91 : 95-110.

Mayo-Wilson E, Fusco N, Li TJ, Hong H, Canner JK, Dickersin K, MUDS Investigators. Multiple outcomes and analyses in clinical trials create challenges for interpretation and research synthesis. Journal of Clinical Epidemiology 2017b; 86 : 39-50.

Mayo-Wilson E, Li T, Fusco N, Dickersin K. Practical guidance for using multiple data sources in systematic reviews and meta-analyses (with examples from the MUDS study). Research Synthesis Methods 2018; 9 : 2-12.

Meade MO, Richardson WS. Selecting and appraising studies for a systematic review. Annals of Internal Medicine 1997; 127 : 531-537.

Meinert CL. Clinical trials dictionary: Terminology and usage recommendations . Hoboken (NJ): Wiley; 2012.

Millard LAC, Flach PA, Higgins JPT. Machine learning to assist risk-of-bias assessments in systematic reviews. International Journal of Epidemiology 2016; 45 : 266-277.

Moher D, Schulz KF, Altman DG. The CONSORT Statement: revised recommendations for improving the quality of reports of parallel-group randomised trials. Lancet 2001; 357 : 1191-1194.

Moher D, Hopewell S, Schulz KF, Montori V, Gøtzsche PC, Devereaux PJ, Elbourne D, Egger M, Altman DG. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. BMJ 2010; 340 : c869.

Moore GF, Audrey S, Barker M, Bond L, Bonell C, Hardeman W, Moore L, O'Cathain A, Tinati T, Wight D, Baird J. Process evaluation of complex interventions: Medical Research Council guidance. BMJ 2015; 350 : h1258.

Orwin RG. Evaluating coding decisions. In: Cooper H, Hedges LV, editors. The Handbook of Research Synthesis . New York (NY): Russell Sage Foundation; 1994. p. 139-162.

Page MJ, McKenzie JE, Kirkham J, Dwan K, Kramer S, Green S, Forbes A. Bias due to selective inclusion and reporting of outcomes and analyses in systematic reviews of randomised trials of healthcare interventions. Cochrane Database of Systematic Reviews 2014; 10 : MR000035.

Ross JS, Mulvey GK, Hines EM, Nissen SE, Krumholz HM. Trial publication after registration in ClinicalTrials.Gov: a cross-sectional analysis. PLoS Medicine 2009; 6 .

Safer DJ. Design and reporting modifications in industry-sponsored comparative psychopharmacology trials. Journal of Nervous and Mental Disease 2002; 190 : 583-592.

Saldanha IJ, Dickersin K, Wang X, Li TJ. Outcomes in Cochrane systematic reviews addressing four common eye conditions: an evaluation of completeness and comparability. PloS One 2014; 9 : e109400.

Saldanha IJ, Li T, Yang C, Ugarte-Gil C, Rutherford GW, Dickersin K. Social network analysis identified central outcomes for core outcome sets using systematic reviews of HIV/AIDS. Journal of Clinical Epidemiology 2016; 70 : 164-175.

Saldanha IJ, Lindsley K, Do DV, Chuck RS, Meyerle C, Jones LS, Coleman AL, Jampel HD, Dickersin K, Virgili G. Comparison of clinical trial and systematic review outcomes for the 4 most prevalent eye diseases. JAMA Ophthalmology 2017a; 135 : 933-940.

Saldanha IJ, Li TJ, Yang C, Owczarzak J, Williamson PR, Dickersin K. Clinical trials and systematic reviews addressing similar interventions for the same condition do not consider similar outcomes to be important: a case study in HIV/AIDS. Journal of Clinical Epidemiology 2017b; 84 : 85-94.

Stewart LA, Clarke M, Rovers M, Riley RD, Simmonds M, Stewart G, Tierney JF, PRISMA-IPD Development Group. Preferred reporting items for a systematic review and meta-analysis of individual participant data: the PRISMA-IPD statement. JAMA 2015; 313 : 1657-1665.

Stock WA. Systematic coding for research synthesis. In: Cooper H, Hedges LV, editors. The Handbook of Research Synthesis . New York (NY): Russell Sage Foundation; 1994. p. 125-138.

Tramèr MR, Reynolds DJ, Moore RA, McQuay HJ. Impact of covert duplicate publication on meta-analysis: a case study. BMJ 1997; 315 : 635-640.

Turner EH. How to access and process FDA drug approval packages for use in research. BMJ 2013; 347 .

von Elm E, Poglia G, Walder B, Tramèr MR. Different patterns of duplicate publication: an analysis of articles used in systematic reviews. JAMA 2004; 291 : 974-980.

Wager E. Coping with scientific misconduct. BMJ 2011; 343 : d6586.

Wieland LS, Rutkow L, Vedula SS, Kaufmann CN, Rosman LM, Twose C, Mahendraratnam N, Dickersin K. Who has used internal company documents for biomedical and public health research and where did they find them? PloS One 2014; 9 .

Zanchetti A, Hansson L. Risk of major gastrointestinal bleeding with aspirin (Authors' reply). Lancet 1999; 353 : 149-150.

Zarin DA, Tse T, Williams RJ, Califf RM, Ide NC. The ClinicalTrials.gov results database: update and key issues. New England Journal of Medicine 2011; 364 : 852-860.

Zwarenstein M, Treweek S, Gagnier JJ, Altman DG, Tunis S, Haynes B, Oxman AD, Moher D. Improving the reporting of pragmatic trials: an extension of the CONSORT statement. BMJ 2008; 337 : a2390.

For permission to re-use material from the Handbook (either academic or commercial), please see here for full details.

A Guide to Evidence Synthesis: 10. Data Extraction

  • Meet Our Team
  • Our Published Reviews and Protocols
  • What is Evidence Synthesis?
  • Types of Evidence Synthesis
  • Evidence Synthesis Across Disciplines
  • Finding and Appraising Existing Systematic Reviews
  • 0. Develop a Protocol
  • 1. Draft your Research Question
  • 2. Select Databases
  • 3. Select Grey Literature Sources
  • 4. Write a Search Strategy
  • 5. Register a Protocol
  • 6. Translate Search Strategies
  • 7. Citation Management
  • 8. Article Screening
  • 9. Risk of Bias Assessment
  • 10. Data Extraction
  • 11. Synthesize, Map, or Describe the Results
  • Evidence Synthesis Institute for Librarians
  • Open Access Evidence Synthesis Resources

Data Extraction

Whether you plan to perform a meta-analysis or not, you will need to establish a regimented approach to extracting data. Researchers often use a form or table to capture the data they will then summarize or analyze. The amount and types of data you collect, as well as the number of collaborators who will be extracting it, will dictate which extraction tools are best for your project. Programs like Excel or Google Spreadsheets may be the best option for smaller or more straightforward projects, while systematic review software platforms can provide more robust support for larger or more complicated data.

It is recommended that you pilot your data extraction tool (especially if you will code your data) to determine if fields should be added or clarified, or if the review team needs guidance in collecting and coding data.

Data Extraction Tools

Excel is the most basic tool for the management of the screening and data extraction stages of the systematic review process. Customized workbooks and spreadsheets can be designed for the review process.

Covidence is a software platform for managing independent title/abstract screening, full text screening, data extraction and risk of bias assessment in a systematic review project. Read more about how Covidence can help you customize extraction tables and export your extracted data.  

RevMan  is free software used to manage Cochrane reviews. For an overview on RevMan, including how it may be used to extract and analyze data, watch the RevMan Web Quickstart Guide or check out the RevMan Knowledge Base .

SRDR  (Systematic Review Data Repository) is a Web-based tool for the extraction and management of data for systematic review or meta-analysis. It is also an open and searchable archive of systematic reviews and their data. Access the help page  for more information.

DistillerSR

DistillerSR is a systematic review management software program, similar to Covidence. It guides reviewers in creating project-specific forms, extracting, and analyzing data. 

JBI Sumari (the Joanna Briggs Institute System for the United Management, Assessment and Review of Information) is a systematic review software platform geared toward fields such as health, social sciences, and humanities. Among the other steps of a review project, it facilitates data extraction and data synthesis. View their short introductions to data extraction and analysis for more information.

The Systematic Review Toolbox (under construction)

The SR Toolbox  is a community-driven, searchable, web-based catalogue of tools that support the systematic review process across multiple domains. Use the advanced search option to restrict to tools specific to data extraction. 

Additional Information

These resources offer additional information and examples of data extraction forms:​

  • Brown, S. A., Upchurch, S. L., & Acton, G. J. (2003). A framework for developing a coding scheme for meta-analysis.  Western Journal of Nursing Research ,  25 (2), 205–222. https://doi.org/10.1177/0193945902250038
  • Elamin, M. B., Flynn, D. N., Bassler, D., Briel, M., Alonso-Coello, P., Karanicolas, P. J., … Montori, V. M. (2009). Choice of data extraction tools for systematic reviews depends on resources and review complexity.  Journal of Clinical Epidemiology ,  62 (5), 506–510. https://doi.org/10.1016/j.jclinepi.2008.10.016
  • Li T, Higgins JPT, Deeks JJ (editors). Chapter 5: Collecting data . In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.4 (updated August 2023). Cochrane, 2023. Available from www.training.cochrane.org/handbook.
  • Research guide from the George Washington University Himmelfarb Health Sciences Library: https://guides.himmelfarb.gwu.edu/c.php?g=27797&p=170447
  • << Previous: 9. Risk of Bias Assessment
  • Next: 11. Synthesize, Map, or Describe the Results >>
  • Last Updated: Apr 24, 2024 2:56 PM
  • URL: https://guides.library.cornell.edu/evidence-synthesis

Banner

  • JABSOM Library

Systematic Review Toolbox

Data extraction.

  • Guidelines & Rubrics
  • Databases & Indexes
  • Reference Management
  • Quality Assessment
  • Data Analysis
  • Manuscript Development
  • Software Comparison
  • Systematic Searching This link opens in a new window
  • Authorship Determination This link opens in a new window
  • Critical Appraisal Tools This link opens in a new window

Requesting Research Consultation

The Health Sciences Library provides consultation services for University of Hawaiʻi-affiliated students, staff, and faculty. The John A. Burns School of Medicine Health Sciences Library does not have staffing to conduct or assist researchers unaffiliated with the University of Hawaiʻi. Please utilize the publicly available guides and support pages that address research databases and tools.

Before Requesting Assistance

Before requesting systematic review assistance from the librarians, please review the relevant guides and the various pages of the Systematic Review Toolbox . Most inquiries received have been answered there previously. Support for research software issues is limited to help with basic installation and setup. Please contact the software developer directly if further assistance is needed.

Data extraction is the process of extracting the relevant pieces of information from the studies you have assessed for eligibility in your review and organizing the information in a way that will help you synthesize the studies and draw conclusions.

Extracting data from reviewed studies should be done in accordance to pre-established guidelines, such as the ones from PRISMA . From each included study, the following data may need to be extracted, depending on the review's purpose: title, author, year, journal, research question and specific aims, conceptual framework, hypothesis, research methods or study type, and concluding points. Special attention should be paid to the methodology, in order to organize studies by study type category in the review results section. If a meta-analysis is also being completed, extract raw and refined data from each result in the study.

Established frameworks for extracting data have been created. Common templates are offered by Cochrane  and supplementary resources have been collected by the George Washington University Libraries . Other forms are built into systematic review manuscript development software (e.g., Covidence, RevMan), although many scholars prefer to simply use Excel to collect data.

  • Data Collection Form A template developed by the Cochrane Collaboration for data extraction of both RCTs and non-RCTs in a systematic review
  • Data Extraction Template A comprehensive template for systematic reviews developed by the Cochrane Haematological Malignancies Group
  • A Framework for Developing a Coding Scheme for Meta-Analysis
  • << Previous: Reference Management
  • Next: Quality Assessment >>
  • Last Updated: Sep 20, 2023 9:14 AM
  • URL: https://hslib.jabsom.hawaii.edu/systematicreview

Health Sciences Library, John A. Burns School of Medicine, University of Hawai‘i at Mānoa, 651 Ilalo Street, MEB 101, Honolulu, HI 96813 - Phone: 808-692-0810, Fax: 808-692-1244

Copyright © 2004-2024. All rights reserved. Library Staff Page - Other UH Libraries

icon

Systematic Reviews and Meta-Analyses: Data Extraction

  • Get Started
  • Exploratory Search
  • Where to Search
  • How to Search
  • Grey Literature
  • What about errata and retractions?
  • Eligibility Screening
  • Critical Appraisal

Data Extraction

  • Synthesis & Discussion
  • Assess Certainty
  • Share & Archive

Systematic methods for extracting data from all relevant studies is an important step leading synthesis . This step often occurs simultaneously with the Critical Appraisal  phase.

  • Presenting Results

Data extraction, sometimes referred to as data collection  or data abstraction , refers to the process of extracting and organizing the information from each included (relevant) study.

The synthesis approach(es) (e.g.,  meta-analysis, framework synthesis ) that you intend to use will inform data extraction . 

Process Details

Just like all other stages of a systematic review,  2 data extractors  should extract data from in  each included reference . The exact procedure may vary according to your resource capacity. For example, you may have a team of 10 extractors in 5 pairs of 2 extracting data from chunks of the included material, if managing a large corpus.

Note:  experience in the field does not necessarily increase the accuracy of this process . See Horton et al., (2010) 'Systematic review data extraction: cross-sectional study showed that experience did not increase accuracy' , and Jones et al., (2005) 'High prevalence but low impact of data extraction and reporting errors were found in Cochrane systematic reviews' for more on this topic' .

Note: Effect Size Measurements

Defining ahead of time which measurement of effect(s) will be relevant and useful  is important, especially if you hope to pursue a meta-analysis . Though it is unlikely that all  of your studies will produce the same measurement of effect (e.g., odds ratio, relative risk ratio), many of these measurements can be transformed or converted to the measurement you need for your meta-analysis.

If converting effect sizes , be sure to provide enough detail about this process in your manuscript such that another team could replicate. It is best to collect the original outputs from articles  before  converting effect sizes. There are tools available for converting effect sizes such as the Campbell Collaboration's tool for calculating or converting effect sizes and the effect size converter from MIT .

Data Extraction Templates

Data extraction is often performed using a  single form to extract data from all included (relevant) studies in a uniform manner . Because the data extraction stage is driven by the scope and goals of a systematic review, there is not a gold standard or one-size-fits all approach to developing a data extraction form.

However, there are templates and guidance available to help in the creation of your forms .

Because it is standard to include the data extraction form in the supplemental material of a systematic review and/or meta-analysis, you may also consider the forms developed and/or used during similar, already published and/or in-progress reviews  

As is the case with the critical appraisal , the type of data you are able to extract will also depend on the study design . Therefore, it is likely that the exact data you extract from each individual article will vary somewhat. 

Data Extraction Form Templates

Cochrane  |  One form for randomized controlled trials (RCTs) only; one form for RCTs and non-RCTs

Joanna Briggs Institute (JBI) |  Several forms located in each relevant chapter:

  • Qualitative data (appendix 2.3)
  • Text and opinion data (appendix 4.3)
  • Prevalence studies (prevalence data; appendix 5.2)
  • Mixed method (convergent integrated approach; appendix 8.1)
  • Diagnostic test accuracy (appendix 9.3)
  • Measurement properties (appendix 12.1) with table of results template (appendix 12.2)

Present Data Extracted

Data extracted from each reference is presented as a summary table or summary of findings table  and described in the narrative .

Summary Tables

A summary table,  like the examples seen below, provides readers with quick glance summary of study details that are important to the systematic review and/or meta-analysis. Similarly to the other stages of a review, what you collect and report will depend on the scope   of the review and the type of synthesis you plan to conduct.

It may be appropriate to include more than one summary table . For example, one table may present basic information about the study such as author names, year of publication, year(s) the study was conducted, study design, funding agency, etc.; Another table may present details more specific to the qualitative synthesis; A third table may present information specifically relevant to the meta-analysis, with effect sizes, confidence intervals, etc. Additionally, it is best practice to have one summary table  for  each outcome.

Methodological Guidance

  • Health Sciences
  • Animal, Food Sciences
  • Social Sciences
  • Environmental Sciences

Cochrane Handbook  -  Part 2: Core Methods

Chapter 5 : Collecting data

  • 5.2 Sources of data
  • 5.3 What data to collect
  • 5.4 Data collection tools
  • 5.5 Extracting data from reports
  • 5.6 Extracting study results and converting to the desired format
  • 5.7 Managing and sharing data

Chapter 6 : Choosing effect measures and computing estimates of effect

  • 6.1 Types of data and effect measures
  • 6.2 Study designs and identifying the unit of analysis
  • 6.3 Extracting estimates of effect directly
  • 6.4 Dichotomous outcome data
  • 6.5 Continuous outcome data
  • 6.6 Ordinal outcome data and measurement scales
  • 6.7 Count and rate data
  • 6.8 Time-to-event data 
  • 6.9 Conditional outcomes only available for subsets of participants 

SYREAF Protocols 

Step 4: data extraction.

Conducting systematic reviews of intervention questions II: Relevance screening, data extraction , assessing risk of bias, presenting the results and interpreting the findings.  Sargeant JM, O’Connor AM. Zoonoses Public Health. 2014 Jun;61 Suppl 1:39-51. doi: 10.1111/zph.12124. PMID: 24905995

Study designs and systematic reviews of interventions: building evidence across study designs.  Sargeant JM, Kelton DF, O’Connor AM. Zoonoses Public Health. 2014 Jun;61 Suppl 1:10-7. doi: 10.1111/zph.12127. PMID: 24905992

Randomized controlled trials and challenge trials: Design and criterion for validity.  Sargeant JM, Kelton DF, O’Connor AM,Zoon. Public Health. 2014. 61 (S1); 18 – 27. PMID: 24905993

Campbell -  MECCIR

C43. Using data collection forms  ( protocol & review / final manuscript )

C44. Describing studies ( review / final manuscript )

C45. Extracting study characteristics and outcome data in duplicate  ( protocol & review / final manuscript )

C46. Making maximal use of data  ( protocol & review / final manuscript )

C47. Examining errata  ( review / final manuscript )

C49. Choosing intervention groups in multi-arm studies  ( protocol & review / final manuscript )

C50. Checking accuracy of numeric data in the review ( review / final manuscript )

CEE  -  Guidelines and Standards for Evidence synthesis in Environmental Management

Section 6. data coding and data extraction.

CEE Standards for conduct and reporting

6.3   Assessing agreement between data coders/extractors

6.4   Data coding

6.5 Data extraction

Reporting in Protocol and Final Manuscript

  • Final Manuscript

In the Protocol |  PRISMA-P

Data collection process (item 11c).

...forms should be developed a priori and included in the published or otherwise available review protocol as an appendix or as online supplementary materials

Include strategies for reducing error:

"...level of reviewer experience has not been shown to affect extraction error rates. As such, additional strategies planned to reduce errors, such as training of reviewers and piloting of extraction forms should be described."

Include how to handle  missing information:

"...in the absence of complete descriptions of treatments, outcomes, effect estimates, or other important information, reviewers may consider asking authors for this information. Whether reviewers plan to contact authors of included studies and how this will be done (such as a maximum of three email attempts) to obtain missing information should be documented in the protocol."

Data Items (Item 12)

List and define all variables for which data will be sought (such as PICO items, funding sources) and any pre-planned data assumptions and simplifications

Include any assumptions by extractors:

"...describe assumptions they intend to make if they encounter missing or unclear information and explain how they plan to deal with such data or lack thereof"

Outcomes and Prioritization (Item 13)

List and define all outcomes for which data will be sought, including prioritisation of main and additional outcomes, with rationale

In the Final Manuscript |  PRISMA

Data collection process (item 9; report in  methods ), essential items.

  • Report how many reviewers collected data from each report, whether multiple reviewers worked independently or not (for example, data collected by one reviewer and checked by another), and any processes used to resolve disagreements between data collectors.
  • Report any processes used to obtain or confirm relevant data from study investigators (such as how they were contacted, what data were sought, and success in obtaining the necessary information).
  • If any automation tools were used to collect data, report how the tool was used (such as machine learning models to extract sentences from articles relevant to the PICO characteristics), how the tool was trained, and what internal or external validation was done to understand the risk of incorrect extractions .
  • If articles required translation into another language to enable data collection, report how these articles were translated (for example, by asking a native speaker or by using software programs).
  • If any software was used to extract data from figures, specify the software used.
  • If any decision rules were used to select data from multiple reports corresponding to a study, and any steps were taken to resolve inconsistencies across reports, report the rules and steps used.

Data Items (Item 10; report in  methods )

  • List and define the outcome domains and time frame of measurement for which data were sought  (Item 10a)
  • Specify whether all results that were compatible with each outcome domain in each study were sought , and, if not, what process was used to select results within eligible domains  (Item 10a)
  • If any changes were made to the inclusion or definition of the outcome domains or to the importance given to them in the review, specify the changes, along with a rationale  (Item 10a)
  • If any changes were made to the processes used to select results within eligible outcome domains, specify the changes, along with a rationale  (Item 10a)
  • List and define all other variables for which data were sought . It may be sufficient to report a brief summary of information collected if the data collection and dictionary forms are made available (for example, as additional files or deposited in a publicly available repository)  (Item 10b)
  • Describe any assumptions made about any missing or unclear information from the studies. For example, in a study that includes “children and adolescents,” for which the investigators did not specify the age range, authors might assume that the oldest participants would be 18 years, based on what was observed in similar studies included in the review, and should report that assumption  (Item 10b)
  • If a tool was used to inform which data items to collect (such as the Tool for Addressing Conflicts of Interest in Trials (TACIT) or a tool for recording intervention details), cite the tool used  (Item 10b)

Additional Items

Consider specifying which outcome domains were considered the most important for interpreting the review’s conclusions (such as “critical” versus “important” outcomes) and provide rationale for the labelling (such as “a recent core outcome set identified the outcomes labelled ‘critical’ as being the most important to patients”)  (Item 10a)

Effect Measures (Item 12; report in  methods )

  • Specify for each outcome or type of outcome (such as binary, continuous) the effect measure(s) (such as risk ratio, mean difference) used in the synthesis or presentation of results.
  • State any thresholds or ranges used to interpret the size of effect (such as minimally important difference; ranges for no/trivial, small, moderate, and large effects) and the rationale for these thresholds.
  • If synthesised results were re-expressed to a different effect measure , report the methods used to re-express results (such as meta-analysing risk ratios and computing an absolute risk reduction based on an assumed comparator risk)

Study Characteristics (Item 17; report in  results )

  • Cite each included study
  • Present the key characteristics of each study in a table or figure (considering a format that will facilitate comparison of characteristics across the studies)

If the review examines the effects of interventions, consider presenting an additional table that summarises the intervention details for each study

Results of Individual Studies (Item 19; report in  results )

  • For all outcomes , irrespective of whether statistical synthesis was undertaken, present for each study summary statistics for each group (where appropriate). For dichotomous outcomes, report the number of participants with and without the events for each group; or the number with the event and the total for each group (such as 12/45). For continuous outcomes, report the mean, standard deviation, and sample size of each group.
  • For all outcomes , irrespective of whether statistical synthesis was undertaken, present for each study an effect estimate and its precision (such as standard error or 95% confidence/credible interval). For example, for time-to-event outcomes, present a hazard ratio and its confidence interval.
  • If study-level data are presented visually or reported in the text (or both), also present a tabular display of the results .
  • If results were obtained from multiple sources (such as journal article, study register entry, clinical study report, correspondence with authors), report the source of the data . This need not be overly burdensome. For example, a statement indicating that, unless otherwise specified, all data came from the primary reference for each included study would suffice. Alternatively, this could be achieved by, for example, presenting the origin of each data point in footnotes, in a column of the data table, or as a hyperlink to relevant text highlighted in reports (such as using SRDR Data Abstraction Assistant139).
  • If applicable, indicate which results were not reported directly and had to be computed or estimated from other information (see item #13b)
  • << Previous: Critical Appraisal
  • Next: Synthesis & Discussion >>
  • Last Updated: Apr 12, 2024 12:41 PM
  • URL: https://guides.lib.vt.edu/SRMA

Systematic Reviews: Data Extraction/Coding/Study characteristics/Results

  • Types of literature review, methods, & resources
  • Protocol and registration
  • Search strategy
  • Medical Literature Databases to search
  • Study selection and appraisal
  • Data Extraction/Coding/Study characteristics/Results
  • Reporting the quality/risk of bias
  • Manage citations using RefWorks This link opens in a new window
  • GW Box file storage for PDF's This link opens in a new window

Data Extraction: PRISMA Item 10

The next step is for the researchers to read the full text of each article identified for inclusion in the review and  extract the pertinent data using a standardized data extraction/coding form.  The data extraction form should be as long or as short as necessary and can be coded for computer analysis if desired.

If you are writing a narrative review to summarise information reported in a small number of studies then you probably don't need to go to the trouble of coding the data variables for computer analysis but instead summarize the information from the data extraction forms for the included studies.

If you are conducting an analytical review with a meta-analysis to compare data outcomes from several clinical trials you may wish to computerize the data collection and analysis processes.   Reviewers can use fillable forms to collect and code data reported in the studies included in the review, the data can then be uploaded to analytical computer software such as Excel or SPSS for statistical analysis.  GW School of Medicine, School of Public Health, and School of Nursing faculty, staff, and students can use the various  statistical analytical software in the Himmelfarb Library , and watch online training videos from LinkedIn Learning  at the Talent@GW website to learn about how to perform statistical analysis with Excel and SPSS.

Software to help you create coded data extraction forms from templates include: Covidence ,  DistillerSR (needs subscription), EPPI Reviewer (subscription, free trial), or AHRQ's  SRDR tool  (free) which is web-based and has a training environment, tutorials, and example templates of systematic review data extraction forms .  If you prefer to design your own coded data extraction form from scratch  Elamin et al (2009)  offer advice on how to decide what electronic tools to use to extract data for analytical reviews. The process of designing a coded data extraction form and codebook are described in  Brown, Upchurch & Acton (2003)  and  Brown et al (2013) .  You should assign a unique identifying number to each variable field so they can be programmed into fillable form fields in whatever software you decide to use for data extraction/collection. You can use AHRQ's Systematic Review Data Repository  SRDR tool , or online survey forms such as Qualtrics, RedCAP , or Survey Monkey, or design and create your own coded fillable forms using Adobe Acrobat Pro or Microsoft Access.   You might like to include on the data extraction form a field for grading the quality of the study, see the Screening for quality page for examples of some of the quality scales you might choose to apply.

Three examples of a data extraction form are below:  

  • Data Extraction Form Example (suitable for small-scale literature review of a few dozen studies) This example was used to gather data for a poster reporting a literature review of studies of interventions to increase Emergency Department throughput. The poster can be downloaded from http://hsrc.himmelfarb.gwu.edu/libfacpres/62/
  • Data Extraction Form for the Cochrane Review Group (uncoded & used to extract fine-detail/many variables) This is one example of a form, illustrating the thoroughness of the Cochrane research methodology. You could devise a simpler one page data extraction form for a more simple literature review.
  • Coded data extraction form (fillable form fields that can be computerized for data analysis) See Table 1 of Brown, Upchurch & Acton (2013)

Study characteristics: PRISMA Item 18

The data extraction forms can be used to produce a summary table of study characteristics that were considered important for inclusion. 

In the final report in the results section the characteristics of the studies that were included in the review should be reported for PRISMA Item 18 as:

  • Summary PICOS (Patient/Population, Intervention, Comparison if any, Outcomes, Study Design Type) and other pertinent characteristics of the reviewed studies should be reported both in the text in the Results section and in the form of a table. Here is an example of a  table that summarizes the characteristics of studies  in a review, note this table could be improved by adding a column for the quality score you assigned to each study, or you could add a column with a value representing the time period in which the study was carried out if this might be useful for the reader to know. The summary table could either be an appendix or in the text itself if the table is small enough e.g. similar to Table 1 of Shah et al (2007) .

A bibliography of the included studies should always be created, particularly if you are intending to publish your review. Read the advice for authors page on the journal website, or ask the journal editor to advise you on what citation format the journal requires you to use. Himmelfarb Library recommends using  RefWorks  to manage your references.

Results: PRISMA Item 20

In the final report the results from individual studies should be reported for PRISMA Item 20 as follows:

For all outcomes considered (benefits or harms) from each included study write in the results section:

  • (a) simple summary data for each intervention group
  • (b) effect estimates and confidence intervals

In a review where you are reporting a binary outcome e.g. intervention vs placebo or control, and you are able to combine/pool results from several experimental studies done using the same methods on like populations in like settings, then in the results section you should report the relative strength of treatment effects from each study in your review and the combined effect outcome from your meta-analysis.  For a meta-analysis of Randomized trials you should represent the meta-analysis visually on a “forest plot”  (see fig. 2).  Here is another example of a meta-analysis  forest plot , and on page 2 a description of how to interpret it.  

If your review included heterogenous study types (ie some combination of experimental trials and observational studies) you won't be able to do a meta-analysis, then instead your analysis could follow the  Synthesis Without Meta-analysis (SWiM) guideline , and consider presenting your results in an alternative visually arresting graphic using a template in Excel or SPSS or from a web-based applications for  infographics .   GW faculty, staff, and students, may watch online training videos from  LinkedIn Learning  at the Talent@GW website  to learn how to work with charts and graphs, and design infographics.

  • << Previous: Study selection and appraisal
  • Next: Reporting the quality/risk of bias >>

Creative Commons License

  • Last Updated: Apr 22, 2024 9:18 AM
  • URL: https://guides.himmelfarb.gwu.edu/systematic_review

GW logo

  • Himmelfarb Intranet
  • Privacy Notice
  • Terms of Use
  • GW is committed to digital accessibility. If you experience a barrier that affects your ability to access content on this page, let us know via the Accessibility Feedback Form .
  • Himmelfarb Health Sciences Library
  • 2300 Eye St., NW, Washington, DC 20037
  • Phone: (202) 994-2850
  • [email protected]
  • https://himmelfarb.gwu.edu
  • Mayo Clinic Libraries
  • Systematic Reviews
  • Data Extraction

Systematic Reviews: Data Extraction

  • Knowledge Synthesis Comparison
  • Knowledge Synthesis Decision Tree
  • Standards & Reporting Results
  • Materials in the Mayo Clinic Libraries
  • Training Resources
  • Review Teams
  • Develop & Refine Your Research Question
  • Develop a Timeline
  • Project Management
  • Communication
  • PRISMA-P Checklist
  • Eligibility Criteria
  • Register your Protocol
  • Other Resources
  • Other Screening Tools
  • Grey Literature Searching
  • Citation Searching
  • Data Extraction Tools
  • Minimize Bias
  • Critical Appraisal by Study Design
  • Synthesis & Meta-Analysis
  • Publishing your Systematic Review

Data Collection

literature review data extraction

"data slide" by bionicteaching is licensed under CC BY-NC 2.0

This stage of the systematic review process involves transcribing information from each study using a structured piloted format designed to consistently and objectively capture the relevant details.  Two reviewers working independently are preferred for accuracy.  Data must be managed appropriately in a transparent way and available for future updates of the systematic review and for data sharing. A sampling of data collection tools are listed here .

Data Extraction Elements :

  • Consider your research question components and objectives
  • Consider study inclusion / exclusion criteria
  • Full citation 
  • Intervention
  • Study Design and methodology
  • Participant characteristics
  • Outcome measures
  • Study quality factors

Consult Cochrane Interactive Learning Module 4: Selecting Studies and Collecting Data for further information.  *Please note you will need to register for a Cochrane account while initially on the Mayo network. You'll receive an email message containing a link to create a password and activate your account.*

References & Recommended Reading

1.      Li T, Higgins JPT, Deeks JJ. Collecting data. In: Higgins J, Thomas J, Chandler J, et al, eds. Cochrane Handbook for Systematic Reviews of Interventions. version 6.2 ed. Cochrane; 2021:chap 5. https://training.cochrane.org/handbook/current/chapter-05

2.      Page MJ, Moher D, Bossuyt PM, et al. PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ (Clinical research ed) . 2021;372:n160. doi:https://dx.doi.org/10.1136/bmj.n160

         See Item 9 – Data Collection Process and Item 10 – Data Items

3.      Buchter RB, Weise A, Pieper D. Development, testing and use of data extraction forms in systematic reviews: a review of methodological guidance. BMC medical research methodology . 2020;20(1):259. doi:https://dx.doi.org/10.1186/s12874-020-01143-3

4.      Mathes T, Klasen P, Pieper D. Frequency of data extraction errors and methods to increase data extraction quality: a methodological review. BMC medical research methodology . 2017;17(1):152. doi:https://dx.doi.org/10.1186/s12874-017-0431-4

5.      Hartling L. Creating efficiencies in the extraction of data from randomized trials: a prospective evaluation of a machine learning and text mining tool . BMC Med Res Methodol. 2021 Aug 16;21(1):169. doi: 10.1186/s12874-021-01354-2. PMID: 34399684; PMCID: PMC8369614.

  • << Previous: Citation Searching
  • Next: Data Extraction Tools >>
  • Last Updated: Apr 25, 2024 8:44 AM
  • URL: https://libraryguides.mayo.edu/systematicreviewprocess

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • Write for Us
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 24, Issue 2
  • Five tips for developing useful literature summary tables for writing review articles
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0003-0157-5319 Ahtisham Younas 1 , 2 ,
  • http://orcid.org/0000-0002-7839-8130 Parveen Ali 3 , 4
  • 1 Memorial University of Newfoundland , St John's , Newfoundland , Canada
  • 2 Swat College of Nursing , Pakistan
  • 3 School of Nursing and Midwifery , University of Sheffield , Sheffield , South Yorkshire , UK
  • 4 Sheffield University Interpersonal Violence Research Group , Sheffield University , Sheffield , UK
  • Correspondence to Ahtisham Younas, Memorial University of Newfoundland, St John's, NL A1C 5C4, Canada; ay6133{at}mun.ca

https://doi.org/10.1136/ebnurs-2021-103417

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction

Literature reviews offer a critical synthesis of empirical and theoretical literature to assess the strength of evidence, develop guidelines for practice and policymaking, and identify areas for future research. 1 It is often essential and usually the first task in any research endeavour, particularly in masters or doctoral level education. For effective data extraction and rigorous synthesis in reviews, the use of literature summary tables is of utmost importance. A literature summary table provides a synopsis of an included article. It succinctly presents its purpose, methods, findings and other relevant information pertinent to the review. The aim of developing these literature summary tables is to provide the reader with the information at one glance. Since there are multiple types of reviews (eg, systematic, integrative, scoping, critical and mixed methods) with distinct purposes and techniques, 2 there could be various approaches for developing literature summary tables making it a complex task specialty for the novice researchers or reviewers. Here, we offer five tips for authors of the review articles, relevant to all types of reviews, for creating useful and relevant literature summary tables. We also provide examples from our published reviews to illustrate how useful literature summary tables can be developed and what sort of information should be provided.

Tip 1: provide detailed information about frameworks and methods

  • Download figure
  • Open in new tab
  • Download powerpoint

Tabular literature summaries from a scoping review. Source: Rasheed et al . 3

The provision of information about conceptual and theoretical frameworks and methods is useful for several reasons. First, in quantitative (reviews synthesising the results of quantitative studies) and mixed reviews (reviews synthesising the results of both qualitative and quantitative studies to address a mixed review question), it allows the readers to assess the congruence of the core findings and methods with the adapted framework and tested assumptions. In qualitative reviews (reviews synthesising results of qualitative studies), this information is beneficial for readers to recognise the underlying philosophical and paradigmatic stance of the authors of the included articles. For example, imagine the authors of an article, included in a review, used phenomenological inquiry for their research. In that case, the review authors and the readers of the review need to know what kind of (transcendental or hermeneutic) philosophical stance guided the inquiry. Review authors should, therefore, include the philosophical stance in their literature summary for the particular article. Second, information about frameworks and methods enables review authors and readers to judge the quality of the research, which allows for discerning the strengths and limitations of the article. For example, if authors of an included article intended to develop a new scale and test its psychometric properties. To achieve this aim, they used a convenience sample of 150 participants and performed exploratory (EFA) and confirmatory factor analysis (CFA) on the same sample. Such an approach would indicate a flawed methodology because EFA and CFA should not be conducted on the same sample. The review authors must include this information in their summary table. Omitting this information from a summary could lead to the inclusion of a flawed article in the review, thereby jeopardising the review’s rigour.

Tip 2: include strengths and limitations for each article

Critical appraisal of individual articles included in a review is crucial for increasing the rigour of the review. Despite using various templates for critical appraisal, authors often do not provide detailed information about each reviewed article’s strengths and limitations. Merely noting the quality score based on standardised critical appraisal templates is not adequate because the readers should be able to identify the reasons for assigning a weak or moderate rating. Many recent critical appraisal checklists (eg, Mixed Methods Appraisal Tool) discourage review authors from assigning a quality score and recommend noting the main strengths and limitations of included studies. It is also vital that methodological and conceptual limitations and strengths of the articles included in the review are provided because not all review articles include empirical research papers. Rather some review synthesises the theoretical aspects of articles. Providing information about conceptual limitations is also important for readers to judge the quality of foundations of the research. For example, if you included a mixed-methods study in the review, reporting the methodological and conceptual limitations about ‘integration’ is critical for evaluating the study’s strength. Suppose the authors only collected qualitative and quantitative data and did not state the intent and timing of integration. In that case, the strength of the study is weak. Integration only occurred at the levels of data collection. However, integration may not have occurred at the analysis, interpretation and reporting levels.

Tip 3: write conceptual contribution of each reviewed article

While reading and evaluating review papers, we have observed that many review authors only provide core results of the article included in a review and do not explain the conceptual contribution offered by the included article. We refer to conceptual contribution as a description of how the article’s key results contribute towards the development of potential codes, themes or subthemes, or emerging patterns that are reported as the review findings. For example, the authors of a review article noted that one of the research articles included in their review demonstrated the usefulness of case studies and reflective logs as strategies for fostering compassion in nursing students. The conceptual contribution of this research article could be that experiential learning is one way to teach compassion to nursing students, as supported by case studies and reflective logs. This conceptual contribution of the article should be mentioned in the literature summary table. Delineating each reviewed article’s conceptual contribution is particularly beneficial in qualitative reviews, mixed-methods reviews, and critical reviews that often focus on developing models and describing or explaining various phenomena. Figure 2 offers an example of a literature summary table. 4

Tabular literature summaries from a critical review. Source: Younas and Maddigan. 4

Tip 4: compose potential themes from each article during summary writing

While developing literature summary tables, many authors use themes or subthemes reported in the given articles as the key results of their own review. Such an approach prevents the review authors from understanding the article’s conceptual contribution, developing rigorous synthesis and drawing reasonable interpretations of results from an individual article. Ultimately, it affects the generation of novel review findings. For example, one of the articles about women’s healthcare-seeking behaviours in developing countries reported a theme ‘social-cultural determinants of health as precursors of delays’. Instead of using this theme as one of the review findings, the reviewers should read and interpret beyond the given description in an article, compare and contrast themes, findings from one article with findings and themes from another article to find similarities and differences and to understand and explain bigger picture for their readers. Therefore, while developing literature summary tables, think twice before using the predeveloped themes. Including your themes in the summary tables (see figure 1 ) demonstrates to the readers that a robust method of data extraction and synthesis has been followed.

Tip 5: create your personalised template for literature summaries

Often templates are available for data extraction and development of literature summary tables. The available templates may be in the form of a table, chart or a structured framework that extracts some essential information about every article. The commonly used information may include authors, purpose, methods, key results and quality scores. While extracting all relevant information is important, such templates should be tailored to meet the needs of the individuals’ review. For example, for a review about the effectiveness of healthcare interventions, a literature summary table must include information about the intervention, its type, content timing, duration, setting, effectiveness, negative consequences, and receivers and implementers’ experiences of its usage. Similarly, literature summary tables for articles included in a meta-synthesis must include information about the participants’ characteristics, research context and conceptual contribution of each reviewed article so as to help the reader make an informed decision about the usefulness or lack of usefulness of the individual article in the review and the whole review.

In conclusion, narrative or systematic reviews are almost always conducted as a part of any educational project (thesis or dissertation) or academic or clinical research. Literature reviews are the foundation of research on a given topic. Robust and high-quality reviews play an instrumental role in guiding research, practice and policymaking. However, the quality of reviews is also contingent on rigorous data extraction and synthesis, which require developing literature summaries. We have outlined five tips that could enhance the quality of the data extraction and synthesis process by developing useful literature summaries.

  • Aromataris E ,
  • Rasheed SP ,

Twitter @Ahtisham04, @parveenazamali

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests None declared.

Patient consent for publication Not required.

Provenance and peer review Not commissioned; externally peer reviewed.

Read the full text or download the PDF:

  • - Google Chrome

Intended for healthcare professionals

  • Access provided by Google Indexer
  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • Validity of data...

Validity of data extraction in evidence synthesis practice of adverse events: reproducibility study

  • Related content
  • Peer review
  • Chang Xu , professor 1 2 3 ,
  • Tianqi Yu , masters candidate 4 ,
  • Luis Furuya-Kanamori , senior research fellow 5 ,
  • Lifeng Lin , assistant professor 6 ,
  • Liliane Zorzela , clinical assistant professor 7 ,
  • Xiaoqin Zhou , methodologist 9 ,
  • Hanming Dai , doctoral candidate 9 ,
  • Yoon Loke , professor 10 ,
  • Sunita Vohra , professor 7 11
  • 1 Key Laboratory of Population Health Across-life Cycle, Ministry of Education of the People’s Republic of China, Anhui Medical University, Anhui, China
  • 2 Anhui Provincial Key Laboratory of Population Health and Aristogenics, Anhui Medical University, Anhui, China
  • 3 School of Public Health, Anhui Medical University, Anhui, China
  • 4 Chinese Evidence-based Medicine Centre, West China Hospital, Sichuan University, Chengdu, China
  • 5 UQ Centre for Clinical Research, Faculty of Medicine, University of Queensland, Brisbane, QLD, Australia
  • 6 Department of Statistics, Florida State University, Tallahassee, FL, USA
  • 7 Department of Pediatrics, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, Alberta, AB, Canada
  • 8 Department of Clinical Research Management, West China Hospital, Sichuan University, Chengdu, China
  • 9 Mental Health Centre, West China Hospital of Sichuan University, Chengdu, China
  • 10 Norwich Medical School, University of East Anglia, Norwich, UK
  • 11 Departments of Psychiatry, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, Alberta, AB, Canada
  • Correspondence to: S Vohra svohra{at}ualberta.ca
  • Accepted 10 April 2022

Objectives To investigate the validity of data extraction in systematic reviews of adverse events, the effect of data extraction errors on the results, and to develop a classification framework for data extraction errors to support further methodological research.

Design Reproducibility study.

Data sources PubMed was searched for eligible systematic reviews published between 1 January 2015 and 1 January 2020. Metadata from the randomised controlled trials were extracted from the systematic reviews by four authors. The original data sources (eg, full text and ClinicalTrials.gov) were then referred to by the same authors to reproduce the data used in these meta-analyses.

Eligibility criteria for selecting studies Systematic reviews were included when based on randomised controlled trials for healthcare interventions that reported safety as the exclusive outcome, with at least one pair meta-analysis that included five or more randomised controlled trials and with a 2×2 table of data for event counts and sample sizes in intervention and control arms available for each trial in the meta-analysis.

Main outcome measures The primary outcome was data extraction errors summarised at three levels: study level, meta-analysis level, and systematic review level. The potential effect of such errors on the results was further investigated.

Results 201 systematic reviews and 829 pairwise meta-analyses involving 10 386 randomised controlled trials were included. Data extraction could not be reproduced in 1762 (17.0%) of 10 386 trials. In 554 (66.8%) of 829 meta-analyses, at least one randomised controlled trial had data extraction errors; 171 (85.1%) of 201 systematic reviews had at least one meta-analysis with data extraction errors. The most common types of data extraction errors were numerical errors (49.2%, 867/1762) and ambiguous errors (29.9%, 526/1762), mainly caused by ambiguous definitions of the outcomes. These categories were followed by three others: zero assumption errors, misidentification, and mismatching errors. The impact of these errors were analysed on 288 meta-analyses. Data extraction errors led to 10 (3.5%) of 288 meta-analyses changing the direction of the effect and 19 (6.6%) of 288 meta-analyses changing the significance of the P value. Meta-analyses that had two or more different types of errors were more susceptible to these changes than those with only one type of error (for moderate changes, 11 (28.2%) of 39 v 26 (10.4%) 249, P=0.002; for large changes, 5 (12.8%) of 39 v 8 (3.2%) of 249, P=0.01).

Conclusion Systematic reviews of adverse events potentially have serious issues in terms of the reproducibility of the data extraction, and these errors can mislead the conclusions. Implementation guidelines are urgently required to help authors of future systematic reviews improve the validity of data extraction.

Introduction

In an online survey of 1576 researchers by Nature , the collected opinions emphasised the need for better reproducibility in research: “More than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments.” 1

Systematic reviews and meta-analyses have become the most important tools for assessing healthcare interventions. This research involves explicit and standardised procedures to identify, appraise, and synthesise all available evidence within a specific topic. 2 During the process of systematic reviews, each step matters, and any errors could affect the reliability of the final results. Among these steps, data extraction is arguably one of the most important and is prone to errors because raw data are transferred from original studies into the systematic review that serves as the basis for evidence synthesis.

To ensure the quality of data extraction, authoritative guidelines, such as the Cochrane Handbook, highlight the importance of independent extraction by two review authors. 2 Despite this quality assurance mechanism, data extraction error in systematic reviews occurs frequently in the literature. 3 Jones et al 4 reproduced 34 Cochrane reviews published in 2003 (issue 4) and found that 20 (59%) had data extraction errors. Gøtzsche et al 5 examined 27 meta-analyses of continuous outcomes and reported that 17 (63%) of these meta-analyses had an error for at least one of the two randomly selected trials. In their subsequent study, based on 10 systematic reviews of continuous outcomes, seven (70%) were identified as erroneous data. 6

Empirical evidence suggests that the effect of data extraction error seems to be minor. 3 5 However, this conclusion is based on systematic reviews of continuous outcomes, which do not apply to binary outcomes of adverse events. Harms, especially serious harms, tend to be rare, and such data in nature are more susceptible to random or systematic errors than are common outcomes. 7 8 For example, consider a 1:1 designed trial with a sample size of 100, and the event counts of death are two intervention group and one in the control group. If the review authors incorrectly extracted the number of events in the intervention group as one, the relative risk would drop from two to one, leading to a completely different conclusion. Owing to this feature, in systematic reviews of adverse events, the validity of data extraction can considerably affect the results and even predominate the final conclusion. The erroneous conclusion would further influence the clinical practice guidelines and mislead healthcare practice.

We used a large-scale reproducibility investigation on the reproducibility of data extraction for systematic reviews of adverse events. We propose an empirical classification of the data extraction errors to help methodologists and systematic review authors better understand the sources of data extraction errors. The impact of such errors on the results is also examined based on the reproducibility dataset.

Protocol and data source

This article is an extension of our previous work describing methods to deal with double-zero-event studies. 9 A protocol was drafted on 11 April 2021 by a group of core authors (CX, TY, LL, LFK), which was then revised after expert feedback (SV, LZ, RQ, and JZ; see supplementary file). We also record the detailed implementation of this study (supplementary table 1).

A subset of the data from the previous study was used in this study. Briefly, we searched PubMed for systematic reviews of adverse events indexed from 1 January 2015 to 1 January 2020. The limit on the search date was arbitrary but allowed us to capture the practice of the most recent systematic reviews. We did not search in other databases because we did not aim to include all systematic reviews; instead, a representative sample was sufficient for the aim of the current study. The search strategy was developed by an information specialist (supplementary box 1), and the literature search was conducted on 28 July 2020, and has been recorded elsewhere. 9

Inclusion criteria and screening

We included systematic reviews of randomised controlled trials for healthcare interventions, with adverse events as the exclusive outcome. The term adverse event was defined as “any untoward medical occurrence in a patient or subject in clinical practice,” 10 which could be a side effect, adverse effect, adverse reaction, harm, or complication associated with any healthcare intervention. 11 We did not consider systematic reviews based on other types of studies because randomised controlled trials are more likely to be registered with available related summarised data for safety outcomes; this source provided another valid way to assess the reproducibility of data extraction. Additionally, we limited systematic reviews to those with at least one pairwise meta-analysis with five or more studies; the requirement of the number of studies was designed for an ongoing series of studies on synthesis methods to ensure sufficient statistical power. 12 To facilitate the reproducing of the data used in meta-analyses, we considered only systematic reviews that provided a 2×2 table of data of event counts and sample sizes in intervention and control arms of each included study in forest plots or tables. Meta-analyses of proportions and network meta-analyses were not considered. Safety outcomes with continuous type were also not considered because continuous outcomes have been investigated by others. 4 5 6 Systematic reviews in languages other than English and Chinese were excluded.

Two review authors screened the literature independently (XQ and CX). Titles and abstracts were screened first, and then the full texts of the relevant publications were read. For screening of titles and abstracts, only records excluded by both reviewer authors were excluded. Any disagreements were solved by discussion between the two authors.

Data collection

Metadata from the randomised controlled trials were collected from eligible systematic reviews. The following items were extracted: name of the first author, outcome of interest, number of participants and number of events in each group, and detailed information of intervention (eg, type of intervention, dosage, and duration) and control groups. Four experienced authors (CX, TQ, XQ, and HM) extracted the data by dividing the eligible systematic reviews into four equal portions by the initial of the first author, and each extractor led one portion. We had a pilot training for the above items to be extracted through the first systematic review before the formal data extraction. Finally, data were initially checked by the same extractors for their own portion and double checked by the other two authors separately (CX and TQ) to confirm that no errors were present from the data extraction (supplementary table 1).

Additionally, based on the reporting of each systematic review, we collected the following information according to the good practice guideline of data extraction 13 : how the data were extracted (eg, two extractors independently), whether a protocol was available, whether a clear data extraction plan was made in the protocol, whether any solution for anticipant problems in data extraction was outlined in the protocol, whether a standard data extraction form was used, whether the data extraction form was piloted, whether the data extractors were trained, the expertise of the data extractors, and whether they documented any details of data extraction. CX also collected the methods (eg, inverse variance, fixed effect model), effect estimators used for meta-analysis, and the effect with a confidence interval of the meta-analysis. TQ checked the extraction and any disagreements were solved by discussion between these two authors (with detailed records).

Reproducibility

After we extracted the data from the included systematic reviews, the four authors who extracted the data were required to reproduce the data used in meta-analyses from the original sources, which included the original publications of the randomised controlled trials and their supplementary files, ClinicalTrials.gov, and websites of the pharmaceutical companies. When the trial data used in a meta-analysis were not the same as had been reported from one of its original sources, we classified it as a “data extraction error.” If the authors of the systematic review reported that they had contacted the authors of the original paper and successfully obtained related data, we did not consider the discrepancy a data extraction error, even if the data were not the same as any of the original sources. 14 We recorded the details of the location (that is, event count (r) or total sample size (n), intervention (1) or control group (2), which are marked as r1/n1/r2/n2) and the reasons why the data could not be reproduced. Any enquires or issues that would affect our assessment were resolved by group discussion of the four extractors. Again, reproducibility was initially checked by the data extractors for their own portions of the workload. After data extraction and reproduction, the lead author (CX) and TQ separately conducted two further rounds of double checking (supplementary table 1). 15

Our primary outcome of this study was the proportions of the data extraction errors at the study level, the meta-analysis level, and the systematic review level. The secondary outcomes were the proportion of studies with data extraction error within each meta-analysis and the proportion of meta-analyses with data extraction error within each systematic review.

Statistical analysis

We summarised the frequency of data extraction errors at the study level, the meta-analysis level, and the systematic review level to estimate the aforementioned proportions. For the study level, the frequency was the total number of randomised controlled trials with data extraction errors. For the meta-analysis level, the frequency was the number of meta-analyses with at least one study with data extraction errors. For the systematic review level, the frequency was the number of systematic reviews with at least one meta-analysis with data extraction errors.

Considering that clustering effects might be present (owing to the diverse expertise and experience of the four people who extracted data), a generalised linear mixed model was further used to estimate the extractor adjusted proportion. 16 The potential associations among duplicated data extraction, development of a protocol in advance, and data extraction errors based on systematic review level were examined using multivariable logistic regression. Other recommendations listed in good practice guidelines were not examined because most systematic reviews did not report the information.

Because data extraction errors could have different mechanisms (eg, calculation errors and unclear definition of the outcome), we empirically classified these errors into different types on the basis of consensus after summarising the error information (supplementary fig 1). Then, the percentages of the different types of errors among the total number of errors were summarised based on the study level. We conducted a post-hoc comparison of the difference of the proportions of the total and the subtype errors by two types of interventions: drug interventions; and non-drug interventions (eg, surgery and device). We did this because the safety outcomes are greatly different for these two types of interventions based on our word cloud analysis (supplementary fig 2).

To investigate the potential effect of data extraction errors on the results, we used the same methods and effect estimators that the authors reported based on the corrected dataset. We repeated these meta-analyses and compared the new results to the original results. Some meta-analyses contained errors related to unclear definitions of the outcomes (that is, the ambiguous error defined in table 1 ). The true number of events is therefore impossible for readers to determine, as is the ability to investigate the effect on the results based on the full empirical dataset. Therefore, we used a subset with meta-analyses free of this type of ambiguous errors. We prespecified a 20% change of the magnitude or more of the effects as moderate impact and a 50% change or more as large impact. We also summarised the proportion of change on the direction of the effects and on the significance of the P value.

Descriptions of the different types of errors during the data extraction

  • View inline

Missing data would occur when the original data sources were not available for a few randomised controlled trials in which we were unable to verify data accuracy. For our sensitivity analysis, which investigated the robustness of the results, we removed these studies. We used Stata 15/SE for the data analysis. The estimation of the proportions was based on the meglm command under the Poisson function with the log link 28 ; we set α=0.05 as the significance level. We performed the re-evaluation of the meta-analyses by the admetan command in Stata and verified by metafor command in R 3.5.1 software, and Excel 2013 was used for visualisation.

Patient and public involvement

As this was a technical paper to assess related methodology for data extraction errors of evidence synthesis practice and the impacts of these errors on the analysis, no patients or public members were involved, nor was funding available for the same reason.

Overall, we screened 18 636 records, and initially identified 456 systematic reviews of adverse events. 9 After a further screening of the full texts, 102 were excluded for having non-randomised studies of intervention and 153 were excluded for not having a pairwise meta-analysis, having fewer than five studies in all meta-analyses, or not reporting 2×2 table data used in meta-analyses (supplementary table 3). As such, 201 systematic reviews were included in the current study ( fig 1 ).

Fig 1

Flowchart for selection of articles. RCT=randomised controlled trial

  • Download figure
  • Open in new tab
  • Download powerpoint

Among the 201 systematic reviews, 156 referred to drug interventions and the other 45 were non-drug interventions (60% were surgical or device interventions). From the 201 systematic reviews, we identified 829 pairwise meta-analyses with at least five studies involving 10 386 randomised controlled trials. The data extraction error by the four data extractors ranged from 0.5% to 5.4% based on the double-checking process, which suggested that this study had high quality data extraction (supplementary table 1).

Among the 201 systematic reviews, based on the reporting information, 167 (83.1%) stated that they had two data extractors, 31 (15.4%) did not report such information, two (1%) cannot be judged owing to insufficient information, and only one (0.5%) reported that the data were extracted by one person. Fifty four (26.9%) systematic reviews reported a protocol that was developed in advance, whereas most (147, 73.1%) did not report whether they had a protocol. For those with protocols, 32 (59.3%) of 54 had a clear plan for data extraction and 22 (40.7%) outlined a potential solution for anticipant problems for data extraction. Sixty six (32.8%) systematic reviews used a standard data extraction form, while most (135, 67.2%) did not report this information. For the systematic reviews that used a standard extraction form, six (8.8%) piloted this process. No systematic reviews reported the information of whether the data extractor was trained or the expertise of the data extractor. Only seven (3.5%) of 201 systematic reviews documented the details of the data extraction process.

Reproducibility of the data extraction

For the reproducibility of the data used in these meta-analyses, at the study level, we could not reproduce 1762 (17.0%) of 10 386 studies with an extractor addressed proportion of 15.8%. At the meta-analysis level, 554 (66.8%) of 829 meta-analyses had at least one randomised controlled trial with data extraction errors, with an extractor addressed proportion of 65.5% ( fig 2 ). For meta-analyses with data extraction errors in at least one study, the proportion of studies with data extraction errors within a meta-analysis ranged from 1.9% to 100%, with a median value of 20.6% (interquartile range 12.5-40.0; fig 2 ).

Fig 2

Data extraction errors at the meta-analysis level. Bar plot is based on studies with data extraction errors (n=554). Error rate within a meta-analysis is calculated by the number of studies with data extraction errors against the total number of studies within a meta-analysis

At the systematic review level, 171 (85.1%) of 201 systematic reviews had at least one meta-analysis with data extraction errors, with an extractor addressed proportion of 85.1% ( fig 3 ). For systematic reviews with data extraction errors in at least one meta-analysis, the proportion of meta-analyses with data extraction errors within a systematic review ranged from 16.7% to 100.0%, with a median value of 100.0% (interquartile range 66.7-100; fig 3 ).

Fig 3

Data extraction errors at the systematic review level. Bar plot is based on studies with data extraction errors (n=171). Error rate within a systematic review is calculated by the number of meta-analyses with data extraction errors against the total number of meta-analyses within a systematic review

Based on the multivariable logistic regression, those systematic reviews that reported duplicated data extraction or were checked by another author (odds ratio 0.9, 95% confidence interval 0.3 to 2.5, P=0.83) and developed a protocol in advance (0.7, 0.3 to 1.6, P=0.38) did not show a difference in the odds of errors, but there might be a weak association of errors.

Empirical classification of errors

Based on the mechanism of the data extraction errors, we empirically classified these errors into five types: numerical error, ambiguous error, zero assumption error, mismatching error, and misidentification error. Table 1 provides the definitions of these five types of data extraction errors, with detailed examples. 17 18 19 20 21 22 23 24 25 26 27

Numerical error was the most prevalent data extraction error, which accounted for 867 (49.2%) of 1762 errors recorded in the studies ( fig 4 ). The second most prevalent data extraction error was the ambiguous error, accounting for 526 (29.9%) errors. Notably, zero assumption errors accounted for as much as 221 (12.5%) errors. Misidentification accounted for 115 (6.5%) errors and mismatching errors accounted for 33 (1.9%) errors.

Fig 4

Proportion of 1762 studies classified by five types of data extraction error

Subgroup analysis by the intervention type suggested that meta-analyses with drug interventions were more likely to have data extraction errors than those involving non-drug interventions: total error (19.9% v 8.9%; P<0.001), ambiguous error (6.1% v 2.4%; P<0.001), numerical error (9.4% v 5.4%; P<0.001), zero assumption error (2.6% v 0.9%; P<0.001), and misidentification errors error (1.5% v 0.1%; P<0.001; supplementary fig 3). Although mismatching errors showed the same pattern, the data were not significantly different (0.4% v 0.2%; P=0.09).

Impact of data extraction errors on the results

After removing meta-analyses with ambiguous errors and without errors, 288 meta-analyses could be used to investigate the impact of data extraction errors on the results (supplementary table 4). Among them, 39 had two or more types of errors (mixed), and 249 had only one type of error (single). For the 249 meta-analyses, 200 had numerical errors, 25 had zero assumption errors, 16 had misidentification errors, and eight had mismatching errors. Because of the limited sample size of each subtype, we only summarised the total impact and the effect grouped by the number of types (that is, single type of error or mixed type of errors).

In total, in terms of the magnitude of the effect, when using corrected data for the 288 meta-analyses, 151 (52.4%) had decreased effects, whereas 137 (47.6%) had increased effects; 37 (12.8%) meta-analyses had moderate changes (with ≥20% changes), and 13 (4.5%) had large changes (with ≥50% changes) in the effect estimates ( fig 5 ). For those 37 studies with moderate changes, the effects in 26 (70.2%) increased, whereas those in 11 (29.7%) decreased when using corrected data. For those 13 studies with large changes, nine (69.2%) showed increased effects, whereas four (30.8%) showed decreased effects. Ten (3.5%) of the 288 meta-analyses had changes in the direction of the effect, and 19 (6.6%) of the 288 meta-analyses changed the significance of the P value. For those studies that had changes in the direction, two (20.0%) of 10 changed from beneficial to harmful effects, and eight (80.0%) of the 10 changed from harmful to beneficial effects. For studies that changed in significance, 10 (52.6%) of 19 changed from non-significance to significance, and nine (47.4%) of 19 changed from significance to non-significance. Some examples are presented in table 2 . Studies with two or more types of errors had higher proportions of moderate (28.2% v 10.4%, P=0.002) and large changes (12.8% v 3.2%, P=0.01; fig 5 ) than did with only a single error.

Fig 5

Impact of data extraction errors on results

Examples of changes in the effects and significance when using corrected data

Sensitivity analysis

For 318 (3.1%) of 10 386 studies in the total dataset, we could not obtain full texts or had no access to the original data source to verify data accuracy. After treating them as missing values and removing them from the analyses, no changes were obvious in the proportions of data extraction errors: 16.2% for the study level, 65.7% for the meta-analysis level, and 85.1% for the systematic review level (addressed by extractor clustering effects).

Principal findings

We investigated the reproducibility of the data extraction of 829 pairwise meta-analyses within 201 systematic reviews of safety outcomes by repeating the data extraction from all the included studies. Our results suggested that as much as 85% of the systematic reviews had data extraction errors in at least one meta-analysis. From the point of meta-analysis level, as many as 67% of the meta-analyses had at least one study with data extraction error. Our findings support the seriousness of the findings from the survey conducted by Nature regarding reproducibility of basic science research (70%). 1 At the systematic review level, the problem is even more serious.

Our subgroup analysis showed that data for the safety outcomes of drug interventions had a higher proportion of extraction error (19.9%) than did data for non-drug interventions (8.9%). One important reason could be that safety outcomes of different types of interventions vary considerably (supplementary fig 1). For non-drug interventions, most interventions were surgical or a device, where safety outcomes might be easier to define. For example, a common safety outcome in surgical intervention is bleeding during surgery, whereas a common outcome of drug interventions is liver toxicity, which might be more complex to define and measure. Additionally, the reporting of adverse events in surgical interventions heavily relies on the surgical staff, whereas for adverse events of a drug, patients might also participate in the reporting process. Selective reporting could exist for adverse events of surgical interventions without patients’ participation, 29 and mild but complex adverse events (eg, muscular pain) might be neglected and further make reported adverse events appear more straightforward.

We classified data extraction errors into five types based on the mechanism. Based on this classification, we further found that numerical errors, ambiguous errors, and zero assumption errors accounted for 91% of the total errors. The classification and related findings are important because these data provide a theoretical basis for researchers to develop implementation guidelines and help systematic review authors to reduce the frequency of errors during the data extraction process. Another important reason for data extraction errors might be the poor reporting of adverse events in randomised controlled trials, which have varying terminology, poorly defined categories, and diverse data sources. 30 31 32 If trials did not clearly define an adverse outcome and report it transparently, then systematic review authors would face difficulties during data extraction and the process would be prone to errors, especially with regard to the ambiguous types. We believe that with proper implementation guidance and more explicit trial reporting guidelines for adverse events, these errors can substantially be reduced.

The classification also provides a theoretical basis for methodologists to investigate the potential impact of different types of data extraction errors on the results. The impact of different types of errors on the results could vary. For example, the zero assumption error is expected to push the final effect towards the null when related studies have balanced sample sizes in two arms. 33 The mismatching error has a similar effect because the error pushes the effect towards the opposite direction. By contrast, the direction of the effect is difficult to predict in the other three types of errors. In our empirical data, because of the small number of meta-analyses in each category, we were unable to investigate the impact of each single type of error on the results. One of the most important reasons is that many meta-analyses have ambiguous errors. Nevertheless, we were able to compare the effect of multiple error types against a single error type for meta-analyses. Our results suggested that meta-analyses with multiple types of data extraction errors were prone to be affected. Because different methods can vary on this assumption (eg, two-stage methods with odds ratios assume that double-zero studies are non-informative 9 ), the use of different synthesis methods and effect estimates might have different impacts. 34 35 The impact of data extraction errors on the results is expected to be thoroughly investigated by simulation research.

Strengths and limitations

This large empirical study investigates the reproducibility of the data extraction of systematic reviews of adverse events and its impact on the results of related meta-analyses. The findings of our study pose a serious warning to the community that much progress is needed to achieve high quality, evidence-based practice. We are confident that the results of our findings are reliable because the data have been through five rounds of cross-checking within our tightly structured collaborative team. Additionally, this study is the first time that data extraction errors were defined based on their mechanism, which we think will benefit future methodological research in this area.

However, some limitations are still present. Firstly, owing to the large amount of work, data collection was divided into four portions, and each portion was conducted by a separate author. Although all authors undertook pilot training in advance, their judgments might still differ. Nevertheless, our analysis used the generalised linear mixed model, which accounted for the potential clustering effect by different extractors, of which the findings suggested no obvious impact on the results. Secondly, our study covered only systematic reviews published in the five year period from 2015 to 2020; therefore, the validity of the data extraction in earlier studies is unclear. Whether this issue has deteriorated or improved over time could not be assessed. Thirdly, a small proportion of studies could not have reproducibility checked, and these studies were treated as if no data extraction errors existed, which could lead to a slight underestimation of data extraction error overall. 36

Furthermore, we only focused on systematic reviews of randomised controlled trials and did not consider observational studies. Because the sample sizes of randomised controlled trials tend to be small, the impact might be exacerbated. Finally, poor reporting has been commonly investigated in literature 37 38 ; owing to the limited information of the data extraction process reported by review authors, we could not fully investigate the association between good practice recommendations and the likelihood of data extraction. For the same reason, the association among duplicated data extraction, development of a protocol in advance, and data extraction errors should be interpreted with caution. Further studies based on randomised controlled design might be helpful. However, we believe these limitations have little impact on our main results and conclusions.

Conclusions

Systematic reviews of adverse events face serious issues in terms of the reproducibility of their data extraction. Prevalence of data extraction errors is high among these systematic reviews and these errors could lead to the changing of the conclusions and further mislead the healthcare practice. A series of expanded reproducibility studies on other types of meta-analyses might be useful for further evidence-based practice. Additionally, implementation guidelines on data extraction for systematic reviews are urgently required to help future review authors improve the validity of their findings.

What is already known on this topic

In evidence synthesis practice, data extraction is an important step and prone to errors, because raw data are transferred from the original studies into the meta-analysis

Data extraction errors in systematic reviews occur frequently in the literature, although these errors generally have a minor effect on the results

However, this conclusion is based on systematic reviews of continuous outcomes, and might not apply to binary outcomes of adverse events

What this study adds

In a large-scale reproducibility investigation of 201 systematic reviews of adverse events with 829 pairwise meta-analyses, data extraction errors frequently occurred for binary outcomes of adverse events

These errors could be grouped into five categories based on the mechanism: numerical error, ambiguous error, zero assumption error, mismatching error, and misidentification error

The errors can lead to changes in the conclusions of the findings, and meta-analyses that had two or more types of errors were more susceptible to these changes

Ethics statements

Ethical approval.

Not required.

Data availability statement

A subset of the data can be found at https://osf.io/czyqa /. The dataset could be obtained from the first author ( [email protected] ) or the corresponding author ( [email protected] ) on request.

Acknowledgments

We thank Riaz Qureshi from Johns Hopkins University and Zhang Jiaxin from Guizhou Provincial People's Hospital for their comments and edits on our protocol. We also thank Lu Cuncun from Lanzhou University for developing the search strategy for the whole project.

Contributors: CX and SV conceived and designed the study; CX collected the data, analysed the data, and drafted the manuscript; ZXQ and CX screened the literature; YTQ, CX, DHM, ZXQ extracted and reproduced the data; YTQ and CX contributed to the data checking; CX, SV, LFK, LL, LZ, and YL provided methodological comments, and revised the manuscript. All authors approved the final version to be published. CX and SV are the study guarantors. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

Funding: LFK is funded by an Australian National Health and Medical Research Council Fellowship (APP1158469). LL is funded by the US National Institutes of Health/National Library of Medicine grant R01 LM012982 and the National Institutes of Health/National Institute of Mental Health grant R03 MH128727. The funding body had no role in any process of the study (that is, study design, analysis, interpretation of data, writing of the report, and decision to submit the article for publication).

Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: support from the Australian National Health and Medical Research Council Fellowship, US National Institutes of Health, National Library of Medicine, and National Institute of Mental Health for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

The lead author affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as originally planned (and, if relevant, registered) have been explained.

Dissemination to participants and related patient and public communities: We plan to present our findings at national and international scientific meetings and to use social media outlets to disseminate findings.

Provenance and peer review: Not commissioned; externally peer reviewed.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ .

  • Higgins JPT ,
  • Chandler J ,
  • Cumpston M ,
  • Remmington T ,
  • Williamson PR ,
  • Gøtzsche PC ,
  • Hróbjartsson A ,
  • Higgins JP ,
  • Austin PC ,
  • Steyerberg EW
  • Zorzela L ,
  • Ioannidis JP ,
  • PRISMAHarms Group
  • Jackson D ,
  • Taylor KS ,
  • Mahtani KR ,
  • Berstock J ,
  • Büchter RB ,
  • Piccart-Gebhart M ,
  • Baselga J ,
  • Blackwell KL ,
  • Burstein HJ ,
  • Storniolo AM ,
  • Mubarak N ,
  • Shehata S ,
  • Mahabadi AA ,
  • Flaherty KT ,
  • Infante JR ,
  • Motzer RJ ,
  • Tannir NM ,
  • McDermott DF ,
  • CheckMate 214 Investigators
  • Chesney J ,
  • Pavlick AC ,
  • Postow MA ,
  • Furuya-Kanamori L ,
  • Altman DG ,
  • Boutron I ,
  • Hodkinson A ,
  • Kirkham JJ ,
  • Tudur-Smith C ,
  • Kahale LA ,
  • Khamis AM ,
  • Di Santo P ,
  • Clifford C ,

literature review data extraction

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of sysrev

Automating data extraction in systematic reviews: a systematic review

Siddhartha r. jonnalagadda.

1 Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, 750 North Lake Shore Drive, 11th Floor, Chicago, IL 60611 USA

Pawan Goyal

2 Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, 721302 West Bengal India

Mark D. Huffman

3 Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, USA

Automation of the parts of systematic review process, specifically the data extraction step, may be an important strategy to reduce the time necessary to complete a systematic review. However, the state of the science of automatically extracting data elements from full texts has not been well described. This paper performs a systematic review of published and unpublished methods to automate data extraction for systematic reviews.

We systematically searched PubMed, IEEEXplore, and ACM Digital Library to identify potentially relevant articles. We included reports that met the following criteria: 1) methods or results section described what entities were or need to be extracted, and 2) at least one entity was automatically extracted with evaluation results that were presented for that entity. We also reviewed the citations from included reports.

Out of a total of 1190 unique citations that met our search criteria, we found 26 published reports describing automatic extraction of at least one of more than 52 potential data elements used in systematic reviews. For 25 (48 %) of the data elements used in systematic reviews, there were attempts from various researchers to extract information automatically from the publication text. Out of these, 14 (27 %) data elements were completely extracted, but the highest number of data elements extracted automatically by a single study was 7. Most of the data elements were extracted with F-scores (a mean of sensitivity and positive predictive value) of over 70 %.

Conclusions

We found no unified information extraction framework tailored to the systematic review process, and published reports focused on a limited (1–7) number of data elements. Biomedical natural language processing techniques have not been fully utilized to fully or even partially automate the data extraction step of systematic reviews.

Systematic reviews identify, assess, synthesize, and interpret published and unpublished evidence, which improves decision-making for clinicians, patients, policymakers, and other stakeholders [ 1 ]. Systematic reviews also identify research gaps to develop new research ideas. The steps to conduct a systematic review [ 1 – 3 ] are:

  • Define the review question and develop criteria for including studies
  • Search for studies addressing the review question
  • Select studies that meet criteria for inclusion in the review
  • Extract data from included studies
  • Assess the risk of bias in the included studies, by appraising them critically
  • Where appropriate, analyze the included data by undertaking meta-analyses
  • Address reporting biases

Despite their widely acknowledged usefulness [ 4 ], the process of systematic review, specifically the data extraction step (step 4), can be time-consuming. In fact, it typically takes 2.5–6.5 years for a primary study publication to be included and published in a new systematic review [ 5 ]. Further, within 2 years of the publication of systematic reviews, 23 % are out of date because they have not incorporated new evidence that might change the systematic review’s primary results [ 6 ].

Natural language processing (NLP), including text mining, involves information extraction, which is the discovery by computer of new, previously unfound information by automatically extracting information from different written resources [ 7 ]. Information extraction primarily constitutes concept extraction, also known as named entity recognition, and relation extraction, also known as association extraction. NLP handles written text at level of documents, words, grammar, meaning, and context. NLP techniques have been used to automate extraction of genomic and clinical information from biomedical literature. Similarly, automation of the data extraction step of the systematic review process through NLP may be one strategy to reduce the time necessary to complete and update a systematic review. The data extraction step is one of the most time-consuming steps of a systematic review. Automating or even semi-automating this step could substantially decrease the time taken to complete systematic reviews and thus decrease the time lag for research evidence to be translated into clinical practice. Despite these potential gains from NLP, the state of the science of automating data extraction has not been well described.

To date, there is limited knowledge and methods on how to automate the data extraction phase of the systematic reviews, despite being one of the most time-consuming steps. To address this gap in knowledge, we sought to perform a systematic review of methods to automate the data extraction component of the systematic review process.

Our methodology was based on the Standards for Systematic Reviews set by the Institute of Medicine [ 8 ]. We conducted our study procedures as detailed below with input from the Cochrane Heart Group US Satellite.

Eligibility criteria

We included a report that met the following criteria: 1) the methods or results section describes what entities were or needed to be extracted, and 2) at least one entity was automatically extracted with evaluation results that were presented for that entity.

We excluded a report that met any of the following criteria: 1) the methods were not applied to the data extraction step of a systematic review; 2) the report was an editorial, commentary, or other non-original research report; or 3) there was no evaluation component.

Information sources and searches

For collecting the initial set of articles for our review, we developed search strategies with the help of the Cochrane Heart Group US Satellite, which includes systematic reviewers and a medical librarian. We refined these strategies using relevant citations from related papers. We searched three datasets: PubMed, IEEExplore, and ACM digital library, and our searches were limited between January 1, 2000 and January 6, 2015 (see Appendix 1 ). We restricted our search to these dates because biomedical information extraction algorithms prior to 2000 are unlikely to be accurate enough to be used for systematic reviews.

We retrieved articles that dealt with the extraction of various data elements, defined as categories of data that pertained to any information about or deriving from a study, including details of methods, participants, setting, context, interventions, outcomes, results, publications, and investigators [ 1 ] from included study reports. After we retrieved the initial set of reports from the search results, we then evaluated reports included in the references of these reports. We also sought expert opinion for additional relevant citations.

Study selection

We first de-duplicated the retrieve citations. For calibration and refinement of the inclusion and exclusion criteria, 100 citations were randomly selected and independently reviewed by a two authors (SRJ and PG). Disagreements were resolved by consensus with a third author (MH). In a second round, another set of 100 randomly selected abstracts was independently reviewed by two study authors (SRJ and PG), whereby we achieved a strong level of agreement (kappa = 0.97). Given the high level of agreement, the remaining studies were reviewed only by one author (PG). In this phase, we identified reports as “not relevant” or “potentially relevant”.

Two authors (PG and SRJ) independently reviewed the full text of all citations ( N  = 74) that were identified as “potentially relevant”. We classified included reports into various categories based on the particular data element that they attempted to extract from the original, scientific articles. Example of these data elements might be overall evidence, specific interventions, among others (Table  1 ). We resolved disagreements between the two reviewers through consensus with a third author (MDH).

Data elements, category, sources and existing automation work

Data collection process

Two authors (PG and SRJ) independently reviewed the included articles to extract data, such as the particular entity automatically extracted by the study, algorithm or technique used, and evaluation results into a data abstraction spreadsheet. We resolved disagreements through consensus with a third author (MDH).

We reviewed the Cochrane Handbook for Systematic Reviews [ 1 ], the CONsolidated Standards Of Reporting Trials (CONSORT) [ 9 ] statement, the Standards for Reporting of Diagnostic Accuracy (STARD) initiative [ 10 ], and PICO [ 11 ], PECODR [ 12 ], and PIBOSO [ 13 ] frameworks to obtain the data elements to be considered. PICO stands for Population, Intervention, Comparison, Outcomes; PECODR stands for Patient-Population-Problem, Exposure-Intervention, Comparison, Outcome, Duration and Results; and PIBOSO stands for Population, Intervention, Background, Outcome, Study Design, Other.

Data synthesis and analysis

Because of the large variation in study methods and measurements, a meta-analysis of methodological features and contextual factors associated with the frequency of data extraction methods was not possible. We therefore present a narrative synthesis of our findings. We did not thoroughly assess risk of bias, including reporting bias, for these reports because the study designs did not match domains evaluated in commonly used instruments such as the Cochrane Risk of Bias tool [ 1 ] or QUADAS-2 instrument used for systematic reviews of randomized trials and diagnostic test accuracy studies, respectively [ 14 ].

Of 1190 unique citations retrieved, we selected 75 reports for full-text screening, and we included 26 articles that met our inclusion criteria (Fig.  1 ). Agreement on abstract and full-text screening was 0.97 and 1.00.

An external file that holds a picture, illustration, etc.
Object name is 13643_2015_66_Fig1_HTML.jpg

Process of screening the articles to be included for this systematic review

Study characteristics

Table  1 provides a list of items to be considered in the data extraction process based on the Cochrane Handbook (Appendix 2 ) [ 1 ], CONSORT statement [ 9 ], STARD initiative [ 10 ], and PICO [ 11 ], PECODR [ 12 ], and PIBOSO [ 13 ] frameworks. We provide the major group for each field and report which standard focused on that field. Finally, we report whether there was a published method to extract that field. Table  1 also identifies the data elements relevant to systematic review process categorized by their domain and the standard from which the element was adopted and was associated with existing automation methods, where present.

Results of individual studies

Table  2 summarizes the existing information extraction studies. For each study, the table provides the citation to the study (study: column 1), data elements that the study focused on (extracted elements: column 2), dataset used by the study (dataset: column 3), algorithm and methods used for extraction (method: column 4), whether the study extracted only the sentence containing the data elements, full concept or neither of these (sentence/concept/neither: column 5), whether the extraction was done from full-text or abstracts (full text/abstract: column 6) and the main accuracy results reported by the system (results: column 7). The studies are arranged by increasing complexity by ordering studies that classified sentences before those that extracted the concepts and ordering studies that extracted data from abstracts before those that extracted data from full-text reports.

A summary of included extraction methods and their evaluation

The accuracy of most ( N  = 18, 69 %) studies was measured using a standard text mining metric known as F-score, which is the harmonic mean of precision (positive predictive value) and recall (sensitivity). Some studies ( N  = 5, 19 %) reported only the precision of their method, while some reported the accuracy values ( N  = 2, 8 %). One study (4 %) reported P5 precision, which indicates the fraction of positive predictions among the top 5 results returned by the system.

Studies that did not implement a data extraction system

Dawes et al. [ 12 ] identified 20 evidence-based medicine journal synopses with 759 extracts in the corresponding PubMed abstracts. Annotators agreed with the identification of an element 85 and 87 % for the evidence-based medicine synopses and PubMed abstracts, respectively. After consensus among the annotators, agreement rose to 97 and 98 %, respectively. The authors proposed various lexical patterns and developed rules to discover each PECODR element from the PubMed abstracts and the corresponding evidence-based medicine journal synopses that might make it possible to partially or fully automate the data extraction process.

Studies that identified sentences but did not extract data elements from abstracts only

Kim et al. [ 13 ] used conditional random fields (CRF) [ 15 ] for the task of classifying sentences in one of the PICO categories. The features were based on lexical, syntactic, structural, and sequential information in the data. The authors found that unigrams, section headings, and sequential information from preceding sentences were useful features for the classification task. They used 1000 medical abstracts from PIBOSO corpus and achieved micro-averaged F-scores of 91 and 67 % over datasets of structured and unstructured abstracts, respectively.

Boudin et al. [ 16 ] utilized a combination of multiple supervised classification techniques for detecting PICO elements in the medical abstracts. They utilized features such as MeSH semantic types, word overlap with title, number of punctuation marks on random forests (RF), naive Bayes (NB), support vector machines (SVM), and multi-layer perceptron (MLP) classifiers. Using 26,000 abstracts from PubMed, the authors took the first sentence in the structured abstracts and assigned a label automatically to build a large training data. They obtained an F-score of 86 % for identifying participants (P), 67 % for interventions (I) and controls (C), and 56 % for outcomes (O).

Huang et al. [ 17 ] used a naive Bayes classifier for the PICO classification task. The training data were generated automatically from the structured abstracts. For instance, all sentences in the section of the structured abstract that started with the term “PATIENT” were used to identify participants (P). In this way, the authors could generate a dataset of 23,472 sentences. Using 23,472 sentences from the structured abstracts, they obtained an F-score of 91 % for identifying participants (P), 75 % for interventions (I), and 88 % for outcomes (O).

Verbeke et al. [ 18 ] used a statistical relational learning-based approach (kLog) that utilized relational features for classifying sentences. The authors also used the PIBOSO corpus for evaluation and achieved micro-averaged F-score of 84 % on structured abstracts and 67 % on unstructured abstracts, which was a better performance than Kim et al. [ 13 ].

Huang et al. [ 19 ] used 19,854 structured extracts and trained two classifiers: one by taking the first sentences of each section (termed CF by the authors) and the other by taking all the sentences in each section (termed CA by the authors). The authors used the naive Bayes classifier and achieved F-scores of 74, 66, and 73 % for identifying participants (P), interventions (I), and outcomes (O), respectively, by the CF classifier. The CA classifier gave F-scores of 73, 73, and 74 % for identifying participants (P), interventions (I), and outcomes (O), respectively.

Hassanzadeh et al. [ 20 ] used the PIBOSO corpus for the identification of sentences with PIBOSO elements. Using conditional random fields (CRF) with discriminative set of features, they achieved micro-averaged F-score of 91 %.

Robinson [ 21 ] used four machine learning models, 1) support vector machines, 2) naive Bayes, 3) naive Bayes multinomial, and 4) logistic regression to identify medical abstracts that contained patient-oriented evidence or not. These data included morbidity, mortality, symptom severity, and health-related quality of life. On a dataset of 1356 PubMed abstracts, the authors achieved the highest accuracy using a support vector machines learning model and achieved an F-measure of 86 %.

Chung [ 22 ] utilized a full sentence parser to identify the descriptions of the assignment of treatment arms in clinical trials. The authors used predicate-argument structure along with other linguistic features with a maximum entropy classifier. They utilized 203 abstracts from randomized trials for training and 124 abstracts for testing and achieved an F-score of 76 %.

Hara and Matsumoto [ 23 ] dealt with the problem of extracting “patient population” and “compared treatments” from medical abstracts. Given a sentence from the abstract, the authors first performed base noun-phrase chunking and then categorized the base noun-phrase into one of the five classes: “disease”, “treatment”, “patient”, “study”, and “others” using support vector machine and conditional random field models. After categorization, the authors used regular expression to extract the target words for patient population and comparison. The authors used 200 abstracts including terms such as “neoplasms” and “clinical trial, phase III” and obtained 91 % accuracy for the task of noun phrase classification. For sentence classification, the authors obtained a precision of 80 % for patient population and 82 % for comparisons.

Studies that identified only sentences but did not extract data elements from full-text reports

Zhao et al. [ 24 ] used two classification tasks to extract study data including patient details, including one at the sentence level and another at the keyword level. The authors first used a five-class scheme including 1) patient, 2) result, 3) intervention, 4) study design, and 5) research goal and tried to classify sentences into one of these five classes. They further used six classes for keywords such as sex (e.g., male, female), age (e.g., 54-year-old), race (e.g., Chinese), condition (e.g., asthma), intervention, and study design (e.g., randomized trial). They utilized conditional random fields for the classification task. Using 19,893 medical abstracts and full-text articles from 17 journal websites, they achieved F-scores of 75 % for identifying patients, 61 % for intervention, 91 % for results, 79 % for study design, and 76 % for research goal.

Hsu et al. [ 25 ] attempted to classify whether a sentence contains the “hypothesis”, “statistical method”, “outcomes”, or “generalizability” of the study and then extracted the values. Using 42 full-text papers, the authors obtained F-scores of 86 % for identifying hypothesis, 84 % for statistical method, 90 % for outcomes, and 59 % for generalizability.

Song et al. [ 26 ] used machine learning-based classifiers such as maximum entropy classifier (MaxEnt), support vector machines (SVM), multi-layer perceptron (MLP), naive Bayes (NB), and radial basis function network (RBFN) to classify the sentences into categories such as analysis (statistical facts found by clinical experiment), general (generally accepted scientific facts, process, and methodology), recommendation (recommendations about interventions), and rule (guidelines). They utilized the principle of information gain (IG) as well as genetic algorithm (GA) for feature selection. They used 346 sentences from the clinical guideline document and obtained an F-score of 98 % for classifying sentences.

Marshall et al. [ 27 ] used soft-margin support vector machines in a joint model for risk of bias assessment along with supporting sentences for random sequence generation, allocation concealment, blinding of participants and personnel, and blinding of outcome assessment, among others. They utilized presence of unigrams in the supporting sentences as features in their model. Working with full text of 2200 clinical trials, the joint model achieved F-scores of 56, 48, 35, and 38 % for identifying sentences corresponding to random sequence generation, allocation concealment, blinding of participants and personnel, and blinding of outcome assessment, respectively.

Studies that identified data elements only from abstracts but not from full texts

Demner-Fushman and Lin [ 28 ] used a rule-based approach to identify sentences containing PICO. Using 275 manually annotated abstracts, the authors achieved an accuracy of 80 % for population extraction and 86 % for problem extraction. They also utilized a supervised classifier for outcome extraction and achieved accuracy from 64 to 95 % across various experiments.

Kelly and Yang [ 29 ] used regular expressions and gazetteer to extract the number of participants, participant age, gender, ethnicity, and study characteristics. The authors utilized 386 abstracts from PubMed obtained with the query “soy and cancer” and achieved F-scores of 96 % for identifying the number of participants, 100 % for age of participants, 100 % for gender of participants, 95 % for ethnicity of participants, 91 % for duration of study, and 87 % for health status of participants.

Hansen et al. [ 30 ] used support vector machines [ 31 ] to extract number of trial participants from abstracts of the randomized control trials. The authors utilized features such as part-of-speech tag of the previous and next words and whether the sentence is grammatically complete (contained a verb). Using 233 abstracts from PubMed, they achieved an F-score of 86 % for identifying participants.

Xu et al. [ 32 ] utilized text classifications augmented with hidden Markov models [ 33 ] to identify sentences about subject demographics. These sentences were then parsed to extract information regarding participant descriptors (e.g., men, healthy, elderly), number of trial participants, disease/symptom name, and disease/symptom descriptors. After testing over 250 RCT abstracts, the authors obtained an accuracy of 83 % for participant descriptors: 83 %, 93 % for number of trial participants, 51 % for diseases/symptoms, and 92 % for descriptors of diseases/symptoms.

Summerscales et al. [ 34 ] used a conditional random field-based approach to identify various named entities such as treatments (drug names or complex phrases) and outcomes. The authors extracted 100 abstracts of randomized trials from the BMJ and achieved F-scores of 49 % for identifying treatment, 82 % for groups, and 54 % for outcomes.

Summerscales et al. [ 35 ] also proposed a method for automatic summarization of results from the clinical trials. The authors first identified the sentences that contained at least one integer (group size, outcome numbers, etc.). They then used the conditional random field classifier to find the entity mentions corresponding to treatment groups or outcomes. The treatment groups, outcomes, etc. were then treated as various “events.” To identify all the relevant information for these events, the authors utilized templates with slots. The slots were then filled using a maximum entropy classifier. They utilized 263 abstracts from the BMJ and achieved F-scores of 76 % for identifying groups, 42 % for outcomes, 80 % for group sizes, and 71 % for outcome numbers.

Studies that identified data elements from full-text reports

Kiritchenko et al. [ 36 ] developed ExaCT, a tool that assists users with locating and extracting key trial characteristics such as eligibility criteria, sample size, drug dosage, and primary outcomes from full-text journal articles. The authors utilized a text classifier in the first stage to recover the relevant sentences. In the next stage, they utilized extraction rules to find the correct solutions. The authors evaluated their system using 50 full-text articles describing randomized trials with 1050 test instances and achieved a P5 precision of 88 % for identifying the classifier. Precision and recall of their extraction rules was found to be 93 and 91 %, respectively.

Restificar et al. [ 37 ] utilized latent Dirichlet allocation [ 38 ] to infer the latent topics in the sample documents and then used logistic regression to compute the probability that a given candidate criterion belongs to a particular topic. Using 44,203 full-text reports of randomized trials, the authors achieved accuracies of 75 and 70 % for inclusion and exclusion criteria, respectively.

Lin et al. [ 39 ] used linear-chain conditional random field for extracting various metadata elements such as number of patients, age group of the patients, geographical area, intervention, and time duration of the study. Using 93 full-text articles, the authors achieved a threefold cross validation precision of 43 % for identifying number of patients, 63 % for age group, 44 % for geographical area, 40 % for intervention, and 83 % for time period.

De Bruijn et al. [ 40 ] used support vector machine classifier to first identify sentences describing information elements such as eligibility criteria, sample size, etc. The authors then used manually crafted weak extraction rules to extract various information elements. Testing this two-stage architecture on 88 randomized trial reports, they obtained a precision of 69 % for identifying eligibility criteria, 62 % for sample size, 94 % for treatment duration, 67 % for intervention, 100 % for primary outcome estimates, and 67 % for secondary outcomes.

Zhu et al. [ 41 ] also used manually crafted rules to extract various subject demographics such as disease, age, gender, and ethnicity. The authors tested their method on 50 articles and for disease extraction obtained an F-score of 64 and 85 % for exactly matched and partially matched cases, respectively.

Risk of bias across studies

In general, many studies have a high risk of selection bias because the gold standards used in the respective studies were not randomly selected. The risk of performance bias is also likely to be high because the investigators were not blinded. For the systems that used rule-based approaches, it was unclear whether the gold standard was used to train the rules or if there were a separate training set. The risk of attrition bias is unclear based on the study design of these non-randomized studies evaluating the performance of NLP methods. Lastly, the risk of reporting bias is unclear because of the lack of protocols in the development, implementation, and evaluation of NLP methods.

Summary of evidence

Extracting the data elements.

  • Participants — Sixteen studies explored the extraction of the number of participants [ 12 , 13 , 16 – 20 , 23 , 24 , 28 – 30 , 32 , 39 ], their age [ 24 , 29 , 39 , 41 ], sex [ 24 , 39 ], ethnicity [ 41 ], country [ 24 , 39 ], comorbidities [ 21 ], spectrum of presenting symptoms, current treatments, and recruiting centers [ 21 , 24 , 28 , 29 , 32 , 41 ], and date of study [ 39 ]. Among them, only six studies [ 28 – 30 , 32 , 39 , 41 ] extracted data elements as opposed to highlighting the sentence containing the data element. Unfortunately, each of these studies used a different corpus of reports, which makes direct comparisons impossible. For example, Kelly and Yang [ 29 ] achieved high F-scores of 100 % for age of participants, 91 % for duration of study, 95 % for ethnicity of participants, 100 % for gender of subjects, 87 % for health status of participants, and 96 % for number of participants on a dataset of 386 abstracts.
  • Intervention — Thirteen studies explored the extraction of interventions [ 12 , 13 , 16 – 20 , 22 , 24 , 28 , 34 , 39 , 40 ], intervention groups [ 34 , 35 ], and intervention details (for replication if feasible) [ 36 ]. Of these, only six studies [ 28 , 34 – 36 , 39 , 40 ] extracted intervention elements. Unfortunately again, each of these studies used a different corpus. For example, Kiritchenko et al. [ 36 ] achieved an F-score of 75–86 % for intervention data elements on a dataset of 50 full-text journal articles.
  • Outcomes and comparisons — Fourteen studies also explored the extraction of outcomes and time points of collection and reporting [ 12 , 13 , 16 – 20 , 24 , 25 , 28 , 34 – 36 , 40 ] and extraction of comparisons [ 12 , 16 , 22 , 23 ]. Of these, only six studies [ 28 , 34 – 36 , 40 ] extracted the actual data elements. For example, De Bruijn et al. [ 40 ] obtained an F-score of 100 % for extracting primary outcome and 67 % for secondary outcome from 88 full-text articles. Summerscales [ 35 ] utilized 263 abstracts from the BMJ and achieved an F-score of 42 % for extracting outcomes.
  • Results — Two studies [ 36 , 40 ] extracted sample size data element from full text on two different data sets. De Bruijn et al. [ 40 ] obtained an accuracy of 67 %, and Kiritchenko et al. [ 36 ] achieved an F-score of 88 %.
  • Interpretation — Three studies explored extraction of overall evidence [ 26 , 42 ] and external validity of trial findings [ 25 ]. However, all these studies only highlighted sentences containing the data elements relevant to interpretation.
  • Objectives — Two studies [ 24 , 25 ] explored the extraction of research questions and hypotheses. However, both these studies only highlighted sentences containing the data elements relevant to interpretation.
  • Methods — Twelve studies explored the extraction of the study design [ 13 , 18 , 20 , 24 ], study duration [ 12 , 29 , 40 ], randomization method [ 25 ], participant flow [ 36 , 37 , 40 ], and risk of bias assessment [ 27 ]. Of these, only four studies [ 29 , 36 , 37 , 40 ] extracted the corresponding data elements from text using different sets of corpora. For example, Restificar et al. [ 37 ] utilized 44,203 full-text clinical trial articles and achieved accuracies of 75 and 70 % for inclusion and exclusion criteria, respectively.
  • Miscellaneous — One study [ 26 ] explored extraction of key conclusion sentence and achieved a high F-score of 98 %.

Related reviews and studies

Previous reviews on the automation of systematic review processes describe technologies for automating the overall process or other steps. Tsafnat et al. [ 43 ] surveyed the informatics systems that automate some of the tasks of systematic review and report systems for each stage of systematic review. Here, we focus on data extraction. None of the existing reviews [ 43 – 47 ] focus on the data extraction step. For example, Tsafnat et al. [ 43 ] presented a review of techniques to automate various aspects of systematic reviews, and while data extraction has been described as a task in their review, they only highlighted three studies as an acknowledgement of the ongoing work. In comparison, we identified 26 studies and critically examined their contribution in relation to all the data elements that need to be extracted to fully support the data extraction step.

Thomas et al. [ 44 ] described the application of text mining technologies such as automatic term recognition, document clustering, classification, and summarization to support the identification of relevant studies in systematic reviews. The authors also pointed out the potential of these technologies to assist at various stages of the systematic review. Slaughter et al. [ 45 ] discussed necessary next steps towards developing “living systematic reviews” rather than a static publication, where the systematic reviews can be continuously updated with the latest knowledge available. The authors mentioned the need for development of new tools for reporting on and searching for structured data from clinical trials.

Tsafnat et al. [ 46 ] described four main tasks in systematic review: identifying the relevant studies, evaluating risk of bias in selected trials, synthesis of the evidence, and publishing the systematic reviews by generating human-readable text from trial reports. They mentioned text extraction algorithms for evaluating risk of bias and evidence synthesis but remain limited to one particular method for extraction of PICO elements.

Most natural language processing research has focused on reducing the workload for the screening step of systematic reviews (Step 3). Wallace et al. [ 48 , 49 ] and Miwa et al. [ 50 ] proposed an active learning framework to reduce the workload in citation screening for inclusion in the systematic reviews. Jonnalagadda et al. [ 51 ] designed a distributional semantics-based relevance feedback model to semi-automatically screen citations. Cohen et al. [ 52 ] proposed a module for grouping studies that are closely related and an automated system to rank publications according to the likelihood for meeting the inclusion criteria of a systematic review. Choong et al. [ 53 ] proposed an automated method for automatic citation snowballing to recursively pursue relevant literature for helping in evidence retrieval for systematic reviews. Cohen et al. [ 54 ] constructed a voting perceptron-based automated citation classification system to classify each article as to whether it contains high-quality, drug-specific evidence. Adeva et al. [ 55 ] also proposed a classification system for screening articles for systematic review. Shemilt et al. [ 56 ] also discussed the use of text mining to reduce screening workload in systematic reviews.

Research implications

No standard gold standards or dataset.

Among the 26 studies included in this systematic review, only three of them use a common corpus, namely 1000 medical abstracts from the PIBOSO corpus. Unfortunately, even that corpus facilitates only classification of sentences into whether they contain one of the data elements corresponding to the PIBOSO categories. No two other studies shared the same gold standard or dataset for evaluation. This limitation made it impossible for us to compare and assess the relative significance of the reported accuracy measures.

Separate systems for each data element

Few data elements, which are also relatively straightforward to extract automatically, such as the total number of participants (14 overall and 5 for extracting the actual data elements), have a relatively higher number of studies aiming towards extracting the same data element. This is not the case with other data elements. There are 27 out of 52 potential data elements that have not been explored for automated extraction, even if for highlighting the sentences containing them; seven more data elements were explored just by one study. There are 38 out of 52 potential data elements (>70 %) that have not been explored for automated extraction of the actual data elements; three more data elements were explored just by one study. The highest number of data elements extracted by a single study is only seven (14 %). This finding means that not only are more studies needed to explore the remaining 70 % data elements, but that there is an urgent need for a unified framework or system to extract all necessary data elements. The current state of informatics research for data extraction is exploratory, and multiple studies need to be conducted using the same gold standard and on the extraction of the same data elements for effective comparison.

Limitations

Our study has limitations. First, there is a possibility that data extraction algorithms were not published in journals or that our search might have missed them. We sought to minimize this limitation by searching in multiple bibliographic databases, including PubMed, IEEExplore, and ACM Digital Library. However, investigators may have also failed to publish algorithms that had lower F-scores than were previously reported, which we would not have captured. Second, we did not publish a protocol a priori, and our initial findings may have influenced our methods. However, we performed key steps, including screening, full-text review, and data extraction in duplicate to minimize potential bias in our systematic review.

Future work

“On demand” access to summarized evidence and best practices has been considered a sound strategy to satisfy clinicians’ information needs and enhance decision-making [ 57 – 65 ]. A systematic review of 26 studies concluded that information-retrieval technology produces positive impact on physicians in terms of decision enhancement, learning, recall, reassurance, and confirmation [ 62 ]. Slaughter et al. [ 45 ] discussed necessary next steps towards developing “living systematic reviews” rather than a static publication, where the systematic reviews can be continuously updated with the latest knowledge available. The authors mention the need for development of new tools for reporting on and searching for structured data from published literature. Automated information extraction framework that extract data elements have the potential to assist the systematic reviewers and to eventually automate the screening and data extraction steps.

Medical science is currently witnessing a rapid pace at which medical knowledge is being created—75 clinical trials a day [ 66 ]. Evidence-based medicine [ 67 ] requires clinicians to keep up with published scientific studies and use them at the point of care. However, it has been shown that it is practically impossible to do that even within a narrow specialty [ 68 ]. A critical barrier is that finding relevant information, which may be located in several documents, takes an amount of time and cognitive effort that is incompatible with the busy clinical workflow [ 69 , 70 ]. Rapid systematic reviews using automation technologies will enable clinicians with up-to-date and systematic summaries of the latest evidence.

Our systematic review describes previously reported methods to identify sentences containing some of the data elements for systematic reviews and only a few studies that have reported methods to extract these data elements. However, most of the data elements that would need to be considered for systematic reviews have been insufficiently explored to date, which identifies a major scope for future work. We hope that these automated extraction approaches might first act as checks for manual data extraction currently performed in duplicate; then serve to validate manual data extraction done by a single reviewer; then become the primary source for data element extraction that would be validated by a human; and eventually completely automate data extraction to enable living systematic reviews.

Abbreviations

Search strategies.

Below, we provide the search strategies used in PubMed, ACM Digital Library, and IEEExplore. The search was conducted on January 6, 2015.

(“identification” [Title] OR “extraction” [Title] OR “extracting” [Title] OR “detection” [Title] OR “identifying” [Title] OR “summarization” [Title] OR “learning approach” [Title] OR “automatically” [Title] OR “summarization” [Title] OR “identify sections” [Title] OR “learning algorithms” [Title] OR “Interpreting” [Title] OR “Inferring” [Title] OR “Finding” [Title] OR “classification” [Title]) AND (“medical evidence”[Title] OR “PICO”[Title] OR “PECODR” [Title] OR “intervention arms” [Title] OR “experimental methods” [Title] OR “study design parameters” [Title] OR “Patient oriented Evidence” [Title] OR “eligibility criteria” [Title] OR “clinical trial characteristics” [Title] OR “evidence based medicine” [Title] OR “clinically important elements” [Title] OR “evidence based practice” [Title] “results from clinical trials” [Title] OR “statistical analyses” [Title] OR “research results” [Title] OR “clinical evidence” [Title] OR “Meta Analysis” [Title] OR “Clinical Research” [Title] OR “medical abstracts” [Title] OR “clinical trial literature” [Title] OR ”clinical trial characteristics” [Title] OR “clinical trial protocols” [Title] OR “clinical practice guidelines” [Title]).

We performed this search only in the metadata.

(“identification” OR “extraction” OR “extracting” OR “detection” OR “Identifying” OR “summarization” OR “learning approach” OR “automatically” OR “summarization” OR “identify sections” OR “learning algorithms” OR “Interpreting” OR “Inferring” OR “Finding” OR “classification”) AND (“medical evidence” OR “PICO” OR “intervention arms” OR “experimental methods” OR “eligibility criteria” OR “clinical trial characteristics” OR “evidence based medicine” OR “clinically important elements” OR “results from clinical trials” OR “statistical analyses” OR “clinical evidence” OR “Meta Analysis” OR “clinical research” OR “medical abstracts” OR “clinical trial literature” OR “clinical trial protocols”).

ACM digital library

((Title: “identification” or Title: “extraction” or Title: “extracting” or Title: “detection” or Title: “Identifying” or Title: “summarization” or Title: “learning approach” or Title: “automatically” or Title: “summarization “or Title: “identify sections” or Title: “learning algorithms” or Title: “scientific artefacts” or Title: “Interpreting” or Title: “Inferring” or Title: “Finding” or Title: “classification” or “statistical techniques”) and (Title: “medical evidence” or Abstract: “medical evidence” or Title: “PICO” or Abstract: “PICO” or Title: “intervention arms” or Title: “experimental methods” or Title: “study design parameters” or Title: “Patient oriented Evidence” or Abstract: “Patient oriented Evidence” or Title: “eligibility criteria” or Abstract: “eligibility criteria” or Title: “clinical trial characteristics” or Abstract: “clinical trial characteristics” or Title: “evidence based medicine” or Abstract: “evidence based medicine” or Title: “clinically important elements” or Title: “evidence based practice” or Title: “treatments” or Title: “groups” or Title: “outcomes” or Title: “results from clinical trials” or Title: “statistical analyses” or Abstract: “statistical analyses” or Title: “research results” or Title: “clinical evidence” or Abstract: “clinical evidence” or Title: “Meta Analysis” or Abstract:“Meta Analysis” or Title:“Clinical Research” or Title: “medical abstracts” or Title: “clinical trial literature” or Title: “Clinical Practice” or Title: “clinical trial protocols” or Abstract: “clinical trial protocols” or Title: “clinical questions” or Title: “clinical trial design”)).

Checklist of items to consider in data collection or data extraction from Cochrane Handbook [ 1 ]

Items without parentheses should normally be collected in all reviews; items in square brackets may be relevant to some reviews and not to others

a Full description required for standard items in the ‘Risk of bias’ tool

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

SRJ and PG had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design were done by SRJ. SRJ, PG, and MDH did the acquisition, analysis, or interpretation of data. SRJ and PG drafted the manuscript. SRJ, PG, and MDH did the critical revision of the manuscript for important intellectual content. SRJ obtained funding. PG and SRJ provided administrative, technical, or material support. SRJ did the study supervision. All authors read and approved the final manuscript.

Funding/Support

This project was partly supported by the National Library of Medicine (grant 5R00LM011389). The Cochrane Heart Group US Satellite at Northwestern University is supported by an intramural grant from the Northwestern University Feinberg School of Medicine.

Role of the sponsors

The funding source had no role in the design and conduct of the study; collection, management, analysis, or interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Library of Medicine.

Additional contributions

Mark Berendsen (Research Librarian, Galter Health Sciences Library, Northwestern University Feinberg School of Medicine) provided insights on the design of this study, including the search strategies, and Dr. Kalpana Raja (Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine) reviewed the manuscript. None of them received compensation for their contributions.

Contributor Information

Siddhartha R. Jonnalagadda, Email: ude.nretsewhtron@dis .

Pawan Goyal, Email: ni.tenre.pgktii.esc@gnawap .

Mark D. Huffman, Email: ude.nretsewhtron@namffuh-m .

Automating data extraction in systematic reviews: a systematic review

Affiliations.

  • 1 Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, 750 North Lake Shore Drive, 11th Floor, Chicago, IL, 60611, USA. [email protected].
  • 2 Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, 721302, West Bengal, India. [email protected].
  • 3 Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, USA. [email protected].
  • PMID: 26073888
  • PMCID: PMC4514954
  • DOI: 10.1186/s13643-015-0066-7

Background: Automation of the parts of systematic review process, specifically the data extraction step, may be an important strategy to reduce the time necessary to complete a systematic review. However, the state of the science of automatically extracting data elements from full texts has not been well described. This paper performs a systematic review of published and unpublished methods to automate data extraction for systematic reviews.

Methods: We systematically searched PubMed, IEEEXplore, and ACM Digital Library to identify potentially relevant articles. We included reports that met the following criteria: 1) methods or results section described what entities were or need to be extracted, and 2) at least one entity was automatically extracted with evaluation results that were presented for that entity. We also reviewed the citations from included reports.

Results: Out of a total of 1190 unique citations that met our search criteria, we found 26 published reports describing automatic extraction of at least one of more than 52 potential data elements used in systematic reviews. For 25 (48 %) of the data elements used in systematic reviews, there were attempts from various researchers to extract information automatically from the publication text. Out of these, 14 (27 %) data elements were completely extracted, but the highest number of data elements extracted automatically by a single study was 7. Most of the data elements were extracted with F-scores (a mean of sensitivity and positive predictive value) of over 70 %.

Conclusions: We found no unified information extraction framework tailored to the systematic review process, and published reports focused on a limited (1-7) number of data elements. Biomedical natural language processing techniques have not been fully utilized to fully or even partially automate the data extraction step of systematic reviews.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Systematic Review
  • Data Mining / methods*
  • Information Storage and Retrieval
  • Publishing*
  • Research Report
  • Review Literature as Topic*

Grants and funding

  • R00 LM011389/LM/NLM NIH HHS/United States
  • 5R00LM011389/LM/NLM NIH HHS/United States

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Study Protocol

The role of sociodemographic factors on the acceptability of digital mental health care: A scoping review protocol

Roles Conceptualization, Investigation, Project administration, Visualization, Writing – original draft

Affiliations School of Rehabilitation, University of Montréal, Montréal, Québec, Canada, Youth Mental Health and Technology Lab, University of Montréal Hospital Research Centre, Montréal, Québec, Canada

ORCID logo

Roles Conceptualization, Project administration, Resources, Supervision, Writing – review & editing

* E-mail: [email protected]

Affiliations School of Rehabilitation, University of Montréal, Montréal, Québec, Canada, Youth Mental Health and Technology Lab, University of Montréal Hospital Research Centre, Montréal, Québec, Canada, Douglas Research Centre, Montréal, Québec, Canada

  • Nagi Abouzeid, 
  • Shalini Lal

PLOS

  • Published: April 26, 2024
  • https://doi.org/10.1371/journal.pone.0301886
  • Peer Review
  • Reader Comments

Table 1

Introduction

Many individuals experiencing mental health complications face barriers when attempting to access services. To bridge this care gap, digital mental health innovations (DMHI) have proven to be valuable additions to in-person care by enhancing access to care. An important aspect to consider when evaluating the utility of DMHI is perceived acceptability. However, it is unclear whether diverse sociodemographic groups differ in their degree of perceived acceptability of DMHI.

This scoping review aims to synthesize evidence on the role of sociodemographic factors (e.g., age, gender) in the perceived acceptability of DMHI among individuals seeking mental health care.

Guided by the JBI Manual of Evidence Synthesis, chapter on Scoping Review, a search strategy developed according to the PCC framework will be implemented in MEDLINE and then adapted to four electronic databases (i.e., CINAHL, MEDLINE, PsycINFO, and EMBASE). The study selection strategy will be piloted by two reviewers on subsets of 30 articles until agreement among reviewers reaches 90%, after which one reviewer will complete the remaining screening of titles and abstracts. The full-text screening, data extraction strategy, and charting tool will be completed by one reviewer and then validated by a second member of the team. Main findings will be presented using tables and figures.

Expected contributions

This scoping review will examine the extent to which sociodemographic factors have been considered in the digital mental health literature. Also, the proposed review may help determine whether certain populations have been associated with a lower level of acceptability within the context of digital mental health care. This investigation aims to favor equitable access to DMHI among diverse populations.

Citation: Abouzeid N, Lal S (2024) The role of sociodemographic factors on the acceptability of digital mental health care: A scoping review protocol. PLoS ONE 19(4): e0301886. https://doi.org/10.1371/journal.pone.0301886

Editor: Maher Abdelraheim Titi, King Saud University Medical City, SAUDI ARABIA

Received: September 22, 2023; Accepted: March 22, 2024; Published: April 26, 2024

Copyright: © 2024 Abouzeid, Lal. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: No datasets were generated or analysed during the current study. All relevant data from this study will be made available upon study completion.

Funding: The authors received no specific funding for this work.

Competing interests: SL is an Associate Professor at the University of Montreal and leads a research program in the field of digital mental health. In the past 5 years SL has received research funding from the Canadian Institutes of Health Research, Canada Research Chairs program, Hoffman-La Roche, Foundation of Stars to advance this work. All of these are unrelated to this specific study. NA is a graduate student at the University of Montreal and is conducting this work towards partial fulfillment of requirements for a Masters of Science degree under the supervision of SL. This does not alter our adherence to PLOS ONE policies on sharing data and materials

Many living with mental health complications face barriers when attempting to access mental health services (e.g., limited availability of services, costs) [ 1 ]. To bridge the existing treatment gap, digital mental health innovations (DMHI) have proven to be valuable additions to standard treatment due to their ability to enhance access to care for those living with precarious mental health conditions (e.g., anxiety, depression) [ 2 – 7 ]. DMHI are digital health technologies used to assess, prevent, or treat mental health difficulties [ 8 ]. An important aspect to consider when evaluating the utility of DMHI is perceived acceptability. Acceptability broadly refers to the way individuals perceive, and experience a given digital innovation; it is also a significant predictor of an intervention’s effectiveness [ 9 ].

Over the past decade, reviews of the literature, including systematic reviews, have been conducted to synthesize the available evidence on acceptability in the context of digital health care [ 10 – 16 ]. However, these reviews focused on specific populations (e.g., adults with depression and college students) [ 10 – 12 , 14 , 16 ], included interventions beyond those relating to mental health (e.g., sexual health) [ 14 ], and did not report on associations between sociodemographic characteristics and perceived acceptability [ 10 – 15 ].

As demonstrated, there have been limited efforts to systematically investigate the role of sociodemographic characteristics on the perceived acceptability of DMHI among diverse populations. Specifically, it is unclear whether there are any discernible trends of lower or higher levels of perceived acceptability among various sociodemographic groups within the context of digital mental health care. Such a synthesis is important, considering the impact of perceived acceptability in sustaining user engagement [ 9 ]. Research has shown that a greater level of engagement with DMHI is significantly associated with improved mental health outcomes [ 17 ]. Therefore, to enhance user engagement and maximize therapeutic outcomes, it is important to gain a deeper understanding of acceptability and its associated factors within the context of digital mental health care.

The proposed scoping review (registered in OSF: osf.io/dvr53) intends to synthesize research that reports on associations between sociodemographic characteristics and the perceived acceptability of DMHI. In the context of this review, the term “association” extends beyond its statistical sense and refers to the presence of qualitative patterns of association between specific sociodemographic groups, and their perceived acceptability of DMHI. This review will help to determine whether a more systematic investigation on the topic, such as a meta-analysis, is needed. Moreover, we may find that certain sociodemographic characteristics have been inadequately accounted for in the development and delivery of DMHI. This undertaking is aligned with a broader objective aiming to ensure underserved populations have access to DMHI that they perceive as acceptable.

Numerous factors informed the decision to opt for a scoping review as opposed to another type of review. First, a scoping review can be used to investigate broad topics of inquiry without requiring a quality assessment of the studies that are discussed [ 18 ]. In contrast, a systematic review will generally seek to answer a more specific research question and will require an assessment of the methodological rigor of the studies included [ 18 ]. Considering these differences, a scoping review is a more suitable option due to the broad scope of the proposed investigation. Moreover, it is difficult to evaluate the feasibility of conducting a full systematic review without first conducting a preliminary mapping of the literature that would help justify such an undertaking.

Objective and review questions

The primary objective of the proposed review is to synthesize evidence on the role of sociodemographic characteristics on the degree of perceived acceptability of DMHI. The research questions listed below have been informed by the primary objective of the proposed scoping review as well as the population, concept, and context (PCC) framework proposed by the Joanna Briggs Institute (JBI) manual of Evidence Synthesis [ 19 ]:

  • What are the different sociodemographic factors that have been considered in the evaluation of perceived acceptability of DMHI?
  • Which sociodemographic factors have been positively, negatively, or not associated with the perceived acceptability of DMHI?
  • Which DMHI have been positively, negatively, or not associated with perceived acceptability among specific sociodemographic groups? (Please see the Study Selection and Extraction section below for more details on how the various DMHI will be categorized)

Eligibility criteria

The PCC framework proposed by the Joanna Briggs Institute (JBI) manual of Evidence Synthesis [ 19 ] will inform the inclusion and exclusion criteria ( Table 1 ) of the proposed scoping review.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0301886.t001

Population.

The proposed scoping review will include studies conducted with human participants. To be considered for inclusion, participants must receive mental health care through the use of DMHI. The proposed scoping review will consider six different sociodemographic dimensions: age (e.g., youth, adult, and elderly), race (e.g., Arabic, Asian, and White), gender identity (e.g., male, female, and genderfluid), sexual orientation (e.g., heterosexual, homosexual, and bisexual), highest level of education completed (e.g., primary school, high school, and bachelor’s degree), and health status (e.g., depression, schizophrenia).

Moreover, for studies to be considered eligible, they must either report on the relative acceptability among participants (e.g., women demonstrating a higher level of acceptability compared to men) or the absolute acceptability (e.g., women rating the innovation with a 4 out of 5 in terms of perceived acceptability). Regarding absolute acceptability, we will consider studies as eligible if the perceived acceptability of subgroups within the larger sample is reported or if the study sample is homogeneous for at least one of the sociodemographic factors of interest.

In this scoping review, the concept of interest is acceptability, a multidimensional notion with various definitions [ 20 ]. Acceptability is broadly defined as the way individuals perceive and experience a given digital health innovation [ 9 ]. More specifically, two important factors contributing to the perceived acceptability of a given technology are perceived usefulness and ease of use [ 21 ]. To be considered for inclusion in the proposed review, the studies must focus on examining the perceived acceptability (as defined by the author(s) of the studies being reviewed), utility, or ease of use of DMHI. Moreover, to be considered for inclusion in the proposed review, acceptability must be evaluated retrospectively. Therefore, studies that evaluate the prospective acceptability of DMHI will not be included in the proposed review. For instance, studies that assess the acceptability of a digital technology among potential users, who have not yet engaged with the innovation, will be excluded from the proposed review. In addition, to be considered for inclusion, acceptability must be evaluated through quantitative methods (e.g., questionnaires and scales), these may or may not have been validated. Studies that examine the perceived acceptability of users entirely through qualitative methods (e.g., focus group discussions, open-ended questions) will not be considered for inclusion in the proposed review.

Moreover, the concept of acceptability is person-centered. Therefore, any studies that define or assess acceptability without considering the perspective of the user will be excluded from the proposed review. For instance, any study that examines the theoretical utility of a technology from the perspective of clinicians as opposed to the perceived utility of a technology from the perspective of users will be excluded from the proposed review. The purpose of these eligibility criteria is to establish clear boundaries for the broad concept of acceptability and to ensure a consistent conceptualization across the included studies. Furthermore, the definition of acceptability, as proposed in each of the included studies, will be reported to ensure that readers have a clear understanding of the concept presented.

In this scoping review, the focus will be on DMHI delivered in both clinical settings (e.g., hospitals) and non-clinical settings (e.g., community). DMHI refer to the use of technology in the assessment, prevention, and treatment of mental health difficulties [ 8 ]. To be considered for inclusion, the mental health intervention must be delivered digitally, such as through a computer, smartphone, or wearable device. There will be no studies excluded based on the specific digital care medium used to deliver the mental health intervention. Some examples of DMHI include, but are not limited to, internet-based cognitive behavioral therapy, chatbots for mental health support, smartphone applications that support ecological momentary assessment, web-based psychotherapeutic platforms, psychosocial assessments through video conferencing solutions, and exposure therapy assisted by virtual reality [ 22 – 24 ].

Interventions delivered in a blended format (i.e., involving in-person and digital components) will only be considered for inclusion if the acceptability of the digital component is assessed independently. For example, studies will not be considered for inclusion if they report a global acceptability score for an intervention comprised of three in-person therapy sessions and two online therapy sessions. However, studies that report on the acceptability of the digital component (i.e., online sessions) independently will be considered for inclusion.

To delimit the broader concept of mental health, interventions that primarily aim to enhance the psychological (e.g., cognitive biases), social (e.g., isolation), and emotional (e.g., sadness) well-being of an individual will be considered for inclusion. Moreover, interventions that target sleep difficulties within mental health populations will be considered for inclusion.

The proposed review will not include DMHI that address both mental health and physical health (e.g., exercise, nutrition), sexual health (e.g., HIV prevention), cognition (e.g., attention, memory), or pain management where the primary objective is not focused on optimizing mental health. For instance, a digital intervention that aims to reduce rumination while promoting physical activity to reduce the risk of cardiovascular diseases would not be included in the proposed review. If this review were to include interventions with an emphasis on health targets beyond those related to mental health, it would be difficult to determine whether the role of sociodemographic factors is associated with mental health care, physical health care, or both. Moreover, digital interventions that aim to improve medication adherence (e.g., SSRI, SNRI) to enhance mental health outcomes will be considered for inclusion. However, digital interventions that aim to improve medication adherence to enhance physical health outcomes (e.g., HIV, diabetes) would not be considered for inclusion.

Any study that does not satisfy the aforementioned eligibility criteria will be excluded from the proposed scoping review.

Types of sources of evidence.

Moreover, there are other notable eligibility criteria unrelated to the PCC framework. First, only materials available in French and English will be included in the proposed scoping review. Second, to ensure the recency and relevance of the synthesized materials, only materials disseminated between January 2013 and June 2023 will be included in the proposed review; this will help ensure that the findings reported are relevant to the present societal context. Any secondary sources (e.g., meta-analysis and systematic reviews), study protocols, commentary or opinion letters, or non peer-reviewed materials will be excluded from the proposed review. Studies with a purely qualitative design will also be excluded from the proposed review. Any study that does not satisfy the aforementioned eligibility criteria will be excluded from the proposed review.

Methodology

The proposed scoping review will be guided by the recommended methodology put forth in the JBI Manual of Evidence Synthesis [ 19 ] and the PRISMA-P checklist ( S2 File ) ( http://www.prisma-statement.org/Extensions/Protocols.aspx ).

Sources of evidence

The information sources that will be used to identify materials pertinent to the proposed review have been selected to ensure the review is as extensive and comprehensive as reasonably possible on the topic of inquiry. Four electronic databases will be examined using distinct search strategies (i.e., PsycInfo through Ovid, MEDLINE through Ovid, CINAHL through EBSCO, and Embase through Ovid). Moreover, the reference list of secondary sources identified as relevant to the topic of inquiry will be examined.

Search strategy

Initially, A1 developed the concept plan for the search strategy in consultation with A2. A search strategy was then developed by A1 and later refined in collaboration with A2 and an experienced librarian at the University of Montreal. As a first step in the development of the search strategy, a preliminary search of MEDLINE and CINAHL was conducted. These databases were selected due to their extensive assortment of literature on health research. This was then followed by the selection of keywords identified in the abstracts and titles of retrieved papers along with indexed terms used to categorize the publications. The keywords and index terms were considered to inform the development of a complete search strategy for MEDLINE ( S1 File ).

The search strategy incorporates three topics of interest, which are acceptability, digital technology, and mental health. For acceptability, one indexed keyword heading (i.e., Patient acceptance of healthcare) (line 1) was searched along with three sets of search terms (line 2, 3 and 4) relating to acceptability (e.g., client acceptance, perceived utility). The indexed keyword and the first three sets of listed terms were then linked with the Boolean operator “OR” (line 5).

For the topic of digital technology, 10 indexed keyword headings (e.g., Internet-based interventions, mobile applications) were searched (lines 6,7,8, 9,10,11, and 12) along with one set of search terms incorporating many concepts related to digital technology (e.g., digital, bots) (line 13). This set of search terms was complemented by a list of keywords proposed by Lal et al. [ 23 ] in a systematic review pertaining to the priority afforded to technology in government-based mental health strategy documents. The indexed keyword headings relating to digital technology along with the related set of search terms were then linked with the Boolean operator “OR” (line 14).

For the topic of mental health, four indexed keyword headings were searched (e.g., mental health, mental disorders) (lines 15,16,17, and 18) along with five sets of search terms relating to mental health (e.g., low mood, mania, bereavement) (Line 19,20,21,22, and 23). These initial sets of search terms were bonified by the mental health disorders listed in the ICD-11 [ 25 ] and DSM-V [ 26 ] along with their associated symptoms. The indexed keyword headings relating to mental health along with the sets of related search terms were then linked with the Boolean operator “OR” (line 24). Lines 5, 14, and 24 were then combined using the Boolean operator “AND” to identify relevant citations that will be considered for inclusion in the proposed scoping review. The search was then further limited by year of publication (i.e., 2013 to current) and language (i.e., French and English). With the guidance of an experienced librarian, A1 will adapt and implement the MEDLINE search strategy to three other electronic databases (i.e., PsycINFO, CINAHL, and Embase).

Study selection and extraction

The inclusion and exclusion criteria outlined above have been established to guide the selection of studies that are concordant with the primary objective of the proposed scoping review. The citations obtained from the literature search will be imported into Covidence, a software used to assist in evidence synthesis. Covidence has the ability to extract citations, remove duplicates, and allows for a multiphase review of citations by up to two reviewers [ 27 ].

First, as proposed in the JBI Manual of Evidence Synthesis [ 19 ], the study selection strategy for titles and abstracts will be piloted on a random subset of citations to ensure that reviewers have an adequate understanding of the eligibility criteria and selection process. The screening process will be guided by the eligibility criteria of the proposed review. To begin, a subset of 30 citations will be selected from the pool of obtained studies. These articles will then be screened independently by two reviewers. Once completed, the reviewers will discuss any discrepancies in the perceived eligibility of studies to determine if the eligibility criteria require any further clarifications. This process will be repeated until the eligibility agreement between reviewers reaches 90%. Once achieved, the complete screening of citations by A1 will begin. The citations that do not contain sufficient information in the titles and abstracts to assess eligibility or those that are deemed eligible will then undergo full-text screening.

Once the screening of titles and abstracts is completed, a three-step process will be used to determine which studies will be included in the proposed scoping review. First, the full-text screening strategy will be piloted on 15 citations selected from the pool of obtained studies. These articles will then be screened independently by two reviewers. Once completed, the reviewers will discuss any discrepancies in the perceived eligibility of studies to determine if the eligibility criteria require any further clarifications. This process will be repeated until the eligibility agreement between reviewers reaches 90%. Once achieved, the complete screening of full-text citations by A1 will begin. A second reviewer will then validate all the studies deemed eligible by A1. If discrepancies arise in the process of eligibility assessment, A1 will meet with the second reviewer to reach a final consensus. If a consensus cannot be reached, a third reviewer will be consulted.

Once eligible articles have been selected, the process of data extraction and charting will begin. To facilitate the process of extracting data and ensure its validity, a legend will be developed. This legend will clearly identify and describe the information that should be included in each category of the charting tool. The categories and their corresponding data will be organized in an Excel sheet. As a first step, we will consider data categories listed in the JBI Manual of Evidence Synthesis [ 19 ]. These include basic descriptive information found within citations (i.e., title, Digital Object Identifier (DOI), author(s), and year of publication), as well as the country of origin, sample size, study design, study objective(s), and key outcomes. Key outcomes will be identified based on their degree of responsiveness to the research questions put forth by the proposed scoping review.

Furthermore, these categories will be further complemented by data relating to the PCC framework and research questions of the proposed review. Considering the population component of the PCC framework, we will extract data relating to the study sample, including their health status (e.g., diagnosed with a major depressive disorder) and reported sociodemographic characteristics (e.g., age, gender, race, sexual orientation). In relation to the concept component of the PCC framework, the conceptualization of acceptability (e.g., perceived ease of use) put forth in the articles along with the method(s) used to evaluate acceptability (e.g., self-report questionnaire) will be extracted. Lastly, considering the context component of the PCC framework, we will extract information on the type of DMHI evaluated (e.g., ICBT), the purpose of the innovation (e.g., assessment), its mental health target (e.g., anxiety), and the technology used (e.g., smartphone).

The data charting tool that will be used is subject to an Iterative process. As data are extracted, additional categories may emerge that warrant inclusion in the charting tool. As recommended in the JBI Manual of Evidence Synthesis [ 19 ], the data charting tool will be piloted by extracting data from 15 studies. This process will help ensure that the data charting tool is comprehensive and easy to use. Two independent reviewers will be involved in the pilot data extraction process. Any discrepancies that may arise during the data extraction or categorization process will be discussed by two reviewers to reach a final consensus. If consensus cannot be reached, a third reviewer will be consulted to examine and resolve any identified discrepancies. Once completed and sufficient concordance among extracted data has been achieved, A1 will begin the data extraction process for the complete list of eligible citations. The extracted data will then be validated by a second reviewer.

Data synthesis and presentation

First, a table will be used to represent the various DMHI that have had their acceptability evaluated through a consideration of sociodemographic characteristics. This table will include data on the various types of DMHI (e.g., ICBT), their mental health targets (e.g., anxiety), their purpose (e.g., assessment), and the technology used (e.g., smartphone). Secondly, a table will be presented to illustrate the different sociodemographic characteristics that have been considered in the evaluation of acceptability of DMHI. The extracted data will be summarized and presented using descriptive statistics (e.g., frequency and percentages).

Finally, key findings on the role of sociodemographic characteristics on the perceived acceptability of DMHI will be presented in tables, including information on whether the association is positive (i.e., high perceived acceptability), neutral (i.e., no association) or negative (i.e., low perceived acceptability). These findings will be categorized based on the technology examined (e.g., smartphone app), sociodemographic factor(s) of interest (e.g., age), and whether they are relative (e.g., men demonstrated a higher level of perceived acceptability than women for this technology) or absolute (e.g., men demonstrate a high level of perceived acceptability for this technology).

For absolute acceptability, the cut-off scores reported in the literature for the various measures used to measure acceptability across studies will be used to indicate whether the reported scores represent high or low acceptability. If such scores are not available, the acceptability scores that have been evaluated using the same measures across studies will be contrasted. For studies that utilized ad-hoc measures to examine the perceived acceptability of users, the author’s interpretation of these scores will be reported in the proposed review. The approach used to interpret the various scores will be finalized depending on how acceptability is being measured across the included studies.

Furthermore, as the data extraction strategy is piloted, additional findings may be reported based on their relevance to the inquiries put forth in the proposed scoping review. Moreover, throughout the screening and extraction processes, notable patterns may emerge across studies, which may not have been specified in this protocol. These may include methodological limitations (e.g., many studies did not report on the socio-demographics of study dropouts), the overrepresentation of specific digital mental health innovations (e.g., virtual reality), or their mental health targets (e.g., depressive symptoms).

Moreover, depending on the volume of findings uncovered (i.e., over 100 studies included), it may be necessary to use additional forms of visual representation, such as graphs and Venn diagrams. These may be used to provide readers with a complementary depiction of findings to facilitate greater comprehension.

Limitations

To the authors’ knowledge, the proposed scoping review will be the first of its kind to synthesize and report on the associations between various sociodemographic characteristics and the perceived acceptability of DMHI. However, such an undertaking is not without limitations.

First, this review will not evaluate the methodological rigor of the included studies. Such an undertaking would help determine the quality of the evidence on the role of sociodemographic characteristics on the acceptability of DMHI. These findings may then potentially be used to demonstrate the necessity for culturally adapted DMHI and guide their future development. To alleviate the impact of this limitation, the study design (e.g., randomized controlled trial, case study) of the included citations will be reported.

Second, considering the diversity of terminology employed in the field of digital mental health care [ 28 ], it is possible that the proposed search strategy does not capture all studies that warrant inclusion in the proposed scoping review. To address this limitation, A1 and A2 collaborated with an experienced librarian to develop a search strategy that will be adapted and implemented in four electronic databases. The search strategy was also supplemented with search terms put forth in a systematic review by Lal et al. [ 23 ]. However, given the rapid evolution of technological advancements, eligible studies may still go undetected and hence not be included in the proposed scoping review. Moreover, the decision to omit related terms such as “satisfaction” and “preference” from the search strategy, while helping with feasibility and limiting the scope of this review to the specific concept of acceptability, also stands as a potential limitation of the proposed review.

Third, after conducting pilot testing of the screening strategy, only one reviewer will be responsible for assessing the eligibility of abstracts. However, mitigation strategies have been implemented to minimize the impact of this limitation. For instance, at the stage of full-text screening, a second reviewer will validate all the studies deemed eligible by A1.

Finally, based on the volume of results obtained through a preliminary screening of the literature, non peer-reviewed materials (e.g., Theses) will not be considered for inclusion in the proposed review.

Despite its limitations, the proposed scoping review will aim to map the available scientific evidence pointing to associations between sociodemographic characteristics and perceived acceptability of DMHI. These findings will allow us to determine the extent to which sociodemographic characteristics are being considered in the digital mental health literature along with their reported associations with perceived acceptability. Specifically, the proposed review will enable us to determine whether certain sociodemographic groups have frequently been linked with lower perceived acceptability of DMHI. This investigation represents a first step to ensure that diverse populations can access and engage with DMHI that they perceive as acceptable.

Moreover, the proposed scoping review will provide insights for future research on ways to assess and report on the role of sociodemographic characteristics when evaluating the perceived acceptability of DMHI. This synthesis of knowledge can also serve as a justification and provide guidance for conducting a systematic review and meta-analysis on the topic of inquiry, within which the methodological rigor of included studies will be examined. The findings from this type of review may serve as evidence to guide the development and delivery of culturally tailored DMHI.

Supporting information

S1 file. detailed search strategy for medline..

https://doi.org/10.1371/journal.pone.0301886.s001

S2 File. PRISMA-P checklist.

https://doi.org/10.1371/journal.pone.0301886.s002

Acknowledgments

The authors would like to acknowledge and thank Sarah Cherrier, a librarian at the University of Montreal, for her assistance and support in the development and implementation of the search strategy presented in this protocol.

  • View Article
  • PubMed/NCBI
  • Google Scholar
  • 25. World Health Organization (WHO). International Statistical Classification of Diseases and Related Health Problems, 11 th Edition. WHO; 2019. Available from: https://icd.who.int .
  • 26. American Psychiatric Association (APA). Diagnostic and statistical manual of mental Disorders, 5th Edition. Arlington, VA: APA Publishing; 2013.

SYSTEMATIC REVIEW article

Evidence quality assessment of acupuncture intervention for stroke hemiplegia: an overview of systematic reviews and meta-analyses.

Maoxia Fan&#x;

  • 1 First Clinical Medical College, Shandong University of Traditional Chinese Medicine, Jinan, China
  • 2 Dongying People’s Hospital (Dongying Hospital of Shandong Provincial Hospital Group), Dongying, China
  • 3 Department of Geriatric Medicine, Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, China

Objective: Summarize the conclusions of the systematic review/meta-analysis of the clinical efficacy of acupuncture for stroke hemiplegia, and evaluate its methodological quality and the quality of evidence.

Methods: Two researchers searched and extracted 8 databases for systematic reviews (SRs)/meta-analyses (MAs), and independently assessed the methodological quality, risk of bias, reporting quality, and quality of evidence of SRs/MAs included in randomized controlled trials (RCTs). Tools used included the Assessment of Multiple Systematic Reviews 2 (AMSTAR-2), the Risk of Bias in Systematic (ROBIS) scale, the list of Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA), and the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) system. The search time is from database building to July 2023.

Results: A total of 11 SRs/MAs were included, including 2 English literature and 9 Chinese literature, with all study sites in China. AMSTAR-2 evaluation results showed that the methodological quality of 11 articles was rated as very low quality; Based on the ROBIS evaluation results, the SRs/MAs was assessed as a high risk of bias; According to the results of the PRISMA checklist evaluation, most of the SRs/MAs reports are relatively complete; according to GRADE system, 42 outcomes were extracted from the included SRs/MAs for evaluation, of which 1 was rated as high-quality evidence, 14 as moderate-quality evidence, 14 as low-quality evidence, and 13 as very low-quality evidence.

Conclusion: The available evidence indicates that acupuncture has certain clinical efficacy in the treatment of stroke hemiplegia. However, there are still some limitations to this study, such as the lower quality of SRs/MAs methodologies and evidence included, and more high-quality studies are needed to verify them.

1 Introduction

Stroke is a common cerebrovascular disease, also often known as “stroke” ( 1 ), belongs to a common clinical disease. It is characterized by high morbidity, high mortality rate, high disability rate and many complications ( 2 , 3 ), and the prevalence rate is increasing year by year ( 4 ). With the improvement of medical technology level, the mortality of stroke patients decreased significantly, but still about 80% of patients in the disease recovery will leave a variety of different degrees of neurological and limb dysfunction ( 5 ), including hemiplegia is one of the common sequelae of stroke ( 6 ), not only affect the quality of life, also bring heavy burden to the family and society. Therefore, taking timely and effective rehabilitation treatment to promote the recovery of limb movement function of stroke patients and improve the self-care ability has become an important problem to be solved in clinical practice.

Acupuncture therapy for stroke and its sequelae has received much attention due to its long history and unique treatment experience ( 7 ). Related studies have found that acupuncture can improve the clinical performance of patients with by promoting cerebral blood circulation, affecting astrocytes, blood rheology, and neuronal plasticity ( 8 , 9 ). In recent years, a large number of clinical randomized controlled trials showed that acupuncture is effective in the treatment of stroke hemiplegia ( 10 , 11 ). At present, several systematic reviews (SRs)/meta-analyses (MAs) have comprehensively analyzed the relevant clinical data, which provides a theoretical basis for the research of acupuncture for stroke hemiplegia. However, the quality of the evidence obtained by the secondary institute is influenced by the quality of the original data and the subjective factors of the researchers, and whether the method evaluation is standardized, objective and fair, and can fully and effectively evaluate the clinical efficacy of acupuncture in stroke hemiplegia. To provide higher evidence support, this study will re-evaluate the existing SRs/MAs of acupuncture for the treatment of stroke hemiplegia at home and abroad. Using tools including the Assessment of Multiple Systematic Reviews 2 (AMSTAR-2) ( 12 , 13 ), the Risk of Bias in Systematic (ROBIS) ( 14 ) scale, the list of Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) ( 15 , 16 ), and the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) ( 17 – 19 ) system for methodology, risk of bias, reporting quality and evidence quality rating. The existing research results are systematically summarized in order to provide a more accurate evidence-based basis for subsequent clinical decisions ( 20 ).

2 Materials and methods

The methodology of this overview follows the Cochrane Handbook, and the report of this overview is in line with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 checklist. This overview has been registered with the PROSPERO website (Registration number: PROSPERO CRD42024495984).

2.1 Inclusion and exclusion criteria

Contributions to acupuncture treatment of post-stroke paralysis based on existing RCTs. The criteria for inclusion of SRs/MAs in this overview are as follows: (1) Based on the SRs/MAs of randomized controlled trials (RCTs), the published language was in Chinese or English. (2) Patients were diagnosed with paralysis after stroke, regardless of the type of stroke (hemorrhagic stroke or ischemic stroke), the diseased brain area, the affected limb, gender, age, ethnicity, and nationality. (3) The intervention measures in the treatment group were acupuncture or acupuncture combined with other treatments, while the control group received non-acupuncture therapy. (4) The main outcome measures are: ① fugl-meyer as-sessment of low limb (FMA-L); ② Barthel Index (BI); ③ Clinic Neurological Function Deficit Scale (NDS); ④ The Modified Ashworth Spasticity Rating Scale score (MAS); ⑤ Clinical effectiveness; ⑥ Brunnstorm stage score; ⑦ The Clinical Spasticity Index score (CSI); ⑧ The Berg Balance Scale score; ⑨ functional comprehensive assessment score (FCA).

Exclusion criteria: Repeated published literature, animal studies, dissertations, conference articles, systematic review protocols and literature where raw data cannot be extracted.

2.2 Search strategy

A computer search was conducted across China National Knowledge Network (CNKI), Wanfang database, VIP database, Chinese biomedical literature database (CBM) and PubMed, EMbase, Cochrane Library and other English databases, the search time from the establishment of the database to December 31, 2023. In addition, we manually supplemented the references and grey literature of the included studies. The retrieval method adopts the combination of subject words and free words. Key words include acupuncture, electric acupuncture, needling, head needle, body needle, flying needle, fire needle, acupuncture with warmed needle, acusector, prod, Post-stroke hemiplegia, stroke hemiplegia, cerebral infarction hemiplegia, spastic hemiplegia, flaccid hemiplegia, systematic review, meta analysis. Taking CNKI as an example, the specific search strategy is as follows: SU = (“acupuncture” + “electric acupuncture” + “needling” + “head needle” + “acupuncture with warmed needle” + “acusector” + “prod”) and SU = (“Post-stroke hemiplegia” + “stroke hemiplegia” + “cerebral infarction hemiplegia” + “spastic hemiplegia” + “flaccid hemiplegia”) and SU = (“systematic review” + “meta analysis”); Search strategy for the PubMed database see Table 1 .

www.frontiersin.org

Table 1 . Search strategy for the PubMed database.

2.3 Literature screening and data extraction

Two researchers (MX-F and RM-L) independently screened and extracted the literatures, and cross-checked them. Any disagreement during this process were subject to discussion and negotiation or the decision of a third expert (WL-G). The extracted literature information included: Authors, publication year, nationality, sample size, intervention measures, quality assessment tools and main conclusions.

2.4 Quality assessment

Two researchers (MX-F and RM-L) independently assessed the methodological and evidence quality of the included SRs/MAs, Any discrepancies were resolved by consensus or adjudication by a third author (WL-G).

2.4.1 Methodological quality assessment

The methodological quality of the included SRs/MAs was evaluated using the Assessment System for Evaluating Methodological Quality 2 (AMSTAR-2). The AMSTAR-2 scale contains 16 items that can be answered with a “yes,” “partially yes” or “no.” Item 1: Whether the research question and inclusion criteria include the elements of PICO; 2: Whether the study method is determined before its implementation, Whether the report is consistent with the plan; 3: Whether to explain the reasons for the study type; 4: Whether the search literature is comprehensive; 5: Whether the literature is independently screened by two people; 6: Whether the data is extracted by two people; 7: Whether to provide the list of documents and reasons for exclusion; 8: Whether to describe the basic characteristics of the included study in detail; 9: Whether the risk of included study bias is reasonably assessed; 10: Whether to report the source of research funding; 11: whether appropriate statistical methods are used to combine and analyze the results during the meta-analysis; 12: whether to consider the risk of bias in the meta-analysis or other evidence integration; 13: Whether the inclusion risk of study bias was considered when interpreting the results; 14: Whether the heterogeneous results are interpreted or discussed; 15: whether the publication bias was investigated during the quantitative synthesis, Whether to discuss its impact on the results; 16: Whether to report conflicts of interest and sources of funding.

According to the evaluation criteria, it can be rated “high,” “moderate,” “low” and “very low,” and 7 out of 16 items in the tool (2, 4, 7, 9, 11, 13 and 15) are critical areas.

2.4.2 Risk of bias assessment

The Risk of Bias in Systematic Review (ROBIS), which aims at the bias risk of system evaluation, is not only used to evaluate the bias risk in the process of making and interpreting the results of multiple SRs/MAs such as intervention, diagnosis, etiology, and prognosis, etc., but also used to evaluate the correlation between the SRs/MAs problems and the practical problems to be solved by users. ROBIS scale was used in this overview to evaluate the risk of bias in the inclusion of SRs/MAs and the evaluation was carried out in three stages. ROBIS is useful for assessing the extent of bias in four domains: (1) Eligibility criteria for each study; (2) The identification and selection of studies; (3) Data collection and study appraisal; and (4) Overall synthesis and major findings. Within each domain, specific questions were used to determine the risk of bias, which was rated as “low,” “high,” or “unclear.”

2.4.3 Report quality assessment

SR/MA is an important evidence to guide clinical practice. The clarity of its report affects the realization of its clinical value. Standard reports can reduce the bias between actual research results and published results, and increase the transparency of articles. The PRISMA Report Guide is designed to help authors improve the quality of their reports, obtain key information, and improve readability and credibility.

The quality of each SR/MA report for the included SRs/MAs was assessed by the PRISMA 2020 checklist, and each of the 27 items included in PRISMA 2020 was scored as “yes,” “partially yes” or “no.”

2.4.4 Evidence quality assessment

In order to be useful to decision-makers, clinicians and patients, the SRs/MAs must provide not only the effect estimates of each result, but also the information needed to judge the correctness of these effect estimates. The Grading of Recommendations Assessment, Development, and Evaluation (GRADE) ( 19 ) provides a structure for SRs/MAs and clinical practice guidelines to ensure that it addresses all key issues of outcome evidence quality evaluation related to a specific issue in a consistent and systematic manner.

The quality of evidence for each SR/MA outcome was evaluated by the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) system. Since the initial quality of evidence for RCTs is high, the quality of evidence for the outcomes of the study was evaluated based on downgrading factors such as limitations, inconsistencies, indirectness, imprecision, and publication bias of the study. According to the level of downgrade, they are rated as “high,” “moderate,” “low” and “very low.”

3.1 Literature search and screening results

A total of 82 related literatures were retrieved, after 32 duplicate literatures were deleted, 28 unrelated literatures were deleted by reading titles and abstracts, then 5 literatures with inconsistent intervention measures were removed, and finally 6 literatures of systematic review protocols and literatures with missing data were deleted, 11 literatures ( 21 – 31 ) were finally included after layer-by-layer screening. The specific screening process is shown in Figure 1 .

www.frontiersin.org

Figure 1 . Flow diagram of literature screening process.

3.2 The basic characteristics of the included literature

Of the 11 included SRs/MAs, 2 of which were published in English and 9 in Chinese, all studies were conducted in China. The number of RCTs in the included SRs/MAs ranged from 5 to 27, with a sample size of between 750 and 1963. For quality evaluation, 6 ( 21 , 22 , 24 , 28 , 30 , 31 ) SRs/MAs used Cochrane risk of bias tool, and 5 ( 23 , 25 – 27 , 29 ) SRs/MAs used the Jadad scale. The specific information is shown in Table 2 .

www.frontiersin.org

Table 2 . Basic information of the included SRs/MAs.

3.3 Results on SRs/MAs quality assessment

3.3.1 results of the methodological quality.

The included SRs/MAs was assessed for its methodological quality by the AMSTAR-2. The results showed that the quality of the 9 included SRs/MAs were very low, since none of the SRs/MAs included in the quality assessment met the requirement of key Item 7 (none of the SRs/MAs provided an exclusion list). A summary of the results is shown in Table 3 .

www.frontiersin.org

Table 3 . Result of the AMSTAR-2 assessments.

3.3.2 Results of the risk of bias assessment

For ROBIS, Phase 1 assesses the relevance of the research topic, and all SRs/MAs were rated as low risk of bias. In the Phase 2: Domain 1 assessed the study eligibility criteria, and all SRs/MAs were rated as low risk of bias; Domain 2 assessed the identification and selection studies, and 3 SRs/MAs were rated as low risk of bias; Domain 3 assessed the collection and study appraisal, 3 SRs/MAs were rated as high risk of bias; and Domain 4 assessed the synthesis and findings, and 8 SRs/MAs were rated as high risk of bias. Phase 3 considered the overall risk of bias in the reviews, and all SRs/MAs were rated as high risk of bias. The ROBIS scale evaluation results are shown in Table 4 .

www.frontiersin.org

Table 4 . Results of the ROBIS assessments.

3.3.3 Reporting quality of the included SRs/MAs

Reporting quality of the included SRs/MAs was evaluated as per the PRISMA checklist.6 of these items did not report at 100%. Mainly concentrated in the reporting rate of item 5 (protocol and registration) was 18%, the reporting rates of item 8 (search) and item 15 (study bias) were 82%, the item 27 (funding) reporting rate was 73%. The results of the PRISMA checklist assessment are shown in Table 5 .

www.frontiersin.org

Table 5 . Results of the PRISMA checklist.

3.3.4 Results of the quality of the evidence

A meta-analysis of the 42 outcomes in the study was performed, and the GRADE system was used to assess the quality of each outcome.1 outcome measure was high evidence quality, 14 moderate, 14 low and 13 very low evidence quality. Longitudinal analysis was performed for each degradation factor while horizontal evaluation was performed for the included systematic review/meta-analysis outcome indicators. We found that inconsistency ( n  = 29) was the main factor in reducing the quality of evidence, followed by publication bias ( n  = 28) and limitation ( n  = 23). Imprecision ( n  = 10) also has an impact on the evidence quality of the article. The results of evidence quality are shown in Table 6 .

www.frontiersin.org

Table 6 . Results of evidence quality.

3.3.5 Summary of results

The outcome measures extracted from the included studies are listed in Table 6 .

6 SRs/MAs ( 21 – 25 , 28 ) reported the BI. 1 SR/MA ( 21 ) conducted subgroup analyses, yielding the following results: (1) The therapeutic efficacy of acupuncture as a standalone treatment was superior to that of other conventional rehabilitation therapies; (2) The therapeutic efficacy of acupuncture as a standalone treatment was superior to non-acupuncture treatments; (3) The combined application of acupuncture and other conventional rehabilitation treatments demonstrated superior therapeutic efficacy compared to other conventional rehabilitation treatments; (4) Acupuncture as a standalone treatment exhibited superior therapeutic efficacy compared to conventional western medical treatments. Furthermore, 5 SRs/MAs ( 22 – 25 , 28 ) reported that the combined application of acupuncture and non-acupuncture treatments showed superior therapeutic efficacy compared to non-acupuncture treatments.

3.3.5.2 Clinical effectiveness

6 SRs/MAs ( 21 – 23 , 25 – 27 ) reported the Clinical effectiveness. 1 SR/MA ( 21 ) conducted subgroup analyses, revealing the following results: (1) The combined application of acupuncture and other conventional rehabilitation treatments demonstrated superior therapeutic efficacy compared to other conventional rehabilitation treatments; (2) Acupuncture as a standalone treatment exhibited superior therapeutic efficacy compared to conventional western medical treatments. Furthermore, 5 SRs/MAs ( 22 , 23 , 25 – 27 ) reported that the combined application of acupuncture and non-acupuncture treatments showed superior therapeutic efficacy compared to non-acupuncture treatments.

3.3.5.3 NDs

3 SRs/MAs ( 21 , 22 , 31 ) reported the NDs. 1 SR/MA ( 21 ) conducted subgroup analyses, yielding the following results: (1) The combined application of acupuncture and other conventional rehabilitation treatments demonstrated superior therapeutic efficacy compared to other conventional rehabilitation treatments; (2) Acupuncture as a standalone treatment exhibited superior therapeutic efficacy compared to conventional western medical treatments. Additionally, 1 SR/MA ( 22 ) reported that the combined application of acupuncture and non-acupuncture treatments showed superior therapeutic efficacy compared to non-acupuncture treatments. Furthermore, 1 SR/MA ( 31 ) reported that acupuncture as a standalone treatment exhibited superior therapeutic efficacy compared to conventional western medical treatments.

3.3.5.4 MAS

4 SRs/MAs ( 21 – 24 ) reported the MAS. 1 SR/MA ( 21 ) reported that acupuncture as a standalone treatment demonstrated superior efficacy compared to other conventional rehabilitation therapies. 3 SRs/MAs ( 22 – 24 ) reported that the combined approach of acupuncture with non-acupuncture treatments exhibited superior efficacy over non-acupuncture treatments alone.

3.3.5.5 The Berg Balance Scale score

1 SR/MA ( 21 ) reported the Berg Balance Scale score. The results indicated that the efficacy of acupuncture as a standalone treatment surpassed that of other conventional rehabilitation therapies.

3.3.5.6 FMA-L

7 SRs/MAs ( 21 – 24 , 28 – 30 ) reported the FMA-L. 1 SR/MA ( 21 ) conducted subgroup analyses, yielding the following results: (1) The therapeutic efficacy of acupuncture as a standalone treatment was superior to that of other conventional rehabilitation therapies; (2) The combined application of acupuncture and other conventional rehabilitation treatments demonstrated superior therapeutic efficacy compared to using other conventional rehabilitation treatments alone; (3) Acupuncture as a standalone treatment exhibited superior therapeutic efficacy compared to conventional western medical treatments. 5 SRs/MAs ( 22 – 24 , 28 , 29 ) reported that the combined approach of acupuncture with non-acupuncture treatments exhibited superior efficacy compared to non-acupuncture treatments alone. Additionally, 1 SR/MA ( 30 ) reported that acupuncture as a standalone treatment demonstrated superior efficacy compared to non-acupuncture treatments.

3.3.5.7 Brunnstorm Stage score

3 SRs/MAs ( 22 , 29 , 31 ) reported the Brunnstorm Stage score. 2 SRs/MAs ( 22 , 29 ) reported that the combined approach of acupuncture with non-acupuncture treatments exhibited superior efficacy compared to non-acupuncture treatments alone. Additionally, 1 SR/MA ( 31 ) reported that acupuncture as a standalone treatment demonstrated superior efficacy compared to conventional western medical treatments.

3.3.5.8 CSI

1 SR/MA ( 22 ) reported the CSI. The results indicate that the combined approach of acupuncture with non-acupuncture treatments shows superior efficacy compared to non-acupuncture treatments alone.

3.3.5.9 FCA

1 SR/MA ( 29 ) reported the FCA. The results indicate that the combined approach of acupuncture with non-acupuncture treatments shows superior efficacy compared to non-acupuncture treatments alone.

4 Discussion

High-quality SRs/MAs is an important source for evidence-based medicine to obtain the best evidence, and is recognized as the cornerstone for evaluating clinical efficacy and formulating clinical guidelines and specifications ( 20 ). Overview of SRs/MAs is a comprehensive research method to comprehensively collect the treatment and diagnosis of the same disease or health problems, which can provide more concentrated users with high-quality evidence and provide better guidance for future clinical work ( 32 , 33 ). In this study, the included 11 SRs/MAs of acupuncture for stroke hemiplegia were reevaluated, and the included articles were evaluated by multiple quality evaluation methods to provide high-quality evidence-based evidence and decision basis for the clinical efficacy of acupuncture for hemiplegia, and provide further reference and evidence support for future clinical applications. At present, acupuncture has been widely used for the treatment of hemiparesis caused by stroke. When stroke patients are in a hemiparetic state, acupuncture can promote the recovery of muscle strength and muscle tone, improve limb motor function, and effectively prevent various complications ( 34 , 35 ). However, the current research on acupuncture therapy for stroke is extensive, with studies primarily focusing on spasticity and the recovery phase in investigating the mechanisms underlying acupuncture’s improvement of limb motor function.

In this study, the methodology of quality was evaluated by the included articles using AMSTAR-2, and all the 7 key items had some defects: (1) Only 2 of the 11 included research protocols were formulated in advance, which may affect the rigor of the systematic review formulation; (2) 9 articles do not manually search the grey literature or provide a complete search form, and cannot check whether the included literature search is comprehensive and accurate, and whether the data extraction is accurate and not reproducible; (3) None of the articles provides a list of excluded articles, which may reduce the credibility of the SRs/MAs; (4) 3 articles did not report any potential conflicts of interest or funding, which would bias the systematic evaluation. The absence of the above key items is the main factor leading to the lower results of the methodological quality evaluation.

The SRs/MAs bias risk results included in the ROBIS scale evaluation showed that incomplete literature search and data synthesis and incomplete presentation of results were the main factors leading to high bias. This study evaluated the quality of the report using the PRISMA checklist, which showed that lack of protocol registration, incomplete search strategy, and uncertainty about funding sources affected the rigor of the systematic overview as the highest evidence for diagnosis and treatment.

Quality grade evaluation of the included articles using the GRADE rating scale found that all the outcome measures were mostly of low quality (1) Some studies also have great risks in terms of heterogeneity and inaccuracy, mainly because the overlap of confidence intervals is too small or the sample size does not meet the optimal information sample size, or the confidence interval is too wide, which may be unreasonably related to the literature inclusion criteria and retrieval method; (2) More than half of the outcome indicators were downgraded, mainly because of the heterogeneity of the included SRs/MAs, and only a few conducted sensitivity and subgroup analyses; (3) In the evaluation of limitations, 23 outcome indicators were downgraded, because the randomization method and allocation of the included studies were not clear, only a small part of them described whether they was blind, and there was a high possibility of implementation and measurement bias; (4) Publication bias is also widespread.

To the best of our knowledge, this overview is the first article to use the SRs/MAs for clinical evaluation of acupuncture for stroke hemiplegia, which can provide comprehensive evidence reference for clinical practice. The assessment processes using tools such as AMSTAR-2, ROBIS, PRISMA, and grading have highlighted the limitations of SRs/MAs and RCTs, which can guide future high-quality clinical research. However, we must also acknowledge the limitations of this overview. Due to language restrictions, this study only included systematic reviews published in Chinese and English, and did not include Korean and Japanese databases that have similar backgrounds in traditional Chinese medicine research. Additionally, the search process actually overlooked manual searching, leading to some degree of selection bias. Furthermore, the literature screening and quality assessment were conducted by two researchers and were somewhat subjective. The number of included SRs/MAs was small, and the overall quality was not high.

4.1 Implications for future research

To reduce various biases such as selection bias, implementation bias, and measurement bias, further original studies should be conducted using large-sample, multicenter, long-term clinical randomized controlled trials based on evidence-based medicine standards. Attention should be given to properly and rationally implementing randomization, concealing allocation, and blinding. Additionally, to improve the quality of evidence, authors should register their study protocols before conducting SRs/MAs to ensure the rigor of their procedures. During the literature search and screening process, the excluded literature information and complete search strategies for all databases should be provided to ensure replicability. When quantitatively calculating effect sizes, individual study results should be systematically excluded one by one to ensure the stability of the results. Furthermore, a comprehensive assessment of publication bias will also improve the accuracy of the meta-analysis results. In order to develop a more effective treatment prescription and evaluation system, it is advisable for future studies to report detailed information regarding acupuncture treatment. This includes the number of needles used in each treatment session, specific acupuncture techniques employed, needle depth, needle reactions, treatment process, qualifications of the acupuncturist, years of experience of the assessors and clinicians involved, and the provision of a standardized and explicit treatment plan. Reporting these details will contribute to a better understanding of the acupuncture intervention and facilitate the replication and comparison of studies, ultimately leading to improved treatment outcomes and enhanced evaluation of acupuncture efficacy.

5 Conclusion

In conclusion, acupuncture treatment of stroke hemiplegia has certain efficacy, which can effectively improve the clinical manifestations of patients and reduce the disability rate. However, there are widespread problems of low methodological quality and low quality of evidence for SRs/MAs, which limits the reliability of the results and requires more high-quality original studies to provide evidence to support them.

Data availability statement

The original contributions presented in the study are included in the article/supplementary materials, further inquiries can be directed to the corresponding author.

Author contributions

MF: Writing – original draft, Writing – review & editing. BZ: Writing – original draft, Writing – review & editing, Methodology. CC: Data curation, Methodology, Writing – original draft. RL: Methodology, Validation, Writing – original draft. WG: Writing – original draft, Writing – review & editing.

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was supported by National Natural Science Foundation of China (No. 82204942); Natural Science Foundation of Shandong Province (No. ZR2022QH123); China Postdoctoral Science Foundation (No. 2022M721998).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Abbreviations

SRs, Systematic reviews; MAs, Meta-analyses; RCTs, Randomized controlled trials; AMSTAR-2, Assessment System for Evaluating Methodological Quality 2; ROBIS, Risk of Bias in Systematic reviews; PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses; GRADE, Grading of Recommendations Assessment Development and Evaluation; CNKI, Chinese National Knowledge Infrastructure; CBM, Chinese Biological Medicine Database; FMA-L, Fugl-meyer as-sessment of low limb; BI, Barthel Index; NDS, Clinic Neurological Function Deficit Scale; MAS, The Modified Ashworth Spasticity Rating Scale score; CSI, The Clinical Spasticity Index score; FCA, Functional comprehensive assessment score

1. Jiang, LS, Wang, Y, Chen, SF, Cui, LL, and Wang, ZY. Clinical observation on treatment of apoplectic hemiplegia by Tongdu Tiaoshen acupuncture combined with Buyang Huanwu decoction. J Chin Med . (2023) 41:206–8.

Google Scholar

2. Wang, ZW, Ma, ZY, Xue, SF, Li, W, Zuo, HJ, Wu, QN, et al. Consensus of experts on co-morbidity management of coronary heart disease and ischemic stroke at grass-roots level 2022. Chin J Cardiov Dis . (2022) 20:772–93.

3. Wang, YJ, Li, ZX, Gu, HQ, Zhai, Y, Jiang, Y, Zhou, Q, et al. Chinese stroke report 2020 (Chinese version) (3). Chin J Stroke . (2022) 17:675–82.

4. Li, JY, Zhang, JJ, An, Y, He, WW, Dong, XL, Zhang, GM, et al. Quality evaluation study of TCM guidelines and consensus for stroke disease. Chin J Tradit Chin Med Inform . (2022) 29:30–6.

5. Zhou, W, Hu, LD, Kong, FS, Li, PP, Zhuang, SJ, Cheng, YS, et al. Clinical observation on Qigui Tongluo formula combined with acupuncture in improving hemiplegia after stroke and its effect on endoplasmic reticulum stress-autophagy. J Chin Med . (2023) 41:89–93.

6. He, XP, and Hu, LL. Clinical study on the treatment of acupuncture combined with exercise imagination therapy for stroke hemiplegia. New Tradit Chin Med . (2021) 53:191–3.

7. Ouyang, YJ. Analysis of the effect of TCM acupuncture combined with rehabilitation training in treating stroke hemiplegia. China Pract Med . (2022) 17:166–8.

8. He, Y, and Jin, RJ. Progress in the treatment of hemiplegia limb spasm in stroke. Rehabil Theory Pract China . (2006) 10:863–6.

9. Hu, CH, Pan, M, and Zhou, M. Research progress of acupuncture treatment for stroke spastic hemiplegia. New Chin Med . (2021) 53:176–9.

10. Zhang, JH, Wang, D, and Liu, M. Overview of systematic reviews and meta-analyses of acupuncture for stroke. Neuroepidemiology . (2014) 42:50–8. doi: 10.1159/000355435

PubMed Abstract | Crossref Full Text | Google Scholar

11. Xie, M, and Wu, HD. Research progress and thinking on acupuncture treatment of hemiplegia after stroke. Sichuan Tradit Chin Med . (2020) 38:216–8.

12. Tao, H, Yang, LT, Ping, A, Quan, LL, Yang, X, Zhang, YG, et al. Interpretation of AMSTAR-2, a quality assessment tool for systematic evaluation of randomized or nonrandomized control studies. Chin J Evid Based Med . (2018) 18:101–8.

13. Shea, BJ, Reeves, BC, Wells, G, Thuku, M, Hamel, C, Moran, J, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ . (2017) 21:j4008. doi: 10.1136/bmj.j4008

Crossref Full Text | Google Scholar

14. Wu, QF, Ding, HF, Deng, W, Yang, N, Wang, Q, Yao, L, et al. ROBIS: a new tool for assessing the risk of systematic reviews bias. Chin J Evid Based Med . (2015) 15:1454–7.

15. Page, MJ, Moher, D, Bossuyt, PM, Boutron, I, Hoffmann, TC, Mulrow, CD, et al. PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ . (2021) 29:n160. doi: 10.1136/bmj.n160

16. Gao, Y, Liu, M, Yang, KL, Ge, L, Li, L, Li, J, et al. System reviews report specification: comparative analysis and example interpretation of PRISMA 2020 and PRISMA 2009. Chin J Evid Based Med . (2021) 21:606–16.

17. Atkins, D, Best, D, Briss, PA, Eccles, M, Falck-Ytter, Y, Flottorp, S, et al. Grading quality of evidence and strength of recommendations. BMJ . (2004) 328:1490. doi: 10.1136/bmj.328.7454.1490

18. Technical Specifications Development Group for Real World Study in Traditional Chinese Medicine, China Association of Chinese Medicine. J Tradit Chin Med . (2022) 63:293–300.

19. Peng, XX, and Liu, YL. Evaluation of evidence quality in diagnostic accuracy studies: inconsistency, imprecision, publication bias, and others. Chin J Evid Based Pediatr . (2021) 16:472–4.

20. Zhao, C, Tian, GH, Zhang, XY, Wang, YP, Zhang, ZX, Shang, HC, et al. Connotation and thinking of evidence-based medicine developing to evidence-based science. Chin J Evid Based Med . (2019) 19:510–4.

21. Tu, Y, Peng, W, Wang, J, Hao, QH, Wang, Y, Li, H, et al. Acupuncture therapy on patients with flaccid hemiplegia after stroke: a systematic review and meta-analysis. Evid Based Complement Alternat Med . (2022) 2022:1–17. doi: 10.1155/2022/2736703

22. Li, HP, Zhai, YB, Xing, J, and Wang, JL. Meta-analysis of acupuncture treatment for spastic hemiplegia of upper limbs after stroke. World J Tradit Chin Med . (2022) 17:196–207+214.

23. Zhu, JM, Zhuang, R, He, J, Ding, YQ, Jiang, LL, et al. Meta-analysis of acupuncture combined with rehabilitation training for upper limb spasticity after stroke. J Integr Tradit Chin Western Med Cardiovasc Dis . (2021) 19:1892–8.

24. Fan, W, Kuang, X, Hu, J, Chen, X, Yi, W, Lu, L, et al. Acupuncture therapy for poststroke spastic hemiplegia: a systematic review and meta-analysis of randomized controlled trials. Complement Ther Clin Pract . (2020) 40:101176. doi: 10.1016/j.ctcp.2020.101176

25. Gou, Y, Wu, JG, Mo, XN, Xun, LX, et al. Meta-analysis of randomized controlled trial of Buyang Huanwu decoction combined with acupuncture versus Buyang Huanwu decoction alone in treating stroke hemiplegia. Asia-Pac Tradit Med . (2019) 15:153–6.

26. Wang, YY, Dai, LW, Qin, YP, Li, F, Yu, J, et al. Meta-analysis of therapeutic effect of acupuncture on stroke hemiplegia. Clin Res Tradit Chin Med . (2017) 9:8–11.

27. Wang, YQ, Yang, HS, Feng, HH, Li, YQ, Peng, XY, Xiao, CX, et al. Meta-analysis of randomized controlled trials of acupuncture for stroke hemiplegia. Chin J Basic Med . (2016) 22:1670–1672+1712.

28. Jin, D, and Wang, J. Meta-analysis of the clinical efficacy evaluation of acupuncture therapy for hemiplegia in stroke. New Tradit Chin Med . (2016) 48:284–6.

29. Song, Y, Li, Q, Wu, ZJ, and Wang, ZY. Meta-analysis of the effect of acupuncture therapy combined with rehabilitation training on the recovery of lower limb motor function in stroke patients with hemiplegia. J Nanjing Inst Phys Educ (Nat Sci Ed) . (2016) 15:49–57.

30. Lin, MQ, and Liu, WL. Systematic review of acupuncture treatment for hemiplegic gait in stroke patients. J Rehabil . (2015) 25:54–62.

31. Li, N, Feng, B, Zou, J, Liu, Y, et al. Meta-analysis of acupuncture for hemiplegia of stroke. J Chengdu Univ Tradit Chin Med . (2002) 2:37–39+64.

32. Wang, LJ, Zeng, Q, Xie, YJ, Chen, D, Yao, DY, Chen, X, et al. Normative evaluation of re-evaluation reports on acupuncture systems at home and abroad based on PRIO-harms scale. Chin Acupunct . (2020) 40:793–8.

33. Yang, KH, Liu, YL, Yuan, JQ, and Jiang, HL. Re-evaluation of system review in development and perfection. Chin J Evid-Based Pediatr . (2011) 6:54–7.

34. Lee, SH, and Lim, SM. Acupuncture for insomnia after stroke: a systematic review and meta-analysis. BMC Complement Altern Med . (2016) 16:228. doi: 10.1186/s12906-016-1220-z

35. Liu, S, Zhang, CS, Cai, Y, Guo, XF, Zhang, L, Xue, CL, et al. Acupuncture for post-stroke shoulder-hand syndrome: a systematic review and meta-analysis. Front Neurol . (2019) 26:433.

36. Du, YH, Li, J, Sun, DW, Liu, WH, Li, GP, Lin, X, et al. Study on the spectrum of modern acupuncture and moxibustion in China. Chin Acupunct . (2007) 5:373–8.

Keywords: acupuncture, stroke hemiplegia, systematic reviews, meta-analyses, overview

Citation: Fan M, Zhang B, Chen C, Li R and Gao W (2024) Evidence quality assessment of acupuncture intervention for stroke hemiplegia: an overview of systematic reviews and meta-analyses. Front. Neurol . 15:1375880. doi: 10.3389/fneur.2024.1375880

Received: 26 January 2024; Accepted: 08 April 2024; Published: 30 April 2024.

Reviewed by:

Copyright © 2024 Fan, Zhang, Chen, Li and Gao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Wulin Gao, [email protected]

† These authors share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

  • Open access
  • Published: 29 April 2024

Untangling the mess of CGRP levels as a migraine biomarker: an in-depth literature review and analysis of our experimental experience

  • Gabriel Gárate 1 ,
  • Julio Pascual 1 ,
  • Marta Pascual-Mato 1 ,
  • Jorge Madera 1 ,
  • María Muñoz-San Martín 1 &
  • Vicente González-Quintanilla 1  

The Journal of Headache and Pain volume  25 , Article number:  69 ( 2024 ) Cite this article

9 Altmetric

Metrics details

Calcitonin gene-related peptide (CGRP) is the most promising candidate to become the first migraine biomarker. However, literature shows clashing results and suggests a methodological source for such discrepancies. We aimed to investigate some of these methodological factors to evaluate the actual role of CGRP as biomarker.

Previous to the experimental part, we performed a literature review of articles measuring CGRP in migraine patients. Using our 399 bio-bank sera samples, we performed a series of experiments to test the validity of different ELISA kits employed, time of sample processing, long-term storage, sampling in rest or after moderate exercise. Analysis of in-house data was performed to analyse average levels of the peptide and the effect of sex and age.

Literature review shows the high variability in terms of study design, determination methods, results and conclusions obtained by studies including CGRP determinations in migraine patients. CGRP measurements depends on the method and specific kit employed, also on the isoform detected, showing completely different ranges of concentrations. Alpha-CGRP and beta-CGRP had median with IQR levels of 37.5 (28.2–54.4) and 4.6 (2.4–6.4)pg/mL, respectively. CGRP content is preserved in serum within the 24 first hours when samples are stored at 4°C after clotting and immediate centrifugation. Storages at -80°C of more than 6 months result in a decrease in CGRP levels. Moderate exercise prior to blood extraction does not modulate the concentration of the peptide. Age positively correlates with beta-CGRP content and men have higher alpha-CGRP levels than women.

Conclusions

We present valuable information for CGRP measurements in serum. ELISA kit suitability should be tested prior to the experiments. Alpha and beta-CGRP levels should be analysed separately as they can show different behaviours even within the same condition. Samples can be processed in a 24-h window if they have been kept in 4°C and should not be stored for more than 6 months at -80°C before assayed. Patients do not need to rest before the blood extraction unless they have performed a high-endurance exercise. For comparative studies, sex and age should be accounted for as these parameters can impact CGRP concentrations.

Graphical Abstract

literature review data extraction

Peer Review reports

Introduction

Migraine and its subtypes are diagnosed based on clinical criteria [ 1 ]. Thus, multiple phenotypes sharing the same diagnosis are treated the same way with clashing outcomes. However, as many real-world data studies have shown [ 2 ], the different phenotypes have been proved ineffective to create profiles prone to respond to the different treatment options. Historical therapies for migraine, which is worth to mention that none of these were initially developed to treat this condition, apart from the triptans, and are not specific for it [ 3 ], have not met the challenge of effectively aborting and/or preventing the symptoms, in some cases with limited efficacy, tolerability and patient adherence [ 4 ].

Since the 1990s decade our understanding of migraine has expanded markedly and new therapeutic agents have been brought to the market in an effort to alleviate the personal and economic burden that migraineurs suffer. These are the calcitonin gene-related peptide (CGRP)-targeted therapies which have revolutionized the management of migraine [ 5 ], including monoclonal antibodies against the CGRP ligand or its receptor [ 6 ], and small molecules antagonists to the CGRP receptor, the gepants [ 7 ]. Nonetheless, there is still a portion of patients who do not respond to the treatments, highlighting the importance that a biomarker would have in migraine, allowing to create objective diagnostic criteria besides the clinical ones, which may be subject to errors [ 8 ], and to monitor objectively the response to treatments.

CGRP is a multifunctional neuropeptide which was first discovered in 1982, described as the result of the alternative splicing of the calcitonin gene (CALCA in humans) transcript, hence its name [ 9 ]. Later on, this first form of CGRP will be named alpha-CGRP, as opposed to the beta-CGRP, encoded in a different gene (CALCB in humans), with a different regulation and expression pattern to the alpha-CGRP [ 10 ]. These two peptides differ in 3 out of the 37 amino acids of their sequence but share a common structure and are part of the CGRP peptide family, also comprised by calcitonin, adrenomedullin 1 and adrenomedullin 2 [ 11 ]. Although their distribution in the human body tends to overlap [ 12 ], alpha-CGRP has been described to be the predominant form in the central and peripheral nervous system while beta-CGRP is more relatively abundant in the enteric nervous system [ 13 ].

The relevance of the peptide goes beyond its use as a therapeutical target, having been proposed as a biomarker in migraine. Several studies have reported elevation of the peptide in ictal and/or interictal phases in medication-free periods of migraine patients [ 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 ], the reduction of the CGRP levels after abortive and prophylactic treatment [ 26 , 28 , 38 , 39 , 40 , 41 , 42 , 43 , 44 ] and the induction of migraine-like headaches when infused in humans [ 45 ]. Despite these results, there are other works contradicting the findings [ 35 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 ] and which emphasize the way until its eventual validation and clinical use is still far way to become a reality. The source of such discrepancies, although still unknown, is most probably multifactorial. There is a methodological component [ 55 , 56 ] and the influence of other individual parameters such as comorbidities [ 36 ], concomitant treatments [ 57 ] or menstrual cycle [ 58 , 59 ], which have not been taken into account or which have not been sufficiently described to be considered properly.

In this work we have analysed in detail the existing literature about CGRP measurements in migraine patients, discussing their methodological differences and their effect on the reported concentrations of the peptide. In addition, we have conducted a series of experiments aimed to elucidate the potential effects on serum content of total CGRP, alpha-CGRP, and beta-CGRP, of a number of variables, including different enzyme-linked immunosorbent assay (ELISA) kits, sample processing time, long-term storage or immediate practise of exercise before sampling. Finally, we have analysed our in-house database of CGRP measurements to investigate the effect that sex and age might have on the molecules.

Review of previously published works including CGRP measurements in migraine patients

A systematic search was conducted in the databases PubMed, Scopus and Science Direct until February 2024 using the following terms: (a) CGRP; (b) migraine; and one of the following terms: (c) levels; (d) concentration; (e) measurements. We included original articles with CGRP measurements in humans with migraine. We only included and analysed works written in English language.

Methodological experiments

Kit analysis.

We tested 4 different ELISA kit references with serum samples, 2 based on competitive ELISA (Biorbyt, UK, ref: orb438605; BMA Biomedicals, Switzerland, ref: S-1198), specifically designed for the detection of total-CGRP, and 2 based on ELISA sandwich (Abbexa, UK, ref: abx257902; CUSABIO, China, ref: CSB-E08210h), designed for the detection of alpha and beta-CGRP, respectively. All 4 of these products were assayed multiple times (at least 4 for each kit reference) to analyse the optimal dilutions of the samples, their reproducibility and their reported concentrations. All the procedures were carried out strictly following the manufacturer’s instructions of use of their products, they were performed by the same researcher, using the same equipment, and in the same facilities. Regarding the last step of the ELISA processes, in which manufacturers give a window of time, specifying that the user must determine the optimum, we incubated the substrate for 15 min for alpha-CGRP and for 20 min for beta-CGRP. All the samples were measured in duplicate, obtained from morning blood extractions, 9–12 am, from patients in a fast of at least 12h. These samples were let to clot for 10–15 min, centrifuged at 3500 rpm for 10 min and then immediately stored at -80°C until assayed. A standard curve was generated for every single batch, and they were calculated using a 4-parameter logistic (4-PL) regression with r 2  > 0.999.

Influence of sample processing time

We recruited 6 individuals without history of migraine and subjective absence of headache at the day of the sampling (50% male; age range: 24–65 years). These individuals had a blood extraction in the early morning, between 9 am and 9:30 am, performed in rest at our laboratory facilities. The blood was then let to clot for 10 min at room temperature, then centrifuged at 3500 rpm for 10 min to obtain serum. Serum was divided into 4 tubes. First one was immediately stored at -80°C, the other three were kept in the refrigerator at 4°C for, 2, 4 and 24 h respectively before frozen. None of the samples were added peptidase inhibitor. These samples were measured by triplicate.

Effect of exercise

Additionally, after the first blood extraction, these same 6 subjects were asked to perform a 20 min run at moderate pace before a second blood extraction. Blood obtained was then processed following the same procedure as the resting samples but in this case all the serum was immediately stored at -80°C. These samples were measured by triplicate.

Long-term storage

We assayed 11 consecutive samples from previous works (36.4% male; age range: 26–65) that had been stored at -80°C for more than 6 months and which had been assayed altogether before being stored for a month and before reaching this time point.

Analysis of our CGRP database

Samples coming from our bio-bank were grouped together, reaching 399 individuals (29.3% male; age range: 18–96 years), and then analysed to see the average levels of the peptide and possible effects of sex and age in the circulating concentrations of the molecules.

Statistical analysis

Data are displayed as average with standard deviation (SD) unless stated differently. Comparisons between samples immediately processed and stored at -80°C obtained in resting subjects and right after exercising, and samples analysed before and after they had been stored for 6 months were made using the Wilcoxon matched-pairs signed rank test. Comparisons between samples from same individuals that were frozen at different timepoints have been made using Friedman test followed by Dunn’s test. Correlation relationships of the meta-analysis were evaluated by Spearman correlation test and summarized by Spearman’s rho coefficient and related p-values. Comparisons between sub-groups in the meta-analysis were performed using the Mann Whitney U test.

Article review

Applying the criteria specified in the method section we included 52 articles from the initial search that have been sorted by sample source and detection methodology and are displayed in Table  1 .

Out of these 52 articles, the main source of sample were blood extractions, with 44 (84.6%) works performing them. Twenty-eight (53.8%) used plasma samples, 6 (11.5%) from the jugular vein and the remaining 22 (42.3%) from the cubital vein. Serum was employed in 16 (30.8%) of the studies. Continue by order of use, saliva was the third sample source with 7 (13.5%), followed by cerebrospinal fluid (CSF) with 3 (5.8%) and by tear fluid with 2 (3.8%), and last, gingival crevicular fluid (GCF) with 1 (1.9%). According to the determination method, 21 (40.4%) of the studies measured CGRP by radioimmunoassay (RIA), 2 (3.8%) of them together with Bradford protein assay (Bradford), and 29 (55.8%) by ELISA, 1 (1.9%) performed along with bicinchoninic acid protein assay (BCA), and 2 (3.8%) used undefined enzyme immune assay (EIA).

Seventeen (32.7%) studies did not include healthy controls while the remaining 35 (67.3%) did. Sampling of the migraine patients were performed only in the ictal phase for 5 (9.6%) studies, only in the interictal phase for 20 (38.5%), in both phases for 22 (42.3%) works, and in 5 (9.6%) of them the phase was not specified.

Data was presented in different ways including mean ± standard deviation, ± standard error of mean (SEM), ± 2*SEM; median with range, interquartile range (IQR), 95% confidence intervals (CI), and in multiple units, pmol/L, fmol/mL, pmol/mg of total protein, pg/mL, pg/µg of total protein.

Therefore, these methodologies, sampling differences and variable data displays did not allow for a meta-analysis, and the absolute CGRP range among all the studies could be inferred, showing a wide range of concentrations (2.45–219,700 pg/mL) [ 28 , 54 ].

Experimental results

The kit from Biorbyt showed an elevated content of CGRP (range: 150-980pg/mL) compared to what has been reported in the bibliography [ 24 , 25 , 26 , 27 , 29 , 38 , 49 , 54 ] when undiluted serum samples were used. Moreover, the reproducibility of the kit was not satisfactory as the assayed samples did not meet the intra and inter-assay coefficient of variance criteria set by the manufacturer (> 10% and > 12%, respectively). This kit also showed a total lack of linearity for the dilutions of 1:2, 1:4, 1:8, 1:16 and 1:32 with each dilution showing higher CGRP concentrations than the one before (data not shown).

For BMA Biomedical kit we were unable to obtain a single measurement within the detection range. Since we decided to strictly follow the manufacturer’s instructions, we could not modify the standard curve points. All the readout absorbance measurements from the tested samples exceed the absorbance range obtained from the readout of the standard curve, and because this is a competitive ELISA technique, no dilution could be tested and neither we could assayed the reproducibility of the test.

Alpha-CGRP specific kit, from Abbexa, showed similar CGRP concentrations (range: 25-105pg/mL) to what has been described previously in most studies using serum from our group [ 38 , 69 , 70 ] and others [ 25 , 26 , 27 , 49 ]. Most of the samples fall within mid-range of the standard curve but the kit showed a good linearity of the measurements when samples were diluted 1:2, 1:3, 1:4 and 1:8 (data not shown). Across the different plates results fulfilled the reproducibility criteria by having an intra and inter-assay coefficient of variance below the maximum set by the manufacturer (< 8% and < 10%, respectively).

The last kit, from CUSABIO, showed similar beta-CGRP concentrations than reported in the literature (range: 1.6–10.5pg/mL) [ 31 , 35 , 36 , 38 , 70 , 71 ]. Because the samples fall within the lower part of the standard curve dilution of 1:2, 1:3 and 1:4 resulted in a lack of signal and the impossibility to determine the concentration of the peptide in all the samples but those with the higher beta-CGRP content. In this latter group the linearity found was between the ranges supplied by the manufacturer. Across the different plates results fulfilled the reproducibility criteria by having an intra and inter-assay coefficient of variance below the maximum set by the manufacturer (< 8% and < 10%, respectively).

Because the 2 kits based on competitive ELISA did not meet the quality requirements and did not adjust to the reported units in the literature the following experiments were carried out using the kits from Abbexa and CUSABIO which have been used by our group in previous studies [ 38 , 69 , 70 , 71 ].

We did not find changes in alpha nor beta-CGRP across samples which remained for 2 h (alpha: 29.9 ± 18.6pg/mL; beta: 4.9 ± 1.7pg/mL), 4 h (alpha: 30.4 ± 18.2pg/mL; beta: 4.7 ± 1.5pg/mL) and 24 h (alpha: 30.2 ± 19.6pg/mL; beta: 4.4 ± 1.8pg/mL) at 4°C compared to those which got deep frozen right away (alpha: 29.2 ± 20.6pg/mL; beta: 4.6 ± 1.6pg/mL; p  = 0.99; p  = 0.84; p  = 0.99; respectively) (Fig.  1 ).

figure 1

Sample processing: evolution of individual A alpha-CGRP and B beta-CGRP values for each subject throughout the time samples remained stored at 4°C before froze at -80°C

No differences were found in none of the molecules when comparing serum samples obtained in rest an immediately stored at -80°C and those obtained after exercise and with the same processing protocol (alpha: 31.1 ± 19.0pg/mL; beta: 4.8 ± 1.7pg/mL; p  = 0.44) (Fig.  2 ).

figure 2

Effect of exercise: evolution of individual A alpha-CGRP and B beta-CGRP values for each subject when sampling was performed in rest of after 20 minutes of moderate exercise

The first significant differences between samples which were measured before they remained stored at -80°C for a month (alpha: 42.3 ± 15.1pg/mL; beta: 4.9 ± 2.0pg/mL) and assayed after this date appeared from the sixth month of storage for both alpha-CGRP and beta-CGRP (alpha: 28.6 ± 11.3pg/mL, p  < 0.01; beta: 3.0 ± 1.3pg/mL, p  < 0.01) (Fig.  3 ).

figure 3

Effect of storage: changes of individual A alpha-CGRP and B beta-CGRP values when samples were immediately analysed or analysed when they surpassed 6 months storage. Data is shown as average ± SD. Comparisons were made using Wilcoxon matched-pairs signed rank test. ** p  < 0.01

Analysis of our database

Alpha and beta-CGRP did follow a normal distribution and averaged (median with IQR) 37.5 (28.2–54.4)pg/mL and 4.6 (2.4–6.4)pg/mL, respectively. Spearman correlation between alpha-CGRP and age was non-significant ( p  = 0.300; r  = -0.05), while it was significant for beta-CGRP and age ( p  < 0.0001; r  = 0.24). When these correlations were analysed with females and males alone it kept being non-significant for alpha-CGRP (male: p  = 0.151, r  = -0.14; female: p  = 0.514, r  = -0.04) and significant for beta-CGRP (male: p  = 0.028, r  = 0.21; female: p  < 0.0001, r  = 0.26). Alpha and beta-CGRP levels did not correlate significantly ( p  = 0.056; r  = 0.11). When sorted by sex, groups had no significant differences in their age distribution (male: 55.6 ± 17.7 years; female: 54.1 ± 16.9 years; p  = 0.222), and showed significant differences in their alpha-CGRP content (median [IQR]; males: 54.4 [38.1–77.6] pg/mL; females: 45.2 [32.5–65.3] pg/mL; p  < 0.01) and unaltered beta-CGRP levels (median [IQR]; males: 4.0 [2.3–6.2] pg/mL; females: 3.9 [2.1–6.1] pg/mL; p  = 0.728) (Fig.  4 ).

figure 4

In-house data analysis: A distribution of alpha-CGRP levels vs. age, green line represents a linear regression and red dotted line represents the CI; B distribution of beta-CGRP levels vs. age, green line represents a linear regression and red dotted line represents the CI; C distribution of beta-CGRP vs. alpha-CGRP levels, green line represents a linear regression and red dotted line represents the CI; D comparison of alpha-CGRP concentrations in subjects sorted by sex; E comparison of alpha-CGRP concentrations in subjects sorted by sex. Data is shown as average ± SD. Comparisons were made using Mann–Whitney U test, ns: non-significant; ** p  < 0.01

Our literature analysis (Table  1 ) shows that studies based on CGRP determinations are highly variable in terms of measuring method and study design, including sample source, sample processing, inclusion/exclusion criteria for patients and controls and aim of the study [ 14 , 15 , 19 , 31 , 39 , 42 , 60 , 66 , 68 ]. Data analysis and presentation of laboratory determinations is also changeful, which hinder the comparison of the data. Despite all the difficulties, it results obvious that the overall outcomes and the conclusions drawn from them are inconsistent across works. Some authors have hypothesized that methodological differences might be the reason for such discrepancies [ 55 , 56 ], and, although this is likely to be the case, there is not to date a consensus of how CGRP determinations should be carried out.

If we analyse the methods used to measure CGRP in migraine patients we can see there have been mainly based on two different techniques, RIA and ELISA. RIA was the first, and until the late 2000s, the only one employed. RIA is based on the competitive incubation for specific antibody sites to form antigen–antibody complexes of radio-labelled and native unlabelled antigen. At equilibrium, the complexes formed are separated from the unbound antigen with a resulting ratio between these two. The bound/free antigen ratio is dependent on the amount of native antigen present in the sample as the radio-labelled is always added at a stable known concentration [ 72 ]. Therefore, this technique relies on the antiserum used, which has to provide an appropriate specificity in order to detect the antigen but no other analogues, and a proper affinity to do so in the range of interest.

The use of different antisera across all the CGRP-measuring studies based on RIA is a main source of variability among articles (Table  1 ). Works employing the same protocol, antiserum, and sample source usually have similar peptide concentrations [ 14 , 39 , 47 ], with some exceptions [ 48 ], while the use of different brands containing different antiseras and protocols show differing concentration ranges even when performed with same sample source [ 15 , 39 , 63 , 64 ], and even if they were done by the same specialist technician with the same samples [ 48 ]. Another problem is that even though studies with the exact same quantification method obtain similar concentration ranges they arrive to clashing conclusions, such as the presence of differences in CGRP concentrations between interictal migraine patients and healthy controls [ 17 , 65 ].

ELISA technique first appears to be used to determine CGRP concentration in migraine patients in 2007 [ 21 ]. ELISA is an immunological assay based on the interaction between the antigen and a primary antibody against the antigen of interest. These will interact, forming a complex that is later confirmed through the enzyme-linked antibody catalysis of an added substrate, which can be quantitatively measured using readouts from either a luminometer or a spectrophotometer. ELISA techniques are broadly classified into direct, indirect, sandwich, and competitive ELISA. For CGRP determinations only competitive and sandwich ELISA have been employed. Competitive ELISA involves a competition between the sample antigen and the plate-coated antigen for the primary antibody, followed by the binding of enzyme-linked secondary antibodies (Fig.  5 ). Sandwich ELISA technique includes a sample antigen introduced to the antibody-precoated plate, followed by sequential binding of detection and enzyme-linked secondary antibodies to the recognition sites on the antigen (Fig.  6 ) [ 73 ]. In both cases, and similarly to what has been pointed out for RIA, the techniques rely on the specificity and sensitivity of the antibodies included in the kit. This is the reason why ELISA-based studies are also subjected to the exact same issues associated with RIA-based works. As it has been described, investigations using the same brand also reports similar peptide concentration ranges [ 25 , 26 , 30 , 44 , 49 , 67 ], even though this is not always the case [ 32 ], but, most importantly, those using different kits clash in the range of concentrations [ 23 , 61 , 62 ] on top of the conclusions drawn [ 33 , 61 ]. For this point we need to explain that kits from USCN Life Sciences and Cloud Clone Corp., and from Peninsula Laboratories and BMA Biomedicals have been considered as only two brands since these companies have merged or have been acquired by the other at some point in their history. Moreover, and this last point serves as an example, there is a lack of information by part of the researchers regarding the kits used, because sometimes the brand cited offers more than one kit or two different brands over the history have been in charge of its production, and with the given information it cannot be inferred which one it was [ 27 , 34 ]. This could be the reason why across studies using kits from the same brand they obtained different concentrations. Also, this lack often comes from the manufacturers, which most of the times do not report essential information to the user such as the specific epitope recognised by the antibodies or their cross-reactivity for analogues of CGRP. This has caused some controversies such as works employing kits specifically designed, according to the manufacturer, for the detection of beta-CGRP reporting results as total-CGRP [ 35 , 36 , 59 ] without proving in their papers whether the technique recognises alpha, beta, or total-CGRP.

figure 5

Schematic representation of a competitive ELISA protocol

figure 6

Schematic representation of a sandwich ELISA protocol

CGRP has been analysed in a broad number of samples sources including plasma and serum from the peripheral circulation and jugular vein, CSF, saliva, tear fluid and GCF. Due to the enormous variability of concentrations found within the sources (Table  1 ) and the fact that results are not homogenous even when the same technique and sample source were used, we thought the comparison between sample sources did not make sense.

Nonetheless, and because our group has focused on the determinations in serum with ELISA, we have done a specific analysis of the studies matching these two criteria. There seems to be a consensus range achieved by most of the studies, independently of the brand employed, and which approximately goes from 15 to 150 pg/mL for total and alpha-CGRP, because the data from the literature exhibits that most of the measured CGRP is the alpha isoform, and from 2 to 10 pg/mL for beta-CGRP.

Because there are examples of different works employing the same method, specific technique, sample source and similar inclusion/exclusion criteria whose results are contradictory [ 14 , 33 , 47 , 59 ], we cannot conclude that all the problematic with CGRP measurements is related to the quantifying method and/or the sample chose by the authors. There has to be other factors playing a role in the discrepancies, such as fluctuations with the circadian [ 74 ] or with the menstrual cycles [ 58 , 59 , 75 ], effect of resting/exercise [ 76 , 77 ], fast degradation of the peptide due to its short half-life [ 78 ], long-term storage stability [ 55 ], migraine and other comorbidities [ 69 , 71 , 79 , 80 , 81 ], and the effects of pharmacological treatments [ 26 , 28 , 38 , 39 , 40 , 41 , 42 , 43 , 44 ]. From our review we could not analyse these parameters, because they were not displayed with enough accuracy in most articles.

Experimental studies

Here, in an effort to provide more detailed information about the suitability of serum from peripheral blood for CGRP determinations, we carried out a series of experiments in order to shed light on some of the main questions regarding the lack of consistency with CGRP quantifications beyond the data already discussed from our review.

We have found that the specific ELISA kit employed has a crucial effect on the CGRP measurements, showing completely different concentration ranges depending on the reference.

Besides the differences in the range we have obtained some alarming results. One of the kits assayed, from Biorbyt, did not meet the reproducibility criteria, which automatically should make this kit unsuitable for any kind of research. On top of that it did not conserve the linearity when diluting the samples which adds more doubts to its reliability. The one from BMA Biomedicals, even though a kit from this brand has been used for a published work when the company had the name Peninsula Laboratories [ 24 ], showed for 4 different times results below the detection limit (20pg/mL), contradicting the data of the cited article. Once again, these data call for a more exhaustive description of the methodology, not only by the researchers but also the companies.

The other two kits assayed fulfilled all the quality requirements and presented a range of measurements which fit the range observed in studies using the same sample source. Because the kit from CUSABIO is specific for beta-CGRP we have considered that the objective range for this kind of determination is different to the range for the Abbexa kit, which detects alpha-CGRP. This comes with no surprise because notwithstanding we have not displayed it, in our previous works the exact internal validations were performed and our researches already shown that these kits were reliable and were in accordance with the results published in the past by other groups [ 25 , 26 , 27 , 31 , 35 , 36 , 49 ].

Overall, the analysis of the kits performed here acts as a probe that the determination methodology needs to be carefully assayed and critically analysed as this is the ultimate guarantee of the validity of the data. Because we have already done so with the 4 kit references listed in this study, we would like to encourage researchers to share their internal validation data with other kits they might have been using, as well as to invite the companies to share more details about their products, as we believe it has been a huge limitation in the field and this would produce a significant advance, saving a lot of time and money to the research.

Throughout the literature many different studies have acknowledged the reported short half-life of CGRP [ 38 , 52 , 55 , 56 , 82 ] as a main limitation for their works. Still, many fail to describe precisely enough their methodology for sample processing so readers can infer how this limitation took place. This problem has been pointed out before and the latest works have included a more accurate description of the sample processing [ 35 , 36 , 38 , 55 ]. To avoid this rapid degradation of the peptide Messlinger et al. [ 55 ] proposed buffering the sample with peptidase inhibitor, but they concluded that immediate freezing was the most effective way to preserve CGRP content.

We did not add peptidase inhibitors as we were using serum as sample and the addition of a peptidase inhibitor needed to be done right after centrifugation but we opt to freeze the samples immediately. Our results show that the degradation of the peptide did not happen, at least in the first 24 h, when samples were stored at 4°C. This complies with the instructions of most of the ELISA kits our group has assayed and which provide a window of time for sample storage depending on the temperature, specifying that samples can be stored at 4°C for up to 24 h before being analysed. These data appear to be contradicting the results of Kraenzlin et al. [ 78 ] regarding the half-life of CGRP. One could argue that the content of serum and plasma is different and the differences found in these studies could be accounted for the binding of CGRP to cellular compartments or to fibrinogen, effectively modifying its degradation. However, the cited article, performed in 1985, is not exempt from limitations and should be reconsidered when analysing the stability of the peptide, at least in isolated biological fluids. First, this pharmacokinetic (PK) study fails to achieve some critical points that are currently required for this kind of works [ 83 ]. CGRP concentration should achieve a steady state in order to be able to extrapolate the half-life as it at this point when the phenomena of absorption, distribution, metabolism and excretion have reached and equilibrium and therefore stopping the infusion will give the information about the actual elimination half-life. Moreover, results from human in-vivo PK studies are not necessarily equivalent to those obtained from in-vitro or animal models in-vivo [ 84 ]. Our findings show that serum freezing does not need to be immediate as long as it is kept in the fridge after instant centrifugation following the clotting. This discovery has the potential to ease the methodology of sample processing for CGRP determinations. Although this is a disruptive finding, data should come with no surprise as other neuropeptides with similar and even shorter half-life than CGRP, such as vasoactive intestinal peptide (VIP) [ 85 ], amylin [ 86 ], and pituitary adenylate cyclase activating peptide-38 (PACAP-38) [ 87 ] are being measured without controversy over the sample processing time [ 33 , 51 , 88 ].

Another point recurrently mentioned in the literature is the long-term stability of the molecules when frozen. Available data indicates that storages of 8 months [ 55 ] significantly decrease the concentration of CGRP. Our results show that storages over 6 months have a decreasing effect on the serum levels of both isoforms of the peptide. With all the evidences collected future research should specify the time samples remained stored prior to being assayed as this could be a main limitation of the study and to date this data is not usually displayed. This opens up the question about whether controls should be matched not only by age and sex but also by the time their samples remained stored until measured, meaning that both groups, patients and controls, should be enrolled simultaneously to ensure the comparability of their CGRP measurements. This point has already been discussed in studies employing CGRP measurements with controversy results where cases and controls were recruited in two different time frames [ 89 ].

The first potential association between physical exercise and CGRP was described by Wyon et al. [ 90 ] with an animal model showing that rats had higher concentrations of CGRP in urine, CSF and serum after 1-h of running. Subsequent studies with more animal models have confirmed this relationship [ 91 , 92 ]. To date the evidence derived from studies with humans is scarce, with only two works [ 76 , 77 ]. The first one [ 76 ] showed that CGRP increased its concentration in samples collected by microdialysis in 8 individuals who had been subjected to eccentric exercise. In the second, completion of a half marathon produced an immediate CGRP increase dependent on the running intensity in 48 individuals [ 77 ].

The relevance for these discoveries in clinical practise is limited because subjects do not usually perform that kind of exercises right before a blood sampling. This is why we analysed the effect of exercise in a way that would reflect more accurately what might be happening at the actual sampling. The results showed that this kind of practise does not have an effect on alpha nor beta-CGRP levels and consequently the patients do not need to be on a strict rest prior to the blood extraction. However, data need to be managed carefully because the exact amount of exercise that has an effect has not been described yet and because the window between the no effect of a 20 min run and a half marathon is wide.

All the results obtained from the experimental analysis would need to be further explored with a bigger number of participants and to be tested in other samples sources that are being considered for CGRP determinations. Nonetheless, it is important to highlight that when considering the future use of CGRP as a biomarker it is necessary to select a sample source that is easy to obtain, which does not have irregular fluctuations associated with unknown factors and which offers reproducible and robust results. Jugular blood, tear fluid, CSF, GCF are not easy to obtain and saliva sampling has to follow very strict protocols to be reliable [ 93 ], so our opinion is that future research should perhaps be focusing in plasma and serum from peripheral blood.

In-house meta-analysis

Our results show that, with a huge number of participants, the levels obtained with Abbexa and CUSABIO kits fit the consensus range seen in the literature review for alpha and beta-CGRP, respectively, and contribute to set a more standardized range of concentration for the peptides.

The effects of sex and age on the circulating levels of CGRP is a point which has not been explored deeply enough. While some studies affirm that CGRP can correlate with age [ 38 ], others do not find such correlation [ 35 , 70 ]. For the data obtained from our in-house samples we found that beta-CGRP correlated positively with age, contradicting previous results obtained with the same kit in plasma and saliva [ 35 ]. Besides, the sub-group comprised by males had different alpha-CGRP content that the female, a finding which had not been described. Taking all these data together, the results call for a stricter control of the group design, which would need to be carefully matched in terms of sex and age, to avoid the effect that these two parameters could have on the comparisons.

Also, as the discrimination between alpha and beta-CGRP in research papers has recently begun [ 38 , 69 , 70 , 71 ], we have shown that these two peptides do not correlate their circulating levels and therefore the results obtained from measuring one or the other are not interchangeable and could lead to opposite conclusions because these molecules can have different behaviours even within the same disorder [ 38 ].

Strengths and limitations

Our work has several strengths. Our literature review summarizes in an easy to understand way all the mess regarding CGRP measurements, showing all the differences not only in terms of results, but also in their aims, design, measuring methodologies and conclusions, allowing for a critical analysis and which will serve as a basis for future comparisons.

Due recent literature has begun to differentiate between alpha and beta-CGRP, we have performed all our experiments to continue doing so, in an effort to expand the knowledge about the different traits of the two molecules.

All the enrolled individuals of the analysis of exercising and duration of the sample processing had their blood extraction performed at the same day and time, limiting the variability that the circadian cycle might have on the levels of the peptide, and all of them were carried out at our laboratory facilities, ensuring an immediate processing and freeze of the serum. Samples for the long-term storage analysis were also obtained at the same time of the day and all of them were collected within a week and assayed for the first and the second time altogether, limiting the effects of different storage time until the determinations and intra-assay variations.

Nonetheless, it has also some limitations that need to be listed. Although we had a bigger list of ELISA kits that have been employed by other researchers, we could not test them all and we decided to probe only 4, including both competitive and sandwich ELISA targeting total, alpha and beta-CGRP. The validity of other kits apart from the ones included in this study would need to be evaluated separately. Also, the results derived from our methodological experiments should be tested in other samples sources as we only included serum because this is, in our opinion, the best sample source for CGRP determinations. For the analysis of our data base, we acknowledge that we did not account for some of the comorbidities of the patients when analysing the effects of sex and age, but because these samples were from our bio-bank their clinical information was limited to the original aim why they were obtained and therefore did not allow such kind of correction.

We have reviewed the different results obtained throughout the years measuring CGRP making an effort to highlight their differences in terms of aim, inclusion/exclusion criteria, methodology, data display and conclusions. We have also analysed the way these differences might have affect the CGRP levels reported and we have come to the conclusion that is not only the sample or the method (RIA or ELISA) but even the brand employed which ultimately determine the concentration range.

Finally, we have illustrated some new features of CGRP determinations in serum which are very valuable for the planning of future studies. Concentrations of alpha-CGRP and beta-CGRP seems to be about (median with IQR) 37–5 (28.2–54.4) pg/mL and 4.6 (2.4–6.4) pg/mL, respectively, according to our in-house analysis, which agrees with what can be seen from the literature review. The facts that serum kept refrigerated conserves the CGRP content up to 24 h and that moderate exercise does not exert a modulation effect on the concentrations will ease the design of sample extraction and processing protocols. Also, we point out that storage time should be controlled as a new way to ensure the validity of results, probably by the simultaneous enrolling of all the subjects included in the study and/or by assaying their samples within similar time-ranges from the extraction. Ultimately, we have shown that alpha and beta-CGRP should be analysed separately as the isoforms do not correlate their concentrations and it has been illustrated in the literature that these can have different behaviours within the same disorder.

Overall, this work has brought new methodological data to progress in our way to evaluate the actual role of CGRP as a migraine biomarker at the same time it has evaluated the previous advances with a critical point of view, trying to produce a constructive criticism that will help to progress in this challenging topic.

Availability of data and materials

No datasets were generated or analysed during the current study.

Abbreviations

Bicinchoninic acid protein assay

Chronic daily headache

Calcitonin gene-related peptide

Confidence interval

Chronic migraine

Combined contraception

Cerebrospinal fluid

Cubital vein

Enzyme immune assay

Enzyme-linked immunosorbent assay

Episodic migraine

Gingival crevicular fluid

Healthy controls

International classification of headache disorders 3rd edition

Interquartile range

Jugular vein

Migraine with aura

Medication overuse

Migraine without aura

Pituitary adenylate cyclase activating peptide-38

Pharmacokinetics

Post menopause

Radioimmuno assay

Regular menstrual cycle

Standard deviation

Standard error of mean

Vasoactive intestinal peptide

Without aura

Olesen J (2018) Headache Classification Committee of the International Headache Society (IHS) The International Classification of Headache Disorders, 3rd edition. Cephalalgia 38:1–211

Hong J Bin, Lange KS, Overeem LH, et al (2023) A Scoping Review and Meta-Analysis of Anti-CGRP Monoclonal Antibodies: Predicting Response. Pharmaceuticals 16:934. https://doi.org/10.3390/ph16070934

Zobdeh F, ben Kraiem A, Attwood MM, et al (2021) Pharmacological treatment of migraine: Drug classes, mechanisms of action, clinical trials and new treatments. Br J Pharmacol 178:4588–4607

Article   CAS   PubMed   Google Scholar  

Hepp Z, Dodick DW, Varon SF et al (2015) Adherence to oral migraine-preventive medications among patients with chronic migraine. Cephalalgia 35:478–488. https://doi.org/10.1177/0333102414547138

Article   PubMed   Google Scholar  

Edvinsson L, Haanes KA, Warfvinge K, DiN K (2018) CGRP as the target of new migraine therapies - Successful translation from bench to clinic. Nat Rev Neurol 14:338–350. https://doi.org/10.1038/s41582-018-0003-1

Gago-Veiga J-M, MGEAPGPMMÁG-G, Castañeda S, (2022) Treatment of migraine with monoclonal antibodies. Expert Opin Biol Ther 22:707–716. https://doi.org/10.1080/14712598.2022.2072207

Negro A, Martelletti P (2019) Gepants for the treatment of migraine. Expert Opin Investig Drugs 28:555–567. https://doi.org/10.1080/13543784.2019.1618830

Angus-Leppan H (2013) Migraine: mimics, borderlands and chameleons. Pract Neurol 13:308–318. https://doi.org/10.1136/practneurol-2012-000502

Amara SG, Jonas V, Rosenfeld MG, et al (1982) Alternative RNA processing in calcitonin gene expression generates mRN As encoding different polypeptide products. Nature 298:240–244. https://doi.org/10.1038/298240a0

Brain SD, Macintyre lAIN, Williams TJ, et al (1986) A second form of human calcitonin gene-related peptide which is a potent vasodilator. Eur J Phamacol 124:349–352. https://doi.org/10.1016/0014-2999(86)90238-4

Russo AF, Hay DL (2023) CGRP physiology, pharmacology, and therapeutic targets: migraine and beyond. Physiol Rev 103:1565–1644

Schütz B, Mauer D, Salmon AM et al (2004) Analysis of the cellular expression pattern of β-CGRP in α-CGRP-deficient mice. J Comp Neurol 476:32–43. https://doi.org/10.1002/cne.20211

Mulderry PK, Ghatei MA, Spokes RA et al (1988) Differential expression of α-CGRP and β-CGRP by primary sensory neurons and enteric autonomic neurons of the rat. Neuroscience 25:195–205. https://doi.org/10.1016/0306-4522(88)90018-8

Goadsby PJ, Edvinsson L, Ekman R (1990) Vasoactive peptide release in the extracerebral circulation of humans during migraine headache. Ann Neurol 28:183–187. https://doi.org/10.1002/ana.410280213

Gallai V, Sarchielli P, Floridi A, et al (1995) Vasoactive peptide levels in the plasma of young migraine patients with and without aura assessed both interictally and ictally. Cephalalgia 15:384–390. https://doi.org/10.1046/j.1468-2982.1995.150538

Sarchielli P, Alberti A, Codini M et al (2000) Nitric oxide metabolites, prostaglandins and trigeminal vasoactive peptides in internal jugular vein blood during spontaneous migraine attacks. Cephalalgia 20:907–918

Ashina M, Bendtsen L, Jensen R, et al (2000) Evidence for increased plasma levels of calcitonin gene-related peptide in migraine outside of attacks. Pain 83:133–138. https://doi.org/10.1016/s0304-3959(00)00232-3

Sarchielli P, Alberti A, Floridi A, Gallai V (2001) Levels of nerve growth factor in cerebrospinal fluid of chronic daily headache patients. Neurology 57:132–134. https://doi.org/10.1212/WNL.57.1.132

Gallai V, Alberti A, Gallai B, et al (2003) Glutamate and nitric oxide pathway in chronic daily headache: evidence from cerebrospinal fluid. Cephalalgia 23:166–174. https://doi.org/10.1046/j.1468-2982.2003.00552.x

Bellamy JL, Cady RK, Durham PL (2006) Salivary levels of CGRP and VIP in rhinosinusitis and migraine patients. Headache 46:24–33. https://doi.org/10.1111/j.1526-4610.2006.00294.x

Fusayasu E, Kowa H, Takeshima T et al (2007) Increased plasma substance P and CGRP levels, and high ACE activity in migraineurs during headache-free periods. Pain 128:209–214. https://doi.org/10.1016/j.pain.2006.09.017

Sarchielli P, Pini LA, Coppola F et al (2007) Endocannabinoids in chronic migraine: CSF findings suggest a system failure. Neuropsychopharmacology 32:1384–1390. https://doi.org/10.1038/sj.npp.1301246

Jang M-U, Park J-W, Kho H-S et al (2011) Plasma and saliva levels of nerve growth factor and neuropeptides in chronic migraine patients. Oral Dis 17:187–193. https://doi.org/10.1111/j.1601-0825.2010.01717.x

Rodríguez-Osorio X, Sobrino T, Brea D et al (2012) Endothelial progenitor cells: A new key for endothelial dysfunction in migraine. Neurology 79:474–479. https://doi.org/10.1212/WNL.0b013e31826170ce

Cernuda-Morollón E, Larrosa D, Ramón C et al (2013) Interictal increase of CGRP levels in peripheral blood as a biomarker for chronic migraine. Neurology 81:1191–1196. https://doi.org/10.1212/WNL.0b013e3182a6cb72

Cernuda-Morollõn E, Martínez-Camblor P, Ramõn C et al (2014) CGRP and VIP levels as predictors of efficacy of onabotulinumtoxin type A in chronic migraine. Headache 54:987–995. https://doi.org/10.1111/head.12372

Fekrazad R, Sardarian A, Azma K et al (2018) Interictal levels of calcitonin gene related peptide in gingival crevicular fluid of chronic migraine patients. Neurol Sci 39:1217–1223. https://doi.org/10.1007/s10072-018-3340-3

Domínguez C, Vieites-Prado A, Pérez-Mato M et al (2018) CGRP and PTX3 as Predictors of Efficacy of Onabotulinumtoxin Type A in Chronic Migraine: An Observational Study. Headache 58:78–87. https://doi.org/10.1111/head.13211

Leira Y, Ameijeira P, Domínguez C et al (2019) Periodontal inflammation is related to increased serum calcitonin gene-related peptide levels in patients with chronic migraine. J Periodontol 90:1088–1095. https://doi.org/10.1002/JPER.19-0051

Han D (2019) Association of serum levels of calcitonin gene-related peptide and cytokines during migraine attacks. Ann Indian Acad Neurol 22:277–281. https://doi.org/10.4103/aian.AIAN_371_18

Article   PubMed   PubMed Central   Google Scholar  

Kamm K, Straube A, Ruscheweyh R (2019) Calcitonin gene-related peptide levels in tear fluid are elevated in migraine patients compared to healthy controls. Cephalalgia 39:1535–1543. https://doi.org/10.1177/0333102419856640

Pérez-Pereda S, Toriello-Suárez M, Ocejo-Vinyals G et al (2020) Serum CGRP, VIP, and PACAP usefulness in migraine: a case–control study in chronic migraine patients in real clinical practice. Mol Biol Rep 47:7125–7138. https://doi.org/10.1007/s11033-020-05781-0

Irimia P, Martínez-Valbuena I, Mínguez-Olaondo A et al (2021) Interictal amylin levels in chronic migraine patients: A case-control study. Cephalalgia 41:604–612. https://doi.org/10.1177/0333102420977106

Vural S, Albayrak L (2022) Can calcitonin gene-related peptide (CGRP) and pentraxin-3 (PTX-3) be useful in diagnosing acute migraine attack? J Recept Signal Transduction 42:562–566. https://doi.org/10.1080/10799893.2022.2097264

Article   CAS   Google Scholar  

Alpuente A, Gallardo VJ, Asskour L et al (2022) Salivary CGRP can monitor the different migraine phases: CGRP (in)dependent attacks. Cephalalgia 42:186–196

Alpuente A, Gallardo VJ, Asskour L et al (2022) Salivary CGRP and erenumab treatment response: towards precision medicine in migraine. Ann Neurol 92:846–859. https://doi.org/10.1002/ana.26472

Liu J, Wang G, Dan Y, Liu X (2022) CGRP and PACAP-38 play an important role in diagnosing pediatric migraine. J Headache Pain 23:1–13. https://doi.org/10.1186/s10194-022-01435-7

Gárate G, González-Quintanilla V, González A et al (2023) Serum Alpha and Beta-CGRP Levels in chronic migraine patients before and after monoclonal antibodies against CGRP or its receptor. Ann Neurol 94:285–294. https://doi.org/10.1002/ana.26658

Goadsby PJ, Edvinsson L (1993) The Trigreminovascular System and Migraine: Studies Characterizing Cerebrovascular and Neuropeptide Changes Seen in Humans and Cats. Annals of Neurology 33:48–56. https://doi.org/10.1002/ana.410330109

Juhasz G, Zsombok T, Jakab B, et al (2005) Sumatriptan causes parallel decrease in plasma calcitonin gene-related peptide (CGRP) concentration and migraine headache during nitroglycerin induced migraine attack. Cephalalgia 25:179–183. https://doi.org/10.1111/j.1468-2982.2005.00836.x

Sarchielli P, Pini LA, Zanchin G et al (2006) Clinical-biochemical correlates of migraine attacks in rizatriptan responders and non-responders. Cephalalgia 26:257–265. https://doi.org/10.1111/j.1468-2982.2005.01016.x

Cady RK, Vause CV, Ho TW et al (2009) Elevated saliva calcitonin gene-related peptide levels during acute migraine predict therapeutic response to rizatriptan. Headache 49:1258–1266. https://doi.org/10.1111/j.1526-4610.2009.01523.x

Cady R, Turner I, Dexter K et al (2014) An exploratory study of salivary calcitonin gene-related peptide levels relative to acute interventions and preventative treatment with onabotulinumtoxinA in chronic migraine. Headache 54:269–277. https://doi.org/10.1111/head.12250

Cernuda-Morollón E, Ramón C, Martínez-Camblor P et al (2015) OnabotulinumtoxinA decreases interictal CGRP plasma levels in patients with chronic migraine. Pain 156:820–824. https://doi.org/10.1097/j.pain.0000000000000119

Lassen LH, Haderslev PA, Jacobsen VB et al (2002) CGRP may play a causative role in migraine. Cephalalgia 22:54–61. https://doi.org/10.1046/j.1468-2982.2002.00310.x

Nicolodi M, Bianco E Del (1990) Sensory neuropeptides (substance P, calcitonin gene-related peptide) and vasoactive intestinal polypeptide in human saliva: their pattern in migraine and cluster headache. Cephalalgia 10:39–50. https://doi.org/10.1046/j.1468-2982.1990.1001039.x

Friberg L, Olesen J, Skyhøi Olsen T et al (1994) Absence of vasoactive peptide release from brain to cerebral circulation during onset of migraine with aura. Cephalalgia 14:47–54

Tvedskov JF, Lipka K, Ashina M et al (2005) No increase of calcitonin gene-related peptide in jugular blood during migraine. Ann Neurol 58:561–568. https://doi.org/10.1002/ana.20605

Lee MJ, Lee SY, Cho S, et al (2018) Feasibility of serum CGRP measurement as a biomarker of chronic migraine: a critical reappraisal. Journal of Headache and Pain 19:. https://doi.org/10.1186/s10194-018-0883-x

Latif R, Rafique N, Al AL et al (2021) Diagnostic accuracy of serum calcitonin gene-related peptide and apolipoprotein e in migraine: A preliminary study. Int J Gen Med 14:851–856. https://doi.org/10.2147/IJGM.S303350

Hanci F, Kilinc YB, Kilinc E et al (2021) Plasma levels of vasoactive neuropeptides in pediatric patients with migraine during attack and attack-free periods. Cephalalgia 41:166–175. https://doi.org/10.1177/0333102420957588

de Vries Lentsch S, Garrelds IM, Danser AHJ, et al (2022) Serum CGRP in migraine patients using erenumab as preventive treatment. J Headache Pain 23. https://doi.org/10.1186/s10194-022-01483-z

Goldstein ED, Gopal N, Badi MK et al (2023) CGRP, Migraine, and Brain MRI in CADASIL: A Pilot Study. Neurologist 28:231–236. https://doi.org/10.1097/NRL.0000000000000478

Neyal A, Ekmekyapar Fırat Y, Çekmen MB et al (2023) Calcitonin gene-related peptide and adrenomedullin levels during ictal and interictal periods in patients with migraine. Cureus. https://doi.org/10.7759/cureus.37843

Messlinger K, Vogler B, Kuhn A et al (2021) CGRP measurements in human plasma – a methodological study. Cephalalgia 41:1359–1373. https://doi.org/10.1177/03331024211024161

Kamm K (2022) CGRP and Migraine: What Have We Learned From Measuring CGRP in Migraine Patients So Far? Front Neurol 13:. https://doi.org/10.3389/fneur.2022.930383

Greco R, De Icco R, Demartini C, et al (2020) Plasma levels of CGRP and expression of specific microRNAs in blood cells of episodic and chronic migraine subjects: towards the identification of a panel of peripheral biomarkers of migraine? J Headache Pain 21. https://doi.org/10.1186/s10194-020-01189-0

Raffaelli B, Overeem LH, Mecklenburg J et al (2021) Plasma calcitonin gene-related peptide (CGRP) in migraine and endometriosis during the menstrual cycle. Ann Clin Transl Neurol 8:1251–1259. https://doi.org/10.1002/acn3.51360

Article   CAS   PubMed   PubMed Central   Google Scholar  

Raffaelli B, Storch E, Overeem LH et al (2023) Sex hormones and calcitonin gene-related peptide in women with migraine: a cross-sectional, matched cohort study. Neurology 100:E1825–E1835. https://doi.org/10.1212/WNL.0000000000207114

Frank F, Kaltseis K, Messlinger K, Broessner G (2022) Short Report of Longitudinal CGRP-Measurements in Migraineurs During a Hypoxic Challenge. Front Neurol 13. https://doi.org/10.3389/fneur.2022.925748

Raffaelli B, Terhart M, Fitzek MP et al (2023) Change of CGRP plasma concentrations in migraine after discontinuation of CGRP-(Receptor) monoclonal antibodies. Pharmaceutics 15:1–9. https://doi.org/10.3390/pharmaceutics15010293

Etefagh HH, Shahmiri SS, Melali H et al (2022) Bariatric surgery in migraine patients: CGRP level and weight loss. Obes Surg 32:3635–3640. https://doi.org/10.1007/s11695-022-06218-2

Juhasz G, Zsombok T, Modos EA et al (2003) NO-induced migraine attack: Strong increase in plasma calcitonin gene-related peptide (CGRP) concentration and negative correlation with platelet serotonin release. Pain 106:461–470. https://doi.org/10.1016/j.pain.2003.09.008

Pellesi L, Al-Karagholi MAM, De Icco R, et al (2022) Plasma Levels of CGRP During a 2-h Infusion of VIP in Healthy Volunteers and Patients With Migraine: An Exploratory Study. Front Neurol 13. https://doi.org/10.3389/fneur.2022.871176

Guo S, Vollesen ALH, Hansen YBL et al (2017) Part II: Biochemical changes after pituitary adenylate cyclase-activating polypeptide-38 infusion in migraine patients. Cephalalgia 37:136–147. https://doi.org/10.1177/0333102416639517

Abbas A, Moustafa R, Shalash A et al (2022) Serum CGRP changes following ultrasound-guided bilateral greater-occipital-nerve block. Neurol Int 14:199–206. https://doi.org/10.3390/neurolint14010016

Riesco N, Cernuda-Morollón E, Martínez-Camblor P et al (2017) Relationship between serum levels of VIP, but not of CGRP, and cranial autonomic parasympathetic symptoms: A study in chronic migraine patients. Cephalalgia 37:823–827. https://doi.org/10.1177/0333102416653232

Babapour M, Khorvash F, Rouhani MH, et al (2022) Effect of soy isoflavones supplementation on migraine characteristics , mental status and calcitonin gene - related peptide ( CGRP ) levels in women with migraine : results of randomised controlled trial. Nutr J 1–11. https://doi.org/10.1186/s12937-022-00802-z

Gárate G, Toriello M, González-Quintanilla V, et al (2023) Serum alpha-CGRP levels are increased in COVID-19 patients with headache indicating an activation of the trigeminal system. BMC Neurol 23. https://doi.org/10.1186/s12883-023-03156-z

Gárate G, Pascual M, Rivero M, et al (2023) Serum Calcitonin Gene-Related Peptide α and β Levels are Increased in COVID-19 Inpatients. Arch Med Res 54. https://doi.org/10.1016/j.arcmed.2022.12.002

Gárate G, Pascual M, Olmos JM, et al (2022) Increase in Serum Calcitonin Gene-Related Peptide β (CGRPβ) Levels in COVID-19 Patients with Diarrhea: An Underlying Mechanism? Dig Dis Sci 67. https://doi.org/10.1007/s10620-022-07473-0

Alhabbab RY (2018) Radioimmunoassay (RIA). Basic Serological Testing. Springer International Publishing, Cham, pp 77–81

Chapter   Google Scholar  

Sadat TM, Ahmed M (2022) Enzyme-Linked Immunosorbent Assay (ELISA). In: Christian SL (ed) Cancer Cell Biology: Methods and Protocols. Springer, US, New York, NY, pp 115–134

Google Scholar  

Wimalawansa SJ (1991) Circadian variation of plasma calcitonin gene-related peptide in man. J Neuroendocrinol 3:319–322. https://doi.org/10.1111/j.1365-2826.1991.tb00281.x

de Vries LS, Rubio-Beltrán E, MaassenVanDenBrink A (2021) Changing levels of sex hormones and calcitonin gene-related peptide (CGRP) during a woman’s life: Implications for the efficacy and safety of novel antimigraine medications. Maturitas 145:73–77. https://doi.org/10.1016/j.maturitas.2020.12.012

Jonhagen S, Ackermann P, Saartok T, Renstrom PA (2006) Calcitonin gene related peptide and neuropeptide Y in skeletal muscle after eccentric exercise: A microdialysis study. Br J Sports Med 40:264–267. https://doi.org/10.1136/bjsm.2005.022731

Tarperi C, Sanchis-Gomar F, Montagnana M et al (2020) Effects of endurance exercise on serum concentration of calcitonin gene-related peptide (CGRP): A potential link between exercise intensity and headache. Clin Chem Lab Med 58:1707–1712. https://doi.org/10.1515/cclm-2019-1337

Kraenzlin ME, Ch’ng JLC, Mulderry PK, et al (1985) Infusion of a novel peptide, calcitonin gene-related peptide (CGRP) in man. Pharmacokinetics and effects on gastric acid secretion and on gastrointestinal hormones. Regul Pept 10:189–197. https://doi.org/10.1016/0167-0115(85)90013-8

Reich A, Orda A, Wiśnicka B, Szepietowski JC (2007) Plasma concentration of selected neuropeptides in patients suffering from psoriasis. Exp Dermatol 16:421–428. https://doi.org/10.1111/j.1600-0625.2007.00544.x

Smillie SJ, Brain SD (2011) Calcitonin gene-related peptide (CGRP) and its role in hypertension. Neuropeptides 45:93–104

Li FJ, Zou YY, Cui Y et al (2013) Calcitonin gene-related peptide is a promising marker in ulcerative colitis. Dig Dis Sci 58:686–693. https://doi.org/10.1007/s10620-012-2406-y

Edvinsson L, Ekman R, Goadsby PJ (2010) Measurement of vasoactive neuropeptides in biological materials: Problems and pitfalls from 30 years of experience and novel future approaches. Cephalalgia 30:761–766

Krause A, Lott D, Dingemanse J (2021) Estimation of attainment of steady-state conditions for compounds with a long half-life. J Clin Pharmacol 61:82–89. https://doi.org/10.1002/jcph.1701

Cho HY, Choi GW, Lee YB (2019) Interpretation of non-clinical data for prediction of human pharmacokinetic parameters: In vitro-in vivo extrapolation and allometric scaling. Pharmaceutics 11. https://doi.org/10.3390/pharmaceutics11040168

Domschke S, Domschke W, Bloom2 SR, et al (1978) Vasoactive intestinal peptide in man: pharmacokinetics, metabolic and circulatory effects. Gut 19:1049–1053. https://doi.org/10.1136/gut.19.11.1049

Mathiesen DS, Lund A, Holst JJ et al (2022) Amylin and calcitonin – physiology and pharmacology. Eur J Endocrinol 186:R93–R111

Birk S, Sitarz JT, Petersen KA et al (2007) The effect of intravenous PACAP38 on cerebral hemodynamics in healthy volunteers. Regul Pept 140:185–191. https://doi.org/10.1016/j.regpep.2006.12.010

Al-Keilani MS, Almomani BA, Al-Sawalha NA et al (2022) Significance of serum VIP and PACAP in multiple sclerosis: an exploratory case–control study. Neurol Sci 43:2621–2630. https://doi.org/10.1007/s10072-021-05682-5

Ochoa-Callejero L, García-Sanmartín J, Villoslada-Blanco P, et al (2021) circulating levels of calcitonin gene-related peptide are lower in COVID-19 patients. J Endocr Soc 5. https://doi.org/10.1210/jendso/bvaa199

Wyon Y, Hammar M, Theodorsson E, Lundeberg T (1998) Effects of physical activity and acupuncture on calcitonin gene-related peptide immunoreactivity in different parts of the rat brain and in cerebrospinal fluid, serum and urine. Acta Physiol Scand 162:517–522. https://doi.org/10.1046/j.1365-201X.1998.0317e.x

Parnow A, Gharakhanlou R, Gorginkaraji Z, et al (2012) Effects of endurance and resistance training on calcitonin gene-related peptide and acetylcholine receptor at slow and fast twitch skeletal muscles and sciatic nerve in male wistar rats. Int J Pept 2012. https://doi.org/10.1155/2012/962651

Kooshki R, Abbasnejad M, Shamsizadeh A, et al (2020) Physical exercise enhances vulnerability to migraine headache associated with CGRP up-expression in trigeminal nucleus caudalis of stressed rats. Neurol Res 42:952–958. https://doi.org/10.1080/01616412.2020.1794243

Jasim H, Carlsson A, Hedenberg-Magnusson B, et al (2018) Saliva as a medium to detect and measure biomarkers related to pain. Sci Rep 8. https://doi.org/10.1038/s41598-018-21131-4

Download references

Acknowledgements

Not applicable.

This study has been founded by Instituto de Salud Carlos III (ISCII) through the project PI20/01358, co-funded by Fondos Europeos de Desarrollo Regional (FEDER), “Una manera de hacer Europa”, and through the project PMP22/00183, co-founded by the Recovery and Resilience Plan by The European Union NextGenerationUE.

Author information

Authors and affiliations.

Instituto de Investigación Marqués de Valdecilla (IDIVAL), Hospital Universitario Marqués de Valdecilla & Universidad de Cantabria, Santander, Spain

Gabriel Gárate, Julio Pascual, Marta Pascual-Mato, Jorge Madera, María Muñoz-San Martín & Vicente González-Quintanilla

You can also search for this author in PubMed   Google Scholar

Contributions

GG, VGQ, JP designed the study, collected and analysed the data and wrote the manuscript. VGQ, JM, MPM and JP recruited participants for the study. All authors reviewed, contributed, and edited the final draft. All authors approved the final version.

Corresponding author

Correspondence to Gabriel Gárate .

Ethics declarations

Ethics approval and consensus to participate.

The study was approved by the Ethics Committee of Cantabria and its approval has been published in the record 28/2020 of December 11, 2020. All participants gave written informed consent for their inclusion in the study.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Gárate, G., Pascual, J., Pascual-Mato, M. et al. Untangling the mess of CGRP levels as a migraine biomarker: an in-depth literature review and analysis of our experimental experience. J Headache Pain 25 , 69 (2024). https://doi.org/10.1186/s10194-024-01769-4

Download citation

Received : 18 March 2024

Accepted : 09 April 2024

Published : 29 April 2024

DOI : https://doi.org/10.1186/s10194-024-01769-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

The Journal of Headache and Pain

ISSN: 1129-2377

literature review data extraction

  • Research article
  • Open access
  • Published: 19 October 2020

Development, testing and use of data extraction forms in systematic reviews: a review of methodological guidance

  • Roland Brian Büchter   ORCID: orcid.org/0000-0002-2437-4790 1 ,
  • Alina Weise 1 &
  • Dawid Pieper 1  

BMC Medical Research Methodology volume  20 , Article number:  259 ( 2020 ) Cite this article

34k Accesses

33 Citations

27 Altmetric

Metrics details

Data extraction forms link systematic reviews with primary research and provide the foundation for appraising, analysing, summarising and interpreting a body of evidence. This makes their development, pilot testing and use a crucial part of the systematic reviews process. Several studies have shown that data extraction errors are frequent in systematic reviews, especially regarding outcome data.

We reviewed guidance on the development and pilot testing of data extraction forms and the data extraction process. We reviewed four types of sources: 1) methodological handbooks of systematic review organisations (SRO); 2) textbooks on conducting systematic reviews; 3) method documents from health technology assessment (HTA) agencies and 4) journal articles. HTA documents were retrieved in February 2019 and database searches conducted in December 2019. One author extracted the recommendations and a second author checked them for accuracy. Results are presented descriptively.

Our analysis includes recommendations from 25 documents: 4 SRO handbooks, 11 textbooks, 5 HTA method documents and 5 journal articles. Across these sources the most common recommendations on form development are to use customized or adapted standardised extraction forms (14/25); provide detailed instructions on their use (10/25); ensure clear and consistent coding and response options (9/25); plan in advance which data are needed (9/25); obtain additional data if required (8/25); and link multiple reports of the same study (8/25). The most frequent recommendations on piloting extractions forms are that forms should be piloted on a sample of studies (18/25); and that data extractors should be trained in the use of the forms (7/25). The most frequent recommendations on data extraction are that extraction should be conducted by at least two people (17/25); that independent parallel extraction should be used (11/25); and that procedures to resolve disagreements between data extractors should be in place (14/25).

Conclusions

Overall, our results suggest a lack of comprehensiveness of recommendations. This may be particularly problematic for less experienced reviewers. Limitations of our method are the scoping nature of the review and that we did not analyse internal documents of health technology agencies.

Peer Review reports

Evidence-based medicine has been defined as the integration of the best-available evidence and individual clinical expertise [ 1 ]. Its practice rests on three fundamental principles: 1) that knowledge of the evidence should ideally come from systematic reviews, 2) that the trustworthiness of the evidence should be taken into account and 3) that the evidence does not speak for itself and appropriate decision making requires trade-offs and consideration of context [ 2 ]. While the first principle directly speaks to the importance of systematic reviews, the second and third have important implications for their conduct. The second principle implies that systematic reviews should be based on rigorous, bias-reducing methods. The third principle implies that decision makers require sufficient information on the primary evidence to make sense of a review’s findings and apply them to their context.

Broadly speaking, a systematic review consists of five steps: 1) formulating a clear question, 2) searching for studies able to answer this question, 3) assessing and extracting data from the studies, 4) synthesizing the data and 5) interpreting the findings [ 3 ]. At a minimum, steps two to five rely on appropriate and thorough data collection methods. In order to collate data from primary studies, standardised data collection forms are used [ 4 ]. These link systematic reviews with primary research and provide the foundation for appraising, analysing, summarising and interpreting a body of evidence. This makes their development, pilot testing and application a crucial part of the systematic reviews process.

Studies on the prevalence and impact of data extraction errors have recently been summarised by Mathes and colleagues [ 5 ]. They identified four studies that looked at the frequency of data extraction errors in systematic reviews. The error rate for outcome data ranged from 8 to 63%. The impact of the errors on summary results and review conclusions varied. In one of the studies the effect size from the meta-analytic point estimates changed by more than 0.1 in 70% of cases (measured as standardised differences in means) [ 6 ]. Considering that most interventions have small to moderate effects, this can have a large impact on conclusions and decisions. Little research has been conducted on extraction errors relating to non-outcome data.

The importance of a rigorous data extraction process is not restricted to outcome data. As previously mentioned, users of systematic reviews need sufficient information on non-outcome data to make sense of the underlying primary studies and assess their applicability. Despite this, many systematic reviews do not sufficiently report this information. In one study almost 90% of systematic reviews of interventions did not provide the information required for treatments to be replicated in practice – compared to 35% of clinical trials [ 7 ]. While there are several possible reasons for this – including the quality of reporting – insufficient data collection forms or procedures may to contribute to the problem.

Against this background, we sought to review the guidance that is available to systematic reviewers for the development and pilot testing of data extraction forms and the data extraction process, these being central elements in systematic reviews.

This project was conducted as part of a dissertation, for which an exposé is available in German. We did not publish a protocol for this descriptive analysis, however. As there are no specific reporting guidelines for this type of methodological review, we reported our methods in accordance with the PRISMA statement as applicable [ 8 ].

Systematic reviews are conducted in a variety of different contexts – most notably as part of dissertations or academic research projects, as standalone projects, by health technology assessment (HTA) agencies and by systematic review organisations (SROs). Thus, we looked at a broad group of sources to identify recommendations:

Methodological handbooks from major SROs

Textbooks aimed at students and researchers endeavouring to conduct a systematic review

Method documents from HTA agencies

Published journal articles making recommendations on how to conduct a systematic review or how to develop data extraction forms

While the sources that we searched mainly focus on medicine and health, we did not exclude other health-related areas such as the social sciences or psychology.

Data sources

Regarding the methodological handbooks from SROs, we considered the following to be the most relevant to our analysis:

The Centre for Reviews and Dissemination’s guidance for undertaking reviews in health care (CRD guidance)

The Cochrane Handbook of Systematic Reviews of Interventions (Cochrane Handbook)

The Institute of Medicine’s Finding What Works in Health Care: Standards for Systematic Reviews (IoM Standards)

The Joanna Briggs Institute’s Reviewer Manual (JBI Manual)

The list of textbooks was based on a recently published article that reviewed systematic review definitions used in textbooks and other sources [ 9 ]. The authors did not carry out a systematic search for textbooks, but included textbooks from a broad range of disciplines including medicine, nursing, education, health library specialties and the social sciences published between 1998 and 2017. These textbooks included information on data extraction in systematic reviews, but none of them focussed on this topic exclusively.

Regarding the HTA agencies, we compiled a list of all member organisations of the European Network for Health Technology Assessment (EUnetHTA), the International Network of Agencies for Health Technology Assessment (INAHTA), Health Technology Assessment international (HTAi) and the Health Technology Assessment Network of the Americas (Red de Evaluación de Tecnologías en Salud de las Américas – RedETSA). The reference month for the compilation of this list was January 2019, the list is included in additional file  1 . We searched these websites for potentially relevant documents and downloaded these. We then reviewed the full texts of all documents for eligibility and included those that fulfilled our inclusion criteria. The website searches and the full text screening of the documents were conducted by two authors independently (RBB and AW). Disagreements were resolved by discussion. We also planned to include the newly founded Asia-Pacific HTA network (HTAsiaLink), but the webpage had not yet been launched during our research period.

To identify relevant journal articles, we first searched the Scientific Resource Center’s Methods Library (SRCML). This is a bibliography of publications relevant to evidence synthesis methods which was maintained until the third quarter of 2017 and has been archived as a RefWorks library. Because the SRCML is no longer updated, we conducted a supplementary search of Medline from the 1st of October 2017 to the 12th of December 2019. Finally, we searched the Cochrane Methodology Register (CMR), a reference database of publications relevant to the conduct of systematic reviews that was curated by the Cochrane Methods Group. The CMR was discontinued on the 31st of May 2012 and has been archived. Due to the limited search and export functions of these archived SRCML and CMR, we used pragmatic search methods for these sources. The search terms that were used for the databases searches are included in additional file  2 . The titles and abstracts from the database searches and the full texts of potentially relevant articles were screened for eligibility by two authors independently (RBB and AW). Disagreements were resolved by discussion or, if this was unsuccessful, arbitration with DP.

Inclusion criteria

To be eligible for inclusion in our review, documents had to fulfil the following criteria:

Published method document (e.g. handbook, guidance, standard operating procedure, manual), academic textbook or journal article

Include recommendations on the development or piloting of data extraction forms or the data extraction process in systematic reviews

Available in English or German

We excluded empirical research on different data extraction methods as well as papers on technical aspects, because these have been reviewed elsewhere [ 10 , 11 , 12 ]. This includes, for example, publications on the merits and downsides of different types of software (word processors, spreadsheets, database or specialised software) or the use of pencil and paper versus electronic extraction forms. We also excluded conference abstracts and other documents not published in full.

For journal articles we specified the inclusion and exclusion criteria more narrowly as this group includes a much broader variety of sources (for example we excluded “primers”, i.e. articles that provide an introduction to reading or appraising a systematic review for practitioners). The full list of inclusion and exclusion criteria for journal articles is published in additional file 2 .

Items of interest

We looked at a variety of items relevant to three categories of interest:

the development of data extraction forms,

the piloting of data extraction forms and

the data extraction process.

To our knowledge, no comprehensive list of potentially relevant items exists. We therefore developed a list of potentially relevant items based on iterative reading of the most influential method handbooks from SROs (see above) and our personal experience. The full list of items included in our extraction form is reported in additional file  3 together with a proposed rationale for each item.

We did not examine recommendations regarding the specific information that should be extracted from studies, because this depends on a review’s question. For example, reviewers might choose to include information on surrogate outcomes in order to aid interpretation of effects or they might choose not to, because they often poorly correlate with clinical endpoints and the researchers are interested in patient-relevant outcomes [ 13 , 14 ]. Furthermore, the specific information that is extracted for a review depends on the area of interest with special requirements for complex intervention or adverse effects reviews, for example [ 15 ]. For the same reason, we did not examine recommendations regarding specific methodological or statistical aspects. For instance, when a generic inverse variance meta-analysis is conducted, standard errors are of interest, whereas in other cases standard deviations may be preferably extracted.

  • Data extraction

One author developed the first draft of the data extraction form to gather information on the items of interest. This was reviewed by DP and complemented and revised after discussion. We collected bibliographic data, direct quotations on recommendations from the source text and page numbers.

Each item was coded using a coding scheme of five possible attributes:

recommendation for the use of this method

recommendation against the use of this method

optional use of this method

a general statement on this method without a recommendation

method not mentioned

For some items descriptive information was of additional interest. This included specific recommendations on the sample of studies that should be used to pilot the data extraction form or the experience or expertise of the reviewers that should be involved. Descriptive information was copied and pasted into the form. The form also included an open field for comments in case any additional items of interest were identified.

One author (RBB) extracted the information of interest from the included documents using the final version of the extraction form. A second author double-checked the information for each of the extracted items (AW). Discrepancies were resolved by discussion or by arbitration with DP.

During extraction, one major change was required to the form. Initially, we considered quantifying agreement only during the piloting phase of an extraction form, but later realised that some sources recommended this for the extraction phase of a review. We thus added items on quantifying agreement to this category.

Data analysis

We separately analysed and reported the four groups of documents (handbooks from SROs, documents from HTA agencies, textbooks and journal articles) and the three categories of interest (development, piloting and extraction). We summarised the results of our findings descriptively. We also aggregated the results across sources for each item using frequencies. Additional information is presented descriptively in the text.

In our primary analysis we only included documents that made recommendations for interventional reviews or generic recommendations. We did this because almost all included documents focussed on these types of reviews and, more importantly, to avoid inclusion of multiple recommendations from one institution. This was particularly relevant for the Joanna Briggs Institute’s Reviewer Manual which at the time of our analysis had 10 separate chapters on a variety of different systematic review types. The decision to restrict the primary analysis to documents focussing on interventional reviews and generic documents was made post hoc. Results for other types of reviews (e.g. scoping reviews, umbrella reviews, economic reviews) are presented as a secondary analysis.

We identified and searched 158 webpages of HTA agencies via the member lists of EUnetHTA, INAHTA, HTAi and RedETSA (see additional file 1 ). This resulted in 155 potentially relevant method documents from 67 agencies. After full text screening, 6 documents remained that fulfilled our inclusion criteria. The database searches resulted in 2982 records. After title and abstract screening, 15 potentially relevant full texts remained. Of these 5 fulfilled our inclusion criteria. A PRISMA flow chart depicting the screening process for the database searches is provided in additional file 2 and for the HTA method documents in additional file 1 .

In total, we collected data from 14 chapters in 4 handbooks of SROs [ 16 , 17 , 18 , 19 ], 11 textbooks [ 3 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 ], 6 method documents from HTA agencies [ 30 , 31 , 32 , 33 , 34 , 35 ] and 5 journal articles [ 36 , 37 , 38 , 39 , 40 ]. Additional file  4 lists all documents that fulfilled our inclusion criteria. In our primary analysis we describe recommendations from a total of 25 sources: 4 chapters from 4 SRO handbooks, 11 textbooks, 5 method documents from HTA agencies and 5 journal articles. Our secondary analysis on recommendations for non-interventional systematic reviews is included in Additional file  5 and the detailed results for the primary analysis in Additional file  6 .

Synthesis of the primary analysis

In sum, we analysed recommendations from 25 sources in our primary analysis. The most frequent recommendations on the development of extraction forms are to use customised or adapted standardised extraction forms (14/25); provide detailed instructions on their use (10/25); ensure clear and consistent coding and response options (9/25); plan in advance which data are needed (9/25); obtain additional data if required (8/25); and link multiple reports of the same study (8/25).

The most frequent recommendations on piloting extractions forms are that forms should be piloted on a sample of studies (18/25); and that data extractors should be trained in the use of the forms (7/25).

The most frequent recommendations on data extraction are that data extraction should be conducted by at least two people (17/25); that independent parallel extraction should be used (11/25); and that procedures to resolve disagreements between data extractors should be in place (14/25).

To provide a more comprehensible overview and illustrate areas where guidance is sparse, we have aggregated the results for definite recommendations (excluding optional recommendations or general statements) in Tables 1 , 2 and 3 . To avoid any misconceptions, we emphasise that by aggregating these results we by no means suggest that all items are of equal importance. Some are in fact mutually exclusive or interconnected.

The following sections provide details for each groups of documents sorted by the three categories of interest.

Handbooks of systematic review organisations

Category: development of extraction forms.

Three handbooks recommend that reviewers should plan in advance which data to extract [ 16 , 17 , 18 ]. Furthermore, three recommended that reviewers develop a customized data extraction form or adapt an existing form to meet the specific review needs [ 17 , 18 , 19 ]. In contrast, the JBI recommends use of their own standardised data extraction form, but allows reviewers to use others, if this is justified and the forms are described [ 16 ]. All four handbooks recommend that reviewers link multiple reports of the same study to avoid multiple inclusions of the same data [ 16 , 17 , 18 , 19 ]. Three handbooks make statements on strategies for obtaining unpublished data [ 16 , 17 , 18 ]. The Cochrane Handbook recommends contacting authors to obtain additional data, while the CRD guidance makes a general statement in light of the chances of success and resources available. The JBI manual makes this optional but requires the systematic reviewers to report whether authors of included studies are contacted in the review protocol.

Two handbooks recommend that the data collection form includes consistent and clear coding instructions and response options and that data extractors are provided with detailed instructions on how to complete the form [ 17 , 18 ]. The Cochrane Handbook also recommends that the entire review team should be involved in the development of the data extraction form and that this should include authors with expertise in the content area, review methods, statisticians and data extractors. The Cochrane Handbook also recommends that reviewers check compatibility of electronic forms or data systems with analytical software and ensure methods are in place to record, assess and correct data entry errors.

Category: piloting of extraction forms

Three handbooks recommended that authors pilot test their data extraction form [ 17 , 18 , 19 ]. The Cochrane Handbook recommends that “several people” are involved and “at least a few articles” used. The CRD guidance states that “a sample of included studies” should be used for piloting. The Cochrane Handbook also recommends that data extractors are trained; that piloting may need to be repeated if major changes to the extraction form are made during the review process; and that reports that have already been extracted should be re-checked in this case. None of the handbooks makes an explicit recommendation on who should be involved in piloting the data extraction form or their expertise. Furthermore, none of the handbooks makes a recommendation on quantifying agreement during the piloting process or using a quantified reliability threshold that should be reached before beginning the extraction process.

Category: data extraction

All handbooks recommend that data should be extracted by at least two reviewers (dual data extraction) [ 16 , 17 , 18 , 19 ]. Three handbooks recommend that data are extracted by two reviewers independently (parallel extraction) [ 16 , 18 , 19 ], one also considers it acceptable that one reviewer extracts the data and a second reviewer checks it for accuracy and completeness (double-checking) [ 17 ]. Furthermore, two of the handbooks make an optional recommendation that independent parallel extraction could be done only for critical data such as risk of bias and outcome data, while non-critical data is extracted by a single reviewer and double-checked by a second reviewer [ 18 , 19 ]. The Cochrane Handbook also recommends that data extractors have a basic understanding of the review topic and knowledge of study design, data analysis and statistics [ 18 ].

All handbooks recommend that reviewers should have procedures in place to resolve disagreements arising from dual data extraction [ 16 , 17 , 18 , 19 ]. In all cases discussion between extractors or arbitration with a third person are suggested. The Cochrane Handbook recommends hierarchical use of these strategies, while the other sources do not specify this [ 18 ]. Of note, the IoM Standards highlights the need for a fair procedure that ensures both reviewers judgements are considered in case of a power or experience asymmetry [ 19 ]. The Cochrane Handbook also recommends that disagreements that remain unresolved after discussion, arbitration or contact with study authors should be reported in the systematic review [ 18 ].

Two handbooks recommend to informally consider the reliability of coding throughout the review process [ 17 , 18 ]. These handbooks also mention the possibility of quantifying agreement of the extracted data. The Cochrane Handbook considers this optional and recommends it only for critical outcomes such as risk of bias assessments or key outcome data, if done [ 18 ]. The CRD guidance mentions this possibility without making a recommendation [ 17 ]. Two handbooks recommend that reviewers document disagreements and how they were resolved [ 17 , 18 ] and two recommend reporting who was involved in data extraction [ 18 , 19 ]. The IoM Standards specify this in that the number of individual data extractors and their qualifications should be reported in the methods section of the review [ 19 ].

Textbooks on conducting systematic reviews

Regarding the development of data extraction forms, the most frequent recommendation in the analysed textbooks is that reviewers should develop a customized extraction form or adapt an existing one to suit the needs of their review (6/11) [ 20 , 21 , 23 , 24 , 26 , 29 ]. Two textbooks consider the choice between customized and generic or pre-existing extraction forms optional [ 3 , 25 ].

Many of the textbooks also make statements on unpublished data (7/11). Most of them recommend that reviewers develop a strategy for obtaining unpublished data (4/11) [ 24 , 25 , 26 , 29 ]. One textbook makes an optional recommendation on obtaining unpublished data and mentions the alternative of conducting sensitivity analysis to account for missing data [ 3 ]. Two textbooks make general statements regarding missing data without a compulsory or optional recommendation [ 22 , 23 ].

Four textbooks recommend that reviewers ensure consistent and easy coding rules and response options in their data collection form [ 3 , 22 , 25 , 29 ]; three to provide detailed instruction on how to complete the data collection form [ 22 , 24 , 25 ]; and three to link multiple reports of the same study [ 3 , 24 , 26 ]. One textbook discusses the impact of including multiple study reports but makes no specific recommendation [ 23 ].

Two textbooks recommend reviewers to plan in advance which data they will need to extract for their review [ 24 , 28 ]. One textbook makes an optional recommendation, depending on the number of included studies [ 22 ]. For reviews with a small number of studies it considers an iterative process appropriate; for large data sets it recommends a thoroughly developed and overinclusive extraction form to avoid the need to go back to study reports later in the review process.

One textbook recommends that clinical experts or methodologists are consulted in developing the extraction form to ensure important study aspects are included [ 26 ]. None includes statements on the recording and handling of extraction errors.

For this category, the most frequently made recommendation in the analysed textbooks is that reviewers should pilot test their data extraction form (8/11) [ 3 , 20 , 22 , 23 , 24 , 25 , 26 , 29 ]. One textbook makes a general statement on piloting, but no specific recommendation [ 27 ].

Three textbooks recommend that data extractors are trained [ 22 , 24 , 25 ]. One textbook states that extraction should not begin before satisfactory agreement is achieved but does not define how this should be assessed [ 22 ]. No recommendations were identified for any of the other items regarding piloting of extraction form in the analysed textbooks.

Six textbooks recommend data extraction by at least two reviewers [ 22 , 23 , 24 , 25 , 26 , 29 ]. Four of these recommend parallel extraction [ 23 , 24 , 25 , 26 ], while two do not specify the exact procedure [ 22 , 29 ]. One textbook explains the different types of dual extraction modes but makes no recommendation on their use [ 27 ].

One textbook recommends that reviewer agreement for extracted data is quantified using a reliability measure [ 25 ], while two mention this possibility without making a clear recommendation [ 22 , 26 ]. Two of these mention Cohen’s kappa as possible measures for quantifying agreement [ 22 , 26 ], one also mentions raw agreement [ 22 ].

Five textbooks recommend that reviewers develop explicit procedures for resolving disagreements, either by discussion or consultation of a third person [ 22 , 24 , 25 , 26 , 29 ]. Two textbooks suggest a hierarchical approach using discussion and, if this is unsuccessful, arbitration with a third person [ 25 , 29 ]. One textbook also suggests the possibility of including the entire review team in discussions [ 24 ]. One textbook emphasizes that educated discussions should be preferred over voting procedures [ 26 ]. One textbook also recommends that reviewers document disagreements and how they were resolved [ 26 ].

One textbook makes recommendations on the expertise of the data extractors [ 24 ]. It suggests that data extraction is conducted by statisticians, data managers and methods experts with the possible involvement of content experts, when required.

Documents from HTA agencies

In two documents from HTA agencies it is recommended that a customised extraction form is developed [ 31 , 35 ]. One of these roughly outlines the contents of extraction forms that can be used as a starting point [ 31 ]. Three documents recommend that detailed instructions on using the extraction form should be provided [ 30 , 31 , 34 ]. Two documents recommend that reviewers develop a strategy for obtaining unpublished data [ 30 , 31 ].

The following recommendations are only included in one method document each: planning in advance which data will be required for the synthesis [ 30 ]; ensuring consistent coding and response options in the data collection form [ 31 ] and linking multiple reports of the same study to avoid including data from the same study more than once [ 31 ].

For this category the only recommendation we found in HTA documents is that data collection forms should be piloted before use (3/5) [ 30 , 31 , 33 ]. None of the documents specifies how this may be done, for example regarding the number or types of studies involved. One of the documents makes a vague suggestion that all reviewers ought to be involved in pilot testing.

In most documents it is recommended that data extraction should be conducted by two reviewers (4/5) [ 30 , 31 , 34 , 35 ]. Two make an optional recommendation for either parallel extraction or a double-checking procedure [ 30 , 31 ], one recommends parallel extraction [ 34 ] and one reports use of double-checking [ 35 ]. Three method documents recommend that reviewers resolving disagreements by discussion [ 30 , 31 , 35 ]. One method document recommends that reviewers report who was involved in data extraction [ 34 ].

Journal articles

We identified 5 journal articles that fulfilled our inclusion criteria. This included a journal article specifying the methods used by the Cochrane Back and Neck Group [ 36 ], an article describing the data extraction and synthesis methods used in JBI systematic reviews [ 38 ], a paper on guidelines for systematic review in the environmental research field [ 39 ] and two in-depth papers on data extraction and coding methods within systematic reviews [ 37 , 40 ]. One of these used the Systematic Reviews Data Suppository (SRDS) as an example, but the recommendations made were not exclusive to this system [ 37 ].

Three journal articles recommended that authors should plan in advance which data they require for the review [ 37 , 39 , 40 ]. A recommendation for developing a customized extraction form (or adapting one) for the specific purpose of the review was also made in three journal articles [ 36 , 37 , 40 ]. Two articles recommended that consistent and clear coding and response options should be ensured and detailed instruction provided to data extractors [ 37 , 40 ]. Furthermore, two articles recommended that mechanisms should be in place for recording, assessing and correcting data entry errors [ 36 , 37 ]. Both referred to plausibility or logic checks of the data and/or statistics.

One article recommends that reviewers try to obtain further data from the included studies, where required [ 39 ], while one makes an optional recommendation [ 36 ] and another a general statement without a specific recommendation [ 37 ]. One of the articles also makes recommendations on the expertise of the reviewers that should be involved in the development of the extraction form. It recommends that all members of the team are involved including data extractors, content area experts, statisticians and reviewers with formal training in form design such as epidemiologists [ 37 ].

Four articles recommend that reviewers should pilot test their extraction form [ 36 , 37 , 38 , 40 ]. Three articles recommend training of data extractors [ 37 , 38 , 40 ]. One recommends that reviewers informally assess the reliability of coding during the piloting process [ 37 ]. One article mentions the possibility of quantifying agreement during the piloting process, without making a specific recommendation or specifying any thresholds [ 40 ].

Three articles recommend that data are extracted by two reviewers, in each case using independent parallel extraction [ 36 , 37 , 38 ]. Citing the IoM standards, one article also mentions the possibility of a using independent parallel extraction for critical data and a double-checking procedure for non-critical data [ 37 ]. One article recommends that the principle reviewer runs regular logic checks to validate the extracted data [ 37 ]. One article also mentions the possibility that the reliability of extraction may need to be reviewed throughout the extraction process in case of extended coding periods [ 40 ].

Two articles mention the need to have a procedure in place for resolving disagreements, either with a hierarchical procedure using discussion and arbitration with a third person [ 36 ] or by discussion and review of the source document [ 37 ]. One article recommends that disagreements and consensus results are documented for future reference [ 37 ]. Finally, one article mentions advantages of having data extractors with complementary expertise such as a content expert and method experts, but does not make a clear recommendations on this [ 37 ].

We reviewed current recommendations on data extraction methods in systematic reviews across a different range of sources. Our results suggest that current recommendations are fragmented. Very few documents made comprehensive recommendations. This may be detrimental to the quality of systematic reviews and makes it difficult to aspiring reviewers to prepare high quality data extraction forms and ensure reliable and valid extraction procedures. While our review cannot show that improved recommendations will truly have an impact on the quality of systematic reviews, it seems reasonable to assume that clear and comprehensive recommendations are a prerequisite to high quality data extraction, especially for less experienced reviewers.

There were some notable exceptions to our findings. Among the most comprehensive documents were the Cochrane Handbook for Systematic Reviews, the textbook by Foster and colleagues and the journal article by Li and colleagues [ 18 , 24 , 37 ]. We believe that these are among the most helpful resources for systematic reviewers from the pool of documents that we analysed – not only because they provide in-depth information, but also for being among the most current sources.

We were particularly surprised by the lack of information provided by HTA agencies. Only very few HTA agencies had documents with relevant recommendations at all. Since many HTA agencies publish detailed documents on many other methodological aspects such as search screening methods, risk of bias assessments or evidence grading methods, it would seem reasonable to provide more information on data extraction methods.

We believe there would be many practical benefits of developing clearer recommendations for the development and testing of extraction forms and the data extraction process. One reason is that data extraction is one of the most resource intensive parts of a systematic review – especially, when the review includes a significant number of studies and/or outcomes. Having a good extraction form can also save time at later stages of the review. For example, a poorly developed extraction form may lead to extensive revisions during the review process and may require reviewers to go back to the original sources or repeat extraction on some included studies. Furthermore, some methodological standards such as independent parallel extraction could be modified to save resources. This is not reflected in most of the sources included in our review. Lastly, it would be helpful to specify recommendations further to accommodate for systematic reviews of different sizes, both in terms of the number of included studies and the review team. While the general quality standards should remain the same, a mega-review with several tens or even hundreds of studies, a large, heterogeneous or international review team and several data extractors may differ in some requirements from a small review with few studies and a small, local team [ 12 , 37 ]. For example, training and piloting may need more time to achieve agreement. We therefore encourage developers of guidelines documents for systematic reviews to provide more comprehensive recommendations on developing and piloting data extraction forms and the data extraction process. Our review can be used as a starting point. Formal development of structured guidance or a set of minimum standards on data extraction methods in systematic reviews may also be useful. Moher and colleagues have developed a framework to support the development of guidance to improve reporting, which includes literature reviews and a Delphi study and provides a helpful starting point [ 41 ]. Lastly, authors of reporting guidelines for systematic reviews of various types can use our results to consider elements worth including.

To some extent the results reflect the empirical evidence from comparative methods research. For example, among the most frequent recommendations were that data extraction should be conducted by two reviewers to reduce risk of errors, which is supported by some evidence [ 11 ]. This is also true for the recommendation that additional data should be retrieved if necessary, which reflects selective outcome reporting [ 42 ]. At the same time, we found few recommendations on reviewer expertise, for which empirical studies have produced inconsistent results [ 11 ]. Arguably, some items in our analysis have theoretical rather than empirical foundations. For instance, we would consider the inclusion of content experts in the development of the extraction forms to be important to enhance clinical relevance and applicability. Even this is a somewhat contested issue, however. Gøtzsche and Ioannidis, for instance, have questioned the value of involving content experts in systematic reviews [ 43 ]. In their analysis, they highlight the lack of evidence on the effects of involving them and in addition to the possible benefits raise potential downsides of expert involvement – notably that experts often have conflicts of interest and strong prior opinions that may introduce bias. While we do not argue against involvement of content experts since conflicts of interest can be managed, the controversy shows that this in fact may be an issue worth exploring empirically [ 44 ]. Thus, in addition to providing more in-depth recommendations for systematic reviewers, empirical evaluations of extraction methods should be encouraged. Such method studies should be based on a systematic review of the current evidence and overcome some of the limitations from previous investigations including the use of convenience samples and small sets of reviewers [ 11 ].

As a final note, some parts of systematic reviews can now be assisted by automation methods. Examples include enhanced study selection using learning algorithms (e.g. implemented in Rayyan) and assisted risk of bias assessments using RobotReviewer [ 45 , 46 ]. However, not all of the software solutions are free and some are still in their early development or have not been validated yet. Furthermore, some of them are restricted to specific review types [ 47 ]. To the best of our knowledge comprehensive tools to assist with data extraction, including for example extraction of outcome data, are no yet available [ 48 ]. For example, a recent systematic review conducted with currently available automation tools used traditional spreadsheet-based data extraction forms and piloting methods [ 49 ]. The authors identified two issues regarding data extraction that could be assisted by automation methods: contacting authors of included studies for additional information using metadata and better integration of software tools to automatically exchange data between different software. Thus, much work is still to be done in this area. Furthermore, when automation tools for data extraction become available, they will need to be readily available, usability tested, accepted by systematic reviewers and validated before widespread use (validation is especially important for technically complex or critical tasks) [ 50 ]. It is also likely that they will complement current data extraction methods rather than replace them as it is currently the case for automated risk of bias assessments of randomised trials [ 46 ]. For these reasons we believe that traditional data extraction methods will still be required and used in the future.

Limitations

There are some limitations to our methods. Firstly, our review is not exhaustive. The list of handbooks from SROs was compiled based on previous research and discussions between the authors, but no formal search was conducted to identify other potentially relevant organisations [ 51 , 52 ]. The list of textbooks was also based on a previous study not intended to cover the literature in full. It does, however, include textbooks from a range of disciplines including medicine, nursing, education and the social sciences, which arguably increases the generalisability of the findings. The search strategy for our database search was pragmatic for reasons stated in the methods and may have missed some relevant articles. Furthermore, the databases searched focus on the field of medicine and health, so other areas may be underrepresented.

Secondly, searching the websites of HTA agencies proved difficult in some instances, as some websites have quite intricate site structures. Furthermore, we did not contact the HTA agencies to retrieve unpublished documents. It is likely that at least some HTA agencies have internal documents that provide more specific recommendations. Our focus was the usefulness of the HTA method documents as a guidance to systematic reviewers outside of HTA institutions, however. For this purpose, we believe that the assumption is appropriate that most reviewers are likely to depend on the information directly accessible to them.

Thirdly, it was difficult to classify some of the recommendations using our coding scheme. For example, recommendations in the new Cochrane Handbook are based on Cochrane’s Methodological Expectations for Cochrane Intervention Reviews Standards (MECIR) which make a subtle differentiation between mandatory and highly desirable recommendations. In this case we considered both these types of recommendations as positive in our classification scheme. To use a more difficult example, one HTA method document did not make a statement on the number of reviewers involved in data extraction but stated that a third investigator may check a random sample of extracted data for additional quality assurance. This would imply that data extraction is conducted by two reviewers independently, but since this method was not stated, it was classified as “method not mentioned”. While some judgements were required, we have described notable cases in the results section and do not believe that different decisions in these cases would affect our overall results or conclusions.

Lastly, we note that some of the included sources referenced more comprehensive guidance such as the Cochrane Handbook. We have not formally extracted information on cross-referencing between documents, however.

Many current methodological guidance documents for systematic reviewers lack comprehensiveness and clarity regarding the development and piloting of data extraction forms and the data extraction process. In the future, developers of learning resources should consider providing more information and guidance on this important part of the systematic review process. Our review and list of items may be a helpful starting point. HTA agencies may consider describing in more detail their published methods on data extraction procedures to increase transparency.

Availability of data and materials

The datasets used and analysed for the current study are available from the corresponding author on reasonable request.

Abbreviations

Cochrane Methodology Register

Centre for Reviews and Dissemination

European Network for Health Technology Assessment

Health Technology Assessment

Health Technology Assessment international

The collaborative research network of Health Technology Assessment agencies in the Asia-Pacific region

International Network of Agencies for Health Technology Assessment

Institute of Medicine

Joanna Briggs Institute

Preferred Reporting Items for Systematic Reviews and Meta-Analyses

Red de Evaluación de Tecnologías en Salud de las Américas (Health Technology Assessment Network of the Americas)

Scientific Resource Center’s Methods Library

Systematic Review Organisations

Sackett DL, Rosenberg WMC, Gray JAM, Haynes RB, Richardson WS. Evidence based medicine: what it is and what it isn’t. BMJ. 1996;312:71–2.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Guyatt G, Rennie D, Meade MO, Cook DJ, editors. Users’ guides to the medical literature: a manual for evidence-based clinical practice. 3rd ed. New York: McGraw-Hill Education Ltd; 2015.

Google Scholar  

Khan KS, Kunz R, Kleijnen J, Antes G. Five steps to conducting a systematic review. J R Soc Med. 2003;96:118–21.

Article   PubMed   PubMed Central   Google Scholar  

Montori VM, Swiontkowski MF, Cook DJ. Methodologic issues in systematic reviews and meta-analyses. Clin Orthop Relat Res. 2003;413:43–54.

Article   Google Scholar  

Mathes T, Klaßen P, Pieper D. Frequency of data extraction errors and methods to increase data extraction quality: a methodological review. BMC Med Res Methodol. 2017;17:152.

Gøtzsche PC, Hróbjartsson A, Maric K, Tendal B. Data extraction errors in meta-analyses that use standardized mean differences. JAMA. 2007;298:430–7.

PubMed   Google Scholar  

Glasziou P, Meats E, Heneghan C, Shepperd S. What is missing from descriptions of treatment in trials and reviews? BMJ. 2008;336:1472–4.

Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ. 2009;339:b2535.

Krnic Martinic M, Pieper D, Glatt A, Puljak L. Definition of a systematic review used in overviews of systematic reviews, meta-epidemiological studies and textbooks. BMC Med Res Methodol. 2019;19:203.

Van der Mierden S, Tsaioun K, Bleich A, Leenaars CHC. Software tools for literature screening in systematic reviews in biomedical research. ALTEX. 2019;36:508–17.

Robson RC, Pham B, Hwee J, Thomas SM, Rios P, Page MJ, et al. Few studies exist examining methods for selecting studies, abstracting data, and appraising quality in a systematic review. J Clin Epidemiol. 2019;106:121–35.

Article   PubMed   Google Scholar  

Elamin MB, Flynn DN, Bassler D, Briel M, Alonso-Coello P, Karanicolas PJ, et al. Choice of data extraction tools for systematic reviews depends on resources and review complexity. J Clin Epidemiol. 2009;62:506–10.

Ciani O, Buyse M, Garside R, Pavey T, Stein K, Sterne JAC, et al. Comparison of treatment effect sizes associated with surrogate and final patient relevant outcomes in randomised controlled trials: meta-epidemiological study. BMJ. 2013;346:f457.

Haslam A, Hey SP, Gill J, Prasad V. A systematic review of trial-level meta-analyses measuring the strength of association between surrogate end-points and overall survival in oncology. Eur J Cancer. 2019;106:196–211.

Pfadenhauer LM, Gerhardus A, Mozygemba K, Lysdahl KB, Booth A, Hofmann B, et al. Making sense of complexity in context and implementation: the Context and Implementation of Complex Interventions (CICI) framework. Implement Sci. 2017;12:21.

Aromataris E, Munn Z, editors. Joanna Briggs Institute reviewer's manual: The Joanna Briggs Institute; 2017. https://reviewersmanual.joannabriggs.org/ . Accessed 04 June 2020.

Centre for Reviews and Dissemination. CRD’s guidance for undertaking reviews in health care. York: York Publishing Services Ltd; 2009.

Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, et al. Cochrane Handbook for Systematic Reviews of Interventions version 6.0: Cochrane; 2019. www.training.cochrane.org/handbook . Accessed 04 June 2020.

Institute of Medicine. Finding what works in health care: standards for systematic reviews. Washington, DC: The National Academies Press; 2011.

Bettany-Saltikov J. How to do a systematic literature review in nursing: a step-by-step guide. Berkshire: McGraw-Hill Education; 2012.

Booth A, Papaioannou D, Sutton A. Systematic approaches to a successful literature review. London: Sage Publications Ltd; 2012.

Cooper HM. Synthesizing research: a guide for literature reviews. Thousand Oaks: Sage Publications Inc; 1998.

Egger M, Smith GD, Altman DG. Systematic reviews in health care: meta-analysis in context. 2nd ed. London: BMJ Publishing Group; 2001.

Book   Google Scholar  

Foster MJ, Jewell ST. Assembling the pieces of a systematic review: a guide for librarians. Lanham: Rowman & Littlefield; 2017.

Holly C, Salmond S, Saimbert M. Comprehensive systematic review for advanced nursing practice. New York: Springer Publishing Company; 2012.

Mulrow C, Cook D. Systematic reviews: synthesis of best evidence for health care decisions. Philadelphia: ACP Press; 1998.

Petticrew M, Roberts H. Systematic Reviews in the Social Sciences: A Practical Guide. Malden: Blackwell Publishing; 2008.

Pope C, Mays N, Popay J. Synthesizing Qualitative and Quantitative Health Evidence. Maidenhead: McGraw Hill; 2007.

Sharma R, Gordon M, Dharamsi S, Gibbs T. Systematic reviews in medical education: A practical approach: AMEE Guide 94. Dundee: Association for Medical Education in Europe; 2015.

Fröschl B, Bornschein B, Brunner-Ziegler S, Conrads-Frank A, Eisenmann A, Gartlehner G, et al. Methodenhandbuch für health technology assessment: Gesundheit Österreich GmbH; 2012. https://jasmin.goeg.at/121/ . Accessed 19 Feb 2019.

Gartlehner G. (Internes) Manual Abläufe und Methoden: Ludwig Boltzmann Institut für Health Technology Assessment (LBI-HTA); 2007. http://eprints.aihta.at/713/ . Accessed 19 Feb 2019.

Health Information and Quality Authority (HIQA). Guidelines for the retrieval and interpretation of economic evaluations of health technologies in Ireland: HIQA; 2014. https://www.hiqa.ie/reports-and-publications/health-technology-assessments/guidelines-interpretation-economic . Accessed 19 Feb 2019.

Institute for Clinical and Economic Review (ICER). A guide to ICER’s methods for health technology assessment: ICER; 2018. https://icer-review.org/methodology/icers-methods/icer-hta-guide_082018/ . Accessed 19 Feb 2019.

International Network of Agencies for Health Technology Assessment (INAHTA). A checklist for health technology assessment reports: INAHTA; 2007. http://www.inahta.org/hta-tools-resources/briefs/ . Accessed 19 Feb 2019.

Malaysian Health Technology Assessment Section (MaHTAS). Manual on health technology assessment. 2015. https://www.moh.gov.my/moh/resources/HTA_MANUAL_MAHTAS.pdf?mid=636 .

Furlan AD, Malmivaara A, Chou R, Maher CG, Deyo RA, Schoene M, et al. 2015 Updated Method Guideline for Systematic Reviews in the Cochrane Back and Neck Group. Spine. 2015;40:1660–73.

Li T, Vedula SS, Hadar N, Parkin C, Lau J, Dickersin K. Innovations in data collection, management, and archiving for systematic reviews. Ann Intern Med. 2015;162:287–94.

Munn Z, Tufanaru C, Aromataris E. JBI’s systematic reviews: data extraction and synthesis. Am J Nurs. 2014;114:49–54.

Pullin AS, Stewart GB. Guidelines for systematic review in conservation and environmental management. Conserv Biol. 2006;20:1647–56.

Stock WA, Goméz Benito J, Balluerka LN. Research synthesis. Coding and conjectures. Eval Health Prof. 1996;19:104–17.

Article   CAS   PubMed   Google Scholar  

Moher D, Schulz KF, Simera I, Altman DG. Guidance for developers of health research reporting guidelines. PLoS Med. 2010;7:e1000217.

Kirkham JJ, Dwan KM, Altman DG, Gamble C, Dodd S, Smyth R, et al. The impact of outcome reporting bias in randomised controlled trials on a cohort of systematic reviews. BMJ. 2010;340:c365.

Gøtzsche PC, Ioannidis JPA. Content area experts as authors: helpful or harmful for systematic reviews and meta-analyses? BMJ. 2012;345:e7031.

Agoritsas T, Neumann I, Mendoza C, Guyatt GH. Guideline conflict of interest management and methodology heavily impacts on the strength of recommendations: comparison between two iterations of the American College of Chest Physicians Antithrombotic Guidelines. J Clin Epidemiol. 2017;81:141–3.

Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan—a web and mobile app for systematic reviews. Syst Rev. 2016;5:210.

Marshall IJ, Kuiper J, Wallace BC. RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials. J Am Med Informatics Assoc. 2016;23:193–201.

Beller E, Clark J, Tsafnat G, Adams C, Diehl H, Lund H, et al. Making progress with the automation of systematic reviews: principles of the International Collaboration for the Automation of Systematic Reviews (ICASR). Syst Rev. 2018;7:77.

O’Connor AM, Glasziou P, Taylor M, Thomas J, Spijker R, Wolfe MS. A focus on cross-purpose tools, automated recognition of study design in multiple disciplines, and evaluation of automation tools: a summary of significant discussions at the fourth meeting of the International Collaboration for Automation of Systematic R. Syst Rev. 2020;9:100.

Clark J, Glasziou P, Del Mar C, Bannach-Brown A, Stehlik P, Scott AM. A full systematic review was completed in 2 weeks using automation tools: a case study. J Clin Epidemiol. 2020;121:81–90.

O’Connor AM, Tsafnat G, Thomas J, Glasziou P, Gilbert SB, Hutton B. A question of trust: can we build an evidence base to gain trust in systematic review automation technologies? Syst Rev. 2019;8:143.

Cooper C, Booth A, Britten N, Garside R. A comparison of results of empirical studies of supplementary search techniques and recommendations in review methodology handbooks: a methodological review. Syst Rev. 2017;6:234.

Cooper C, Booth A, Varley-Campbell J, Britten N, Garside R. Defining the process to literature searching in systematic reviews: a literature review of guidance and supporting studies. BMC Med Res Methodol. 2018;18:85.

Download references

Acknowledgments

We thank information specialist Simone Hass for peer reviewing the search strategy and conducting searches.

No funding was received. Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and affiliations.

Institute for Research in Operative Medicine (IFOM), Faculty of Health - School of Medicine, Witten/Herdecke University, Ostmerheimer Str. 200, 51109, Cologne, Germany

Roland Brian Büchter, Alina Weise & Dawid Pieper

You can also search for this author in PubMed   Google Scholar

Contributions

Study design: RBB, DP. Data extraction: RBB, AW. Data analysis and interpretation: RBB, DP, AW. Writing the first draft of the manuscript: RBB. Revisions of the manuscript for important intellectual content: RBB, DP, AW. Final approval of the manuscript: RBB, DP, AW. Agree to be accountable for all aspects of the work: RBB, DP, AW. Guarantor: RBB.

Corresponding author

Correspondence to Roland Brian Büchter .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1..

List of HTA websites searched.

Additional file 2.

Information on database searches

Additional file 3.

List of items and rationale

Additional file 4.

List of included documents

Additional file 5.

Recommendations for non-interventional reviews

Additional file 6.

Primary analysis

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Büchter, R.B., Weise, A. & Pieper, D. Development, testing and use of data extraction forms in systematic reviews: a review of methodological guidance. BMC Med Res Methodol 20 , 259 (2020). https://doi.org/10.1186/s12874-020-01143-3

Download citation

Received : 11 June 2020

Accepted : 07 October 2020

Published : 19 October 2020

DOI : https://doi.org/10.1186/s12874-020-01143-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Systematic review methods
  • Evidence synthesis

BMC Medical Research Methodology

ISSN: 1471-2288

literature review data extraction

  • Open access
  • Published: 15 June 2015

Automating data extraction in systematic reviews: a systematic review

  • Siddhartha R. Jonnalagadda 1 ,
  • Pawan Goyal 2 &
  • Mark D. Huffman 3  

Systematic Reviews volume  4 , Article number:  78 ( 2015 ) Cite this article

41k Accesses

125 Citations

37 Altmetric

Metrics details

Automation of the parts of systematic review process, specifically the data extraction step, may be an important strategy to reduce the time necessary to complete a systematic review. However, the state of the science of automatically extracting data elements from full texts has not been well described. This paper performs a systematic review of published and unpublished methods to automate data extraction for systematic reviews.

We systematically searched PubMed, IEEEXplore, and ACM Digital Library to identify potentially relevant articles. We included reports that met the following criteria: 1) methods or results section described what entities were or need to be extracted, and 2) at least one entity was automatically extracted with evaluation results that were presented for that entity. We also reviewed the citations from included reports.

Out of a total of 1190 unique citations that met our search criteria, we found 26 published reports describing automatic extraction of at least one of more than 52 potential data elements used in systematic reviews. For 25 (48 %) of the data elements used in systematic reviews, there were attempts from various researchers to extract information automatically from the publication text. Out of these, 14 (27 %) data elements were completely extracted, but the highest number of data elements extracted automatically by a single study was 7. Most of the data elements were extracted with F-scores (a mean of sensitivity and positive predictive value) of over 70 %.

Conclusions

We found no unified information extraction framework tailored to the systematic review process, and published reports focused on a limited (1–7) number of data elements. Biomedical natural language processing techniques have not been fully utilized to fully or even partially automate the data extraction step of systematic reviews.

Peer Review reports

Systematic reviews identify, assess, synthesize, and interpret published and unpublished evidence, which improves decision-making for clinicians, patients, policymakers, and other stakeholders [ 1 ]. Systematic reviews also identify research gaps to develop new research ideas. The steps to conduct a systematic review [ 1 – 3 ] are:

Define the review question and develop criteria for including studies

Search for studies addressing the review question

Select studies that meet criteria for inclusion in the review

Extract data from included studies

Assess the risk of bias in the included studies, by appraising them critically

Where appropriate, analyze the included data by undertaking meta-analyses

Address reporting biases

Despite their widely acknowledged usefulness [ 4 ], the process of systematic review, specifically the data extraction step (step 4), can be time-consuming. In fact, it typically takes 2.5–6.5 years for a primary study publication to be included and published in a new systematic review [ 5 ]. Further, within 2 years of the publication of systematic reviews, 23 % are out of date because they have not incorporated new evidence that might change the systematic review’s primary results [ 6 ].

Natural language processing (NLP), including text mining, involves information extraction, which is the discovery by computer of new, previously unfound information by automatically extracting information from different written resources [ 7 ]. Information extraction primarily constitutes concept extraction, also known as named entity recognition, and relation extraction, also known as association extraction. NLP handles written text at level of documents, words, grammar, meaning, and context. NLP techniques have been used to automate extraction of genomic and clinical information from biomedical literature. Similarly, automation of the data extraction step of the systematic review process through NLP may be one strategy to reduce the time necessary to complete and update a systematic review. The data extraction step is one of the most time-consuming steps of a systematic review. Automating or even semi-automating this step could substantially decrease the time taken to complete systematic reviews and thus decrease the time lag for research evidence to be translated into clinical practice. Despite these potential gains from NLP, the state of the science of automating data extraction has not been well described.

To date, there is limited knowledge and methods on how to automate the data extraction phase of the systematic reviews, despite being one of the most time-consuming steps. To address this gap in knowledge, we sought to perform a systematic review of methods to automate the data extraction component of the systematic review process.

Our methodology was based on the Standards for Systematic Reviews set by the Institute of Medicine [ 8 ]. We conducted our study procedures as detailed below with input from the Cochrane Heart Group US Satellite.

Eligibility criteria

We included a report that met the following criteria: 1) the methods or results section describes what entities were or needed to be extracted, and 2) at least one entity was automatically extracted with evaluation results that were presented for that entity.

We excluded a report that met any of the following criteria: 1) the methods were not applied to the data extraction step of a systematic review; 2) the report was an editorial, commentary, or other non-original research report; or 3) there was no evaluation component.

Information sources and searches

For collecting the initial set of articles for our review, we developed search strategies with the help of the Cochrane Heart Group US Satellite, which includes systematic reviewers and a medical librarian. We refined these strategies using relevant citations from related papers. We searched three datasets: PubMed, IEEExplore, and ACM digital library, and our searches were limited between January 1, 2000 and January 6, 2015 (see Appendix 1 ). We restricted our search to these dates because biomedical information extraction algorithms prior to 2000 are unlikely to be accurate enough to be used for systematic reviews.

We retrieved articles that dealt with the extraction of various data elements, defined as categories of data that pertained to any information about or deriving from a study, including details of methods, participants, setting, context, interventions, outcomes, results, publications, and investigators [ 1 ] from included study reports. After we retrieved the initial set of reports from the search results, we then evaluated reports included in the references of these reports. We also sought expert opinion for additional relevant citations.

Study selection

We first de-duplicated the retrieve citations. For calibration and refinement of the inclusion and exclusion criteria, 100 citations were randomly selected and independently reviewed by a two authors (SRJ and PG). Disagreements were resolved by consensus with a third author (MH). In a second round, another set of 100 randomly selected abstracts was independently reviewed by two study authors (SRJ and PG), whereby we achieved a strong level of agreement (kappa = 0.97). Given the high level of agreement, the remaining studies were reviewed only by one author (PG). In this phase, we identified reports as “not relevant” or “potentially relevant”.

Two authors (PG and SRJ) independently reviewed the full text of all citations ( N  = 74) that were identified as “potentially relevant”. We classified included reports into various categories based on the particular data element that they attempted to extract from the original, scientific articles. Example of these data elements might be overall evidence, specific interventions, among others (Table  1 ). We resolved disagreements between the two reviewers through consensus with a third author (MDH).

Data collection process

Two authors (PG and SRJ) independently reviewed the included articles to extract data, such as the particular entity automatically extracted by the study, algorithm or technique used, and evaluation results into a data abstraction spreadsheet. We resolved disagreements through consensus with a third author (MDH).

We reviewed the Cochrane Handbook for Systematic Reviews [ 1 ], the CONsolidated Standards Of Reporting Trials (CONSORT) [ 9 ] statement, the Standards for Reporting of Diagnostic Accuracy (STARD) initiative [ 10 ], and PICO [ 11 ], PECODR [ 12 ], and PIBOSO [ 13 ] frameworks to obtain the data elements to be considered. PICO stands for Population, Intervention, Comparison, Outcomes; PECODR stands for Patient-Population-Problem, Exposure-Intervention, Comparison, Outcome, Duration and Results; and PIBOSO stands for Population, Intervention, Background, Outcome, Study Design, Other.

Data synthesis and analysis

Because of the large variation in study methods and measurements, a meta-analysis of methodological features and contextual factors associated with the frequency of data extraction methods was not possible. We therefore present a narrative synthesis of our findings. We did not thoroughly assess risk of bias, including reporting bias, for these reports because the study designs did not match domains evaluated in commonly used instruments such as the Cochrane Risk of Bias tool [ 1 ] or QUADAS-2 instrument used for systematic reviews of randomized trials and diagnostic test accuracy studies, respectively [ 14 ].

Of 1190 unique citations retrieved, we selected 75 reports for full-text screening, and we included 26 articles that met our inclusion criteria (Fig.  1 ). Agreement on abstract and full-text screening was 0.97 and 1.00.

Process of screening the articles to be included for this systematic review

Study characteristics

Table  1 provides a list of items to be considered in the data extraction process based on the Cochrane Handbook (Appendix 2 ) [ 1 ], CONSORT statement [ 9 ], STARD initiative [ 10 ], and PICO [ 11 ], PECODR [ 12 ], and PIBOSO [ 13 ] frameworks. We provide the major group for each field and report which standard focused on that field. Finally, we report whether there was a published method to extract that field. Table  1 also identifies the data elements relevant to systematic review process categorized by their domain and the standard from which the element was adopted and was associated with existing automation methods, where present.

Results of individual studies

Table  2 summarizes the existing information extraction studies. For each study, the table provides the citation to the study (study: column 1), data elements that the study focused on (extracted elements: column 2), dataset used by the study (dataset: column 3), algorithm and methods used for extraction (method: column 4), whether the study extracted only the sentence containing the data elements, full concept or neither of these (sentence/concept/neither: column 5), whether the extraction was done from full-text or abstracts (full text/abstract: column 6) and the main accuracy results reported by the system (results: column 7). The studies are arranged by increasing complexity by ordering studies that classified sentences before those that extracted the concepts and ordering studies that extracted data from abstracts before those that extracted data from full-text reports.

The accuracy of most ( N  = 18, 69 %) studies was measured using a standard text mining metric known as F-score, which is the harmonic mean of precision (positive predictive value) and recall (sensitivity). Some studies ( N  = 5, 19 %) reported only the precision of their method, while some reported the accuracy values ( N  = 2, 8 %). One study (4 %) reported P5 precision, which indicates the fraction of positive predictions among the top 5 results returned by the system.

Studies that did not implement a data extraction system

Dawes et al. [ 12 ] identified 20 evidence-based medicine journal synopses with 759 extracts in the corresponding PubMed abstracts. Annotators agreed with the identification of an element 85 and 87 % for the evidence-based medicine synopses and PubMed abstracts, respectively. After consensus among the annotators, agreement rose to 97 and 98 %, respectively. The authors proposed various lexical patterns and developed rules to discover each PECODR element from the PubMed abstracts and the corresponding evidence-based medicine journal synopses that might make it possible to partially or fully automate the data extraction process.

Studies that identified sentences but did not extract data elements from abstracts only

Kim et al. [ 13 ] used conditional random fields (CRF) [ 15 ] for the task of classifying sentences in one of the PICO categories. The features were based on lexical, syntactic, structural, and sequential information in the data. The authors found that unigrams, section headings, and sequential information from preceding sentences were useful features for the classification task. They used 1000 medical abstracts from PIBOSO corpus and achieved micro-averaged F-scores of 91 and 67 % over datasets of structured and unstructured abstracts, respectively.

Boudin et al. [ 16 ] utilized a combination of multiple supervised classification techniques for detecting PICO elements in the medical abstracts. They utilized features such as MeSH semantic types, word overlap with title, number of punctuation marks on random forests (RF), naive Bayes (NB), support vector machines (SVM), and multi-layer perceptron (MLP) classifiers. Using 26,000 abstracts from PubMed, the authors took the first sentence in the structured abstracts and assigned a label automatically to build a large training data. They obtained an F-score of 86 % for identifying participants (P), 67 % for interventions (I) and controls (C), and 56 % for outcomes (O).

Huang et al. [ 17 ] used a naive Bayes classifier for the PICO classification task. The training data were generated automatically from the structured abstracts. For instance, all sentences in the section of the structured abstract that started with the term “PATIENT” were used to identify participants (P). In this way, the authors could generate a dataset of 23,472 sentences. Using 23,472 sentences from the structured abstracts, they obtained an F-score of 91 % for identifying participants (P), 75 % for interventions (I), and 88 % for outcomes (O).

Verbeke et al. [ 18 ] used a statistical relational learning-based approach (kLog) that utilized relational features for classifying sentences. The authors also used the PIBOSO corpus for evaluation and achieved micro-averaged F-score of 84 % on structured abstracts and 67 % on unstructured abstracts, which was a better performance than Kim et al. [ 13 ].

Huang et al. [ 19 ] used 19,854 structured extracts and trained two classifiers: one by taking the first sentences of each section (termed CF by the authors) and the other by taking all the sentences in each section (termed CA by the authors). The authors used the naive Bayes classifier and achieved F-scores of 74, 66, and 73 % for identifying participants (P), interventions (I), and outcomes (O), respectively, by the CF classifier. The CA classifier gave F-scores of 73, 73, and 74 % for identifying participants (P), interventions (I), and outcomes (O), respectively.

Hassanzadeh et al. [ 20 ] used the PIBOSO corpus for the identification of sentences with PIBOSO elements. Using conditional random fields (CRF) with discriminative set of features, they achieved micro-averaged F-score of 91 %.

Robinson [ 21 ] used four machine learning models, 1) support vector machines, 2) naive Bayes, 3) naive Bayes multinomial, and 4) logistic regression to identify medical abstracts that contained patient-oriented evidence or not. These data included morbidity, mortality, symptom severity, and health-related quality of life. On a dataset of 1356 PubMed abstracts, the authors achieved the highest accuracy using a support vector machines learning model and achieved an F-measure of 86 %.

Chung [ 22 ] utilized a full sentence parser to identify the descriptions of the assignment of treatment arms in clinical trials. The authors used predicate-argument structure along with other linguistic features with a maximum entropy classifier. They utilized 203 abstracts from randomized trials for training and 124 abstracts for testing and achieved an F-score of 76 %.

Hara and Matsumoto [ 23 ] dealt with the problem of extracting “patient population” and “compared treatments” from medical abstracts. Given a sentence from the abstract, the authors first performed base noun-phrase chunking and then categorized the base noun-phrase into one of the five classes: “disease”, “treatment”, “patient”, “study”, and “others” using support vector machine and conditional random field models. After categorization, the authors used regular expression to extract the target words for patient population and comparison. The authors used 200 abstracts including terms such as “neoplasms” and “clinical trial, phase III” and obtained 91 % accuracy for the task of noun phrase classification. For sentence classification, the authors obtained a precision of 80 % for patient population and 82 % for comparisons.

Studies that identified only sentences but did not extract data elements from full-text reports

Zhao et al. [ 24 ] used two classification tasks to extract study data including patient details, including one at the sentence level and another at the keyword level. The authors first used a five-class scheme including 1) patient, 2) result, 3) intervention, 4) study design, and 5) research goal and tried to classify sentences into one of these five classes. They further used six classes for keywords such as sex (e.g., male, female), age (e.g., 54-year-old), race (e.g., Chinese), condition (e.g., asthma), intervention, and study design (e.g., randomized trial). They utilized conditional random fields for the classification task. Using 19,893 medical abstracts and full-text articles from 17 journal websites, they achieved F-scores of 75 % for identifying patients, 61 % for intervention, 91 % for results, 79 % for study design, and 76 % for research goal.

Hsu et al. [ 25 ] attempted to classify whether a sentence contains the “hypothesis”, “statistical method”, “outcomes”, or “generalizability” of the study and then extracted the values. Using 42 full-text papers, the authors obtained F-scores of 86 % for identifying hypothesis, 84 % for statistical method, 90 % for outcomes, and 59 % for generalizability.

Song et al. [ 26 ] used machine learning-based classifiers such as maximum entropy classifier (MaxEnt), support vector machines (SVM), multi-layer perceptron (MLP), naive Bayes (NB), and radial basis function network (RBFN) to classify the sentences into categories such as analysis (statistical facts found by clinical experiment), general (generally accepted scientific facts, process, and methodology), recommendation (recommendations about interventions), and rule (guidelines). They utilized the principle of information gain (IG) as well as genetic algorithm (GA) for feature selection. They used 346 sentences from the clinical guideline document and obtained an F-score of 98 % for classifying sentences.

Marshall et al. [ 27 ] used soft-margin support vector machines in a joint model for risk of bias assessment along with supporting sentences for random sequence generation, allocation concealment, blinding of participants and personnel, and blinding of outcome assessment, among others. They utilized presence of unigrams in the supporting sentences as features in their model. Working with full text of 2200 clinical trials, the joint model achieved F-scores of 56, 48, 35, and 38 % for identifying sentences corresponding to random sequence generation, allocation concealment, blinding of participants and personnel, and blinding of outcome assessment, respectively.

Studies that identified data elements only from abstracts but not from full texts

Demner-Fushman and Lin [ 28 ] used a rule-based approach to identify sentences containing PICO. Using 275 manually annotated abstracts, the authors achieved an accuracy of 80 % for population extraction and 86 % for problem extraction. They also utilized a supervised classifier for outcome extraction and achieved accuracy from 64 to 95 % across various experiments.

Kelly and Yang [ 29 ] used regular expressions and gazetteer to extract the number of participants, participant age, gender, ethnicity, and study characteristics. The authors utilized 386 abstracts from PubMed obtained with the query “soy and cancer” and achieved F-scores of 96 % for identifying the number of participants, 100 % for age of participants, 100 % for gender of participants, 95 % for ethnicity of participants, 91 % for duration of study, and 87 % for health status of participants.

Hansen et al. [ 30 ] used support vector machines [ 31 ] to extract number of trial participants from abstracts of the randomized control trials. The authors utilized features such as part-of-speech tag of the previous and next words and whether the sentence is grammatically complete (contained a verb). Using 233 abstracts from PubMed, they achieved an F-score of 86 % for identifying participants.

Xu et al. [ 32 ] utilized text classifications augmented with hidden Markov models [ 33 ] to identify sentences about subject demographics. These sentences were then parsed to extract information regarding participant descriptors (e.g., men, healthy, elderly), number of trial participants, disease/symptom name, and disease/symptom descriptors. After testing over 250 RCT abstracts, the authors obtained an accuracy of 83 % for participant descriptors: 83 %, 93 % for number of trial participants, 51 % for diseases/symptoms, and 92 % for descriptors of diseases/symptoms.

Summerscales et al. [ 34 ] used a conditional random field-based approach to identify various named entities such as treatments (drug names or complex phrases) and outcomes. The authors extracted 100 abstracts of randomized trials from the BMJ and achieved F-scores of 49 % for identifying treatment, 82 % for groups, and 54 % for outcomes.

Summerscales et al. [ 35 ] also proposed a method for automatic summarization of results from the clinical trials. The authors first identified the sentences that contained at least one integer (group size, outcome numbers, etc.). They then used the conditional random field classifier to find the entity mentions corresponding to treatment groups or outcomes. The treatment groups, outcomes, etc. were then treated as various “events.” To identify all the relevant information for these events, the authors utilized templates with slots. The slots were then filled using a maximum entropy classifier. They utilized 263 abstracts from the BMJ and achieved F-scores of 76 % for identifying groups, 42 % for outcomes, 80 % for group sizes, and 71 % for outcome numbers.

Studies that identified data elements from full-text reports

Kiritchenko et al. [ 36 ] developed ExaCT, a tool that assists users with locating and extracting key trial characteristics such as eligibility criteria, sample size, drug dosage, and primary outcomes from full-text journal articles. The authors utilized a text classifier in the first stage to recover the relevant sentences. In the next stage, they utilized extraction rules to find the correct solutions. The authors evaluated their system using 50 full-text articles describing randomized trials with 1050 test instances and achieved a P5 precision of 88 % for identifying the classifier. Precision and recall of their extraction rules was found to be 93 and 91 %, respectively.

Restificar et al. [ 37 ] utilized latent Dirichlet allocation [ 38 ] to infer the latent topics in the sample documents and then used logistic regression to compute the probability that a given candidate criterion belongs to a particular topic. Using 44,203 full-text reports of randomized trials, the authors achieved accuracies of 75 and 70 % for inclusion and exclusion criteria, respectively.

Lin et al. [ 39 ] used linear-chain conditional random field for extracting various metadata elements such as number of patients, age group of the patients, geographical area, intervention, and time duration of the study. Using 93 full-text articles, the authors achieved a threefold cross validation precision of 43 % for identifying number of patients, 63 % for age group, 44 % for geographical area, 40 % for intervention, and 83 % for time period.

De Bruijn et al. [ 40 ] used support vector machine classifier to first identify sentences describing information elements such as eligibility criteria, sample size, etc. The authors then used manually crafted weak extraction rules to extract various information elements. Testing this two-stage architecture on 88 randomized trial reports, they obtained a precision of 69 % for identifying eligibility criteria, 62 % for sample size, 94 % for treatment duration, 67 % for intervention, 100 % for primary outcome estimates, and 67 % for secondary outcomes.

Zhu et al. [ 41 ] also used manually crafted rules to extract various subject demographics such as disease, age, gender, and ethnicity. The authors tested their method on 50 articles and for disease extraction obtained an F-score of 64 and 85 % for exactly matched and partially matched cases, respectively.

Risk of bias across studies

In general, many studies have a high risk of selection bias because the gold standards used in the respective studies were not randomly selected. The risk of performance bias is also likely to be high because the investigators were not blinded. For the systems that used rule-based approaches, it was unclear whether the gold standard was used to train the rules or if there were a separate training set. The risk of attrition bias is unclear based on the study design of these non-randomized studies evaluating the performance of NLP methods. Lastly, the risk of reporting bias is unclear because of the lack of protocols in the development, implementation, and evaluation of NLP methods.

Summary of evidence

Extracting the data elements.

Participants — Sixteen studies explored the extraction of the number of participants [ 12 , 13 , 16 – 20 , 23 , 24 , 28 – 30 , 32 , 39 ], their age [ 24 , 29 , 39 , 41 ], sex [ 24 , 39 ], ethnicity [ 41 ], country [ 24 , 39 ], comorbidities [ 21 ], spectrum of presenting symptoms, current treatments, and recruiting centers [ 21 , 24 , 28 , 29 , 32 , 41 ], and date of study [ 39 ]. Among them, only six studies [ 28 – 30 , 32 , 39 , 41 ] extracted data elements as opposed to highlighting the sentence containing the data element. Unfortunately, each of these studies used a different corpus of reports, which makes direct comparisons impossible. For example, Kelly and Yang [ 29 ] achieved high F-scores of 100 % for age of participants, 91 % for duration of study, 95 % for ethnicity of participants, 100 % for gender of subjects, 87 % for health status of participants, and 96 % for number of participants on a dataset of 386 abstracts.

Intervention — Thirteen studies explored the extraction of interventions [ 12 , 13 , 16 – 20 , 22 , 24 , 28 , 34 , 39 , 40 ], intervention groups [ 34 , 35 ], and intervention details (for replication if feasible) [ 36 ]. Of these, only six studies [ 28 , 34 – 36 , 39 , 40 ] extracted intervention elements. Unfortunately again, each of these studies used a different corpus. For example, Kiritchenko et al. [ 36 ] achieved an F-score of 75–86 % for intervention data elements on a dataset of 50 full-text journal articles.

Outcomes and comparisons — Fourteen studies also explored the extraction of outcomes and time points of collection and reporting [ 12 , 13 , 16 – 20 , 24 , 25 , 28 , 34 – 36 , 40 ] and extraction of comparisons [ 12 , 16 , 22 , 23 ]. Of these, only six studies [ 28 , 34 – 36 , 40 ] extracted the actual data elements. For example, De Bruijn et al. [ 40 ] obtained an F-score of 100 % for extracting primary outcome and 67 % for secondary outcome from 88 full-text articles. Summerscales [ 35 ] utilized 263 abstracts from the BMJ and achieved an F-score of 42 % for extracting outcomes.

Results — Two studies [ 36 , 40 ] extracted sample size data element from full text on two different data sets. De Bruijn et al. [ 40 ] obtained an accuracy of 67 %, and Kiritchenko et al. [ 36 ] achieved an F-score of 88 %.

Interpretation — Three studies explored extraction of overall evidence [ 26 , 42 ] and external validity of trial findings [ 25 ]. However, all these studies only highlighted sentences containing the data elements relevant to interpretation.

Objectives — Two studies [ 24 , 25 ] explored the extraction of research questions and hypotheses. However, both these studies only highlighted sentences containing the data elements relevant to interpretation.

Methods — Twelve studies explored the extraction of the study design [ 13 , 18 , 20 , 24 ], study duration [ 12 , 29 , 40 ], randomization method [ 25 ], participant flow [ 36 , 37 , 40 ], and risk of bias assessment [ 27 ]. Of these, only four studies [ 29 , 36 , 37 , 40 ] extracted the corresponding data elements from text using different sets of corpora. For example, Restificar et al. [ 37 ] utilized 44,203 full-text clinical trial articles and achieved accuracies of 75 and 70 % for inclusion and exclusion criteria, respectively.

Miscellaneous — One study [ 26 ] explored extraction of key conclusion sentence and achieved a high F-score of 98 %.

Related reviews and studies

Previous reviews on the automation of systematic review processes describe technologies for automating the overall process or other steps. Tsafnat et al. [ 43 ] surveyed the informatics systems that automate some of the tasks of systematic review and report systems for each stage of systematic review. Here, we focus on data extraction. None of the existing reviews [ 43 – 47 ] focus on the data extraction step. For example, Tsafnat et al. [ 43 ] presented a review of techniques to automate various aspects of systematic reviews, and while data extraction has been described as a task in their review, they only highlighted three studies as an acknowledgement of the ongoing work. In comparison, we identified 26 studies and critically examined their contribution in relation to all the data elements that need to be extracted to fully support the data extraction step.

Thomas et al. [ 44 ] described the application of text mining technologies such as automatic term recognition, document clustering, classification, and summarization to support the identification of relevant studies in systematic reviews. The authors also pointed out the potential of these technologies to assist at various stages of the systematic review. Slaughter et al. [ 45 ] discussed necessary next steps towards developing “living systematic reviews” rather than a static publication, where the systematic reviews can be continuously updated with the latest knowledge available. The authors mentioned the need for development of new tools for reporting on and searching for structured data from clinical trials.

Tsafnat et al. [ 46 ] described four main tasks in systematic review: identifying the relevant studies, evaluating risk of bias in selected trials, synthesis of the evidence, and publishing the systematic reviews by generating human-readable text from trial reports. They mentioned text extraction algorithms for evaluating risk of bias and evidence synthesis but remain limited to one particular method for extraction of PICO elements.

Most natural language processing research has focused on reducing the workload for the screening step of systematic reviews (Step 3). Wallace et al. [ 48 , 49 ] and Miwa et al. [ 50 ] proposed an active learning framework to reduce the workload in citation screening for inclusion in the systematic reviews. Jonnalagadda et al. [ 51 ] designed a distributional semantics-based relevance feedback model to semi-automatically screen citations. Cohen et al. [ 52 ] proposed a module for grouping studies that are closely related and an automated system to rank publications according to the likelihood for meeting the inclusion criteria of a systematic review. Choong et al. [ 53 ] proposed an automated method for automatic citation snowballing to recursively pursue relevant literature for helping in evidence retrieval for systematic reviews. Cohen et al. [ 54 ] constructed a voting perceptron-based automated citation classification system to classify each article as to whether it contains high-quality, drug-specific evidence. Adeva et al. [ 55 ] also proposed a classification system for screening articles for systematic review. Shemilt et al. [ 56 ] also discussed the use of text mining to reduce screening workload in systematic reviews.

Research implications

No standard gold standards or dataset.

Among the 26 studies included in this systematic review, only three of them use a common corpus, namely 1000 medical abstracts from the PIBOSO corpus. Unfortunately, even that corpus facilitates only classification of sentences into whether they contain one of the data elements corresponding to the PIBOSO categories. No two other studies shared the same gold standard or dataset for evaluation. This limitation made it impossible for us to compare and assess the relative significance of the reported accuracy measures.

Separate systems for each data element

Few data elements, which are also relatively straightforward to extract automatically, such as the total number of participants (14 overall and 5 for extracting the actual data elements), have a relatively higher number of studies aiming towards extracting the same data element. This is not the case with other data elements. There are 27 out of 52 potential data elements that have not been explored for automated extraction, even if for highlighting the sentences containing them; seven more data elements were explored just by one study. There are 38 out of 52 potential data elements (>70 %) that have not been explored for automated extraction of the actual data elements; three more data elements were explored just by one study. The highest number of data elements extracted by a single study is only seven (14 %). This finding means that not only are more studies needed to explore the remaining 70 % data elements, but that there is an urgent need for a unified framework or system to extract all necessary data elements. The current state of informatics research for data extraction is exploratory, and multiple studies need to be conducted using the same gold standard and on the extraction of the same data elements for effective comparison.

Limitations

Our study has limitations. First, there is a possibility that data extraction algorithms were not published in journals or that our search might have missed them. We sought to minimize this limitation by searching in multiple bibliographic databases, including PubMed, IEEExplore, and ACM Digital Library. However, investigators may have also failed to publish algorithms that had lower F-scores than were previously reported, which we would not have captured. Second, we did not publish a protocol a priori, and our initial findings may have influenced our methods. However, we performed key steps, including screening, full-text review, and data extraction in duplicate to minimize potential bias in our systematic review.

Future work

“On demand” access to summarized evidence and best practices has been considered a sound strategy to satisfy clinicians’ information needs and enhance decision-making [ 57 – 65 ]. A systematic review of 26 studies concluded that information-retrieval technology produces positive impact on physicians in terms of decision enhancement, learning, recall, reassurance, and confirmation [ 62 ]. Slaughter et al. [ 45 ] discussed necessary next steps towards developing “living systematic reviews” rather than a static publication, where the systematic reviews can be continuously updated with the latest knowledge available. The authors mention the need for development of new tools for reporting on and searching for structured data from published literature. Automated information extraction framework that extract data elements have the potential to assist the systematic reviewers and to eventually automate the screening and data extraction steps.

Medical science is currently witnessing a rapid pace at which medical knowledge is being created—75 clinical trials a day [ 66 ]. Evidence-based medicine [ 67 ] requires clinicians to keep up with published scientific studies and use them at the point of care. However, it has been shown that it is practically impossible to do that even within a narrow specialty [ 68 ]. A critical barrier is that finding relevant information, which may be located in several documents, takes an amount of time and cognitive effort that is incompatible with the busy clinical workflow [ 69 , 70 ]. Rapid systematic reviews using automation technologies will enable clinicians with up-to-date and systematic summaries of the latest evidence.

Our systematic review describes previously reported methods to identify sentences containing some of the data elements for systematic reviews and only a few studies that have reported methods to extract these data elements. However, most of the data elements that would need to be considered for systematic reviews have been insufficiently explored to date, which identifies a major scope for future work. We hope that these automated extraction approaches might first act as checks for manual data extraction currently performed in duplicate; then serve to validate manual data extraction done by a single reviewer; then become the primary source for data element extraction that would be validated by a human; and eventually completely automate data extraction to enable living systematic reviews.

Abbreviations

natural language processing

CONsolidated Standards Of Reporting Trials

Standards for Reporting of Diagnostic Accuracy

Population, Intervention, Comparison, Outcomes

Patient-Population-Problem, Exposure-Intervention, Comparison, Outcome, Duration and Results

Population, Intervention, Background, Outcome, Study Design, Other

conditional random fields

naive Bayes

randomized control trial

British Medical Journal

Higgins J, Green S. Cochrane handbook for systematic reviews of interventions version 5.1. 0 [updated March 2011]. The Cochrane Collaboration. 2011. Available at [ http://community.cochrane.org/handbook ]

Khan KS, Ter Riet G, Glanville J, Sowden AJ, Kleijnen J. Undertaking systematic reviews of research on effectiveness: CRD’s guidance for carrying out or commissioning reviews, NHS Centre for Reviews and Dissemination. 2001.

Google Scholar  

Woolf SH. Manual for conducting systematic reviews, Agency for Health Care Policy and Research. 1996.

Field MJ, Lohr KN. Clinical practice guidelines: directions for a new program, Clinical Practice Guidelines. 1990.

Elliott J, Turner T, Clavisi O, Thomas J, Higgins J, Mavergames C, et al. Living systematic reviews: an emerging opportunity to narrow the evidence-practice gap. PLoS Med. 2014;11:e1001603.

Article   PubMed   PubMed Central   Google Scholar  

Shojania KG, Sampson M, Ansari MT, Ji J, Doucette S, Moher D. How quickly do systematic reviews go out of date? A survival analysis. Ann Intern Med. 2007;147(4):224–33.

Article   PubMed   Google Scholar  

Hearst MA. Untangling text data mining. Proceedings of the 37th annual meeting of the Association for Computational Linguistics. College Park, Maryland: Association for Computational Linguistics; 1999. p. 3–10.

Morton S, Levit L, Berg A, Eden J. Finding what works in health care: standards for systematic reviews. Washington D.C.: National Academies Press; 2011. Available at [ http://www.nap.edu/catalog/13059/finding-what-works-in-health-care-standards-for-systematic-reviews ]

Begg C, Cho M, Eastwood S, Horton R, Moher D, Olkin I, et al. Improving the quality of reporting of randomized controlled trials: the CONSORT statement. JAMA. 1996;276(8):637–9.

Article   CAS   PubMed   Google Scholar  

Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Clin Chem Lab Med. 2003;41(1):68–73. doi: 10.1515/CCLM.2003.012 .

Richardson WS, Wilson MC, Nishikawa J, Hayward RS. The well-built clinical question: a key to evidence-based decisions. ACP J Club. 1995;123(3):A12–3.

CAS   PubMed   Google Scholar  

Dawes M, Pluye P, Shea L, Grad R, Greenberg A, Nie J-Y. The identification of clinically important elements within medical journal abstracts: Patient–Population–Problem, Exposure–Intervention, Comparison, Outcome, Duration and Results (PECODR). Inform Prim Care. 2007;15(1):9–16.

PubMed   Google Scholar  

Kim S, Martinez D, Cavedon L, Yencken L. Automatic classification of sentences to support evidence based medicine. BMC Bioinform. 2011;12 Suppl 2:S5.

Article   Google Scholar  

Whiting P, Rutjes AWS, Reitsma JB, Bossuyt PMM, Kleijnen J. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3(1):25.

Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data, Proceedings of the Eighteenth International Conference on Machine Learning. 2001. p. 282–9. %L 3140.

Boudin F, Nie JY, Bartlett JC, Grad R, Pluye P, Dawes M. Combining classifiers for robust PICO element detection. BMC Med Inform Decis Mak. 2010;10:29. doi: 10.1186/1472-6947-10-29 .

Huang K-C, Liu C-H, Yang S-S, Liao C-C, Xiao F, Wong J-M, et al, editors. Classification of PICO elements by text features systematically extracted from PubMed abstracts. Granular Computing (GrC), 2011 IEEE International Conference on; 2011: IEEE.

Verbeke M, Van Asch V, Morante R, Frasconi P, Daelemans W, De Raedt L, editors. A statistical relational learning approach to identifying evidence based medicine categories. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning; 2012: Association for Computational Linguistics.

Huang K-C, Chiang IJ, Xiao F, Liao C-C, Liu CC-H, Wong J-M. PICO element detection in medical text without metadata: are first sentences enough? J Biomed Inform. 2013;46(5):940–6.

Hassanzadeh H, Groza T, Hunter J. Identifying scientific artefacts in biomedical literature: the evidence based medicine use case. J Biomed Inform. 2014;49:159–70.

Robinson DA. Finding patient-oriented evidence in PubMed abstracts. Athens: University of Georgia; 2012.

Chung GY-C. Towards identifying intervention arms in randomized controlled trials: extracting coordinating constructions. J Biomed Inform. 2009;42(5):790–800.

Hara K, Matsumoto Y. Extracting clinical trial design information from MEDLINE abstracts. N Gener Comput. 2007;25(3):263–75.

Zhao J, Bysani P, Kan MY. Exploiting classification correlations for the extraction of evidence-based practice information. AMIA Annu Symp Proc. 2012;2012:1070–8.

PubMed   PubMed Central   Google Scholar  

Hsu W, Speier W, Taira R. Automated extraction of reported statistical analyses: towards a logical representation of clinical trial literature. AMIA Annu Symp Proc. 2012;2012:350–9.

Song MH, Lee YH, Kang UG. Comparison of machine learning algorithms for classification of the sentences in three clinical practice guidelines. Healthcare Informatics Res. 2013;19(1):16–24.

Marshall IJ, Kuiper J, Wallace BC, editors. Automating risk of bias assessment for clinical trials. Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics; 2014: ACM.

Demner-Fushman D, Lin J. Answering clinical questions with knowledge-based and statistical techniques. Comput Linguist. 2007;33(1):63–103.

Kelly C, Yang H. A system for extracting study design parameters from nutritional genomics abstracts. J Integr Bioinform. 2013;10(2):222. doi: 10.2390/biecoll-jib-2013-222 .

Hansen MJ, Rasmussen NO, Chung G. A method of extracting the number of trial participants from abstracts describing randomized controlled trials. J Telemed Telecare. 2008;14(7):354–8. doi: 10.1258/jtt.2008.007007 .

Joachims T. Text categorization with support vector machines: learning with many relevant features, Machine Learning: ECML-98, Tenth European Conference on Machine Learning. 1998. p. 137–42.

Xu R, Garten Y, Supekar KS, Das AK, Altman RB, Garber AM. Extracting subject demographic information from abstracts of randomized clinical trial reports. 2007.

Eddy SR. Hidden Markov models. Curr Opin Struct Biol. 1996;6(3):361–5.

Summerscales RL, Argamon S, Hupert J, Schwartz A. Identifying treatments, groups, and outcomes in medical abstracts. The Sixth Midwest Computational Linguistics Colloquium (MCLC 2009). 2009.

Summerscales R, Argamon S, Bai S, Huperff J, Schwartzff A. Automatic summarization of results from clinical trials, the 2011 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2011. p. 372–7.

Kiritchenko S, de Bruijn B, Carini S, Martin J, Sim I. ExaCT: automatic extraction of clinical trial characteristics from journal publications. BMC Med Inform Decis Mak. 2010;10:56.

Restificar A, Ananiadou S. Inferring appropriate eligibility criteria in clinical trial protocols without labeled data, Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics. 2012. ACM.

Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003;3(4–5):993–1022.

Lin S, Ng J-P, Pradhan S, Shah J, Pietrobon R, Kan M-Y, editors. Extracting formulaic and free text clinical research articles metadata using conditional random fields. Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents; 2010: Association for Computational Linguistics.

De Bruijn B, Carini S, Kiritchenko S, Martin J, Sim I, editors. Automated information extraction of key trial design elements from clinical trial publications. AMIA Annual Symposium Proceedings; 2008: American Medical Informatics Association.

Zhu H, Ni Y, Cai P, Qiu Z, Cao F. Automatic extracting of patient-related attributes: disease, age, gender and race. Stud Health Technol Inform. 2011;180:589–93.

Davis-Desmond P, Mollá D, editors. Detection of evidence in clinical research papers. Proceedings of the Fifth Australasian Workshop on Health Informatics and Knowledge Management-Volume 129; 2012: Australian Computer Society, Inc.

Tsafnat G, Glasziou P, Choong M, Dunn A, Galgani F, Coiera E. Systematic review automation technologies. Syst Rev. 2014;3(1):74.

Thomas J, McNaught J, Ananiadou S. Applications of text mining within systematic reviews. Res Synthesis Methods. 2011;2(1):1–14.

Slaughter L, Berntsen CF, Brandt L, Mavergames C. Enabling living systematic reviews and clinical guidelines through semantic technologies. D-Lib Magazine. 2015;21(1/2). Available at [ http://www.dlib.org/dlib/january15/slaughter/01slaughter.html ]

Tsafnat G, Dunn A, Glasziou P, Coiera E. The automation of systematic reviews. BMJ. 2013;346:f139.

O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev. 2015;4(1):5.

Wallace BC, Trikalinos TA, Lau J, Brodley C, Schmid CH. Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinformatics. 2010;11(1):55.

Wallace BC, Small K, Brodley CE, Trikalinos TA, editors. Active learning for biomedical citation screening. Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining; 2010: ACM.

Miwa M, Thomas J, O’Mara-Eves A, Ananiadou S. Reducing systematic review workload through certainty-based screening. J Biomed Inform. 2014;51:242–53.

Jonnalagadda S, Petitti D. A new iterative method to reduce workload in systematic review process. Int J Comput Biol Drug Des. 2013;6(1–2):5–17. doi: 10.1504/IJCBDD.2013.052198 .

Cohen A, Adams C, Davis J, Yu C, Yu P, Meng W, et al. Evidence-based medicine, the essential role of systematic reviews, and the need for automated text mining tools. Proceedings of the 1st ACM International Health Informatics Symposium. 2010:376–80.

Choong MK, Galgani F, Dunn AG, Tsafnat G. Automatic evidence retrieval for systematic reviews. J Med Inter Res. 2014;16(10):e223.

Cohen AM, Hersh WR, Peterson K, Yen P-Y. Reducing workload in systematic review preparation using automated citation classification. J Am Med Inform Assoc. 2006;13(2):206–19.

Article   CAS   PubMed   PubMed Central   Google Scholar  

García Adeva JJ, Pikatza Atxa JM, Ubeda Carrillo M, Ansuategi ZE. Automatic text classification to support systematic reviews in medicine. Expert Syst Appl. 2014;41(4):1498–508.

Shemilt I, Simon A, Hollands GJ, Marteau TM, Ogilvie D, O’Mara‐Eves A, et al. Pinpointing needles in giant haystacks: use of text mining to reduce impractical screening workload in extremely large scoping reviews. Res Synthesis Methods. 2014;5(1):31–49.

Cullen RJ. In search of evidence: family practitioners’ use of the Internet for clinical information. J Med Libr Assoc. 2002;90(4):370–9.

Hersh WR, Hickam DH. How well do physicians use electronic information retrieval systems? A framework for investigation and systematic review. JAMA. 1998;280(15):1347–52.

Lucas BP, Evans AT, Reilly BM, Khodakov YV, Perumal K, Rohr LG, et al. The impact of evidence on physicians’ inpatient treatment decisions. J Gen Intern Med. 2004;19(5 Pt 1):402–9. doi: 10.1111/j.1525-1497.2004.30306.x .

Magrabi F, Coiera EW, Westbrook JI, Gosling AS, Vickland V. General practitioners’ use of online evidence during consultations. Int J Med Inform. 2005;74(1):1–12. doi: 10.1016/j.ijmedinf.2004.10.003 .

McColl A, Smith H, White P, Field J. General practitioner’s perceptions of the route to evidence based medicine: a questionnaire survey. BMJ. 1998;316(7128):361–5.

Pluye P, Grad RM, Dunikowski LG, Stephenson R. Impact of clinical information-retrieval technology on physicians: a literature review of quantitative, qualitative and mixed methods studies. Int J Med Inform. 2005;74(9):745–68. doi: 10.1016/j.ijmedinf.2005.05.004 .

Rothschild JM, Lee TH, Bae T, Bates DW. Clinician use of a palmtop drug reference guide. J Am Med Inform Assoc. 2002;9(3):223–9.

Rousseau N, McColl E, Newton J, Grimshaw J, Eccles M. Practice based, longitudinal, qualitative interview study of computerised evidence based guidelines in primary care. BMJ. 2003;326(7384):314.

Westbrook JI, Coiera EW, Gosling AS. Do online information retrieval systems help experienced clinicians answer clinical questions? J Am Med Inform Assoc. 2005;12(3):315–21. doi: 10.1197/jamia.M1717 .

Bastian H, Glasziou P, Chalmers I. Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Med. 2010;7(9):e1000326. doi: 10.1371/journal.pmed.1000326 .

Lau J. Evidence-based medicine and meta-analysis: getting more out of the literature. In: Greenes RA, editor. Clinical decision support: the road ahead. 2007. p. 249.

Fraser AG, Dunstan FD. On the impossibility of being expert. BMJ (Clinical Res). 2010;341:c6815.

Ely JW, Osheroff JA, Chambliss ML, Ebell MH, Rosenbaum ME. Answering physicians’ clinical questions: obstacles and potential solutions. J Am Med Inform Assoc. 2005;12(2):217–24. doi: 10.1197/jamia.M1608 .

Ely JW, Osheroff JA, Maviglia SM, Rosenbaum ME. Patient-care questions that physicians are unable to answer. J Am Med Inform Assoc. 2007;14(4):407–14. doi: 10.1197/jamia.M2398 .

Download references

Author information

Authors and affiliations.

Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, 750 North Lake Shore Drive, 11th Floor, Chicago, IL, 60611, USA

Siddhartha R. Jonnalagadda

Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, 721302, West Bengal, India

Pawan Goyal

Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, USA

Mark D. Huffman

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Siddhartha R. Jonnalagadda .

Additional information

Competing interests.

The authors declare that they have no competing interests.

Authors’ contributions

SRJ and PG had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design were done by SRJ. SRJ, PG, and MDH did the acquisition, analysis, or interpretation of data. SRJ and PG drafted the manuscript. SRJ, PG, and MDH did the critical revision of the manuscript for important intellectual content. SRJ obtained funding. PG and SRJ provided administrative, technical, or material support. SRJ did the study supervision. All authors read and approved the final manuscript.

Funding/Support

This project was partly supported by the National Library of Medicine (grant 5R00LM011389). The Cochrane Heart Group US Satellite at Northwestern University is supported by an intramural grant from the Northwestern University Feinberg School of Medicine.

Role of the sponsors

The funding source had no role in the design and conduct of the study; collection, management, analysis, or interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Library of Medicine.

Additional contributions

Mark Berendsen (Research Librarian, Galter Health Sciences Library, Northwestern University Feinberg School of Medicine) provided insights on the design of this study, including the search strategies, and Dr. Kalpana Raja (Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine) reviewed the manuscript. None of them received compensation for their contributions.

Search strategies

Below, we provide the search strategies used in PubMed, ACM Digital Library, and IEEExplore. The search was conducted on January 6, 2015.

(“identification” [Title] OR “extraction” [Title] OR “extracting” [Title] OR “detection” [Title] OR “identifying” [Title] OR “summarization” [Title] OR “learning approach” [Title] OR “automatically” [Title] OR “summarization” [Title] OR “identify sections” [Title] OR “learning algorithms” [Title] OR “Interpreting” [Title] OR “Inferring” [Title] OR “Finding” [Title] OR “classification” [Title]) AND (“medical evidence”[Title] OR “PICO”[Title] OR “PECODR” [Title] OR “intervention arms” [Title] OR “experimental methods” [Title] OR “study design parameters” [Title] OR “Patient oriented Evidence” [Title] OR “eligibility criteria” [Title] OR “clinical trial characteristics” [Title] OR “evidence based medicine” [Title] OR “clinically important elements” [Title] OR “evidence based practice” [Title] “results from clinical trials” [Title] OR “statistical analyses” [Title] OR “research results” [Title] OR “clinical evidence” [Title] OR “Meta Analysis” [Title] OR “Clinical Research” [Title] OR “medical abstracts” [Title] OR “clinical trial literature” [Title] OR ”clinical trial characteristics” [Title] OR “clinical trial protocols” [Title] OR “clinical practice guidelines” [Title]).

We performed this search only in the metadata.

(“identification” OR “extraction” OR “extracting” OR “detection” OR “Identifying” OR “summarization” OR “learning approach” OR “automatically” OR “summarization” OR “identify sections” OR “learning algorithms” OR “Interpreting” OR “Inferring” OR “Finding” OR “classification”) AND (“medical evidence” OR “PICO” OR “intervention arms” OR “experimental methods” OR “eligibility criteria” OR “clinical trial characteristics” OR “evidence based medicine” OR “clinically important elements” OR “results from clinical trials” OR “statistical analyses” OR “clinical evidence” OR “Meta Analysis” OR “clinical research” OR “medical abstracts” OR “clinical trial literature” OR “clinical trial protocols”).

ACM digital library

((Title: “identification” or Title: “extraction” or Title: “extracting” or Title: “detection” or Title: “Identifying” or Title: “summarization” or Title: “learning approach” or Title: “automatically” or Title: “summarization “or Title: “identify sections” or Title: “learning algorithms” or Title: “scientific artefacts” or Title: “Interpreting” or Title: “Inferring” or Title: “Finding” or Title: “classification” or “statistical techniques”) and (Title: “medical evidence” or Abstract: “medical evidence” or Title: “PICO” or Abstract: “PICO” or Title: “intervention arms” or Title: “experimental methods” or Title: “study design parameters” or Title: “Patient oriented Evidence” or Abstract: “Patient oriented Evidence” or Title: “eligibility criteria” or Abstract: “eligibility criteria” or Title: “clinical trial characteristics” or Abstract: “clinical trial characteristics” or Title: “evidence based medicine” or Abstract: “evidence based medicine” or Title: “clinically important elements” or Title: “evidence based practice” or Title: “treatments” or Title: “groups” or Title: “outcomes” or Title: “results from clinical trials” or Title: “statistical analyses” or Abstract: “statistical analyses” or Title: “research results” or Title: “clinical evidence” or Abstract: “clinical evidence” or Title: “Meta Analysis” or Abstract:“Meta Analysis” or Title:“Clinical Research” or Title: “medical abstracts” or Title: “clinical trial literature” or Title: “Clinical Practice” or Title: “clinical trial protocols” or Abstract: “clinical trial protocols” or Title: “clinical questions” or Title: “clinical trial design”)).

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Jonnalagadda, S.R., Goyal, P. & Huffman, M.D. Automating data extraction in systematic reviews: a systematic review. Syst Rev 4 , 78 (2015). https://doi.org/10.1186/s13643-015-0066-7

Download citation

Received : 20 March 2015

Accepted : 21 May 2015

Published : 15 June 2015

DOI : https://doi.org/10.1186/s13643-015-0066-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Support Vector Machine
  • Data Element
  • Conditional Random Field
  • PubMed Abstract
  • Systematic Review Process

Systematic Reviews

ISSN: 2046-4053

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

literature review data extraction

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 33, Issue 5
  • Equitable and accessible informed healthcare consent process for people with intellectual disability: a systematic literature review
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0002-8498-7329 Manjekah Dunn 1 , 2 ,
  • Iva Strnadová 3 , 4 , 5 ,
  • Jackie Leach Scully 4 ,
  • Jennifer Hansen 3 ,
  • Julie Loblinzk 3 , 5 ,
  • Skie Sarfaraz 5 ,
  • Chloe Molnar 1 ,
  • Elizabeth Emma Palmer 1 , 2
  • 1 Faculty of Medicine & Health , University of New South Wales , Sydney , New South Wales , Australia
  • 2 The Sydney Children's Hospitals Network , Sydney , New South Wales , Australia
  • 3 School of Education , University of New South Wales , Sydney , New South Wales , Australia
  • 4 Disability Innovation Institute , University of New South Wales , Sydney , New South Wales , Australia
  • 5 Self Advocacy Sydney , Sydney , New South Wales , Australia
  • Correspondence to Dr Manjekah Dunn, Paediatrics & Child Health, University of New South Wales Medicine & Health, Sydney, New South Wales, Australia; manjekah.dunn{at}unsw.edu.au

Objective To identify factors acting as barriers or enablers to the process of healthcare consent for people with intellectual disability and to understand how to make this process equitable and accessible.

Data sources Databases: Embase, MEDLINE, PsychINFO, PubMed, SCOPUS, Web of Science and CINAHL. Additional articles were obtained from an ancestral search and hand-searching three journals.

Eligibility criteria Peer-reviewed original research about the consent process for healthcare interventions, published after 1990, involving adult participants with intellectual disability.

Synthesis of results Inductive thematic analysis was used to identify factors affecting informed consent. The findings were reviewed by co-researchers with intellectual disability to ensure they reflected lived experiences, and an easy read summary was created.

Results Twenty-three studies were included (1999 to 2020), with a mix of qualitative (n=14), quantitative (n=6) and mixed-methods (n=3) studies. Participant numbers ranged from 9 to 604 people (median 21) and included people with intellectual disability, health professionals, carers and support people, and others working with people with intellectual disability. Six themes were identified: (1) health professionals’ attitudes and lack of education, (2) inadequate accessible health information, (3) involvement of support people, (4) systemic constraints, (5) person-centred informed consent and (6) effective communication between health professionals and patients. Themes were barriers (themes 1, 2 and 4), enablers (themes 5 and 6) or both (theme 3).

Conclusions Multiple reasons contribute to poor consent practices for people with intellectual disability in current health systems. Recommendations include addressing health professionals’ attitudes and lack of education in informed consent with clinician training, the co-production of accessible information resources and further inclusive research into informed consent for people with intellectual disability.

PROSPERO registration CRD42021290548.

  • Decision making
  • Healthcare quality improvement
  • Patient-centred care
  • Quality improvement
  • Standards of care

Data availability statement

Data are available upon reasonable request. Additional data and materials such as data collection forms, data extraction and analysis templates and QualSyst assessment data can be obtained by contacting the corresponding author.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ .

https://doi.org/10.1136/bmjqs-2023-016113

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

What is already known on this topic

People with intellectual disability are frequently excluded from decision-making processes and not provided equal opportunity for informed consent, despite protections outlined in the United Nations Convention on the Rights of Persons with Disabilities.

People with intellectual disability have the capacity and desire to make informed medical decisions, which can improve their well-being, health satisfaction and health outcomes.

What this review study adds

Health professionals lack adequate training in valid informed consent and making reasonable adjustments for people with intellectual disability, and continue to perpetuate assumptions of incapacity.

Health information provided to people with intellectual disability is often inaccessible and insufficient for them to make informed decisions about healthcare.

The role of support people, systemic constraints, a person-centred approach and ineffective healthcare communication also affect informed consent.

How this review might affect research, practice or policy

Health professionals need additional training on how to provide a valid informed consent process for people with intellectual disability, specifically in using accessible health information, making reasonable adjustments (e.g., longer/multiple appointments, options of a support person attending or not, using plain English), involving the individual in discussions, and communicating effectively with them.

Inclusive research is needed to hear the voices and opinions of people with intellectual disability about healthcare decision-making and about informed consent practices in specific healthcare settings.

Introduction

Approximately 1% of the world’s population have intellectual disability. 1 Intellectual disability is medically defined as a group of neurodevelopmental conditions beginning in childhood, with below average cognitive functioning and adaptive behaviour, including limitations in conceptual, social and practical skills. 2 People with intellectual disability prefer an alternative strength-based definition, reflected in the comment by Robert Strike OAM (Order of Australia Medal): ‘We can learn if the way of teaching matches how the person learns’, 3 reinforcing the importance of providing information tailored to the needs of a person with intellectual disability. A diagnosis of intellectual disability is associated with significant disparities in health outcomes. 4–7 Person-centred decision-making and better communication have been shown to improve patient satisfaction, 8 9 the physician–patient relationship 10 and overall health outcomes 11 for the wider population. Ensuring people with intellectual disability experience informed decision-making and accessible healthcare can help address the ongoing health disparities and facilitate equal access to healthcare.

Bodily autonomy is an individual’s power and agency to make decisions about their own body. 12 Informed consent for healthcare enables a person to practice bodily autonomy and is protected, for example, by the National Safety and Quality Health Service Standards (Australia), 13 Mental Capacity Act (UK) 14 and the Joint Commission Standards (USA). 15 In this article, we define informed consent according to three requirements: (1) the person is provided with information they understand, (2) the decision is free of coercion and (3) the person must have capacity. 16 For informed consent to be valid, this process must be suited to the individual’s needs so that they can understand and communicate effectively. Capacity is the ability to give informed consent for a medical intervention, 17 18 and the Mental Capacity Act outlines that ‘a person must be assumed to have capacity unless it is established that he lacks capacity’ and that incapacity can only be established if ‘all practicable steps’ to support capacity have been attempted without success. 14 These assumptions of capacity are also decision-specific, meaning an individual’s ability to consent can change depending on the situation, the choice itself and other factors. 17

Systemic issues with healthcare delivery systems have resulted in access barriers for people with intellectual disability, 19 despite the disability discrimination legislation in many countries who are signatories to the United Nations (UN) Convention on the Rights of Persons with Disabilities. 20 Patients with intellectual disability are not provided the reasonable adjustments that would enable them to give informed consent for medical procedures or interventions, 21 22 despite evidence that many people with intellectual disability have both the capacity and the desire to make their own healthcare decisions. 21 23

To support people with intellectual disability to make independent health decisions, an equitable and accessible informed consent process is needed. 24 However, current health systems have consistently failed to provide this. 21 25 To address this gap, we must first understand the factors that contribute to inequitable and inaccessible consent. To the best of our knowledge, the only current review of informed consent for people with intellectual disability is an integrative review by Goldsmith et al . 26 Many of the included articles focused on assessment of capacity 27–29 and research consent. 30–32 The review’s conclusion supported the functional approach to assess capacity, with minimal focus on how the informed consent processes can be improved. More recently, there has been a move towards ensuring that the consent process is accessible for all individuals, including elderly patients 33 and people with aphasia. 34 However, there remains a paucity of literature about the informed consent process for people with intellectual disability, with no systematic reviews summarising the factors influencing the healthcare consent process for people with intellectual disability.

To identify barriers to and enablers of the informed healthcare consent process for people with intellectual disability, and to understand how this can be made equitable and accessible.

A systematic literature review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analysis Protocols (PRISMA-P) systematic literature review protocol. 35 The PRISMA 2020 checklist 36 and ENhancing Transparency in REporting the synthesis of Qualitative research (ENTREQ) reporting guidelines were also followed. 37 The full study protocol is included in online supplemental appendix 1 .

Supplemental material

No patients or members of the public were involved in this research for this manuscript.

Search strategy

A search strategy was developed to identify articles about intellectual disability, consent and healthcare interventions, described in online supplemental appendix 2 . Multiple databases were searched for articles published between January 1990 to January 2022 (Embase, MEDLINE, PsychINFO, PubMed, SCOPUS, Web of Science and CINAHL). These databases include healthcare and psychology databases that best capture relevant literature on this topic, including medical, nursing, social sciences and bioethical literature. The search was limited to studies published from 1990 as understandings of consent have changed since then. 38 39 This yielded 4853 unique papers which were imported into Covidence, a specialised programme for conducting systematic reviews. 40

Study selection

Citation screening by abstract and titles was completed by two independent researchers (MD and EEP). Included articles had to:

Examine the informed consent process for a healthcare intervention for people with intellectual disability.

Have collected more than 50% of its data from relevant stakeholders, including adults with intellectual disability, families or carers of a person with intellectual disability, and professionals who engage with people with intellectual disability.

Report empirical data from primary research methodology.

Be published in a peer-reviewed journal after January 1990.

Be available in English.

Full text screening was completed by two independent researchers (MD and EEP). Articles were excluded if consent was only briefly discussed or if it focused on consent for research, capacity assessment, or participant knowledge or comprehension. Any conflicts were resolved through discussion with an independent third researcher (IS).

Additional studies were identified through an ancestral search and by hand-searching three major journals relevant to intellectual disability research. Journals were selected if they had published more than one included article for this review or in previous literature reviews conducted by the research team.

Quality assessment

Two independent researchers (MD and IS) assessed study quality with the QualSyst tool, 41 which can assess both qualitative and quantitative research papers. After evaluating the distribution of scores, a threshold value of 55% was used, as suggested by QualSyst 41 to exclude poor-quality studies but capture enough studies overall. Any conflicts between the quality assessment scores were resolved by a third researcher (EEP). For mixed-method studies, both qualitative and quantitative quality scores were calculated, and the higher value used.

Data collection

Two independent researchers (MD and JH) reviewed each study and extracted relevant details, including study size, participant demographics, year, country of publication, study design, data analysis and major outcomes reported. Researchers used standardised data collection forms designed, with input from senior researchers with expertise in qualitative research (IS and EEP), to extract data relevant to the review’s research aims. The form was piloted on one study, and a second iteration made based on feedback. These forms captured data on study design, methods, participants, any factors affecting the process of informed consent and study limitations. Data included descriptions and paragraphs outlining key findings, the healthcare context, verbatim participant quotes and any quantitative analyses or statistics. Missing or unclear data were noted.

Data analysis

A pilot literature search showed significant heterogeneity in methodology of studies, limiting the applicability of traditional quantitative analysis (ie, meta-analysis). Instead, inductive thematic analysis was chosen as an alternative methodology 42 43 that has been used in recent systematic reviews examining barriers and enablers of other health processes. 44 45 The six-phase approach described by Braun and Clarke was used. 46 47 A researcher (MD) independently coded the extracted data of each study line-by-line, with subsequent data grouped into pre-existing codes or new concepts when necessary. Codes were reviewed iteratively and grouped into categories, subthemes and themes framed around the research question. Another independent researcher (JH) collated and analysed the data on study demographics, methods and limitations. The themes were reviewed by two senior researchers (EEP and IS).

Qualitative methods of effect size calculations have been described in the literature, 48 49 which was captured in this review by the number of studies that identified each subtheme, with an assigned frequency rating to compare their relative significance. Subthemes were given a frequency rating of A, B, C or D if they were identified by >10, 7–9, 4–6 or <3 articles, respectively. The overall significance of each theme was estimated by the number of studies that mentioned it and the GRADE framework, a stepwise approach to quality assessment using a four-tier rating system. Each study was evaluated for risk of bias, inconsistency, indirectness, imprecision and publication bias. 50 51 Study sensitivity was assessed by counting the number of distinct subthemes included. 52 The quality of findings was designated high, moderate or low depending on the frequency ratings, the QualSyst score and the GRADE scores of studies supporting the finding. Finally, the relative contributions of each study were evaluated by the number of subthemes described, guided by previously reported methods for qualitative reviews. 52

Co-research

The findings were reviewed by two co-researchers with intellectual disability (JL and SS), with over 30 years combined experience as members and employees of a self-advocacy organisation. Guidance on the findings and an easy read summary was produced in line with best-practice inclusive research 53 54 over multiple discussions. Input from two health professional researchers (MD and EEP) provided data triangulation and sense-checking of findings.

Twenty-three articles were identified ( figure 1 ): 14 qualitative, 6 quantitative and 3 mixed-methods. Two papers included the same population of study participants: McCarthy 55 and McCarthy, 56 but had different research questions. Fovargue et al 57 was excluded due to a quality score of 35%. Common quality limitations were a lack of verification procedures to establish credibility and limited researcher reflexivity. No studies were excluded due to language requirements (as all were in English) or age restrictions (all studies had majority adult participants).

  • Download figure
  • Open in new tab
  • Download powerpoint

PRISMA 2020 flowchart for the systematic review. 36

Studies were published from 1999 to 2020 and involved participant populations from the UK (n=18), USA (n=3), Sweden (n=1) and Ireland (n=1). Participant numbers ranged from 9 to 604 (median 21), and participants included people with intellectual disability (n=817), health professionals (n=272), carers and support people (n=48), and other professionals that work with people with intellectual disability (n=137, community service agency directors, social workers, administrative staff and care home staff). Ages of participants ranged from 8 to 84 years, though only Aman et al 58 included participants <18 years of age. This study was included as the article states very few children were included. Studies examined consent in different contexts, including contraception and sexual health (6/23 articles), 58–60 medications (5/23 articles), 58–62 emergency healthcare, 63 cervical screening, 64 community referrals, 58–61 65 mental health, 66 hydrotherapy, 64 blood collection 67 and broad decision-making consent without a specific context. 65 68–71 A detailed breakdown of each study is included in online supplemental appendix 3 .

Six major themes were identified from the studies, summarised in figure 2 . An overview of included studies showing study sensitivity, effect size, QualSyst and GRADE scores is given in online supplemental appendix 4 . Studies with higher QualSyst and GRADE scores contributed more to this review’s findings and tended to include more subthemes; specifically, Rogers et al , 66 Sowney and Barr, 63 Höglund and Larsson, 72 and McCarthy 55 and McCarthy. 56 Figure 3 gives the easy read version of theme 1, with the full easy read summary in online supplemental appendix 5 .

Summary of the identified six themes and subthemes.

Theme 1 of the easy read summary.

Theme 1—Health professionals’ attitudes and lack of education about informed consent

Health professionals’ attitudes and practices were frequently (18/21) identified as factors affecting the informed consent process, with substantial evidence supporting this theme. Studies noted the lack of training for health professionals in supporting informed consent for people with intellectual disability, their desire for further education, and stereotypes and discrimination perpetuated by health professionals.

Lack of health professional education on informed consent and disability discrimination legislation

Multiple studies reported inconsistent informed consent practices, for various reasons: some reported that health professionals ‘forgot’ to or ‘did not realise consent was necessary’, 63 73 but inconsistent consent practices were also attributed to healthcare providers’ unfamiliarity with consent guidelines and poor education on this topic. Carlson et al 73 reported that only 44% of general practitioners (GPs) were aware of consent guidelines, and there was the misconception that consent was unnecessary for people with intellectual disability. Similarly, studies of psychologists 66 and nurses 63 found that many were unfamiliar with their obligations to obtain consent, despite the existence of anti-discrimination legislation. People with intellectual disability describe feeling discriminated against by health professionals, reflected in comments such as ‘I can tell, my doctor just thinks I’m stupid – I'm nothing to him’. 74 Poor consent practices by health professionals were observed in Goldsmith et al , 67 while health professionals surveyed by McCarthy 56 were unaware of their responsibility to provide accessible health information to women with intellectual disability. Improving health professional education and training was suggested by multiple studies as a way to remove this barrier. 63 65–67 69 73

Lack of training on best practices for health professions caring for people with intellectual disability

A lack of training in caring for and communicating with people with intellectual disability was also described by midwives, 72 psychologists, 66 nurses, 63 pharmacists 61 and GPs. 56 72 75 Health professionals lacked knowledge about best practice approaches to providing equitable healthcare consent processes through reasonable adjustments such as accessible health information, 56 60 66 longer appointments times, 60 72 simple English 62 67 and flexible approaches to patient needs. 63 72

Health professionals’ stereotyping and assumptions of incapacity

Underlying stereotypes contributed to some health professionals’ (including nurses, 63 GPs 56 and physiotherapists 64 ) belief that people with intellectual disability lack capacity and therefore, do not require opportunities for informed consent. 56 64 In a survey of professionals referring people with intellectual disability to a disability service, the second most common reason for not obtaining consent was ‘patient unable to understand’. 73

Proxy consent as an inappropriate alternative

People with intellectual disability are rarely the final decision-maker in their medical choices, with many health providers seeking proxy consent from carers, support workers and family members, despite its legal invalidity. In McCarthy’s study (2010), 18/23 women with intellectual disability said the decision to start contraception was made by someone else. Many GPs appeared unaware that proxy consent is invalid in the UK. 56 Similar reports came from people with intellectual disability, 55 56 60 64 69 76 health professionals (nurses, doctors, allied health, psychologists), 56 63 64 66 77 support people 64 77 and non-medical professionals, 65 73 and capacity was rarely documented. 56 62 77

Exclusion of people with intellectual disability from decision-making discussions

Studies described instances where health professionals made decisions for their patients with intellectual disability or coerced patients into a choice. 55 72 74 76 77 In Ledger et al 77 , only 62% of women with intellectual disability were involved in the discussion about contraception, and only 38% made the final decision, and others stated in Wiseman and Ferrie 74 : ‘I was not given the opportunity to explore the different options. I was told what one I should take’. Three papers outlined instances where the choices of people with intellectual disability were ignored despite possessing capacity 65 66 69 and when a procedure continued despite them withdrawing consent. 69

Theme 2—Inadequate accessible health information

Lack of accessible health information.

The lack of accessible health information was the most frequently identified subtheme (16/23 studies). Some studies reported that health professionals provided information to carers instead, 60 avoided providing easy read information due to concerns about ‘offending’ patients 75 or only provided verbal information. 56 67 Informed consent was supported when health professionals recognised the importance of providing medical information 64 and when it was provided in an accessible format. 60 Alternative approaches to health information were explored, including virtual reality 68 and in-person education sessions, 59 with varying results. Overall, the need to provide information in different formats tailored to an individual’s communication needs, rather than a ‘one size fits all’ approach, was emphasised by both people with intellectual disability 60 and health professionals. 66

Insufficient information provided

Studies described situations where insufficient information was provided to people with intellectual disability to make informed decisions. For example, some people felt the information from their GP was often too basic to be helpful (Fish et al 60 ) and wanted additional information on consent forms (Rose et al 78 ).

Theme 3—The involvement of support people

Support people (including carers, family members and group home staff) were identified in 11 articles as both enablers of and barriers to informed consent. The antagonistic nature of these findings and lower frequency of subthemes are reflected in the lower quality assessments of evidence.

Support people facilitated communication with health professionals

Some studies reported carers bridging communication barriers with health to support informed consent. 63 64 McCarthy 56 found 21/23 of women with intellectual disability preferred to see doctors with a support person due to perceived benefits: ‘Sometimes I don’t understand it, so they have to explain it to my carer, so they can explain it to me easier’. Most GPs in this study (93%) also agreed that support people aided communication.

Support people helped people with intellectual disability make decisions

By advocating for people with intellectual disability, carers encouraged decision-making, 64 74 provided health information, 74 77 emotional support 76 and assisted with reading or remembering health information. 55 58 76 Some people with intellectual disability explicitly appreciated their support person’s involvement, 60 such as in McCarthy’s 55 study where 18/23 participants felt supported and safer when a support person was involved.

Support people impeded individual autonomy

The study by Wiseman and Ferrie 74 found that while younger participants with intellectual disability felt family members empowered their decision-making, older women felt family members impaired their ability to give informed consent. This was reflected in interviews with carers who questioned the capacity of the person with intellectual disability they supported and stated they would guide them to pick the ‘best choice’ or even over-ride their choices. 64 Studies of psychologists and community service directors described instances where the decision of family or carers was prioritised over the wishes of the person with intellectual disability. 65 66 Some women with intellectual disability in McCarthy’s studies (2010, 2009) 55 56 appeared to have been coerced into using contraception by parental pressures or fear of losing group home support.

Theme 4—Systemic constraints within healthcare systems

Time restraints affect informed consent and accessible healthcare.

Resource limitations create time constraints that impair the consent process and have been identified as a barrier by psychologists, 66 GPs, 56 hospital nurses 63 and community disability workers. 73 Rogers et al 66 highlighted that a personalised approach that could improve informed decision-making is restricted by inflexible medical models. Only two studies described flexible patient-centred approaches to consent. 60 72 A survey of primary care practices in 2007 reported that most did not modify their cervical screening information for patients with intellectual disability because it was not practical. 75

Inflexible models of consent

Both people with intellectual disability 76 and health professionals 66 recognised that consent is traditionally obtained through one-off interactions prior to an intervention. Yet, for people with intellectual disability, consent should ideally be an ongoing process that begins before an appointment and continues between subsequent ones. Other studies have tended to describe one-off interactions where decision-making was not revisited at subsequent appointments. 56 60 72 76

Lack of systemic supports

In one survey, self-advocates highlighted a lack of information on medication for people with intellectual disability and suggested a telephone helpline and a centralised source of information to support consent. 60 Health professionals also want greater systemic support, such as a health professional specialised in intellectual disability care to support other staff, 72 or a pharmacist specifically to help patients with intellectual disability. 61 Studies highlighted a lack of guidelines about healthcare needs of people with intellectual disabilities such as contraceptive counselling 72 or primary care. 75

Theme 5—Person-centred informed consent

Ten studies identified factors related to a person-centred approach to informed consent, grouped below into three subthemes. Health professionals should tailor their practice when obtaining informed consent from people with intellectual disability by considering how these subthemes relate to the individual. Each subtheme was described five times in the literature with a relative frequency rating of ‘C’, contributing to overall lower quality scores.

Previous experience with decision-making

Arscott et al 71 found that the ability of people with intellectual disability to consent changed with their verbal and memory skills and in different clinical vignettes, supporting the view of ‘functional’ capacity specific to the context of the medical decision. Although previous experiences with decision-making did not influence informed consent in this paper, other studies suggest that people with intellectual disability accustomed to independent decision-making were more able to make informed medical decisions, 66 70 and those who live independently were more likely to make independent healthcare decisions. 56 Health professionals should be aware that their patients with intellectual disability will have variable experience with decision-making and provide individualised support to meet their needs.

Variable awareness about healthcare rights

Consent processes should be tailored to the health literacy of patients, including emphasising available choices and the option to refuse treatment. In some studies, medical decisions were not presented to people with intellectual disability as a choice, 64 and people with intellectual disability were not informed of their legal right to accessible health information. 56

Power differences and acquiescence

Acquiescence by people with intellectual disability due to common and repeated experiences of trauma—that is, their tendency to agree with suggestions made by carers and health professionals, often to avoid upsetting others—was identified as an ongoing barrier. In McCarthy’s (2009) interviews with women with intellectual disability, some participants implicitly rejected the idea that they might make their own healthcare decisions: ‘They’re the carers, they have responsibility for me’. Others appeared to have made decisions to appease their carers: ‘I have the jab (contraceptive injection) so I can’t be blamed for getting pregnant’. 55 Two studies highlighted that health professionals need to be mindful of power imbalances when discussing consent with people with intellectual disability to ensure the choices are truly autonomous. 61 66

Theme 6—Effective communication between health professionals and patients

Implementation of reasonable adjustments for verbal and written information.

Simple language was always preferred by people with intellectual disability. 60 67 Other communication aids used in decision-making included repetition, short sentences, models, pictures and easy read brochures. 72 Another reasonable adjustment is providing the opportunity to ask questions, which women with intellectual disability in McCarthy’s (2009) study reported did not occur. 55

Tailored communication methods including non-verbal communication

Midwives noted that continuity of care allows them to develop rapport and understand the communication preferences of people with intellectual disability. 72 This is not always possible; for emergency nurses, the lack of background information about patients with intellectual disability made it challenging to understand their communication preferences. 63 The use of non-verbal communication, such as body language, was noted as underutilised 62 66 and people with intellectual disability supported the use of hearing loops, braille and sign language. 60

To the best of our knowledge, this is the first systematic review investigating the barriers and enablers of the informed consent process for healthcare procedures for people with intellectual disability. The integrative review by Goldsmith et al 26 examined capacity assessment and shares only three articles with this systematic review. 69 71 73 Since the 2000s, there has been a paradigm shift in which capacity is no longer considered a fixed ability that only some individuals possess 38 39 but instead as ‘functional’: a flexible ability that changes over time and in different contexts, 79 reflected in Goldsmith’s review. An individual’s capacity can be supported through various measures, including how information is communicated and how the decision-making process is approached. 18 80 By recognising the barriers and enablers identified in this review, physicians can help ensure the consent process for their patients with intellectual disability is both valid and truly informed. This review has highlighted the problems of inaccessible health information, insufficient clinical education on how to make reasonable adjustments and lack of person-centred trauma-informed care.

Recommendations

Health professionals require training in the informed consent process for people with intellectual disability, particularly in effective and respectful communication, reasonable adjustments and trauma-informed care. Reasonable adjustments include offering longer or multiple appointments, using accessible resources (such as easy read information or shared decision-making tools) and allowing patient choices (such as to record a consultation or involve a support person). Co-researchers reported that many people with intellectual disability prefer to go without a support person because they find it difficult to challenge their decisions and feel ignored if the health professional only talks to the support person. People with intellectual disability also feel they cannot seek second opinions before making medical decisions or feel pressured to provide consent, raising the possibility of coercion. These experiences contribute to healthcare trauma. Co-researchers raised the importance of building rapport with the person with intellectual disability and of making reasonable adjustments, such as actively advocating for the person’s autonomy, clearly stating all options including the choice to refuse treatment, providing opportunities to contribute to discussions and multiple appointments to ask questions and understand information. They felt that without these efforts to support consent, health professionals can reinforce traumatic healthcare experiences for people with intellectual disability. Co-researchers noted instances where choices were made by doctors without discussion and where they were only given a choice after requesting one and expressed concern that these barriers are greater for those with higher support needs.

Co-researchers showed how these experiences contributed to mistrust of health professionals and poorer health outcomes. In one situation, a co-researcher was not informed of a medication’s withdrawal effects, resulting in significant side-effects when it was ceased. Many people with intellectual disability describe a poor relationship with their health professionals, finding it difficult to trust health information provided due to previous traumatic experiences of disrespect, coercion, lack of choice and inadequate support. Many feel they cannot speak up due to the power imbalance and fear of retaliation. Poor consent practices and lack of reasonable adjustments directly harm therapeutic alliances by reducing trust, contribute to healthcare trauma and lead to poorer health outcomes for people with intellectual disability.

Additional education and training for health professionals is urgently needed in the areas of informed consent, reasonable adjustments and effective communication with people with intellectual disability. The experiences of health professionals within the research team confirmed that there is limited training in providing high-quality healthcare for people with intellectual disability, including reasonable adjustments and accessible health information. Co-researchers also suggested that education should be provided to carers and support people to help them better advocate for people with intellectual disability.

Health information should be provided in a multimodal format, including written easy read information. Many countries have regulation protecting the right to accessible health information and communication support to make an informed choice, such as UK’s Accessible Information Standard, 81 and Australia’s Charter of Health Care Rights, 24 yet these are rarely observed. Steps to facilitate this include routinely asking patients about information requirements, system alerts for an individual’s needs or routinely providing reasonable adjustments. 82 Co-researchers agreed that there is a lack of accessible health information, particularly about medications, and that diagrams and illustrations are underutilised. There is a critical need for more inclusive and accessible resources to help health professionals support informed consent in a safe and high-quality health system. These resources should be created through methods of inclusive research, such as co-production, actively involving people with intellectual disability in the planning, creation, and feedback process. 53

Strengths and limitations

This systematic review involved two co-researchers with intellectual disability in sense-checking findings and co-creating the easy read summary. Two co-authors who are health professionals provided additional sense-checking of findings from a different stakeholder perspective. In future research, this could be extended by involving people with intellectual disability in the design and planning of the study as per recommendations for best-practice inclusive research. 53 83

The current literature is limited by low use of inclusive research practices in research involving people with intellectual disability, increasing vulnerability to external biases (eg, inaccessible questionnaires, involvement of carers in data collection, overcompliance or acquiescence and absence of researcher reflexivity). Advisory groups or co-research with people with intellectual disability were only used in five studies. 58 60 68 74 76 Other limitations include unclear selection criteria, low sample sizes, missing data, using gatekeepers in patient selection and predominance of UK-based studies—increasing the risk of bias and reducing transferability. Nine studies (out of 15 involving people with intellectual disability) explicitly excluded those with severe or profound intellectual disability, reflecting a selection bias; only one study specifically focused on people with intellectual disability with higher support needs. Studies were limited to a few healthcare contexts, with a focus on consent about sexual health, contraception and medications.

The heterogeneity and qualitative nature of studies made it challenging to apply traditional meta-analysis. However, to promote consistency in qualitative research, the PRISMA and ENTREQ guidelines were followed. 36 37 Although no meta-analyses occurred, the duplication of study populations in McCarthy 2009 and 2010 likely contributed to increased significance of findings reported in both studies. Most included studies (13/23) were published over 10 years ago, reducing the current relevance of this review’s findings. Nonetheless, the major findings reflect underlying systemic issues within the health system, which are unlikely to have been resolved since the articles were published, as the just-released final report of the Australian Royal Commission into Violence, Abuse, Neglect and Exploitation of People with Disability highlights. 84 There is an urgent need for more inclusive studies to explore the recommendations and preferences of people with intellectual disability about healthcare choices.

Informed consent processes for people with intellectual disability should include accessible information and reasonable adjustments, be tailored to individuals’ needs and comply with consent and disability legislation. Resources, guidelines and healthcare education are needed and should cover how to involve carers and support people, address systemic healthcare problems, promote a person-centred approach and ensure effective communication. These resources and future research must use principles of inclusive co-production—involving people with intellectual disability at all stages. Additionally, research is needed on people with higher support needs and in specific contexts where informed consent is vital but under-researched, such as cancer screening, palliative care, prenatal and newborn screening, surgical procedures, genetic medicine and advanced therapeutics such as gene-based therapies.

Ethics statements

Patient consent for publication.

Not applicable.

Ethics approval

  • Maulik PK ,
  • Mascarenhas MN ,
  • Mathers CD , et al
  • World Health Organisation
  • Council for Intellectual Disability
  • Emerson E ,
  • Shogren KA ,
  • Wehmeyer ML ,
  • Reese RM , et al
  • Cordasco KM
  • Hallock JL ,
  • Jordens CFC ,
  • McGrath C , et al
  • Brenner LH ,
  • Brenner AT ,
  • United Nations Population Fund
  • Australian Commission on Safety and Quality in Health Care
  • The Joint Commission
  • Beauchamp TL ,
  • Childress JF
  • New South Wales Attorney General
  • United Nations General Assembly
  • Strnadová I ,
  • Loblinzk J ,
  • Scully JL , et al
  • MacPhail C ,
  • McKay K , et al
  • Keywood K ,
  • Fovargue S ,
  • Goldsmith L ,
  • Skirton H ,
  • Cash J , et al
  • Morris CD ,
  • Niederbuhl JM ,
  • Arscott K ,
  • Fisher CB ,
  • Davidson PW , et al
  • Giampieri M
  • Shamseer L ,
  • Clarke M , et al
  • McKenzie JE ,
  • Bossuyt PM , et al
  • Flemming K ,
  • McInnes E , et al
  • Appelbaum PS
  • ↵ Covidence systematic review software . Melbourne, Australia ,
  • Proudfoot K
  • Papadopoulos I ,
  • Koulouglioti C ,
  • Lazzarino R , et al
  • Onwuegbuzie AJ
  • BMJ Best Practice
  • Guyatt GH ,
  • Vist GE , et al
  • Garcia-Lee B
  • Brimblecombe J , et al
  • Benson BA ,
  • Farmer CA , et al
  • Ferguson L ,
  • Graham YNH ,
  • Gerrard D ,
  • Laight S , et al
  • Huneke NTM ,
  • Halder N , et al
  • Ferguson M ,
  • Jarrett D ,
  • McGuire BE , et al
  • Woodward V ,
  • Jackson L , et al
  • Conboy-Hill S ,
  • Leafman J ,
  • Nehrenz GM , et al
  • Höglund B ,
  • Carlson T ,
  • English S , et al
  • Wiseman P ,
  • Walmsley J ,
  • Tilley E , et al
  • Khatkar HS , et al
  • Holland AJ , et al
  • Beauchamp TL
  • England National Health Service
  • National Health Service England
  • Royal Commission into Violence, Abuse, Neglect and Exploitation of People with Disability

Supplementary materials

Supplementary data.

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

  • Data supplement 1
  • Data supplement 2
  • Data supplement 3
  • Data supplement 4
  • Data supplement 5

Contributors MD, EEP and IS conceived the idea for the systematic review. MD drafted the search strategy which was refined by EEP and IS. MD and EEP completed article screening. MD and IS completed quality assessments of included articles. MD and JH completed data extraction. MD drafted the original manuscript. JL and SS were co-researchers who sense-checked findings and were consulted to formulate dissemination plans. JL and SS co-produced the easy read summary with MD, CM, JH, EEP and IS. MD, JLS, EEP and IS reviewed manuscript wording. All authors critically reviewed the manuscript and approved it for publication. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted. MD is the guarantor responsible for the overall content of this manuscript.

Funding This systematic literature review was funded by the National Health & Medical Research Council (NHMRC), Targeted Call for Research (TCR) into Improving health of people with intellectual disability. Research grant title "GeneEQUAL: equitable and accessible genomic healthcare for people with intellectual disability". NHMRC application ID: 2022/GNT2015753.

Competing interests None declared.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Linked Articles

  • Editorial It is up to healthcare professionals to talk to us in a way that we can understand: informed consent processes in people with an intellectual disability Jonathon Ding Richard Keagan-Bull Irene Tuffrey-Wijne BMJ Quality & Safety 2024; 33 277-279 Published Online First: 30 Jan 2024. doi: 10.1136/bmjqs-2023-016830

Read the full text or download the PDF:

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • Hosted content
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Online First
  • Inappropriate use of proton pump inhibitors in clinical practice globally: a systematic review and meta-analysis
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0002-5111-7861 Amit K Dutta 1 ,
  • http://orcid.org/0000-0003-2472-3409 Vishal Sharma 2 ,
  • Abhinav Jain 3 ,
  • Anshuman Elhence 4 ,
  • Manas K Panigrahi 5 ,
  • Srikant Mohta 6 ,
  • Richard Kirubakaran 7 ,
  • Mathew Philip 8 ,
  • http://orcid.org/0000-0003-1700-7543 Mahesh Goenka 9 ,
  • Shobna Bhatia 10 ,
  • http://orcid.org/0000-0002-9435-3557 Usha Dutta 2 ,
  • D Nageshwar Reddy 11 ,
  • Rakesh Kochhar 12 ,
  • http://orcid.org/0000-0002-1305-189X Govind K Makharia 4
  • 1 Gastroenterology , Christian Medical College and Hospital Vellore , Vellore , India
  • 2 Gastroenterology , Post Graduate Institute of Medical Education and Research , Chandigarh , India
  • 3 Gastroenterology , Gastro 1 Hospital , Ahmedabad , India
  • 4 Gastroenterology and Human Nutrition , All India Institute of Medical Sciences , New Delhi , India
  • 5 Gastroenterology , All India Institute of Medical Sciences - Bhubaneswar , Bhubaneswar , India
  • 6 Department of Gastroenterology , Narayana Superspeciality Hospital , Kolkata , India
  • 7 Center of Biostatistics and Evidence Based Medicine , Vellore , India
  • 8 Lisie Hospital , Cochin , India
  • 9 Apollo Gleneagles Hospital , Kolkata , India
  • 10 Gastroenterology , National Institute of Medical Science , Jaipur , India
  • 11 Asian Institute of Gastroenterology , Hyderabad , India
  • 12 Gastroenterology , Paras Hospitals, Panchkula , Chandigarh , India
  • Correspondence to Dr Amit K Dutta, Gastroenterology, Christian Medical College and Hospital Vellore, Vellore, Tamil Nadu, India; akdutta1995{at}gmail.com

https://doi.org/10.1136/gutjnl-2024-332154

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

  • PROTON PUMP INHIBITION
  • META-ANALYSIS

We read with interest the population-based cohort studies by Abrahami et al on proton pump inhibitors (PPI) and the risk of gastric and colon cancers. 1 2 PPI are used at all levels of healthcare and across different subspecialties for various indications. 3 4 A recent systematic review on the global trends and practices of PPI recognised 28 million PPI users from 23 countries, suggesting that 23.4% of the adults were using PPI. 5 Inappropriate use of PPI appears to be frequent, although there is a lack of compiled information on the prevalence of inappropriate overuse of PPI. Hence, we conducted a systematic review and meta-analysis on the inappropriate overuse of PPI globally.

Supplemental material

Overall, 79 studies, including 20 050 patients, reported on the inappropriate overuse of PPI and were included in this meta-analysis. The pooled proportion of inappropriate overuse of PPI was 0.60 (95% CI 0.55 to 0.65, I 2 97%, figure 1 ). The proportion of inappropriate overuse by dose was 0.17 (0.08 to 0.33) and by duration of use was 0.17 (0.07 to 0.35). Subgroup analysis was done to assess for heterogeneity ( figure 2A ). No significant differences in the pooled proportion of inappropriate overuse were noted based on the study design, setting (inpatient or outpatient), data source, human development index of the country, indication for use, sample size estimation, year of publication and study quality. However, regional differences were noted (p<0.01): Australia—40%, North America—56%, Europe—61%, Asia—62% and Africa—91% ( figure 2B ). The quality of studies was good in 27.8%, fair in 62.03% and low in 10.12%. 6

  • Download figure
  • Open in new tab
  • Download powerpoint

Forest plot showing inappropriate overuse of proton pump inhibitors.

(A) Subgroup analysis of inappropriate overuse of proton pump inhibitors (PPI). (B) Prevalence of inappropriate overuse of PPI across different countries of the world. NA, data not available.

This is the first systematic review and meta-analysis on global prescribing inappropriateness of PPI. The results of this meta-analysis are concerning and suggest that about 60% of PPI prescriptions in clinical practice do not have a valid indication. The overuse of PPI appears to be a global problem and across all age groups including geriatric subjects (63%). Overprescription increases the patient’s cost, pill burden and risk of adverse effects. 7–9 The heterogeneity in the outcome data persisted after subgroup analysis. Hence, this may be inherent to the practice of PPI use rather than related to factors such as study design, setting or study quality.

Several factors (both physician and patient-related) may contribute to the high magnitude of PPI overuse. These include a long list of indications for use, availability of the drug ‘over the counter’, an exaggerated sense of safety, and lack of awareness about the correct indications, dose and duration of therapy. A recently published guideline makes detailed recommendations on the accepted indications for the use of PPI, including the dose and duration, and further such documents may help to promote its rational use. 3 Overall, there is a need for urgent adoption of PPI stewardship practices, as is done for antibiotics. Apart from avoiding prescription when there is no indication, effective deprescription strategies are also required. 10 We hope the result of the present systematic review and meta-analysis will create awareness about the current situation and translate into a change in clinical practice globally.

Ethics statements

Patient consent for publication.

Not applicable.

Ethics approval

  • Abrahami D ,
  • McDonald EG ,
  • Schnitzer ME , et al
  • Jearth V , et al
  • Malfertheiner P ,
  • Megraud F ,
  • Rokkas T , et al
  • Shanika LGT ,
  • Reynolds A ,
  • Pattison S , et al
  • O’Connell D , et al
  • Choudhury A ,
  • Gillis KA ,
  • Lees JS , et al
  • Paynter S , et al
  • Targownik LE ,
  • Fisher DA ,

Supplementary materials

Supplementary data.

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

  • Data supplement 1

X @drvishal82

Contributors AKD: concept, study design, data acquisition and interpretation, drafting the manuscript and approval of the manuscript. VS: study design, data acquisition, analysis and interpretation, drafting the manuscript and approval of the manuscript. AJ, AE, MKP, SM: data acquisition and interpretation, critical revision of the manuscript, and approval of the manuscript. RK: study design, data analysis and interpretation, critical revision of the manuscript and approval of the manuscript. MP, MG, SB, UD, DNR, RK: data interpretation, critical revision of the manuscript and approval of the manuscript. GKM: concept, study design, data interpretation, drafting the manuscript, critical revision and approval of the manuscript.

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests None declared.

Provenance and peer review Not commissioned; internally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Read the full text or download the PDF:

This paper is in the following e-collection/theme issue:

Published on 29.4.2024 in Vol 26 (2024)

The Applications of Artificial Intelligence for Assessing Fall Risk: Systematic Review

Authors of this article:

Author Orcid Image

  • Ana González-Castro 1 , PT, MSc   ; 
  • Raquel Leirós-Rodríguez 2 , PT, PhD   ; 
  • Camino Prada-García 3 , MD, PhD   ; 
  • José Alberto Benítez-Andrades 4 , PhD  

1 Nursing and Physical Therapy Department, Universidad de León, Ponferrada, Spain

2 SALBIS Research Group, Nursing and Physical Therapy Department, Universidad de León, Ponferrada, Spain

3 Department of Preventive Medicine and Public Health, Universidad de Valladolid, Valladolid, Spain

4 SALBIS Research Group, Department of Electric, Systems and Automatics Engineering, Universidad de León, León, Spain

Corresponding Author:

Ana González-Castro, PT, MSc

Nursing and Physical Therapy Department

Universidad de León

Astorga Ave

Ponferrada, 24401

Phone: 34 987442000

Email: [email protected]

Background: Falls and their consequences are a serious public health problem worldwide. Each year, 37.3 million falls requiring medical attention occur. Therefore, the analysis of fall risk is of great importance for prevention. Artificial intelligence (AI) represents an innovative tool for creating predictive statistical models of fall risk through data analysis.

Objective: The aim of this review was to analyze the available evidence on the applications of AI in the analysis of data related to postural control and fall risk.

Methods: A literature search was conducted in 6 databases with the following inclusion criteria: the articles had to be published within the last 5 years (from 2018 to 2024), they had to apply some method of AI, AI analyses had to be applied to data from samples consisting of humans, and the analyzed sample had to consist of individuals with independent walking with or without the assistance of external orthopedic devices.

Results: We obtained a total of 3858 articles, of which 22 were finally selected. Data extraction for subsequent analysis varied in the different studies: 82% (18/22) of them extracted data through tests or functional assessments, and the remaining 18% (4/22) of them extracted through existing medical records. Different AI techniques were used throughout the articles. All the research included in the review obtained accuracy values of >70% in the predictive models obtained through AI.

Conclusions: The use of AI proves to be a valuable tool for creating predictive models of fall risk. The use of this tool could have a significant socioeconomic impact as it enables the development of low-cost predictive models with a high level of accuracy.

Trial Registration: PROSPERO CRD42023443277; https://tinyurl.com/4sb72ssv

Introduction

According to alarming figures reported by the World Health Organization in 2021, falls cause 37.3 million injuries annually that require medical attention and result in 684,000 deaths [ 1 ]. These figures indicate a significant impact of falls on the health care system and on society, both directly and indirectly [ 2 , 3 ].

Life expectancy has progressively increased over the years, leading to an aging population [ 4 ]. By 2050, it is estimated that 16% of the population will be >65 years of age. In this group, the incidence of falls has steadily risen, becoming the leading cause of accidental injury and death (accounting for 55.8% of such deaths, according to some research) [ 5 , 6 ]. It is estimated that 30% of this population falls at least once a year, negatively impacting their physical and psychological well-being [ 7 , 8 ].

Physically, falls are often associated with severe complications that can lead to extended hospitalizations [ 9 ]. These hospitalizations are usually due to serious injuries, often cranioencephalic trauma, fractures, or soft tissue injuries [ 10 , 11 ]. Psychologically, falls among the older adult population tend to result in self-imposed limitations due to the fear of falling again [ 10 , 12 ]. These limitations lead to social isolation as individuals avoid participating in activities or even individual mobility [ 13 ]. Consequently, falls can lead to psychological conditions such as anxiety and depression [ 14 , 15 ]. Numerous research studies on the risk of falls are currently underway, with ongoing investigations into various innovations and intervention ideas [ 16 - 19 ]. These studies encompass the identification of fall risk factors [ 20 , 21 ], strategies for prevention [ 22 , 23 ], and the outcomes following rehabilitation [ 23 , 24 ].

In the health care field, artificial intelligence (AI) is characterized by data management and processing, offering new possibilities to the health care paradigm [ 24 ]. Some applications of AI in the health care domain include assessing tumor interaction processes [ 25 ], serving as a tool for image-based diagnostics [ 26 , 27 ], participating in virus detection [ 28 ], and, most importantly, as a statistical and predictive method [ 29 - 32 ].

Several publications have combined AI techniques to address health care issues [ 33 - 35 ]. Within the field of predictive models, it is important to understand certain differentiations. In AI, we have machine learning and deep learning [ 36 - 38 ]. Machine learning encompasses a set of techniques applied to data and can be done in a supervised or unsupervised manner [ 39 , 40 ]. On the other hand, deep learning is typically used to work with larger data sets compared to machine learning, and its computational cost is higher [ 41 , 42 ].

Some examples of AI techniques include the gradient boosting machine [ 43 ], learning method, and the long short-term memory (LSTM) [ 44 ] and the convolutional neural network (CNN) [ 45 ], all of them are deep learning methods.

For all the reasons mentioned in the preceding section, it was considered necessary to conduct a systematic review to analyze the scientific evidence of AI applications in the analysis of data related to postural control and the risk of falls.

Data Sources and Searches

This systematic review and meta-analysis were prospectively registered on PROSPERO (ID CRD42023443277) and followed the Meta-Analyses of Observational Studies in Epidemiology checklist [ 46 ] and the recommendations of the Cochrane Collaboration [ 47 ].

The search was conducted in January 2024 on the following databases: PubMed, Scopus, ScienceDirect, Web of Science, CINAHL, and Cochrane Library. The Medical Subject Headings (MeSH) terms used for the search included machine learning , artificial intelligent , accidental falls , rehabilitation , and physical therapy specialty . The terms “predictive model” and “algorithms” were also used. These terms were combined using the Boolean operators AND and OR ( Textbox 1 ).

  • (“machine learning”[MeSH Terms] OR “artificial intelligent”[MeSH Terms]) AND “accidental falls”[MeSH Terms]
  • (“machine learning”[MeSH Terms] OR “artificial intelligent”) AND (“rehabilitation”[MeSH Terms] OR “physical therapy specialty”[MeSH Terms])
  • “accidental falls” [Title/Abstract] AND “algorithms” [Title/Abstract]
  • “accidental falls”[Title/Abstract] AND “predictive model” [Title/Abstract]
  • TITLE-ABS-KEY (“machine learning” OR “artificial intelligent”) AND TITLE-ABS-KEY (“accidental falls”)
  • TITLE-ABS-KEY (“machine learning” OR “artificial intelligent”) AND TITLE-ABS-KEY (“rehabilitation” OR “physical therapy specialty”)
  • TITLE-ABS-KEY (“accidental falls” AND “algorithms”)
  • TITLE-ABS-KEY (“accidental falls” AND “predictive model”)

ScienceDirect

  • Title, abstract, keywords: (“machine learning” OR “artificial intelligent”) AND “accidental falls”
  • Title, abstract, keywords: (“machine learning” OR “artificial intelligent”) AND (“rehabilitation” OR “physical therapy specialty”)
  • Title, abstract, keywords: (“accidental falls” AND “algorithms”)
  • Title, abstract, keywords: (“accidental falls” AND “predictive model”)

Web of Science

  • TS=(“machine learning” OR “artificial intelligent”) AND TS=“accidental falls”
  • TS=(“machine learning” OR “artificial intelligent”) AND TS= (“rehabilitation” OR “physical therapy specialty”)
  • AB= (“accidental falls” AND “algorithms”)
  • AB= (“accidental falls” AND “predictive model”)
  • (MH “machine learning” OR MH “artificial intelligent”) AND MH “accidental falls”
  • (MH “machine learning” OR MH “artificial intelligent”) AND (MH “rehabilitation” OR MH “physical therapy specialty”)
  • (AB “accidental falls”) AND (AB “algorithms”)
  • (AB “accidental falls”) AND (AB “predictive model”)

Cochrane Library

  • (“machine learning” OR “artificial intelligent”) in Title Abstract Keyword AND “accidental falls” in Title Abstract Keyword
  • (“machine learning” OR “artificial intelligent”) in Title Abstract Keyword AND (“rehabilitation” OR “physical therapy specialty”) in Title Abstract Keyword
  • “accidental falls” in Title Abstract Keyword AND “algorithms” in Title Abstract Keyword
  • “accidental falls” in Title Abstract Keyword AND “predictive model” in Title Abstract Keyword

Study Selection

After removing duplicates, 2 reviewers (AGC and RLR) independently screened articles for eligibility. In the case of disagreement, a third reviewer (JABA) finally decided whether the study should be included or not. We calculated the κ coefficient and percentage agreement scores to assess reliability before any consensus and estimated the interrater reliability using κ. Interrater reliability was estimated using κ>0.7 indicating a high level of agreement between the reviewers, κ of 0.5 to 0.7 indicating a moderate level of agreement, and κ<0.5 indicating a low level of agreement [ 48 ].

For the selection of results, the inclusion criteria were established as follows: (1) articles should have been published in the last 5 years (from 2018 to the present); (2) they must apply some AI method; (3) AI analyses should be applied to data from samples of humans; and (4) the sample analyzed should consist of people with independent walking, with or without the use of external orthopedic devices.

After screening the data, extracting, obtaining, and screening the titles and abstracts for inclusion criteria, the selected abstracts were obtained in full texts. Titles and abstracts lacking sufficient information regarding inclusion criteria were also obtained as full texts. Full-text articles were selected in case of compliance with inclusion criteria by the 2 reviewers using a data extraction form.

Data Extraction and Quality Assessment

The 2 reviewers mentioned independently extracting data from the included studies using a customized data extraction table in Excel (Microsoft Corporation). In case of disagreement, both reviewers debated until an agreement was reached.

The data extracted from the included articles for further analysis were: demographic information (title, authors, journal, and year), characteristics of the sample (age, inclusion and exclusion criteria, and number of participants), study-specific parameters (study type, AI techniques applied, and data analyzed), and the results obtained. Tables were used to describe both the studies’ characteristics and the extracted data.

Assessment of Risk of Bias

The methodological quality of the selected articles was evaluated using the Critical Review Form for Quantitative Studies [ 49 ]. The ROBINS-E (Risk of Bias in Nonrandomized Studies of Exposures) tool was used to evaluate the risk of bias [ 50 ].

Characteristics of the Selected Studies

A total of 3858 articles were initially retrieved, with 1563 duplicates removed. From the remaining 2295 articles, 2271 were excluded based on the initial selection criteria, leaving 24 articles for the subsequent analysis. In this second analysis, 2 articles were removed as they were systematic reviews, and 22 articles were finally selected [ 51 - 72 ] ( Figure 1 ). After the first reading of all candidate full texts, the kappa score for inclusion of the results of reviewers 1 and 2 was 0.98, indicating a very high level of agreement.

The methodological quality of the 22 analyzed studies (Table S1 in Multimedia Appendix 1 [ 51 , 52 , 54 , 56 , 58 , 59 , 61 , 63 , 64 , 69 , 70 , 72 ]) ranged from 11 points in 2 (9.1%) studies [ 52 , 65 ] to 16 points in 7 (32%) studies [ 53 , 54 , 56 , 63 , 69 - 71 ].

literature review data extraction

Study Characteristics and Risk of Bias

All the selected articles were cross-sectional observational studies ( Table 1 ).

In total, 34 characteristics affecting the risk of falls were extracted and classified into high fall-risk and low fall-risk groups with the largest sample sizes significantly differing from the rest. Studies based on data collected from various health care systems had larger sample sizes, ranging from 22,515 to 265,225 participants [ 60 , 65 , 67 ]. In contrast, studies that applied some form of evaluation test had sample sizes ranging from 8 participants [ 56 ] to 746 participants [ 55 ].

It is worth noting the various studies conducted by Dubois et al [ 54 , 72 ], whose publications on fall risk and machine learning started in 2018 and progressed until 2021. A total of 9.1% (2/22) of the articles by this author were included in the final selection [ 54 , 72 ]. Both articles used samples with the same characteristics, even though the first one was composed of 43 participants [ 54 ] and the last one had 30 participants [ 72 ]. All 86.4% (19/22) of the articles used samples of individuals aged ≥65 years [ 51 - 60 , 62 - 65 , 68 - 72 ]. In the remaining 13.6% (3/22) of the articles, the ages ranged between 16 and 62 years [ 61 , 66 , 67 ].

Althobaiti et al [ 61 ] used a sample of participants between the ages of 19 and 35 years for their research, where these participants had to reproduce examples of falls for subsequent analysis. In 2022, Ladios-Martin et al [ 67 ] extracted medical data from participants aged >16 years for their research. Finally, in 2023, the study by Maray et al [ 66 ] used 3 types of samples, with ages ranging from 21 to 62 years. Among the 22 selected articles, only 1 (4.5%) of them did not describe the characteristics of its sample [ 52 ].

Finally, regarding the sex of the samples, 13.6% (3/22) of the articles specified in the characteristics of their samples that only female individuals were included among their participants [ 53 , 59 , 70 ].

a AI: artificial intelligence.

b ML: machine learning.

c nd: none described.

d ADL: activities of daily living.

e TUG: Timed Up and Go.

f BBS: Berg Balance Scale.

g ASM: associative skill memories.

h CNN: convolutional neural network.

i FP: fall prevention.

j IMU: inertial measurement unit.

k AUROC: area under the receiver operating characteristic curve.

l AUPR: area under the precision-recall curve.

m MFS: Morse Fall Scale.

n XGB: extreme gradient boosting.

o MCT: motor control test.

p GBM: gradient boosting machine.

q RF: random forest.

r LOOCV: leave-one-out cross-validation.

s LSTM: long short-term memory.

Applied Assessment Procedures

All articles initially analyzed the characteristics of their samples to subsequently create a predictive model of the risk of falls. However, they did not all follow the same evaluation process.

Regarding the applied assessment procedures, 3 main options stood out: studies with tests or assessments accompanied by sensors or accelerometers [ 51 - 57 , 59 , 61 - 63 , 66 , 70 - 72 ], studies with tests or assessments accompanied by cameras [ 68 , 69 ], or studies based on medical records [ 58 , 60 , 65 , 67 ] ( Figure 2 ). Guillan et al [ 64 ] performed a physical and functional evaluation of the participants. In their study, they evaluated parameters such as walking speed, stride frequency and length, and the minimum space between the toes. Afterward, they asked them to record the fall events they had during the past 2 years in a personal diary.

literature review data extraction

In total, 22.7% (5/22) of the studies used the Timed Up and Go test [ 53 , 54 , 69 , 71 , 72 ]. In 18.2% (4/22) of them, the participants performed the test while wearing a sensor to collect data [ 53 , 54 , 71 , 72 ]. In 1 (4.5%) study, the test was recorded with a camera for later analysis [ 69 ]. Another commonly used method in studies was to ask participants to perform everyday tasks or activities of daily living while a sensor collected data for subsequent analysis. Specifically, 18.2% (4/22) of the studies used this method to gather data [ 51 , 56 , 61 , 62 ].

A total of 22.7% (5/22) of the studies asked participants to simulate falls and nonfalls while a sensor collected data [ 52 , 61 - 63 , 66 ]. In this way, the data obtained were used to create the predictive model of falls. As for the tests used, Eichler et al [ 68 ] asked participants to perform the Berg Balance Scale while a camera recorded their performance.

Finally, other authors created their own battery of tests for data extraction [ 55 , 59 , 64 , 70 ]. Gillain et al [ 64 ] used gait records to analyze speed, stride length, frequency, symmetry, regularity, and foot separation. Hu et al [ 59 ] asked their participants to perform normal walking, the postural reflexive response test, and the motor control test. In the study by Noh et al [ 55 ], gait tests were conducted, involving walking 20 m at different speeds. Finally, Greene et al [ 70 ] created a 12-question questionnaire and asked their participants to maintain balance while holding a mobile phone in their hand.

AI Techniques

The selected articles used various techniques within AI. They all had the same objective in applying these techniques, which was to achieve a predictive and classification model for the risk of falls [ 51 - 72 ].

In chronological order, in 2018, Nait Aicha et al [ 51 ] compared single-task learning models with multitask learning, obtaining better evaluation results through multitask learning. In the same year, Dubois et al [ 54 ] applied AI techniques that analyzed multiple parameters to classify the risk of falls in their sample. Qiu et al [ 53 ], also in the same year, used 6 machine learning models (logistic regression, naïve Bayes, decision tree, RF, boosted tree, and support vector machine) in their research.

In contrast, in 2019, Ferrete et al [ 52 ] compared the applicability of 2 different deep learning models: the classifier based on associative skill memories and a CNN classifier. In the same year, after confirming the applicability of AI as a predictive method for the risk of falls, various authors investigated through methods such as the RF to identify factors that can predict and quantify the risk of falls [ 63 , 65 ].

Among the selected articles, 5 (22.7%) were published in 2020 [ 58 - 62 ]. The research conducted by Tunca et al [ 62 ] compared the applicability of deep learning LSTM networks with traditional machine learning applied to the risk of falls. Hu et al [ 59 ] first used cross-validation, where algorithms were trained randomly, and then used the gradient boosting machine algorithm to classify participants as high or low risk. Ye et al [ 60 ] and Hsu et al [ 58 ] both used the extreme gradient boosting (XGBoost) algorithm based on machine learning to create their predictive model. In the same year, Althobaiti et al [ 61 ] trained machine learning models for their research.

In 2021, Lockhart et al [ 57 ] started using 3 deep learning techniques simultaneously with the same goal as before: to create a predictive model for the risk of falls. Specifically, they used the RF, RF with feature engineering, and RF with feature engineering and linear and nonlinear variables. Noh et al [ 55 ], again in the same year, used the XGBoost algorithm, while Roshdibenam et al [ 71 ], on the other hand, used the CNN algorithm for each location of the wearable sensors used in their research. Various machine learning techniques were used for classifying the risk of falls and for balance loss events in the research by Hauth et al [ 56 ]. Dubois et al [ 72 ] used the following algorithms: decision tree, adaptive boosting, neural net, naïve Bayes, k-nearest neighbors, linear support vector machine, radial basis function support vector machine, RF, and quadratic discriminant analysis. Hauth et al [ 56 ], on the other hand, used regularized logistic regression and bidirectional LSTM networks. In the research conducted by Greene et al [ 70 ], AI was used, but the specific procedure that they followed is not described.

Tang et al [ 69 ] published their research with innovation up to that point. In their study, they used a smart gait analyzer with the help of deep learning techniques to assess the diagnostic accuracy of fall risk through vision. Months later, in August 2022, Ladios-Martin et al [ 67 ] published their research, in which they compared 2 deep learning models to achieve the best results in terms of specificity and sensitivity in detecting fall risk. The first model used the Bayesian Point Machine algorithm with a fall prevention variable, and the second one did not use the variable. They obtained better results when using that variable, a mitigating factor defined as a set of care interventions carried out by professionals to prevent the patient from experiencing a fall during hospitalization. Particularly controversial, as its exclusion could obscure the model’s performance. Eichler et al [ 68 ], on the other hand, used machine learning–based classifier training and later tested the performance of RFs in score predictions.

Finally, in January 2023, Maray et al [ 66 ] published their research, linking the previously mentioned terms (AI and fall risk) with 3 wearable devices that are commonly used today. They collected data through these devices and applied transfer learning to generalize the model across heterogeneous devices.

The results of the 22 articles provided promising data, and all of them agreed on the feasibility of applying various AI techniques as a method for predicting and classifying the risk of falls. Specifically, the accuracy values obtained in the studies exceed 70%. Noh et al [ 55 ] achieved the “lowest” accuracy among the studies conducted, with a 70% accuracy rate. Ribeiro et al [ 52 ] obtained an accuracy of 92.7% when using CNN to differentiate between normal gait and fall events. Hsu et al [ 58 ] further demonstrated that the XGBoost model is more sensitive than the Morse Fall Scale. Similarly, in their comparative study, Nait Aicha et al [ 51 ] also showed that a predictive model created from accelerometer data with AI is comparable to conventional models for assessing the risk of falls. More specifically, Dubois et al [ 54 ] concluded that using 1 gait-related parameter (excluding velocity) in combination with another parameter related to seated position allowed for the correct classification of individuals according to their risk of falls.

Principal Findings

The aim of this research was to analyze the scientific evidence regarding the applications of AI in the analysis of data related to postural control and the risk of falls. On the basis of the analysis of results, it can be asserted that the following risk factors were identified in the analyzed studies: age [ 65 ], daily habits [ 65 ], clinical diagnoses [ 65 ], environmental and hygiene factors [ 65 ], sex [ 64 ], stride length [ 55 , 72 ], gait speed [ 55 ], and posture [ 55 ]. This aligns with other research that also identifies sex [ 73 , 74 ], age [ 73 ], and gait speed [ 75 ].

On the other hand, the “fear of falling” has been identified in various studies as a risk factor and a predictor of falls [ 73 , 76 ], but it was not identified in any of the studies included in this review.

As for the characteristics of the analyzed samples, only 9.1% (2/22) of the articles used a sample composed exclusively of women [ 53 , 59 ], and no article used a sample composed exclusively of men. This fact is incongruent with reality, as women have a longer life expectancy than men, and therefore, the number of women aged >65 years is greater than the number of men of the same age [ 77 ]. Furthermore, women experience more falls than men [ 78 ]. The connection between menopause and its consequences, including osteopenia, suggests a higher risk of falls among older women than among men of the same age [ 79 , 80 ].

Within the realm of analysis tools, the most frequently used devices to analyze participants were accelerometers [ 51 - 57 , 59 , 61 - 63 , 66 , 70 - 72 ]. However, only 36.4% (8/22) of the studies provided all the information regarding the characteristics of these devices [ 51 , 53 , 59 , 61 , 63 , 66 , 70 , 72 ]. On the other hand, 18.2% (4/22) of the studies used the term “inertial measurement unit” as the sole description of the devices used [ 55 - 57 , 71 ].

The fact that most of the analyzed procedures involved the use of inertial sensors reflects the current widespread use of these devices for postural control analysis. These sensors, in general (and triaxial accelerometers in particular), have demonstrated great diagnostic capacity for balance [ 81 ]. In addition, they exhibit good sensitivity and reliability, combined with their portability and low economic cost [ 82 ]. Another advantage of triaxial accelerometers is their versatility in both adult and pediatric populations [ 83 - 86 ], although the studies included in this review did not address the pediatric population.

The remaining studies extracted data from cameras [ 68 , 69 ], medical records [ 58 , 60 , 65 , 67 ], and other functional and clinical tests [ 59 , 64 , 70 ]. Regarding the AI techniques used, out of the 18.2% (4/22) of articles that used deep learning techniques [ 52 , 57 , 62 , 71 ], only 4.5% (1/22) did not provide a description of the sample characteristics [ 52 ]. In this case, the authors focused on the AI landscape, while the rest of the articles strike a balance between AI and health sciences.

Regarding the validity of the generated models, only 40.9% (9/22) of the articles assessed this characteristic [ 52 , 53 , 55 , 61 - 64 , 68 , 69 ]. The authors of these 9 (N=22, 40.9%) articles evaluated the validity of the models through accuracy. All the results obtained reflected accuracies exceeding 70%, with Ribeiro et al [ 52 ] achieving a notable accuracy of 92.7% and 100%. Specifically, they obtained a 92.7% accuracy through the CNN model for distinguishing normal gait, the prefall condition, and the falling situation, considering the step before the fall, and 100% when not considering it [ 52 ].

The positive results of sensitivity and specificity can only be compared between the studies of Qiu et al [ 53 ] and Gillain et al [ 64 ], as they were the only ones to take them into account, and in both investigations, they were very high. Similarly, in the case of the F 1 -score, only Althobaiti et al [ 61 ] examined this validity measure. This measure is the result of combining precision and recall into a single figure, and the outcome obtained by these researchers was promising.

Despite these differences, the 22 studies obtained promising results in the health care field [ 51 - 72 ]. Specifically, their outcomes highlight the potential of AI integration into clinical settings. However, further research is necessary to explore how health care professionals can effectively use these predictive models. Consequently, future research should focus on studying the application and integration of the already-developed models. In this context, fall prevention plans could be implemented for the target populations identified by the predictive models. This approach would allow for a retrospective analysis to determine if the combination of predictive models with prevention programs effectively reduces the prevalence of falls in the population.

Limitations

Regarding limitations, the articles showed significant variation in the sample sizes selected. Moreover, even in the study with the largest sample size (with 265,225 participants [ 60 ]), the amount of data analyzed was relatively small. In addition, several of the databases used were not generated specifically for the published research but rather derived from existing medical records [ 58 , 60 , 65 , 67 ]. This could explain the significant variability in the variables analyzed across different studies.

Despite the limitations, this research has strengths, such as being the first systematic review on the use of AI as a tool to analyze postural control and the risk of falls. Furthermore, a total of 6 databases were used for the literature search, and a comprehensive article selection process was carried out by 3 researchers. Finally, only cross-sectional observational studies were selected, and they shared the same objective.

Conclusions

The use of AI in the analysis of data related to postural control and the risk of falls proves to be a valuable tool for creating predictive models of fall risk. It has been identified that most AI studies analyze accelerometer data from sensors, with triaxial accelerometers being the most frequently used.

For future research, it would be beneficial to provide more detailed descriptions of the measurement procedures and the AI techniques used. In addition, exploring larger databases could lead to the development of more robust models.

Conflicts of Interest

None declared.

Quality scores of reviewed studies (Critical Review Form for Quantitative Studies tool results).

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) checklist.

  • Step safely: strategies for preventing and managing falls across the life-course. World Health Organization. 2021. URL: https://www.who.int/publications/i/item/978924002191-4 [accessed 2024-04-02]
  • Keall MD, Pierse N, Howden-Chapman P, Guria J, Cunningham CW, Baker MG. Cost-benefit analysis of fall injuries prevented by a programme of home modifications: a cluster randomised controlled trial. Inj Prev. Feb 2017;23(1):22-26. [ CrossRef ] [ Medline ]
  • Almada M, Brochado P, Portela D, Midão L, Costa E. Prevalence of falls and associated factors among community-dwelling older adults: a cross-sectional study. J Frailty Aging. 2021;10(1):10-16. [ CrossRef ] [ Medline ]
  • Menéndez-González L, Izaguirre-Riesgo A, Tranche-Iparraguirre S, Montero-Rodríguez Á, Orts-Cortés MI. [Prevalence and associated factors of frailty in adults over 70 years in the community]. Aten Primaria. Dec 2021;53(10):102128. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Guirguis-Blake JM, Michael YL, Perdue LA, Coppola EL, Beil TL. Interventions to prevent falls in older adults: updated evidence report and systematic review for the US preventive services task force. JAMA. Apr 24, 2018;319(16):1705-1716. [ CrossRef ] [ Medline ]
  • Pereira CB, Kanashiro AM. Falls in older adults: a practical approach. Arq Neuropsiquiatr. May 2022;80(5 Suppl 1):313-323. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Byun M, Kim J, Kim M. Physical and psychological factors affecting falls in older patients with arthritis. Int J Environ Res Public Health. Feb 09, 2020;17(3):1098. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Goh HT, Nadarajah M, Hamzah NB, Varadan P, Tan MP. Falls and fear of falling after stroke: a case-control study. PM R. Dec 04, 2016;8(12):1173-1180. [ CrossRef ] [ Medline ]
  • Alanazi FK, Lapkin S, Molloy L, Sim J. The impact of safety culture, quality of care, missed care and nurse staffing on patient falls: a multisource association study. J Clin Nurs. Oct 12, 2023;32(19-20):7260-7272. [ CrossRef ] [ Medline ]
  • Hossain A, Lall R, Ji C, Bruce J, Underwood M, Lamb SE. Comparison of different statistical models for the analysis of fracture events: findings from the Prevention of Falls Injury Trial (PreFIT). BMC Med Res Methodol. Oct 02, 2023;23(1):216. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Williams CT, Whyman J, Loewenthal J, Chahal K. Managing geriatric patients with falls and fractures. Orthop Clin North Am. Jul 2023;54(3S):e1-12. [ CrossRef ] [ Medline ]
  • Gadhvi C, Bean D, Rice D. A systematic review of fear of falling and related constructs after hip fracture: prevalence, measurement, associations with physical function, and interventions. BMC Geriatr. Jun 23, 2023;23(1):385. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Lohman MC, Fallahi A, Mishio Bawa E, Wei J, Merchant AT. Social mediators of the association between depression and falls among older adults. J Aging Health. Aug 12, 2023;35(7-8):593-603. [ CrossRef ] [ Medline ]
  • Smith AD, Silva AO, Rodrigues RA, Moreira MA, Nogueira JD, Tura LF. Assessment of risk of falls in elderly living at home. Rev Lat Am Enfermagem. Apr 06, 2017;25:e2754. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Koh V, Matchar DB, Chan A. Physical strength and mental health mediate the association between pain and falls (recurrent and/or injurious) among community-dwelling older adults in Singapore. Arch Gerontol Geriatr. Sep 2023;112:105015. [ CrossRef ] [ Medline ]
  • Soh SE, Morgan PE, Hopmans R, Barker AL, Ackerman IN. The feasibility and acceptability of a falls prevention e-learning program for physiotherapists. Physiother Theory Pract. Mar 18, 2023;39(3):631-640. [ CrossRef ] [ Medline ]
  • Morat T, Snyders M, Kroeber P, De Luca A, Squeri V, Hochheim M, et al. Evaluation of a novel technology-supported fall prevention intervention - study protocol of a multi-centre randomised controlled trial in older adults at increased risk of falls. BMC Geriatr. Feb 18, 2023;23(1):103. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • You T, Koren Y, Butts WJ, Moraes CA, Yeh GY, Wayne PM, et al. Pilot studies of recruitment and feasibility of remote Tai Chi in racially diverse older adults with multisite pain. Contemp Clin Trials. May 2023;128:107164. [ CrossRef ] [ Medline ]
  • Aldana-Benítez D, Caicedo-Pareja MJ, Sánchez DP, Ordoñez-Mora LT. Dance as a neurorehabilitation strategy: a systematic review. J Bodyw Mov Ther. Jul 2023;35:348-363. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Jawad A, Baattaiah BA, Alharbi MD, Chevidikunnan MF, Khan F. Factors contributing to falls in people with multiple sclerosis: the exploration of the moderation and mediation effects. Mult Scler Relat Disord. Aug 2023;76:104838. [ CrossRef ] [ Medline ]
  • Warren C, Rizo E, Decker E, Hasse A. A comprehensive analysis of risk factors associated with inpatient falls. J Patient Saf. Oct 01, 2023;19(6):396-402. [ CrossRef ] [ Medline ]
  • Gross M, Roigk P, Schoene D, Ritter Y, Pauly P, Becker C, et al. Bundesinitiative Sturzprävention. [Update of the recommendations of the federal falls prevention initiative-identification and prevention of the risk of falling in older people living at home]. Z Gerontol Geriatr. Oct 11, 2023;56(6):448-457. [ CrossRef ] [ Medline ]
  • Li S, Li Y, Liang Q, Yang WJ, Zi R, Wu X, et al. Effects of tele-exercise rehabilitation intervention on women at high risk of osteoporotic fractures: study protocol for a randomised controlled trial. BMJ Open. Nov 07, 2022;12(11):e064328. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Jiang F, Jiang Y, Zhi H, Dong Y, Li H, Ma S, et al. Artificial intelligence in healthcare: past, present and future. Stroke Vasc Neurol. Dec 2017;2(4):230-243. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Ye Y, Wu X, Wang H, Ye H, Zhao K, Yao S, et al. Artificial intelligence-assisted analysis for tumor-immune interaction within the invasive margin of colorectal cancer. Ann Med. Dec 2023;55(1):2215541. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kuwahara T, Hara K, Mizuno N, Haba S, Okuno N, Fukui T, et al. Current status of artificial intelligence analysis for the treatment of pancreaticobiliary diseases using endoscopic ultrasonography and endoscopic retrograde cholangiopancreatography. DEN Open. Apr 30, 2024;4(1):e267. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Yokote A, Umeno J, Kawasaki K, Fujioka S, Fuyuno Y, Matsuno Y, et al. Small bowel capsule endoscopy examination and open access database with artificial intelligence: the SEE-artificial intelligence project. DEN Open. Apr 22, 2024;4(1):e258. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Ramalingam M, Jaisankar A, Cheng L, Krishnan S, Lan L, Hassan A, et al. Impact of nanotechnology on conventional and artificial intelligence-based biosensing strategies for the detection of viruses. Discov Nano. Dec 01, 2023;18(1):58. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Yerukala Sathipati S, Tsai MJ, Shukla SK, Ho SY. Artificial intelligence-driven pan-cancer analysis reveals miRNA signatures for cancer stage prediction. HGG Adv. Jul 13, 2023;4(3):100190. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Liu J, Dan W, Liu X, Zhong X, Chen C, He Q, et al. Development and validation of predictive model based on deep learning method for classification of dyslipidemia in Chinese medicine. Health Inf Sci Syst. Dec 06, 2023;11(1):21. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Carou-Senra P, Ong JJ, Castro BM, Seoane-Viaño I, Rodríguez-Pombo L, Cabalar P, et al. Predicting pharmaceutical inkjet printing outcomes using machine learning. Int J Pharm X. Dec 2023;5:100181. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Li X, Zhu Y, Zhao W, Shi R, Wang Z, Pan H, et al. Machine learning algorithm to predict the in-hospital mortality in critically ill patients with chronic kidney disease. Ren Fail. Dec 2023;45(1):2212790. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Bonnin M, Müller-Fouarge F, Estienne T, Bekadar S, Pouchy C, Ait Si Selmi T. Artificial intelligence radiographic analysis tool for total knee arthroplasty. J Arthroplasty. Jul 2023;38(7 Suppl 2):S199-207.e2. [ CrossRef ] [ Medline ]
  • Kao DP. Intelligent artificial intelligence: present considerations and future implications of machine learning applied to electrocardiogram interpretation. Circ Cardiovasc Qual Outcomes. Sep 2019;12(9):e006021. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • van der Stigchel B, van den Bosch K, van Diggelen J, Haselager P. Intelligent decision support in medical triage: are people robust to biased advice? J Public Health (Oxf). Aug 28, 2023;45(3):689-696. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Jakhar D, Kaur I. Artificial intelligence, machine learning and deep learning: definitions and differences. Clin Exp Dermatol. Jan 09, 2020;45(1):131-132. [ CrossRef ] [ Medline ]
  • Ghosh M, Thirugnanam A. Introduction to artificial intelligence. In: Srinivasa KG, Siddesh GM, Sekhar SR, editors. Artificial Intelligence for Information Management: A Healthcare Perspective. Cham, Switzerland. Springer; 2021;88-44.
  • Taulli T. Artificial Intelligence Basics: A Non-Technical Introduction. Berkeley, CA. Apress Berkeley; 2019.
  • Patil S, Joda T, Soffe B, Awan KH, Fageeh HN, Tovani-Palone MR, et al. Efficacy of artificial intelligence in the detection of periodontal bone loss and classification of periodontal diseases: a systematic review. J Am Dent Assoc. Sep 2023;154(9):795-804.e1. [ CrossRef ] [ Medline ]
  • Quek LJ, Heikkonen MR, Lau Y. Use of artificial intelligence techniques for detection of mild cognitive impairment: a systematic scoping review. J Clin Nurs. Sep 10, 2023;32(17-18):5752-5762. [ CrossRef ] [ Medline ]
  • Tan D, Mohd Nasir NF, Abdul Manan H, Yahya N. Prediction of toxicity outcomes following radiotherapy using deep learning-based models: a systematic review. Cancer Radiother. Sep 2023;27(5):398-406. [ CrossRef ] [ Medline ]
  • Rabilloud N, Allaume P, Acosta O, De Crevoisier R, Bourgade R, Loussouarn D, et al. Deep learning methodologies applied to digital pathology in prostate cancer: a systematic review. Diagnostics (Basel). Aug 14, 2023;13(16):2676. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Li K, Yao S, Zhang Z, Cao B, Wilson C, Kalos D, et al. Efficient gradient boosting for prognostic biomarker discovery. Bioinformatics. Mar 04, 2022;38(6):1631-1638. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Chen T, Chen Y, Li H, Gao T, Tu H, Li S. Driver intent-based intersection autonomous driving collision avoidance reinforcement learning algorithm. Sensors (Basel). Dec 16, 2022;22(24):9943. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Huynh QT, Nguyen PH, Le HX, Ngo LT, Trinh NT, Tran MT, et al. Automatic acne object detection and acne severity grading using smartphone images and artificial intelligence. Diagnostics (Basel). Aug 03, 2022;12(8):1879. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Brooke BS, Schwartz TA, Pawlik TM. MOOSE reporting guidelines for meta-analyses of observational studies. JAMA Surg. Aug 01, 2021;156(8):787-788. [ CrossRef ] [ Medline ]
  • Scholten RJ, Clarke M, Hetherington J. The Cochrane collaboration. Eur J Clin Nutr. Aug 28, 2005;59 Suppl 1(S1):S147-S196. [ CrossRef ] [ Medline ]
  • Warrens MJ. Kappa coefficients for dichotomous-nominal classifications. Adv Data Anal Classif. Apr 07, 2020;15(1):193-208. [ CrossRef ]
  • Law M, Stewart D, Letts L, Pollock N, Bosch J. Guidelines for critical review of qualitative studies. McMaster University Occupational Therapy Evidence-Based Practice Research Group. URL: https://www.canchild.ca/system/tenon/assets/attachments/000/000/360/original/qualguide.pdf [accessed 2024-04-05]
  • Higgins JP, Morgan RL, Rooney AA, Taylor KW, Thayer KA, Silva RA, et al. Risk of bias in non-randomized studies - of exposure (ROBINS-E). ROBINS-E tool. URL: https://www.riskofbias.info/welcome/robins-e-tool [accessed 2024-04-02]
  • Nait Aicha A, Englebienne G, van Schooten KS, Pijnappels M, Kröse B. Deep learning to predict falls in older adults based on daily-life trunk accelerometry. Sensors (Basel). May 22, 2018;18(5):1654. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Ribeiro NF, André J, Costa L, Santos CP. Development of a strategy to predict and detect falls using wearable sensors. J Med Syst. Apr 04, 2019;43(5):134. [ CrossRef ] [ Medline ]
  • Qiu H, Rehman RZ, Yu X, Xiong S. Application of wearable inertial sensors and a new test battery for distinguishing retrospective fallers from non-fallers among community-dwelling older people. Sci Rep. Nov 05, 2018;8(1):16349. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Dubois A, Bihl T, Bresciani JP. Automatic measurement of fall risk indicators in timed up and go test. Inform Health Soc Care. Sep 13, 2019;44(3):237-245. [ CrossRef ] [ Medline ]
  • Noh B, Youm C, Goh E, Lee M, Park H, Jeon H, et al. XGBoost based machine learning approach to predict the risk of fall in older adults using gait outcomes. Sci Rep. Jun 09, 2021;11(1):12183. [ CrossRef ] [ Medline ]
  • Hauth J, Jabri S, Kamran F, Feleke EW, Nigusie K, Ojeda LV, et al. Automated loss-of-balance event identification in older adults at risk of falls during real-world walking using wearable inertial measurement units. Sensors (Basel). Jul 07, 2021;21(14):4661. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Lockhart TE, Soangra R, Yoon H, Wu T, Frames CW, Weaver R. Prediction of fall risk among community-dwelling older adults using a wearable system. Sci Rep. 2021;11(1):20976. [ CrossRef ]
  • Hsu YC, Weng HH, Kuo CY, Chu TP, Tsai YH. Prediction of fall events during admission using eXtreme gradient boosting: a comparative validation study. Sci Rep. Oct 08, 2020;10(1):16777. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Hu Y, Bishnoi A, Kaur R, Sowers R, Hernandez ME. Exploration of machine learning to identify community dwelling older adults with balance dysfunction using short duration accelerometer data. In: Proceedings of the 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society. 2020. Presented at: EMBC '20; July 20-24, 2020;812-815; Montreal, QC. URL: https://ieeexplore.ieee.org/document/9175871 [ CrossRef ]
  • Ye C, Li J, Hao S, Liu M, Jin H, Zheng L, et al. Identification of elders at higher risk for fall with statewide electronic health records and a machine learning algorithm. Int J Med Inform. May 2020;137:104105. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Althobaiti T, Katsigiannis S, Ramzan N. Triaxial accelerometer-based falls and activities of daily life detection using machine learning. Sensors (Basel). Jul 06, 2020;20(13):3777. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Tunca C, Salur G, Ersoy C. Deep learning for fall risk assessment with inertial sensors: utilizing domain knowledge in spatio-temporal gait parameters. IEEE J Biomed Health Inform. Jul 2020;24(7):1994-2005. [ CrossRef ]
  • Kim K, Yun G, Park SK, Kim DH. Fall detection for the elderly based on 3-axis accelerometer and depth sensor fusion with random forest classifier. In: Proceedings of the 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 2019. Presented at: EMBC '19; July 23-27, 2019;4611-4614; Berlin, Germany. URL: https://ieeexplore.ieee.org/document/8856698 [ CrossRef ]
  • Gillain S, Boutaayamou M, Schwartz C, Brüls O, Bruyère O, Croisier JL, et al. Using supervised learning machine algorithm to identify future fallers based on gait patterns: a two-year longitudinal study. Exp Gerontol. Nov 2019;127:110730. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Lo Y, Lynch SF, Urbanowicz RJ, Olson RS, Ritter AZ, Whitehouse CR, et al. Using machine learning on home health care assessments to predict fall risk. Stud Health Technol Inform. Aug 21, 2019;264:684-688. [ CrossRef ] [ Medline ]
  • Maray N, Ngu AH, Ni J, Debnath M, Wang L. Transfer learning on small datasets for improved fall detection. Sensors (Basel). Jan 18, 2023;23(3):1105. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Ladios-Martin M, Cabañero-Martínez MJ, Fernández-de-Maya J, Ballesta-López FJ, Belso-Garzas A, Zamora-Aznar FM, et al. Development of a predictive inpatient falls risk model using machine learning. J Nurs Manag. Nov 30, 2022;30(8):3777-3786. [ CrossRef ] [ Medline ]
  • Eichler N, Raz S, Toledano-Shubi A, Livne D, Shimshoni I, Hel-Or H. Automatic and efficient fall risk assessment based on machine learning. Sensors (Basel). Feb 17, 2022;22(4):1557. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Tang YM, Wang YH, Feng XY, Zou QS, Wang Q, Ding J, et al. Diagnostic value of a vision-based intelligent gait analyzer in screening for gait abnormalities. Gait Posture. Jan 2022;91:205-211. [ CrossRef ] [ Medline ]
  • Greene BR, McManus K, Ader LG, Caulfield B. Unsupervised assessment of balance and falls risk using a smartphone and machine learning. Sensors (Basel). Jul 13, 2021;21(14):4770. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Roshdibenam V, Jogerst GJ, Butler NR, Baek S. Machine learning prediction of fall risk in older adults using timed up and go test kinematics. Sensors (Basel). May 17, 2021;21(10):3481. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Dubois A, Bihl T, Bresciani JP. Identifying fall risk predictors by monitoring daily activities at home using a depth sensor coupled to machine learning algorithms. Sensors (Basel). Mar 11, 2021;21(6):1957. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Vo MT, Thonglor R, Moncatar TJ, Han TD, Tejativaddhana P, Nakamura K. Fear of falling and associated factors among older adults in Southeast Asia: a systematic review. Public Health. Sep 2023;222:215-228. [ CrossRef ] [ Medline ]
  • Torun E, Az A, Akdemir T, Solakoğlu GA, Açiksari K, Güngörer B. Evaluation of the risk factors for falls in the geriatric population presenting to the emergency department. Ulus Travma Acil Cerrahi Derg. Aug 2023;29(8):897-903. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Son NK, Ryu YU, Jeong HW, Jang YH, Kim HD. Comparison of 2 different exercise approaches: Tai Chi versus Otago, in community-dwelling older women. J Geriatr Phys Ther. 2016;39(2):51-57. [ CrossRef ] [ Medline ]
  • Sawa R, Doi T, Tsutsumimoto K, Nakakubo S, Kurita S, Kiuchi Y, et al. Overlapping status of frailty and fear of falling: an elevated risk of incident disability in community-dwelling older adults. Aging Clin Exp Res. Sep 11, 2023;35(9):1937-1944. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Calazans JA, Permanyer I. Levels, trends, and determinants of cause-of-death diversity in a global perspective: 1990-2019. BMC Public Health. Apr 05, 2023;23(1):650. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kakara R, Bergen G, Burns E, Stevens M. Nonfatal and fatal falls among adults aged ≥65 years - United States, 2020-2021. MMWR Morb Mortal Wkly Rep. Sep 01, 2023;72(35):938-943. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Dostan A, Dobson CA, Vanicek N. Relationship between stair ascent gait speed, bone density and gait characteristics of postmenopausal women. PLoS One. Mar 22, 2023;18(3):e0283333. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Zheng Y, Wang X, Zhang ZK, Guo B, Dang L, He B, et al. Bushen Yijing Fang reduces fall risk in late postmenopausal women with osteopenia: a randomized double-blind and placebo-controlled trial. Sci Rep. Feb 14, 2019;9(1):2089. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Woelfle T, Bourguignon L, Lorscheider J, Kappos L, Naegelin Y, Jutzeler CR. Wearable sensor technologies to assess motor functions in people with multiple sclerosis: systematic scoping review and perspective. J Med Internet Res. Jul 27, 2023;25:e44428. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Abdollah V, Dief TN, Ralston J, Ho C, Rouhani H. Investigating the validity of a single tri-axial accelerometer mounted on the head for monitoring the activities of daily living and the timed-up and go test. Gait Posture. Oct 2021;90:137-140. [ CrossRef ] [ Medline ]
  • Mielke GI, de Almeida Mendes M, Ekelund U, Rowlands AV, Reichert FF, Crochemore-Silva I. Absolute intensity thresholds for tri-axial wrist and waist accelerometer-measured movement behaviors in adults. Scand J Med Sci Sports. Sep 12, 2023;33(9):1752-1764. [ CrossRef ] [ Medline ]
  • Löppönen A, Delecluse C, Suorsa K, Karavirta L, Leskinen T, Meulemans L, et al. Association of sit-to-stand capacity and free-living performance using Thigh-Worn accelerometers among 60- to 90-yr-old adults. Med Sci Sports Exerc. Sep 01, 2023;55(9):1525-1532. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • García-Soidán JL, Leirós-Rodríguez R, Romo-Pérez V, García-Liñeira J. Accelerometric assessment of postural balance in children: a systematic review. Diagnostics (Basel). Dec 22, 2020;11(1):8. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Leirós-Rodríguez R, García-Soidán JL, Romo-Pérez V. Analyzing the use of accelerometers as a method of early diagnosis of alterations in balance in elderly people: a systematic review. Sensors (Basel). Sep 09, 2019;19(18):3883. [ FREE Full text ] [ CrossRef ] [ Medline ]

Abbreviations

Edited by A Mavragani; submitted 28.11.23; peer-reviewed by E Andrade, M Behzadifar, A Suárez; comments to author 09.01.24; revised version received 30.01.24; accepted 13.02.24; published 29.04.24.

©Ana González-Castro, Raquel Leirós-Rodríguez, Camino Prada-García, José Alberto Benítez-Andrades. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 29.04.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

IMAGES

  1. How to extract study data for your systematic review

    literature review data extraction

  2. (PDF) Data Extraction Templates in Systematic Literature Reviews: How

    literature review data extraction

  3. (PDF) Development, testing and use of data extraction forms in

    literature review data extraction

  4. Literature review and data extraction flow chart The entire review

    literature review data extraction

  5. PPT

    literature review data extraction

  6. Literature review and data extraction flow

    literature review data extraction

VIDEO

  1. How To Extract Data from Systematic Review and Meta-Analysis HD 1080 WEB H264 4000

  2. AI in Research... literature review, data analysis, text management and ethics

  3. 12 Important Practice Questions /Research Methodology in English Education /Unit-1 /B.Ed. 4th Year

  4. 5. Data extraction from graphs for most data types & practice using web plot digitizer

  5. Project 8: Keyword Extraction using Rapid Automatic Keyword Extraction (RAKE)

  6. Day 4 session1 FDP on " Systematic Literature Review & Data Collection" @ REC

COMMENTS

  1. Systematic Reviews: Step 7: Extract Data from Included Studies

    A librarian can advise you on data extraction for your systematic review, including: What the data extraction stage of the review entails; Finding examples in the literature of similar reviews and their completed data tables; How to choose what data to extract from your included articles ; How to create a randomized sample of citations for a ...

  2. Data extraction methods for systematic review (semi)automation: Update

    Background: The reliable and usable (semi)automation of data extraction can support the field of systematic review by reducing the workload required to gather information about the conduct and results of the included studies. This living systematic review examines published approaches for data extraction from reports of clinical studies. Methods: We systematically and continually search PubMed ...

  3. Summarising good practice guidelines for data extraction for systematic

    Data extraction is the process of a systematic review that occurs between identifying eligible studies and analysing the data, whether it can be a qualitative synthesis or a quantitative synthesis involving the pooling of data in a meta-analysis. The aims of data extraction are to obtain information about the included studies in terms of the characteristics of each study and its population and ...

  4. A practical guide to data analysis in general literature reviews

    This article is a practical guide to conducting data analysis in general literature reviews. The general literature review is a synthesis and analysis of published research on a relevant clinical issue, and is a common format for academic theses at the bachelor's and master's levels in nursing, physiotherapy, occupational therapy, public health and other related fields.

  5. Chapter 5: Collecting data

    Training of data extractors is intended to familiarize them with the review topic and methods, the data collection form or data system, and issues that may arise during data extraction. Results of the pilot testing of the form should prompt discussion among review authors and extractors of ambiguous questions or responses to establish consistency.

  6. A Guide to Evidence Synthesis: 10. Data Extraction

    Excel is the most basic tool for the management of the screening and data extraction stages of the systematic review process. Customized workbooks and spreadsheets can be designed for the review process. A more advanced approach to using Excel for this purpose is the PIECES approach, designed by a librarian at Texas A&M.

  7. Development, testing and use of data extraction forms in systematic

    Methods. We reviewed guidance on the development and pilot testing of data extraction forms and the data extraction process. We reviewed four types of sources: 1) methodological handbooks of systematic review organisations (SRO); 2) textbooks on conducting systematic reviews; 3) method documents from health technology assessment (HTA) agencies and 4) journal articles.

  8. Guidance on Conducting a Systematic Literature Review

    The entire literature review process, including literature search, data extraction and analysis, and reporting, should be tailored to answer the research question (Kitchenham and Charters 2007). Second, choose a review type suitable for the review purpose.

  9. Data extraction methods for systematic review (semi)automation: A

    1. Review published methods and tools aimed at automating or semi-automating the process of data extraction in the context of a systematic review of medical research studies. 2. Review this evidence in the scope of a living review, keeping information up to date and relevant to the challenges faced by systematic reviewers at any time.

  10. JABSOM Library: Systematic Review Toolbox: Data Extraction

    Extracting data from reviewed studies should be done in accordance to pre-established guidelines, such as the ones from PRISMA. From each included study, the following data may need to be extracted, depending on the review's purpose: title, author, year, journal, research question and specific aims, conceptual framework, hypothesis, research ...

  11. Data extraction and comparison for complex systematic reviews: a step

    Data extraction (DE) is a challenging step in systematic reviews (SRs). Complex SRs can involve multiple interventions and/or outcomes and encompass multiple research questions. Attempts have been made to clarify DE aspects focusing on the subsequent meta-analysis; there are, however, no guidelines for DE in complex SRs. Comparing datasets extracted independently by pairs of reviewers to ...

  12. Systematic Reviews and Meta-Analyses: Data Extraction

    Data Extraction Templates. Data extraction is often performed using a single form to extract data from all included (relevant) studies in a uniform manner.Because the data extraction stage is driven by the scope and goals of a systematic review, there is not a gold standard or one-size-fits all approach to developing a data extraction form.. However, there are templates and guidance available ...

  13. Data Extraction/Coding/Study characteristics/Results

    The data extraction forms can be used to produce a summary table of study characteristics that were considered important for inclusion. In the final report in the results section the characteristics of the studies that were included in the review should be reported for PRISMA Item 18 as:

  14. Data Extraction

    Data Collection. "data slide" by bionicteaching is licensed under CC BY-NC 2.0. This stage of the systematic review process involves transcribing information from each study using a structured piloted format designed to consistently and objectively capture the relevant details. Two reviewers working independently are preferred for accuracy.

  15. PDF Data Extraction Templates in Systematic Literature Reviews: How

    Systematic literature reviews (SLR) are the foundation informing clinical and cost-effectiveness analyses in healthcare decision-making. Established guidelines have encouraged the use of standardised data extraction templates (DET) to guide extraction, ensure transparency in information collected across the studies and allow qualitative and/or ...

  16. Five tips for developing useful literature summary tables for writing

    Literature reviews offer a critical synthesis of empirical and theoretical literature to assess the strength of evidence, develop guidelines for practice and policymaking, and identify areas for future research.1 It is often essential and usually the first task in any research endeavour, particularly in masters or doctoral level education. For effective data extraction and rigorous synthesis ...

  17. Validity of data extraction in evidence synthesis practice of adverse

    Objectives To investigate the validity of data extraction in systematic reviews of adverse events, the effect of data extraction errors on the results, and to develop a classification framework for data extraction errors to support further methodological research. Design Reproducibility study. Data sources PubMed was searched for eligible systematic reviews published between 1 January 2015 and ...

  18. Choice of data extraction tools for systematic reviews depends on

    Abstract. Objective: To assist investigators planning, coordinating, and conducting systematic reviews in the selection of data-extraction tools for conducting systematic reviews. Study design and setting: We constructed an initial table listing available data-collection tools and reflecting our experience with these tools and their performance.

  19. Automating data extraction in systematic reviews: a systematic review

    Despite their widely acknowledged usefulness [], the process of systematic review, specifically the data extraction step (step 4), can be time-consuming.In fact, it typically takes 2.5-6.5 years for a primary study publication to be included and published in a new systematic review [].Further, within 2 years of the publication of systematic reviews, 23 % are out of date because they have not ...

  20. Automating data extraction in systematic reviews: a systematic review

    Background: Automation of the parts of systematic review process, specifically the data extraction step, may be an important strategy to reduce the time necessary to complete a systematic review. However, the state of the science of automatically extracting data elements from full texts has not been well described. This paper performs a systematic review of published and unpublished methods to ...

  21. The role of sociodemographic factors on the acceptability of digital

    The data charting tool that will be used is subject to an Iterative process. As data are extracted, additional categories may emerge that warrant inclusion in the charting tool. As recommended in the JBI Manual of Evidence Synthesis , the data charting tool will be piloted by extracting data from 15 studies. This process will help ensure that ...

  22. Frontiers

    2.3 Literature screening and data extraction. Two researchers (MX-F and RM-L) independently screened and extracted the literatures, and cross-checked them. Any disagreement during this process were subject to discussion and negotiation or the decision of a third expert (WL-G).

  23. Untangling the mess of CGRP levels as a migraine biomarker: an in-depth

    Calcitonin gene-related peptide (CGRP) is the most promising candidate to become the first migraine biomarker. However, literature shows clashing results and suggests a methodological source for such discrepancies. We aimed to investigate some of these methodological factors to evaluate the actual role of CGRP as biomarker. Previous to the experimental part, we performed a literature review of ...

  24. Development, testing and use of data extraction forms in systematic

    Data extraction forms link systematic reviews with primary research and provide the foundation for appraising, analysing, summarising and interpreting a body of evidence. This makes their development, pilot testing and use a crucial part of the systematic reviews process. Several studies have shown that data extraction errors are frequent in systematic reviews, especially regarding outcome data.

  25. The Impact of Vaping on the Ocular Surface: A Systematic Review ...

    Methods: A systematic review of the literature was undertaken by keyword searching on the Embase, Medline, and Web of Science databases. Articles identified through the search underwent title/abstract screening, full-text screening, and data extraction. Results: A total of 18 studies were included in this review.

  26. Automating data extraction in systematic reviews: a systematic review

    Automation of the parts of systematic review process, specifically the data extraction step, may be an important strategy to reduce the time necessary to complete a systematic review. However, the state of the science of automatically extracting data elements from full texts has not been well described. This paper performs a systematic review of published and unpublished methods to automate ...

  27. Equitable and accessible informed healthcare consent process for people

    A systematic literature review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analysis ... with input from senior researchers with expertise in qualitative research (IS and EEP), to extract data relevant to the review's research aims. The form was piloted on one study, and a second iteration made based ...

  28. Inappropriate use of proton pump inhibitors in clinical practice

    The literature was searched from inception until 19 July 2023 (PubMed, Embase and Scopus) for studies that reported on the inappropriate overuse of PPI. Two investigators independently performed each step, from screening to study selection to data extraction and quality assessment. ... we conducted a systematic review and meta-analysis on the ...

  29. Journal of Medical Internet Research

    Background: Falls and their consequences are a serious public health problem worldwide. Each year, 37.3 million falls requiring medical attention occur. Therefore, the analysis of fall risk is of great importance for prevention. Artificial intelligence (AI) represents an innovative tool for creating predictive statistical models of fall risk through data analysis.