University of Jamestown Library Guides banner

Evidence-Based Practice: Research Guide | 5 Steps of EBP

5 steps of ebp.

  • Find: Databases for EBP
  • Appraise: Building an Evidence Table
  • Additional Resources

Online Research & Instruction Librarian

Profile Photo

Mon, Tues, Wed: 1p - 4p

Thurs & Sun: 6p - 9p

Please use the "Meet With Me" button to book your appointment.

  • Ask : Convert the need for information into an answerable question.
  • Find : Track down the best evidence with which to answer that question.
  • Appraise : Critically appraise that evidence for its validity and applicability.
  • Apply : Integrate the critical appraisal with clinical expertise and with the patient's unique biology, values, and circumstances.
  • Evaluate : Evaluate the effectiveness and efficiency in executing steps 1-4 and seek ways to improve them both for next time.

1. ASK: Using PICO

Formulating a strong clinical question is the first step in the research process.  PICO is a way of building clinical research questions that allow you to focus your research, and to create a query that better matches most medical databases.

  • Patient – Describe your patient or population.  What are the most important characteristics?  Include information on age, race, gender, medical conditions, etc.
  • Intervention – What is the main intervention or therapy you are considering? The can be as general as treat or observe, or as specific as a specific test or therapy.
  • Comparison Intervention – An alternative intervention or therapy you wish to compare to the first.
  • Outcome – What are you trying to do for the patient?  What is the clinical outcome?  What are the relevant outcomes?

Example: In a (describe patient) can (intervention A) affect (outcome) compared with (intervention B)?

  • Patient - 50 yr old man with diabetes
  • Intervention - weight loss and exercise
  • Comparison - medication
  • Outcome - maintaining blood sugar levels

2. FIND: Formulate a Search Strategy

Think about the keywords for each of the PICO parts of the clinical question.

Sample Question: Is prophylactic physical therapy for patients undergoing upper abdominal surgery effective in preventing post-operative pulmonary complications?

The PICO parts with keywords for this question would look like this:

You might also see PICO with an added T. The T often stands for either “time” or “type of study.” Time helps you consider the timeframe of an intervention or outcome, while type of study is a way to define the types or levels of evidence that you will need in order to answer your question. 

Databases for EBP Research

3. APPRAISE: Evidence & Evaluation

Different types of information provide different standards or levels of evidence. These levels depend on things like a study's design, objectives, and review process. You may be familiar with a pyramid diagram showing a hierarchy of types of evidence. Often included in pyramids of evidence are the following types of information: 

chart displaying different types of evidence

  • Clinical practice guidelines—recommendations for applying current medical knowledge (or evidence) to the treatment and care of a patient. 
  • Meta-analyses and systematic reviews—an approach to literature reviews that identifies all studies addressing a given research question based on specific inclusion criteria and analyzes the results of each study to produce a summary result. 
  • Randomized controlled trials (RCTs)—eligible participants are randomly assigned to study groups to test a treatment against a control group. In blinded trials, the participants and researchers do not know which study group participants have been assigned to. 
  • Cohort studies—follow a group of subjects over a period of time to determine the incidence or identify predictors of a certain condition. 
  • Case-control studies—compare two groups of subjects, one with the outcome and one without, to identify predictor variables associated with the outcome. 
  • Case reports/series, expert opinions, and editorials—reports on individual cases with no control groups involved, opinions based on one person’s experience and expertise 
  • Animal and laboratory studies—studies that do not involve humans 

The pyramid hierarchy places some types of evidence above others in terms of validity, objectivity, and transferability. It’s important to remember, however, that the best type of evidence to answer your research question depends on the nature of your question and what purpose you have for searching for evidence in the first place. Conducting a literature review, for example, is a very different situation than searching for an answer to a specific question about a particular case, patient, or situation. 

Evaluation Criteria:

  • Credibility (Internal Validity)
  • Transferability (External Validity)
  • Dependability (Reliability)
  • Confirmability (Objectivity)

Credibility: looks at truth and quality and asks, "Can you believe the results?"

Some questions you might ask are: Were patients randomized? Were patients analyzed in the groups to which they were (originally) randomized? Were patients in the treatment and control groups similar with respect to known prognostic factors?

Transferability: looks at external validity of the data and asks, "Can the results be transferred to other situations?"

Some questions you might ask are: Were patients in the treatment and control groups similar with respect to known prognostic factors? Was there a blind comparison with an independent gold standard? Were objective and unbiased outcome criteria used? Are the results of this study valid?

Dependability: looks at consistency of results and asks, "Would the results be similar if the study was repeated with the same subjects in a similar context?"

Some questions you might ask are: Aside from the experimental intervention, were the groups treated equally? Was follow-up complete? Was the sample of patients representative? Were the patients sufficiently homogeneous with respect to prognostic factors?

Confirmability: looks at neutrality and asks, "Was there an attempt to enhance objectivity by reducing research bias?"

Some questions you might ask are: Were 5 important groups (patients, care givers, collectors of outcome data, adjudicators of outcome, data analysis) aware of group allocations? Was randomization concealed?

4. APPLY: Use Evidence in Clinical Practice

research evidence is best evaluated using which type of process

Other good resources for both appraisal and applying evidence in clinical practice can be found on these two websites:

  • KT Clearinghouse/Centre for Evidence-Based Medicine, Toronto
  • Centre for Evidence Based Medicine, University of Oxford

5. EVALUATE: Look at Your Performance

Ask yourself:

  • Did you ask an answerable clinical question?
  • Did you find the best external evidence?
  • Did you critically appraise the evidence and evaluate it for its validity and potential usefulness?
  • Did you integrate critical appraisal of the best available external evidence from systematic research with individual clinical expertise in personal daily clinical practice?
  • What were the outcomes of your application of the best evidence for your patient(s)?
  • << Previous: Start Here
  • Next: Find: Databases for EBP >>
  • Last Updated: May 16, 2024 11:08 AM
  • URL:

UCI Libraries Mobile Site

  • Langson Library
  • Science Library
  • Grunigen Medical Library
  • Law Library
  • Connect From Off-Campus
  • Accessibility
  • Gateway Study Center

Libaries home page

Email this link

Systematic reviews & evidence synthesis methods.

  • Schedule a Consultation / Meet our Team
  • What is Evidence Synthesis?
  • Types of Evidence Synthesis
  • Evidence Synthesis Across Disciplines
  • Finding and Appraising Existing Systematic Reviews
  • 0. Preliminary Searching
  • 1. Develop a Protocol
  • 2. Draft your Research Question
  • 3. Select Databases
  • 4. Select Grey Literature Sources
  • 5. Write a Search Strategy
  • 6. Register a Protocol
  • 7. Translate Search Strategies
  • 8. Citation Management
  • 9. Article Screening
  • 10. Risk of Bias Assessment
  • 11. Data Extraction
  • 12. Synthesize, Map, or Describe the Results
  • Open Access Evidence Synthesis Resources

About This Guide

This research guide provides an overview of the evidence synthesis process, guidance documents for conducting evidence synthesis projects, and links to resources to help you conduct a comprehensive and systematic search of the scholarly literature. Navigate the guide using the tabs on the left.

"Evidence synthesis" refers to rigorous, well-documented methods of identifying, selecting, and combining results from multiple studies. These projects are conducted by teams and follow specific methodologies to minimize bias and maximize reproducibility. A systematic review is a type of evidence synthesis. We use the term evidence synthesis to better reflect the breadth of methodologies that we support, including systematic reviews, scoping reviews , evidence gap maps, umbrella reviews, meta-analyses and others.

Note: Librarians at UC Irvine Libraries have supported systematic reviews and related methodologies in STEM fields for several years. As our service has evolved, we have added capacity to support these reviews in the Social Sciences as well.

Systematic Review OR Literature Review Conducted Systematically?

There are many types of literature reviews. Before beginning a systematic review, consider whether it is the best type of review for your question, goals, and resources. The table below compares systematic reviews, scoping reviews, and systematized reviews (narrative literature reviews employing some, but not all elements of a systematic review) to help you decide which is best for you. See the Types of Evidence Synthesis page for a more in-depth overview at types of reviews.

  • Next: UCI Libraries Evidence Synthesis Service >>
  • Last Updated: May 25, 2024 10:49 AM
  • URL:

Off-campus? Please use the Software VPN and choose the group UCIFull to access licensed content. For more information, please Click here

Software VPN is not available for guests, so they may not have access to some content when connecting from off-campus.

Evidence-Based Research Series-Paper 1: What Evidence-Based Research is and why is it important?


  • 1 Johns Hopkins Evidence-based Practice Center, Division of General Internal Medicine, Department of Medicine, Johns Hopkins University, Baltimore, MD, USA.
  • 2 Digital Content Services, Operations, Elsevier Ltd., 125 London Wall, London, EC2Y 5AS, UK.
  • 3 School of Nursing, McMaster University, Health Sciences Centre, Room 2J20, 1280 Main Street West, Hamilton, Ontario, Canada, L8S 4K1; Section for Evidence-Based Practice, Western Norway University of Applied Sciences, Inndalsveien 28, Bergen, P.O.Box 7030 N-5020 Bergen, Norway.
  • 4 Department of Sport Science and Clinical Biomechanics, University of Southern Denmark, Campusvej 55, 5230, Odense M, Denmark; Department of Physiotherapy and Occupational Therapy, University Hospital of Copenhagen, Herlev & Gentofte, Kildegaardsvej 28, 2900, Hellerup, Denmark.
  • 5 Musculoskeletal Statistics Unit, the Parker Institute, Bispebjerg and Frederiksberg Hospital, Copenhagen, Nordre Fasanvej 57, 2000, Copenhagen F, Denmark; Department of Clinical Research, Research Unit of Rheumatology, University of Southern Denmark, Odense University Hospital, Denmark.
  • 6 Section for Evidence-Based Practice, Western Norway University of Applied Sciences, Inndalsveien 28, Bergen, P.O.Box 7030 N-5020 Bergen, Norway. Electronic address: [email protected].
  • PMID: 32979491
  • DOI: 10.1016/j.jclinepi.2020.07.020

Objectives: There is considerable actual and potential waste in research. Evidence-based research ensures worthwhile and valuable research. The aim of this series, which this article introduces, is to describe the evidence-based research approach.

Study design and setting: In this first article of a three-article series, we introduce the evidence-based research approach. Evidence-based research is the use of prior research in a systematic and transparent way to inform a new study so that it is answering questions that matter in a valid, efficient, and accessible manner.

Results: We describe evidence-based research and provide an overview of the approach of systematically and transparently using previous research before starting a new study to justify and design the new study (article #2 in series) and-on study completion-place its results in the context with what is already known (article #3 in series).

Conclusion: This series introduces evidence-based research as an approach to minimize unnecessary and irrelevant clinical health research that is unscientific, wasteful, and unethical.

Keywords: Clinical health research; Clinical trials; Evidence synthesis; Evidence-based research; Medical ethics; Research ethics; Systematic review.

Copyright © 2020 Elsevier Inc. All rights reserved.

Publication types

  • Research Support, Non-U.S. Gov't
  • Biomedical Research* / methods
  • Biomedical Research* / organization & administration
  • Clinical Trials as Topic / ethics
  • Clinical Trials as Topic / methods
  • Clinical Trials as Topic / organization & administration
  • Ethics, Research
  • Evidence-Based Medicine / methods*
  • Needs Assessment
  • Reproducibility of Results
  • Research Design* / standards
  • Research Design* / trends
  • Systematic Reviews as Topic
  • Treatment Outcome

Systematic Review and Evidence Synthesis


This guide is directly informed by and selectively reuses, with permission, content from: 

  • Systematic Reviews, Scoping Reviews, and other Knowledge Syntheses by Genevieve Gore and Jill Boruff, McGill University (CC-BY-NC-SA)
  • A Guide to Evidence Synthesis , Cornell University Library Evidence Synthesis Service

Primary University of Minnesota Libraries authors are: Meghan Lafferty, Scott Marsalis, & Erin Reardon

Last updated: September 2022

Types of evidence synthesis

There are many types of evidence synthesis, and it is important to choose the right type of synthesis for your research questions. 

Types of evidence synthesis include (but are not limited to):

Systematic Review

Addresses a specific, answerable question of medical, scientific, policy, or management importance.

May be limited to relevant study designs depending on the type of question (e.g., intervention, prognosis, diagnosis).

Compares, critically evaluates, and synthesizes evidence.

Follows an established protocol and methodology.

May or may not include a meta-analysis of findings.

The most commonly referred-to type of evidence synthesis.

Time-intensive; can take months or longer than a year to complete.

At the top of the evidence pyramid


Statistical technique for combining the findings from multiple quantitative studies.

Uses statistical methods to objectively evaluate, synthesize, and summarize results.

Scoping Review (or Evidence Map)

Addresses the scope of the existing literature on broad, complex, or exploratory research questions.

Many different study designs may be applicable.

Seeks to identify research gaps and opportunities for evidence synthesis.

May critically evaluate existing evidence, but does not attempt to synthesize results like a systematic review would.

Time-intensive, can take months or longer than a year to complete.

Rapid Review

Applies the methodology of a systematic review within a time-constrained setting.

Employs methodological “shortcuts” (e.g., limiting search terms) at the risk of introducing bias.

Useful for addressing issues that need a quick decision, such as developing policy recommendations or treatment recommendations for emergent conditions.

Umbrella Review

Reviews other systematic reviews on a topic.

Often attempts to answer a broader question than a systematic review typically would.

Useful when there are competing interventions to consider.

Literature (or “Narrative”) Review

A broad term that reviews with a wide scope and non-standardized methodology.

Search strategies, comprehensiveness, and time range covered may vary and do not follow an established protocol.

What Review Type?

Dr. Andrea Tricco, a leading evidence synthesis methodologist, and her team developed web-based tool to assist in selecting the right review type based on your answers to a brief list of questions. Although the tool assumes a health science topic, other disciplines may find it useful as well.

  • Right Review

Main review types characterized by methods

This table summarizes the main characteristics of the 14 main review types as laid out in the seminal article on the topic. Please note that methodologies may have evolved since this article was written, so it is recommended that you review the more specific information on the following pages. Librarians can also work with you to determine the best review type for your needs.

Reproduced from: Grant, M. J. and Booth, A. (2009), A typology of reviews: An analysis of 14 review types and associated methodologies. Health Information & Libraries Journal , 26: 91-108.  doi:10.1111/j.1471-1842.2009.00848.x  Table 1.

  • << Previous: What is Evidence Synthesis?
  • Next: Evidence Synthesis Resources Across Disciplines >>

National Academies Press: OpenBook

Taking Science to School: Learning and Teaching Science in Grades K-8 (2007)

Chapter: 5 generating and evaluating scientific evidence and explanations, 5 generating and evaluating scientific evidence and explanations.

Major Findings in the Chapter:

Children are far more competent in their scientific reasoning than first suspected and adults are less so. Furthermore, there is great variation in the sophistication of reasoning strategies across individuals of the same age.

In general, children are less sophisticated than adults in their scientific reasoning. However, experience plays a critical role in facilitating the development of many aspects of reasoning, often trumping age.

Scientific reasoning is intimately intertwined with conceptual knowledge of the natural phenomena under investigation. This conceptual knowledge sometimes acts as an obstacle to reasoning, but often facilitates it.

Many aspects of scientific reasoning require experience and instruction to develop. For example, distinguishing between theory and evidence and many aspects of modeling do not emerge without explicit instruction and opportunities for practice.

In this chapter, we discuss the various lines of research related to Strand 2—generate and evaluate evidence and explanations. 1 The ways in which

scientists generate and evaluate scientific evidence and explanations have long been the focus of study in philosophy, history, anthropology, and sociology. More recently, psychologists and learning scientists have begun to study the cognitive and social processes involved in building scientific knowledge. For our discussion, we draw primarily from the past 20 years of research in developmental and cognitive psychology that investigates how children’s scientific thinking develops across the K-8 years.

We begin by developing a broad sketch of how key aspects of scientific thinking develop across the K-8 years, contrasting children’s abilities with those of adults. This contrast allows us to illustrate both how children’s knowledge and skill can develop over time and situations in which adults’ and children’s scientific thinking are similar. Where age differences exist, we comment on what underlying mechanisms might be responsible for them. In this research literature, two broad themes emerge, which we take up in detail in subsequent sections of the chapter. The first is the role of prior knowledge in scientific thinking at all ages. The second is the importance of experience and instruction.

Scientific investigation, broadly defined, includes numerous procedural and conceptual activities, such as asking questions, hypothesizing, designing experiments, making predictions, using apparatus, observing, measuring, being concerned with accuracy, precision, and error, recording and interpreting data, consulting data records, evaluating evidence, verification, reacting to contradictions or anomalous data, presenting and assessing arguments, constructing explanations (to oneself and others), constructing various representations of the data (graphs, maps, three-dimensional models), coordinating theory and evidence, performing statistical calculations, making inferences, and formulating and revising theories or models (e.g., Carey et al., 1989; Chi et al., 1994; Chinn and Malhotra, 2001; Keys, 1994; McNay and Melville, 1993; Schauble et al., 1995; Slowiaczek et al., 1992; Zachos et al., 2000). As noted in Chapter 2 , over the past 20 to 30 years, the image of “doing science” emerging from across multiple lines of research has shifted from depictions of lone scientists conducting experiments in isolated laboratories to the image of science as both an individual and a deeply social enterprise that involves problem solving and the building and testing of models and theories.

Across this same period, the psychological study of science has evolved from a focus on scientific reasoning as a highly developed form of logical thinking that cuts across scientific domains to the study of scientific thinking as the interplay of general reasoning strategies, knowledge of the natural phenomena being studied, and a sense of how scientific evidence and explanations are generated. Much early research on scientific thinking and inquiry tended to focus primarily either on conceptual development or on the development of reasoning strategies and processes, often using very

simplified reasoning tasks. In contrast, many recent studies have attempted to describe a larger number of the complex processes that are deployed in the context of scientific inquiry and to describe their coordination. These studies often engage children in firsthand investigations in which they actively explore multivariable systems. In such tasks, participants initiate all phases of scientific discovery with varying amounts of guidance provided by the researcher. These studies have revealed that, in the context of inquiry, reasoning processes and conceptual knowledge are interdependent and in fact facilitate each other (Schauble, 1996; Lehrer et al. 2001).

It is important to note that, across the studies reviewed in this chapter, researchers have made different assumptions about what scientific reasoning entails and which aspects of scientific practice are most important to study. For example, some emphasize the design of well-controlled experiments, while others emphasize building and critiquing models of natural phenomena. In addition, some researchers study scientific reasoning in stripped down, laboratory-based tasks, while others examine how children approach complex inquiry tasks in the context of the classroom. As a result, the research base is difficult to integrate and does not offer a complete picture of students’ skills and knowledge related to generating and evaluating evidence and explanations. Nor does the underlying view of scientific practice guiding much of the research fully reflect the image of science and scientific understanding we developed in Chapter 2 .


Generating evidence.

The evidence-gathering phase of inquiry includes designing the investigation as well as carrying out the steps required to collect the data. Generating evidence entails asking questions, deciding what to measure, developing measures, collecting data from the measures, structuring the data, systematically documenting outcomes of the investigations, interpreting and evaluating the data, and using the empirical results to develop and refine arguments, models, and theories.

Asking Questions and Formulating Hypotheses

Asking questions and formulating hypotheses is often seen as the first step in the scientific method; however, it can better be viewed as one of several phases in an iterative cycle of investigation. In an exploratory study, for example, work might start with structured observation of the natural world, which would lead to formulation of specific questions and hypotheses. Further data might then be collected, which lead to new questions,

revised hypotheses, and yet another round of data collection. The phase of asking questions also includes formulating the goals of the activity and generating hypotheses and predictions (Kuhn, 2002).

Children differ from adults in their strategies for formulating hypotheses and in the appropriateness of the hypotheses they generate. Children often propose different hypotheses from adults (Klahr, 2000), and younger children (age 10) often conduct experiments without explicit hypotheses, unlike 12- to 14-year-olds (Penner and Klahr, 1996a). In self-directed experimental tasks, children tend to focus on plausible hypotheses and often get stuck focusing on a single hypothesis (e.g., Klahr, Fay, and Dunbar, 1993). Adults are more likely to consider multiple hypotheses (e.g., Dunbar and Klahr, 1989; Klahr, Fay, and Dunbar, 1993). For both children and adults, the ability to consider many alternative hypotheses is a factor contributing to success.

At all ages, prior knowledge of the domain under investigation plays an important role in the formulation of questions and hypotheses (Echevarria, 2003; Klahr, Fay, and Dunbar, 1993; Penner and Klahr, 1996b; Schauble, 1990, 1996; Zimmerman, Raghavan, and Sartoris, 2003). For example, both children and adults are more likely to focus initially on variables they believe to be causal (Kanari and Millar, 2004; Schauble, 1990, 1996). Hypotheses that predict expected results are proposed more frequently than hypotheses that predict unexpected results (Echevarria, 2003). The role of prior knowledge in hypothesis formulation is discussed in greater detail later in the chapter.

Designing Experiments

The design of experiments has received extensive attention in the research literature, with an emphasis on developmental changes in children’s ability to build experiments that allow them to identify causal variables. Experimentation can serve to generate observations in order to induce a hypothesis to account for the pattern of data produced (discovery context) or to test the tenability of an existing hypothesis under consideration (confirmation/ verification context) (Klahr and Dunbar, 1988). At a minimum, one must recognize that the process of experimentation involves generating observations that will serve as evidence that will be related to hypotheses.

Ideally, experimentation should produce evidence or observations that are interpretable in order to make the process of evidence evaluation uncomplicated. One aspect of experimentation skill is to isolate variables in such a way as to rule out competing hypotheses. The control of variables is a basic strategy that allows valid inferences and narrows the number of possible experiments to consider (Klahr, 2000). Confounded experiments, those in which variables have not been isolated correctly, yield indetermi-

nate evidence, thereby making valid inferences and subsequent knowledge gain difficult, if not impossible.

Early approaches to examining experimentation skills involved minimizing the role of prior knowledge in order to focus on the strategies that participants used. That is, the goal was to examine the domain-general strategies that apply regardless of the content to which they are applied. For example, building on the research tradition of Piaget (e.g., Inhelder and Piaget, 1958), Siegler and Liebert (1975) examined the acquisition of experimental design skills by fifth and eighth graders. The problem involved determining how to make an electric train run. The train was connected to a set of four switches, and the children needed to determine the particular on/off configuration required. The train was in reality controlled by a secret switch, so that the discovery of the correct solution was postponed until all 16 combinations were generated. In this task, there was no principled reason why any one of the combinations would be more or less likely, and success was achieved by systematically testing all combinations of a set of four switches. Thus the task involved no domain-specific knowledge that would constrain the hypotheses about which configuration was most likely. A similarly knowledge-lean task was used by Kuhn and Phelps (1982), similar to a task originally used by Inhelder and Piaget (1958), involving identifying reaction properties of a set of colorless fluids. Success on the task was dependent on the ability to isolate and control variables in the set of all possible fluid combinations in order to determine which was causally related to the outcome. The study extended over several weeks with variations in the fluids used and the difficulty of the problem.

In both studies, the importance of practice and instructional support was apparent. Siegler and Liebert’s study included two experimental groups of children who received different kinds of instructional support. Both groups were taught about factors, levels, and tree diagrams. One group received additional, more elaborate support that included practice and help representing all possible solutions with a tree diagram. For fifth graders, the more elaborate instructional support improved their performance compared with a control group that did not receive any support. For eighth graders, both kinds of instructional support led to improved performance. In the Kuhn and Phelps task, some students improved over the course of the study, although an abrupt change from invalid to valid strategies was not common. Instead, the more typical pattern was one in which valid and invalid strategies coexisted both within and across sessions, with a pattern of gradual attainment of stable valid strategies by some students (the stabilization point varied but was typically around weeks 5-7).

Since this early work, researchers have tended to investigate children’s and adults’ performance on experimental design tasks that are more knowledge rich and less constrained. Results from these studies indicate that, in

general, adults are more proficient than children at designing informative experiments. In a study comparing adults with third and sixth graders, adults were more likely to focus on experiments that would be informative (Klahr, Fay, and Dunbar, 1993). Similarly, Schauble (1996) found that during the initial 3 weeks of exploring a domain, children and adults considered about the same number of possible experiments. However, when they began experimentation of another domain in the second 3 weeks of the study, adults considered a greater range of possible experiments. Over the full 6 weeks, children and adults conducted approximately the same number of experiments. Thus, children were more likely to conduct unintended duplicate or triplicate experiments, making their experimentation efforts less informative relative to the adults, who were selecting a broader range of experiments. Similarly, children are more likely to devote multiple experimental trials to variables that were already well understood, whereas adults move on to exploring variables they did not understand as well (Klahr, Fay, and Dunbar, 1993; Schauble, 1996). Evidence also indicates, however, that dimensions of the task often have a greater influence on performance than age (Linn, 1978, 1980; Linn, Chen, and Their, 1977; Linn and Levine, 1978).

With respect to attending to one feature at a time, children are less likely to control one variable at a time than adults. For example, Schauble (1996) found that across two task domains, children used controlled comparisons about a third of the time. In contrast, adults improved from 50 percent usage on the first task to 63 percent on the second task. Children usually begin by designing confounded experiments (often as a means to produce a desired outcome), but with repeated practice begin to use a strategy of changing one variable at time (e.g., Kuhn, Schauble, and Garcia-Mila, 1992; Kuhn et al. 1995; Schauble, 1990).

Reminiscent of the results of the earlier study by Kuhn and Phelps, both children and adults display intraindividual variability in strategy usage. That is, multiple strategy usage is not unique to childhood or periods of developmental transition (Kuhn et al., 1995). A robust finding is the coexistence of valid and invalid strategies (e.g., Kuhn, Schuable, and Garcia-Mila, 1992; Garcia-Mila and Andersen, 2005; Gleason and Schauble, 2000; Schauble, 1990; Siegler and Crowley, 1991; Siegler and Shipley, 1995). That is, participants may progress to the use of a valid strategy, but then return to an inefficient or invalid strategy. Similar use of multiple strategies has been found in research on the development of other academic skills, such as mathematics (e.g., Bisanz and LeFevre, 1990; Siegler and Crowley, 1991), reading (e.g., Perfetti, 1992), and spelling (e.g., Varnhagen, 1995). With respect to experimentation strategies, an individual may begin with an invalid strategy, but once the usefulness of changing one variable at a time is discovered, it is not immediately used exclusively. The newly discovered, effective strategy is only slowly incorporated into an individual’s set of strategies.

An individual’s perception of the goals of an investigation also has an important effect on the hypotheses they generate and their approach to experimentation. Individuals tend to differ in whether they see the overarching goal of an inquiry task as seeking to identify which factors make a difference (scientific) or seeking to produce a desired effect (engineering). It is a question for further research if these different approaches characterize an individual, or if they are invoked by task demand or implicit assumptions.

In a direct exploration of the effect of adopting scientific versus engineering goals, Schauble, Klopfer, and Raghavan (1991) provided fifth and sixth graders with an “engineering context” and a “science context.” When the children were working as scientists, their goal was to determine which factors made a difference and which ones did not. When the children were working as engineers, their goal was optimization, that is, to produce a desired effect (i.e., the fastest boat in the canal task). When working in the science context, the children worked more systematically, by establishing the effect of each variable, alone and in combination. There was an effort to make inclusion inferences (i.e., an inference that a factor is causal) and exclusion inferences (i.e., an inference that a factor is not causal). In the engineering context, children selected highly contrastive combinations and focused on factors believed to be causal while overlooking factors believed or demonstrated to be noncausal. Typically, children took a “try-and-see” approach to experimentation while acting as engineers, but they took a theory-driven approach to experimentation when acting as scientists. Schauble et al. (1991) found that children who received the engineering instructions first, followed by the scientist instructions, made the greatest improvements. Similarly, Sneider et al. (1984) found that students’ ability to plan and critique experiments improved when they first engaged in an engineering task of designing rockets.

Another pair of contrasting approaches to scientific investigation is the theorist versus the experimentalist (Klahr and Dunbar, 1998; Schauble, 1990). Similar variation in strategies for problem solving have been observed for chess, puzzles, physics problems, science reasoning, and even elementary arithmetic (Chase and Simon, 1973; Klahr and Robinson, 1981; Klayman and Ha, 1989; Kuhn et al., 1995; Larkin et al., 1980; Lovett and Anderson, 1995, 1996; Simon, 1975; Siegler, 1987; Siegler and Jenkins, 1989). Individuals who take a theory-driven approach tend to generate hypotheses and then test the predictions of the hypotheses. Experimenters tend to make data-driven discoveries, by generating data and finding the hypothesis that best summarizes or explains that data. For example, Penner and Klahr (1996a) asked 10-to 14-year-olds to conduct experiments to determine how the shape, size, material, and weight of an object influence sinking times. Students’ approaches to the task could be classified as either “prediction oriented” (i.e., a theorist: “I believe that weight makes a difference) or “hypothesis oriented” (i.e., an

experimenter: “I wonder if …”). The 10-year-olds were more likely to take a prediction (or demonstration) approach, whereas the 14-year-olds were more likely to explicitly test a hypothesis about an attribute without a strong belief or need to demonstrate that belief. Although these patterns may characterize approaches to any given task, it has yet to be determined if such styles are idiosyncratic to the individual and likely to remain stable across varying tasks, or if different styles might emerge for the same person depending on task demands or the domain under investigation.

Observing and Recording

Record keeping is an important component of scientific investigation in general, and of self-directed experimental tasks especially, because access to and consulting of cumulative records are often important in interpreting evidence. Early studies of experimentation demonstrated that children are often not aware of their own memory limitations, and this plays a role in whether they document their work during an investigation (e.g., Siegler and Liebert, 1975). Recent studies corroborate the importance of an awareness of one’s own memory limitations while engaged in scientific inquiry tasks, regardless of age. Spontaneous note-taking or other documentation of experimental designs and results may be a factor contributing to the observed developmental differences in performance on both experimental design tasks and in evaluation of evidence. Carey et al. (1989) reported that, prior to instruction, seventh graders did not spontaneously keep records when trying to determine and keep track of which substance was responsible for producing a bubbling reaction in a mixture of yeast, flour, sugar, salt, and warm water. Nevertheless, even though preschoolers are likely to produce inadequate and uninformative notations, they can distinguish between the two when asked to choose between them (Triona and Klahr, in press). Dunbar and Klahr (1988) also noted that children (grades 3-6) were unlikely to check if a current hypothesis was or was not consistent with previous experimental results. In a study by Trafton and Trickett (2001), undergraduates solving scientific reasoning problems in a computer environment were more likely to achieve correct performance when using the notebook function (78 percent) than were nonusers (49 percent), showing that this issue is not unique to childhood.

In a study of fourth graders’ and adults’ spontaneous use of notebooks during a 10-week investigation of multivariable systems, all but one of the adults took notes, whereas only half of the children took notes. Moreover, despite variability in the amount of notebook usage in both groups, on average adults made three times more notebook entries than children did. Adults’ note-taking remained stable across the 10 weeks, but children’s frequency of use decreased over time, dropping to about half of their initial

usage. Children rarely reviewed their notes, which typically consisted of conclusions, but not the variables used or the outcomes of the experimental tests (i.e., the evidence for the conclusion was not recorded) (Garcia-Mila and Andersen, 2005).

Children may differentially record the results of experiments, depending on familiarity or strength of prior theories. For example, 10- to 14-year-olds recorded more data points when experimenting with factors affecting force produced by the weight and surface area of boxes than when they were experimenting with pendulums (Kanari and Millar, 2004). Overall, it is a fairly robust finding that children are less likely than adults to record experimental designs and outcomes or to review what notes they do keep, despite task demands that clearly necessitate a reliance on external memory aids.

Given the increasing attention to the importance of metacognition for proficient performance on such tasks (e.g., Kuhn and Pearsall, 1998, 2000), it is important to determine at what point children and early adolescents recognize their own memory limitations as they navigate through a complex task. Some studies show that children’s understanding of how their own memories work continues to develop across the elementary and middle school grades (Siegler and Alibali, 2005). The implication is that there is no particular age or grade level when memory and limited understanding of one’s own memory are no longer a consideration. As such, knowledge of how one’s own memory works may represent an important moderating variable in understanding the development of scientific reasoning (Kuhn, 2001). For example, if a student is aware that it will be difficult for her to remember the results of multiple trials, she may be more likely to carefully record each outcome. However, it may also be the case that children, like adult scientists, need to be inducted into the practice of record keeping and the use of records. They are likely to need support to understand the important role of records in generating scientific evidence and supporting scientific arguments.

Evaluating Evidence

The important role of evidence evaluation in the process of scientific activity has long been recognized. Kuhn (1989), for example, has argued that the defining feature of scientific thinking is the set of skills involved in differentiating and coordinating theory and evidence. Various strands of research provide insight on how children learn to engage in this phase of scientific inquiry. There is an extensive literature on the evaluation of evidence, beginning with early research on identifying patterns of covariation and cause that used highly structured experimental tasks. More recently researchers have studied how children evaluate evidence in the context of self-directed experimental tasks. In real-world contexts (in contrast to highly controlled laboratory tasks) the process of evidence evaluation is very messy

and requires an understanding of error and variation. As was the case for hypothesis generation and the design of experiments, the role of prior knowledge and beliefs has emerged as an important influence on how individuals evaluate evidence.

Covariation Evidence

A number of early studies on the development of evidence evaluation skills used knowledge-lean tasks that asked participants to evaluate existing data. These data were typically in the form of covariation evidence—that is, the frequency with which two events do or do not occur together. Evaluation of covariation evidence is potentially important in regard to scientific thinking because covariation is one potential cue that two events are causally related. Deanna Kuhn and her colleagues carried out pioneering work on children’s and adults’ evaluation of covariation evidence, with a focus on how participants coordinate their prior beliefs about the phenomenon with the data presented to them (see Box 5-1 ).

Results across a series of studies revealed continuous improvement of the skills involved in differentiating and coordinating theory and evidence, as well as bracketing prior belief while evaluating evidence, from middle childhood (grades 3 and 6) to adolescence (grade 9) to adulthood (Kuhn, Amsel, and O’Loughlin, 1988). These skills, however, did not appear to develop to an optimal level even among adults. Even adults had a tendency to meld theory and evidence into a single mental representation of “the way things are.”

Participants had a variety of strategies for keeping theory and evidence in alignment with one another when they were in fact discrepant. One tendency was to ignore, distort, or selectively attend to evidence that was inconsistent with a favored theory. For example, the protocol from one ninth grader demonstrated that upon repeated instances of covariation between type of breakfast roll and catching colds, he would not acknowledge this relationship: “They just taste different … the breakfast roll to me don’t cause so much colds because they have pretty much the same thing inside” (Kuhn, Amsel, and O’Loughlin, 1998, p. 73).

Another tendency was to adjust a theory to fit the evidence, a process that was most often outside an individual’s conscious awareness and control. For example, when asked to recall their original beliefs, participants would often report a theory consistent with the evidence that was presented, and not the theory as originally stated. Take the case of one ninth grader who did not believe that type of condiment (mustard versus ketchup) was causally related to catching colds. With each presentation of an instance of covariation evidence, he acknowledged the evidence and elaborated a theory based on the amount of ingredients or vitamins and the temperature of the

food the condiment was served with to make sense of the data (Kuhn, Amsel, and O’Loughlin, 1988, p. 83). Kuhn argued that this tendency suggests that the student’s theory does not exist as an object of cognition. That is, a theory and the evidence for that theory are undifferentiated—they do not exist as separate cognitive entities. If they do not exist as separate entities, it is not possible to flexibly and consciously reflect on the relation of one to the other.

A number of researchers have criticized Kuhn’s findings on both methodological and theoretical grounds. Sodian, Zaitchik, and Carey (1991), for example, questioned the finding that third and sixth grade children cannot distinguish between their beliefs and the evidence, pointing to the complex-

ity of the tasks Kuhn used as problematic. They chose to employ simpler tasks that involved story problems about phenomena for which children did not hold strong beliefs. Children’s performance on these tasks demonstrated that even first and second graders could differentiate a hypothesis from the evidence. Likewise, Ruffman et al. (1993) used a simplified task and showed that 6-year-olds were able to form a causal hypothesis based on a pattern of covariation evidence. A study of children and adults (Amsel and Brock, 1996) indicated an important role of prior beliefs, especially for children. When presented with evidence that disconfirmed prior beliefs, children from both grade levels tended to make causal judgments consistent with their prior beliefs. When confronted with confirming evidence, however, both groups of children and adults made similar judgments. Looking across these studies provides insight into the conditions under which children are more or less proficient at coordinating theory and evidence. In some situations, children are better at distinguishing prior beliefs from evidence than the results of Kuhn et al. suggest.

Koslowksi (1996) criticized Kuhn et al.’s work on more theoretical grounds. She argued that reliance on knowledge-lean tasks in which participants are asked to suppress their prior knowledge may lead to an incomplete or distorted picture of the reasoning abilities of children and adults. Instead, Koslowski suggested that using prior knowledge when gathering and evaluating evidence is a valid strategy. She developed a series of experiments to support her thesis and to explore the ways in which prior knowledge might play a role in evaluating evidence. The results of these investigations are described in detail in the later section of this chapter on the role of prior knowledge.

Evidence in the Context of Investigations

Researchers have also looked at reasoning about cause in the context of full investigations of causal systems. Two main types of multivariable systems are used in these studies. In the first type of system, participants are involved in a hands-on manipulation of a physical system, such as a ramp (e.g., Chen and Klahr, 1999; Masnick and Klahr, 2003) or a canal (e.g., Gleason and Schauble, 2000; Kuhn, Schauble, and Garcia-Mila, 1992). The second type of system is a computer simulation, such as the Daytona microworld in which participants discover the factors affecting the speed of race cars (Schauble, 1990). A variety of virtual environments have been created in domains such as electric circuits (Schauble et al., 1992), genetics (Echevarria, 2003), earthquake risk, and flooding risk (e.g., Keselman, 2003).

The inferences that are made based on self-generated experimental evidence are typically classified as either causal (or inclusion), noncausal (or exclusion), indeterminate, or false inclusion. All inference types can be fur-

ther classified as valid or invalid. Invalid inclusion, by definition, is of particular interest because in self-directed experimental contexts, both children and adults often infer based on prior beliefs that a variable is causal, when in reality it is not.

Children tend to focus on making causal inferences during their initial explorations of a causal system. In a study in which children worked to discover the causal structure of a computerized microworld, fifth and sixth graders began by producing confounded experiments and relied on prior knowledge or expectations (Schauble, 1990). As a result, in their early explorations of the causal system, they were more likely to make incorrect causal inferences. In a direct comparison of adults and children (Schauble, 1996), adults also focused on making causal inferences, but they made more valid inferences because their experimentation was more often done using a control-of-variables strategy. Overall, children’s inferences were valid 44 percent of the time, compared with 72 percent for adults. The fifth and sixth graders improved over the course of six sessions, starting at 25 percent but improving to almost 60 percent valid inferences (Schauble, 1996). Adults were more likely than children to make inferences about which variables were noncausal or inferences of indeterminacy (80 and 30 percent, respectively) (Schauble, 1996).

Children’s difficulty with inferences of noncausality also emerged in a study of 10- to 14-year-olds who explored factors influencing the swing of a pendulum or the force needed to pull a box along a level surface (Kanari and Millar, 2004). Only half of the students were able draw correct conclusions about factors that did not covary with outcome. Students were likely to either selectively record data, selectively attend to data, distort or reinterpret the data, or state that noncovariation experimental trials were “inconclusive.” Such tendencies are reminiscent of other findings that some individuals selectively attend to or distort data in order to preserve a prior theory or belief (Kuhn, Amsel, and O’Loughlin, 1988; Zimmerman, Raghavan, and Sartoris, 2003).

Some researchers suggest children’s difficulty with noncausal or indeterminate inferences may be due both to experience and to the inherent complexity of the problem. In terms of experience, in the science classroom it is typical to focus on variables that “make a difference,” and therefore students struggle when testing variables that do not covary with the outcome (e.g., the weight of a pendulum does not affect the time of swing or the vertical height of a weight does not affect balance) (Kanari and Millar, 2004). Also, valid exclusion and indeterminacy inferences may be conceptually more complex, because they require one to consider a pattern of evidence produced from several experimental trials (Kuhn et al., 1995; Schauble, 1996). Looking across several trials may require one to review cumulative records of previous outcomes. As has been suggested previously, children do not

often have the memory skills to either record information, record sufficient information, or consult such information when it has been recorded.

The importance of experience is highlighted by the results of studies conducted over several weeks with fifth and sixth graders. After several weeks with a task, children started making more exclusion inferences (that factors are not causal) and indeterminacy inferences (that one cannot make a conclusive judgment about a confounded comparison) and did not focus solely on causal inferences (e.g., Keselman, 2003; Schauble, 1996). They also began to distinguish between an informative and an uninformative experiment by attending to or controlling other factors leading to an improved ability to make valid inferences. Through repeated exposure, invalid inferences, such as invalid inclusions, dropped in frequency. The tendency to begin to make inferences of indeterminacy suggests that students developed more awareness of the adequacy or inadequacy of their experimentation strategies for generating sufficient and interpretable evidence.

Children and adults also differ in generating sufficient evidence to support inferences. In contexts in which it is possible, children often terminate their search early, believing that they have determined a solution to the problem (e.g., Dunbar and Klahr, 1989). In studies over several weeks in which children must continue their investigation (e.g., Schauble et al., 1991), this is less likely because of the task requirements. Children are also more likely to refer to the most recently generated evidence. They may jump to a conclusion after a single experiment, whereas adults typically need to see the results of several experiments (e.g., Gleason and Schauble, 2000).

As was found with experimentation, children and adults display intraindividual variability in strategy usage with respect to inference types. Likewise, the existence of multiple inference strategies is not unique to childhood (Kuhn et al., 1995). In general, early in an investigation, individuals focus primarily on identifying factors that are causal and are less likely to consider definitely ruling out factors that are not causal. However, a mix of valid and invalid inference strategies co-occur during the course of exploring a causal system. As with experimentation, the addition of a valid inference strategy to an individual’s repertoire does not mean that they immediately give up the others. Early in investigations, there is a focus on causal hypotheses and inferences, whether they are warranted or not. Only with additional exposure do children start to make inferences of noncausality and indeterminacy. Knowledge change and experience—gaining a better understanding of the causal system via experimentation—was associated with the use of valid experimentation and inference strategies.


In the previous section we reviewed evidence on developmental differences in using scientific strategies. Across multiple studies, prior knowledge

emerged as an important influence on several parts of the process of generating and evaluating evidence. In this section we look more closely at the specific ways that prior knowledge may shape part of the process. Prior knowledge includes conceptual knowledge, that is, knowledge of the natural world and specifically of the domain under investigation, as well as prior knowledge and beliefs about the purpose of an investigation and the goals of science more generally. This latter kind of prior knowledge is touched on here and discussed in greater detail in the next chapter.

Beliefs About Causal Mechanism and Plausibility

In response to research on evaluation of covariation evidence that used knowledge-lean tasks or even required participants to suppress prior knowledge, Koslowski (1996) argued that it is legitimate and even helpful to consider prior knowledge when gathering and evaluating evidence. The world is full of correlations, and consideration of plausibility, causal mechanism, and alternative causes can help to determine which correlations between events should be taken seriously and which should be viewed as spurious. For example, the identification of the E. coli bacterium allows a causal relationship between hamburger consumption and certain types of illness or mortality. Because of the absence of a causal mechanism, one does not consider seriously the correlation between ice cream consumption and violent crime rate as causal, but one looks for other covarying quantities (such as high temperatures) that may be causal for both behaviors and thus explain the correlation.

Koslowski (1996) presented a series of experiments that demonstrate the interdependence of theory and evidence in legitimate scientific reasoning (see Box 5-2 for an example). In most of these studies, all participants (sixth graders, ninth graders, and adults) did take mechanism into consideration when evaluating evidence in relation to a hypothesis about a causal relationship. Even sixth graders considered more than patterns of covariation when making causal judgments (Koslowksi and Okagaki, 1986; Koslowski et al., 1989). In fact, as discussed in the previous chapter, results of studies by Koslowski (1996) and others (Ahn et al., 1995) indicate that children and adults have naïve theories about the world that incorporate information about both covariation and causal mechanism.

The plausibility of a mechanism also plays a role in reasoning about cause. In some situations, scientific progress occurs by taking seemingly implausible correlations seriously (Wolpert, 1993). Similarly, Koslowski argued that if people rely on covariation and mechanism information in an interdependent and judicious manner, then they should pay attention to implausible correlations (i.e., those with no apparent mechanism) when the implausible correlation occurs repeatedly. For example, discovering the cause of Kawasaki’s syndrome depended on taking seriously the implausible cor-

relation between the illness and having recently cleaned carpets. Similarly, Thagard (1998a, 1998b) describes the case of researchers Warren and Marshall, who proposed that peptic ulcers could be caused by a bacterium, and their efforts to have their theory accepted by the medical community. The bacterial theory of ulcers was initially rejected as implausible, given the assumption that the stomach is too acidic to allow bacteria to survive.

Studies with both children and adults reveal links between reasoning about mechanism and the plausibility of that mechanism (Koslowski, 1996). When presented with an implausible covariation (e.g., improved gas mileage and color of car), participants rated the causal status of the implausible cause (color) before and after learning about a possible way that the cause could bring about the effect (improved gas mileage). In this example, par-

ticipants learned that the color of the car affects the driver’s alertness (which affects driving quality, which in turn affects gas mileage). At all ages, participants increased their causal ratings after learning about a possible mediating mechanism. The presence of a possible mechanism in addition to a large number of covariations (four or more) was taken to indicate the possibility of a causal relationship for both plausible and implausible covariations. When either generating or assessing mechanisms for plausible covariations, all age groups (sixth and ninth graders and adults) were comparable. When the covariation was implausible, sixth graders were more likely to generate dubious mechanisms to account for the correlation.

The role of prior knowledge, especially beliefs about causal mechanism and plausibility, is also evident in hypothesis formation and the design of investigations. Individuals’ prior beliefs influence the choice of which hypotheses to test, including which hypotheses are tested first, repeatedly, or receive the most time and attention (e.g., Echevarria, 2003; Klahr, Fay, and Dunbar, 1993; Penner and Klahr, 1996b; Schauble, 1990, 1996; Zimmerman, Raghavan, and Sartoris, 2003). For example, children’s favored theories sometimes result in the selection of invalid experimentation and evidence evaluation heuristics (e.g., Dunbar and Klahr, 1989; Schauble, 1990). Plausibility of a hypothesis may serve as a guide for which experiments to pursue. Klahr, Fay, and Dunbar (1993) provided third and sixth grade children and adults with hypotheses to test that were incorrect but either plausible or implausible. For plausible hypotheses, children and adults tended to go about demonstrating the correctness of the hypothesis rather than setting up experiments to decide between rival hypotheses. For implausible hypotheses, adults and some sixth graders proposed a plausible rival hypothesis and set up an experiment that would discriminate between the two. Third graders tended to propose a plausible hypothesis but then ignore or forget the initial implausible hypothesis, getting sidetracked in an attempt to demonstrate that the plausible hypothesis was correct.

Recognizing the interdependence of theory and data in the evaluation of evidence and explanations, Chinn and Brewer (2001) proposed that people evaluate evidence by building a mental model of the interrelationships between theories and data. These models integrate patterns of data, procedural details, and the theoretical explanation of the observed findings (which may include unobservable mechanisms, such as molecules, electrons, enzymes, or intentions and desires). The information and events can be linked by different kinds of connections, including causal, contrastive, analogical, and inductive links. The mental model may then be evaluated by considering the plausibility of these links. In addition to considering the links between, for example, data and theory, the model might also be evaluated by appealing to alternate causal mechanisms or alternate explanations. Essentially, an individual seeks to “undermine one or more of the links in the

model” (p. 337). If no reasons to be critical can be identified, the individual may accept the new evidence or theoretical interpretation.

Some studies suggest that the strength of prior beliefs, as well as the personal relevance of those beliefs, may influence the evaluation of the mental model (Chinn and Malhotra, 2002; Klaczynski, 2000; Klaczynski and Narasimham, 1998). For example, when individuals have reason to disbelieve evidence (e.g., because it is inconsistent with prior belief), they will search harder for flaws in the data (Kunda, 1990). As a result, individuals may not find the evidence compelling enough to reassess their cognitive model. In contrast, beliefs about simple empirical regularities may not be held with such conviction (e.g., the falling speed of heavy versus light objects), making it easier to change a belief in response to evidence.

Evaluating Evidence That Contradicts Prior Beliefs

Anomalous data or evidence refers to results that do not fit with one’s current beliefs. Anomalous data are considered very important by scientists because of their role in theory change, and they have been used by science educators to promote conceptual change. The idea that anomalous evidence promotes conceptual change (in the scientist or the student) rests on a number of assumptions, including that individuals have beliefs or theories about natural or social phenomena, that they are capable of noticing that some evidence is inconsistent with those theories, that such evidence calls into question those theories, and, in some cases, that a belief or theory will be altered or changed in response to the new (anomalous) evidence (Chinn and Brewer, 1998). Chinn and Brewer propose that there are eight possible responses to anomalous data. Individuals can (1) ignore the data; (2) reject the data (e.g., because of methodological error, measurement error, bias); (3) acknowledge uncertainty about the validity of the data; (4) exclude the data as being irrelevant to the current theory; (5) hold the data in abeyance (i.e., withhold a judgment about the relation of the data to the initial theory); (6) reinterpret the data as consistent with the initial theory; (7) accept the data and make peripheral change or minor modification to the theory; or (8) accept the data and change the theory. Examples of all of these responses were found in undergraduates’ responses to data that contradicted theories to explain the mass extinction of dinosaurs and theories about whether dinosaurs were warm-blooded or cold-blooded.

In a series of studies, Chinn and Malhotra (2002) examined how fourth, fifth, and sixth graders responded to experimental data that were inconsistent with their existing beliefs. Experiments from physical science domains were selected in which the outcomes produced either ambiguous or unambiguous data, and for which the findings were counterintuitive for most children. For example, most children assume that a heavy object falls faster

than a light object. When the two objects are dropped simultaneously, there is some ambiguity because it is difficult to observe both objects. An example of a topic that is counterintuitive but results in unambiguous evidence is the reaction temperature of baking soda added to vinegar. Children believe that either no change in temperature will occur, or that the fizzing causes an increase in temperature. Thermometers unambiguously show a temperature drop of about 4 degrees centigrade.

When examining the anomalous evidence produced by these experiments, children’s difficulties seemed to occur in one of four cognitive processes: observation, interpretation, generalization, or retention (Chinn and Malhotra, 2002). For example, prior belief may influence what is “observed,” especially in the case of data that are ambiguous, and children may not perceive the two objects as landing simultaneously. Inferences based on this faulty observation will then be incorrect. At the level of interpretation, even if individuals accurately observed the outcome, they might not shift their theory to align with the evidence. They can fail to do so in many ways, such as ignoring or distorting the data or discounting the data because they are considered flawed. At the level of generalization, an individual may accept, for example, that these particular heavy and light objects fell at the same rate but insist that the same rule may not hold for other situations or objects. Finally, even when children appeared to change their beliefs about an observed phenomenon in the immediate context of the experiment, their prior beliefs reemerged later, indicating a lack of long-term retention of the change.

Penner and Klahr (1996a) investigated the extent to which children’s prior beliefs affect their ability to design and interpret experiments. They used a domain in which most children hold a strong belief that heavier objects sink in fluid faster than light objects, and they examined children’s ability to design unconfounded experiments to test that belief. In this study, for objects of a given composition and shape, sink times for heavy and light objects are nearly indistinguishable to an observer. For example, the sink times for the stainless steel spheres weighing 65 gm and 19 gm were .58 sec and .62 sec, respectively. Only one of the eight children (out of 30) who chose to directly contrast these two objects continued to explore the reason for the unexpected finding that the large and small spheres had equivalent sink times. The process of knowledge change was not straightforward. For example, some children suggested that the size of the smaller steel ball offset the fact that it weighed less because it was able to move through the water as fast as the larger, heavier steel ball. Others concluded that both weight and shape make a difference. That is, there was an attempt to reconcile the evidence with prior knowledge and expectations by appealing to causal mechanisms, alternate causes, or enabling conditions.

What is also important to note about the children in the Penner and Klahr study is that they did in fact notice the surprising finding, rather than

ignore or misrepresent the data. They tried to make sense of the outcome by acting as a theorist who conjectures about the causal mechanisms, boundary conditions, or other ad hoc explanations (e.g., shape) to account for the results of an experiment. In Chinn and Malhotra’s (2002) study of students’ evaluation of observed evidence (e.g., watching two objects fall simultaneously), the process of noticing was found to be an important mediator of conceptual change.

Echevarria (2003) examined seventh graders’ reactions to anomalous data in the domain of genetics and whether they served as a catalyst for knowledge construction during the course of self-directed experimentation. Students in the study completed a 3-week unit on genetics that involved genetics simulation software and observing plant growth. In both the software and the plants, students investigated or observed the transmission of one trait. Anomalies in the data were defined as outcomes that were not readily explainable on the basis of the appearance of the parents.

In general, the number of hypotheses generated, the number of tests conducted, and the number of explanations generated were a function of students’ ability to encounter, notice, and take seriously an anomalous finding. The majority of students (80 percent) developed some explanation for the pattern of anomalous data. For those who were unable to generate an explanation, it was suggested that the initial knowledge was insufficient and therefore could not undergo change as a result of the encounter with “anomalous” evidence. Analogous to case studies in the history of science (e.g., Simon, 2001), these students’ ability to notice and explore anomalies was related to their level of domain-specific knowledge (as suggested by Pasteur’s oft quoted maxim “serendipity favors the prepared mind”). Surprising findings were associated with an increase in hypotheses and experiments to test these potential explanations, but without the domain knowledge to “notice,” anomalies could not be exploited.

There is some evidence that, with instruction, students’ ability to evaluate anomalous data improves (Chinn and Malhotra, 2002). In a study of fourth, fifth, and sixth graders, one group of students was instructed to predict the outcomes of three experiments that produce counterintuitive but unambiguous data (e.g., reaction temperature). A second group answered questions that were designed to promote unbiased observations and interpretations by reflecting on the data. A third group was provided with an explanation of what scientists expected to find and why. All students reported their prediction of the outcome, what they observed, and their interpretation of the experiment. They were then tested for generalizations, and a retention test followed 9-10 days later. Fifth and sixth graders performed better than did fourth graders. Students who heard an explanation of what scientists expected to find and why did best. Further analyses suggest that the explanation-based intervention worked by influencing students’ initial

predictions. This correct prediction then influenced what was observed. A correct observation then led to correct interpretations and generalizations, which resulted in conceptual change that was retained. A similar pattern of results was found using interventions employing either full or reduced explanations prior to the evaluation of evidence.

Thus, it appears that children were able to change their beliefs on the basis of anomalous or unexpected evidence, but only when they were capable of making the correct observations. Difficulty in making observations was found to be the main cognitive process responsible for impeding conceptual change (i.e., rather than interpretation, generalization, or retention). Certain interventions, in particular those involving an explanation of what scientists expected to happen and why, were very effective in mediating conceptual change when encountering counterintuitive evidence. With particular scaffolds, children made observations independent of theory, and they changed their beliefs based on observed evidence.


There is increasing evidence that, as in the case of intellectual skills in general, the development of the component skills of scientific reasoning “cannot be counted on to routinely develop” (Kuhn and Franklin, 2006, p. 47). That is, young children have many requisite skills needed to engage in scientific thinking, but there are also ways in which even adults do not show full proficiency in investigative and inference tasks. Recent research efforts have therefore been focused on how such skills can be promoted by determining which types of educational interventions (e.g., amount of structure, amount of support, emphasis on strategic or metastrategic skills) will contribute most to learning, retention, and transfer, and which types of interventions are best suited to different students. There is a developing picture of what children are capable of with minimal support, and research is moving in the direction of ascertaining what children are capable of, and when, under conditions of practice, instruction, and scaffolding. It may one day be possible to tailor educational opportunities that neither under- or overestimate children’s ability to extract meaningful experiences from inquiry-based science classes.

Very few of the early studies focusing on the development of experimentation and evidence evaluation skills explicitly addressed issues of instruction and experience. Those that did, however, indicated an important role of experience and instruction in supporting scientific thinking. For example, Siegler and Liebert (1975) incorporated instructional manipulations aimed at teaching children about variables and variable levels with or without practice on analogous tasks. In the absence of both instruction and

extended practice, no fifth graders and a small minority of eighth graders were successful. Kuhn and Phelps (1982) reported that, in the absence of explicit instruction, extended practice over several weeks was sufficient for the development and modification of experimentation and inference strategies. Later studies of self-directed experimentation also indicate that frequent engagement with the inquiry environment alone can lead to the development and modification of cognitive strategies (e.g., Kuhn, Schauble, and Garcia-Mila, 1992; Schauble et al., 1991).

Some researchers have suggested that even simple prompts, which are often used in studies of students’ investigation skills, may provide a subtle form of instruction intervention (Klahr and Carver, 1995). Such prompts may cue the strategic requirements of the task, or they may promote explanation or the type of reflection that could induce a metacognitive or metastrategic awareness of task demands. Because of their role in many studies of revealing students’ thinking generation, it may be very difficult to tease apart the relative contributions of practice from the scaffolding provided by researcher prompts.

In the absence of instruction or prompts, students may not routinely ask questions of themselves, such as “What are you going to do next?” “What outcome do you predict?” “What did you learn?” and “How do you know?” Questions such as these may promote self-explanation, which has been shown to enhance understanding in part because it facilitates the integration of newly learned material with existing knowledge (Chi et al., 1994). Questions such as the prompts used by researchers may serve to promote such integration. Chinn and Malhotra (2002) incorporated different kinds of interventions, aimed at promoting conceptual change in response to anomalous experimental evidence. Interventions included practice at making predictions, reflecting on data, and explanation. The explanation-based interventions were most successful at promoting conceptual change, retention, and generalization. The prompts used in some studies of self-directed experimentation are very likely to serve the same function as the prompts used by Chi et al. (1994). Incorporating such prompts in classroom-based inquiry activities could serve as a powerful teaching tool, given that the use of self-explanation in tutoring systems (human and computer interface) has been shown to be quite effective (e.g., Chi, 1996; Hausmann and Chi, 2002).

Studies that compare the effects of different kinds of instruction and practice opportunities have been conducted in the laboratory, with some translation to the classroom. For example, Chen and Klahr (1999) examined the effects of direct and indirect instruction of the control of variables strategy on students’ (grades 2-4) experimentation and knowledge acquisition. The instructional intervention involved didactic teaching of the control-of-variables strategy, along with examples and probes. Indirect (or implicit) training involved the use of systematic probes during the course of children’s

experimentation. A control group did not receive instruction or probes. No group received instruction on domain knowledge for any task used (springs, ramps, sinking objects). For the students who received instruction, use of the control-of-variables strategy increased from 34 percent prior to instruction to 65 percent after, with 61-64 percent usage maintained on transfer tasks that followed after 1 day and again after 7 months, respectively. No such gains were evident for the implicit training or control groups.

Instruction about control of variables improved children’s ability to design informative experiments, which in turn facilitated conceptual change in a number of domains. They were able to design unconfounded experiments, which facilitated valid causal and noncausal inferences, resulting in a change in knowledge about how various multivariable causal systems worked. Significant gains in domain knowledge were evident only for the instruction group. Fourth graders showed better skill retention at long-term assessment than second or third graders.

The positive impact of instruction on control of variables also appears to translate to the classroom (Toth, Klahr, and Chen, 2000; Klahr, Chen and Toth, 2001). Fourth graders who received instruction in the control-of-variables strategy in their classroom increased their use of the strategy, and their domain knowledge improved. The percentage of students who were able to correctly evaluate others’ research increased from 28 to 76 percent.

Instruction also appears to promote longer term use of the control-of-variables strategy and transfer of the strategy to a new task (Klahr and Nigam, 2004). Third and fourth graders who received instruction were more likely to master the control-of-variables strategy than students who explored a multivariable system on their own. Interestingly, although the group that received instruction performed better overall, a quarter of the students who explored the system on their own also mastered the strategy. These results raise questions about the kinds of individual differences that may allow for some students to benefit from the discovery context, but not others. That is, which learner traits are associated with the success of different learning experiences?

Similar effects of experience and instruction have been demonstrated for improving students’ ability to use evidence from multiple records and make correct inferences from noncausal variables (Keselman, 2003). In many cases, students show some improvement when they are given the opportunity for practice, but greater improvement when they receive instruction (Kuhn and Dean, 2005).

Long-term studies of students’ learning in the classroom with instructional support and structured experiences over months and years reveal children’s potential to engage in sophisticated investigations given the appropriate experiences (Metz, 2004; Lehrer and Schauble, 2005). For example, in one classroom-based study, second and fourth and fifth graders took part

in a curriculum unit on animal behavior that emphasized domain knowledge, whole-class collaboration, scaffolded instruction, and discussions about the kinds of questions that can and cannot be answered by observational records (Metz, 2004). Pairs or triads of students then developed a research question, designed an experiment, collected and analyzed data, and presented their findings on a research poster. Such studies have demonstrated that, with appropriate support, students in grades K-8 and students from a variety of socioeconomic, cultural, and linguistic backgrounds can be successful in generating and evaluating scientific evidence and explanations (Kuhn and Dean, 2005; Lehrer and Schauble, 2005; Metz, 2004; Warren, Rosebery, and Conant, 1994).


The picture that emerges from developmental and cognitive research on scientific thinking is one of a complex intertwining of knowledge of the natural world, general reasoning processes, and an understanding of how scientific knowledge is generated and evaluated. Science and scientific thinking are not only about logical thinking or conducting carefully controlled experiments. Instead, building knowledge in science is a complex process of building and testing models and theories, in which knowledge of the natural world and strategies for generating and evaluating evidence are closely intertwined. Working from this image of science, a few researchers have begun to investigate the development of children’s knowledge and skills in modeling.

The kinds of models that scientists construct vary widely, both within and across disciplines. Nevertheless, the rhetoric and practice of science are governed by efforts to invent, revise, and contest models. By modeling, we refer to the construction and test of representations that serve as analogues to systems in the real world (Lehrer and Schauble, 2006). These representations can be of many forms, including physical models, computer programs, mathematical equations, or propositions. Objects and relations in the model are interpreted as representing theoretically important objects and relations in the represented world. Models are useful in summarizing known features and predicting outcomes—that is, they can become elements of or representations of theories. A key hurdle for students is to understand that models are not copies; they are deliberate simplifications. Error is a component of all models, and the precision required of a model depends on the purpose for its current use.

The forms of thinking required for modeling do not progress very far without explicit instruction and fostering (Lehrer and Schauble, 2000). For this reason, studies of modeling have most often taken place in classrooms over sustained periods of time, often years. These studies provide a pro-

vocative picture of the sophisticated scientific thinking that can be supported in classrooms if students are provided with the right kinds of experiences over extended periods of time. The instructional approaches used in studies of students’ modeling, as well as the approach to curriculum that may be required to support the development of modeling skills over multiple years of schooling, are discussed in the chapters in Part III .

Lehrer and Schauble (2000, 2003, 2006) reported observing characteristic shifts in the understanding of modeling over the span of the elementary school grades, from an early emphasis on literal depictional forms, to representations that are progressively more symbolic and mathematically powerful. Diversity in representational and mathematical resources both accompanied and produced conceptual change. As children developed and used new mathematical means for characterizing growth, they understood biological change in increasingly dynamic ways. For example, once students understood the mathematics of ratio and changing ratios, they began to conceive of growth not as simple linear increase, but as a patterned rate of change. These transitions in conception and representation appeared to support each other, and they opened up new lines of inquiry. Children wondered whether plant growth was like animal growth, and whether the growth of yeast and bacteria on a Petri dish would show a pattern like the growth of a single plant. These forms of conceptual development required a context in which teachers systematically supported a restricted set of central ideas, building successively on earlier concepts over the grades of schooling.

Representational Systems That Support Modeling

The development of specific representational forms and notations, such as graphs, tables, computer programs, and mathematical expressions, is a critical part of engaging in mature forms of modeling. Mathematics, data and scale models, diagrams, and maps are particularly important for supporting science learning in grades K-8.


Mathematics and science are, of course, separate disciplines. Nevertheless, for the past 200 years, the steady press in science has been toward increasing quantification, visualization, and precision (Kline, 1980). Mathematics in all its forms is a symbol system that is fundamental to both expressing and understanding science. Often, expressing an idea mathematically results in noticing new patterns or relationships that otherwise would not be grasped. For example, elementary students studying the growth of organisms (plants, tobacco hornworms, populations of bacteria) noted that when they graphed changes in heights over the life span, all the organisms

studied produced an emergent S-shaped curve. However, such seeing depended on developing a “disciplined perception” (Stevens and Hall, 1998), a firm grounding in a Cartesian system. Moreover, the shape of the curve was determined in light of variation, accounted for by selecting and connecting midpoints of intervals that defined piece-wise linear segments. This way of representing typical growth was contentious, because some midpoints did not correspond to any particular case value. This debate was therefore a pathway toward the idealization and imagined qualities of the world necessary for adopting a modeling stance. The form of the growth curve was eventually tested in other systems, and its replications inspired new questions. For example, why would bacteria populations and plants be describable by the same growth curve? In this case and in others, explanatory models and data models mutually bootstrapped conceptual development (Lehrer and Schauble, 2002).

It is not feasible in this report to summarize the extensive body of research in mathematics education, but one point is especially critical for science education: the need to expand elementary school mathematics beyond arithmetic to include space and geometry, measurement, and data/ uncertainty. The National Council of Teachers of Mathematics standards (2000) has strongly supported this extension of early mathematics, based on their judgment that arithmetic alone does not constitute a sufficient mathematics education. Moreover, if mathematics is to be used as a resource for science, the resource base widens considerably with a broader mathematical base, affording students a greater repertoire for making sense of the natural world.

For example, consider the role of geometry and visualization in comparing crystalline structures or evaluating the relationship between the body weights and body structures of different animals. Measurement is a ubiquitous part of the scientific enterprise, although its subtleties are almost always overlooked. Students are usually taught procedures for measuring but are rarely taught a theory of measure. Educators often overestimate children’s understanding of measurement because measuring tools—like rulers or scales—resolve many of the conceptual challenges of measurement for children, so that they may fail to grasp the idea that measurement entails the iteration of constant units, and that these units can be partitioned. It is reasonably common, for example, for even upper elementary students who seem proficient at measuring lengths with rulers to tacitly hold the theory that measuring merely entails the counting of units between boundaries. If these students are given unconnected units (say, tiles of a constant length) and asked to demonstrate how to measure a length, some of them almost always place the units against the object being measured in such a way that the first and last tile are lined up flush with the end of the object measured. This arrangement often requires leaving spaces between units. Diagnosti-

cally, these spaces do not trouble a student who holds this “boundary-filling” conception of measurement (Lehrer, 2003; McClain et al., 1999).

Researchers agree that scientific thinking entails the coordination of theory with evidence (Klahr and Dunbar, 1988; Kuhn, Amsel, and O’Loughlin, 1988), but there are many ways in which evidence may vary in both form and complexity. Achieving this coordination therefore requires tools for structuring and interpreting data and error. Otherwise, students’ interpretation of evidence cannot be accountable. There have been many studies of students’ reasoning about data, variation, and uncertainty, conducted both by psychologists (Kahneman, Solvic, and Tversky, 1982; Konold, 1989; Nisbett et al., 1983) and by educators (Mokros and Russell, 1995; Pollatsek, Lima, and Well, 1981; Strauss and Bichler, 1988). Particularly pertinent here are studies that focus on data modeling (Lehrer and Romberg, 1996), that is, how reasoning with data is recruited as a way of investigating genuine questions about the world.

Data modeling is, in fact, what professionals do when they reason with data and statistics. It is central to a variety of enterprises, including engineering, medicine, and natural science. Scientific models are generated with acute awareness of their entailments for data, and data are recorded and structured as a way of making progress in articulating a scientific model or adjudicating among rival models. The tight relationship between model and data holds generally in domains in which inquiry is conducted by inscribing, representing, and mathematizing key aspects of the world (Goodwin, 2000; Kline, 1980; Latour, 1990).

Understanding the qualities and meaning of data may be enhanced if students spend as much attention on its generation as on its analysis. First and foremost, students need to grasp the notion that data are constructed to answer questions (Lehrer, Giles, and Schauble, 2002). The National Council of Teachers of Mathematics (2000) emphasizes that the study of data should be firmly anchored in students’ inquiry, so that they “address what is involved in gathering and using the data wisely” (p. 48). Questions motivate the collection of certain types of information and not others, and many aspects of data coding and structuring also depend on the question that motivated their collection. Defining the variables involved in addressing a research question, considering the methods and timing to collect data, and finding efficient ways to record it are all involved in the initial phases of data modeling. Debates about the meaning of an attribute often provoke questions that are more precise.

For example, a group of first graders who wanted to learn which student’s pumpkin was the largest eventually understood that they needed to agree

whether they were interested in the heights of the pumpkins, their circumferences, or their weights (Lehrer et al., 2001). Deciding what to measure is bound up with deciding how to measure. As the students went on to count the seeds in their pumpkins (they were pursuing a question about whether there might be relationship between pumpkin size and number of seeds), they had to make decisions about whether they would include seeds that were not full grown and what criteria would be used to decide whether any particular seed should be considered mature.

Data are inherently a form of abstraction: an event is replaced by a video recording, a sensation of heat is replaced by a pointer reading on a thermometer, and so on. Here again, the tacit complexity of tools may need to be explained. Students often have a fragile grasp of the relationship between the event of interest and the operation (hence, the output) of a tool, whether that tool is a microscope, a pan balance, or a “simple” ruler. Some students, for example, do not initially consider measurement to be a form of comparison and may find a balance a very confusing tool. In their mind, the number displayed on a scale is the weight of the object. If no number is displayed, weight cannot be found.

Once the data are recorded, making sense of them requires that they be structured. At this point, students sometimes discover that their data require further abstraction. For example, as they categorized features of self-portraits drawn by other students, a group of fourth graders realized that it would not be wise to follow their original plan of creating 23 categories of “eye type” for the 25 portraits that they wished to categorize (DiPerna, 2002). Data do not come with an inherent structure; rather, structure must be imposed (Lehrer, Giles, and Schauble, 2002). The only structure for a set of data comes from the inquirers’ prior and developing understanding of the phenomenon under investigation. He imposes structure by selecting categories around which to describe and organize the data.

Students also need to mentally back away from the objects or events under study to attend to the data as objects in their own right, by counting them, manipulating them to discover relationships, and asking new questions of already collected data. Students often believe that new questions can be addressed only with new data; they rarely think of querying existing data sets to explore questions that were not initially conceived when the data were collected (Lehrer and Romberg, 1996).

Finally, data are represented in various ways in order to see or understand general trends. Different kinds of displays highlight certain aspects of the data and hide others. An important educational agenda for students, one that extends over several years, is to come to understand the conventions and properties of different kinds of data displays. We do not review here the extensive literature on students’ understanding of different kinds of representational displays (tables, graphs of various kinds, distributions), but, for

purposes of science, students should not only understand the procedures for generating and reading displays, but they should also be able to critique them and to grasp the communicative advantages and disadvantages of alternative forms for a given purpose (diSessa, 2004; Greeno and Hall, 1997). The structure of the data will affect the interpretation. Data interpretation often entails seeking and confirming relationships in the data, which may be at varying levels of complexity. For example, simple linear relationships are easier to spot than inverse relationships or interactions (Schauble, 1990), and students often fail to entertain the possibility that more than one relationship may be operating.

The desire to interpret data may further inspire the creation of statistics, such as measures of center and spread. These measures are a further step of abstraction beyond the objects and events originally observed. Even primary grade students can learn to consider the overall shape of data displays to make interpretations based on the “clumps” and “holes” in the data. Students often employ multiple criteria when trying to identify a “typical value” for a set of data. Many young students tend to favor the mode and justify their choice on the basis of repetition—if more than one student obtained this value, perhaps it is to be trusted. However, students tend to be less satisfied with modes if they do not appear near the center of the data, and they also shy away from measures of center that do not have several other values clustered near them (“part of a clump”). Understanding the mean requires an understanding of ratio, and if students are merely taught to “average” data in a procedural way without having a well-developed sense of ratio, their performance notoriously tends to degrade into “average stew”—eccentric procedures for adding and dividing things that make no sense (Strauss and Bichler, 1988). With good instruction, middle and upper elementary students can simultaneously consider the center and the spread of the data. Students can also generate various forms of mathematical descriptions of error, especially in contexts of measurement, where they can readily grasp the relationships between their own participation in the act of measuring and the resulting variation in measures (Petrosino, Lehrer, and Schauble, 2003).

Scale Models, Diagrams, and Maps

Although data representations are central to science, they are not, of course, the only representations students need to use and understand. Perhaps the most easily interpretable form of representation widely used in science is scale models. Physical models of this kind are used in science education to make it possible for students to visualize objects or processes that are at a scale that makes their direct perception impossible or, alternatively, that permits them to directly manipulate something that otherwise

they could not handle. The ease or difficulty with which students understand these models depends on the complexity of the relationships being communicated. Even preschoolers can understand scale models used to depict location in a room (DeLoache, 2004). Primary grade students can pretty readily overcome the influence of the appearance of the model to focus on and investigate the way it functions (Penner et al., 1997), but middle school students (and some adults) struggle to work out the positional relationships of the earth, the sun, and the moon, which involves not only reconciling different perspectives with respect to perspective and frame (what one sees standing on the earth, what one would see from a hypothetical point in space), but also visualizing how these perspectives would change over days and months (see, for example, the detailed curricular suggestions at the web site ).

Frequently, students are expected to read or produce diagrams, often integrating the information from the diagram with information from accompanying text (Hegarty and Just, 1993; Mayer, 1993). The comprehensibility of diagrams seems to be governed less by domain-general principles than by the specifics of the diagram and its viewer. Comprehensibility seems to vary with the complexity of what is portrayed, the particular diagrammatic details and features, and the prior knowledge of the user.

Diagrams can be difficult to understand for a host of reasons. Sometimes the desired information is missing in the first place; sometimes, features of the diagram unwittingly play into an incorrect preconception. For example, it has been suggested that the common student misconception that the earth is closer to the sun in the summer than in the winter may be due in part to the fact that two-dimensional representations of the three-dimensional orbit make it appear as if the foreshortened orbit is indeed closer to the sun at some points than at others.

Mayer (1993) proposes three common reasons why diagrams mis-communicate: some do not include explanatory information (they are illustrative or decorative rather than explanatory), some lack a causal chain, and some fail to map the explanation to a familiar or recognizable context. It is not clear that school students misperceive diagrams in ways that are fundamentally different from the perceptions of adults. There may be some diagrammatic conventions that are less familiar to children, and children may well have less knowledge about the phenomena being portrayed, but there is no reason to expect that adult novices would respond in fundamentally different ways. Although they have been studied for a much briefer period of time, the same is probably true of complex computer displays.

Finally, there is a growing developmental literature on students’ understanding of maps. Maps can be particularly confusing because they preserve some analog qualities of the space being represented (e.g., relative position and distance) but also omit or alter features of the landscape in ways that

require understanding of mapping conventions. Young children often initially confuse maps of the landscape with pictures of objects in the landscape. It is much easier for youngsters to represent objects than to represent large-scale space (which is the absence of or frame for objects). Students also may struggle with orientation, perspective (the traditional bird’s eye view), and mathematical descriptions of space, such as polar coordinate representations (Lehrer and Pritchard, 2002; Liben and Downs, 1993).


There is a common thread throughout the observations of this chapter that has deep implications for what one expects from children in grades K-8 and how their science learning should be structured. In almost all cases, the studies converge to the position that the skills under study develop with age, but also that this development is significantly enhanced by prior knowledge, experience, and instruction.

One of the continuing themes evident from studies on the development of scientific thinking is that children are far more competent than first suspected, and likewise that adults are less so. Young children experiment, but their experimentation is generally not systematic, and their observations as well as their inferences may be flawed. The progression of ability is seen with age, but it is not uniform, either across individuals or for a given individual. There is variation across individuals at the same age, as well as variation within single individuals in the strategies they use. Any given individual uses a collection of strategies, some more valid than others. Discovering a valid strategy does not mean that an individual, whether a child or an adult, will use the strategy consistently across all contexts. As Schauble (1996, p. 118) noted:

The complex and multifaceted nature of the skills involved in solving these problems, and the variability in performance, even among the adults, suggest that the developmental trajectory of the strategies and processes associated with scientific reasoning is likely to be a very long one, perhaps even lifelong . Previous research has established the existence of both early precursors and competencies … and errors and biases that persist regardless of maturation, training, and expertise.

One aspect of cognition that appears to be particularly important for supporting scientific thinking is awareness of one’s own thinking. Children may be less aware of their own memory limitations and therefore may be unsystematic in recording plans, designs, and outcomes, and they may fail to consult such records. Self-awareness of the cognitive strategies available is also important in order to determine when and why to employ various strategies. Finally, awareness of the status of one’s own knowledge, such as

recognizing the distinctions between theory and evidence, is important for reasoning in the context of scientific investigations. This last aspect of cognition is discussed in detail in the next chapter.

Prior knowledge, particularly beliefs about causality and plausibility, shape the approach to investigations in multiple ways. These beliefs influence which hypotheses are tested, how experiments are designed, and how evidence is evaluated. Characteristics of prior knowledge, such as its type, strength, and relevance, are potential determinants of how new evidence is evaluated and whether anomalies are noticed. Knowledge change occurs as a result of the encounter.

Finally, we conclude that experience and instruction are crucial mediators of the development of a broad range of scientific skills and of the degree of sophistication that children exhibit in applying these skills in new contexts. This means that time spent doing science in appropriately structured instructional frames is a crucial part of science education. It affects not only the level of skills that children develop, but also their ability to think about the quality of evidence and to interpret evidence presented to them. Students need instructional support and practice in order to become better at coordinating their prior theories and the evidence generated in investigations. Instructional support is also critical for developing skills for experimental design, record keeping during investigations, dealing with anomalous data, and modeling.

Ahn, W., Kalish, C.W., Medin, D.L., and Gelman, S.A. (1995). The role of covariation versus mechanism information in causal attribution. Cognition, 54, 299-352.

Amsel, E., and Brock, S. (1996). The development of evidence evaluation skills. Cognitive Development, 11 , 523-550.

Bisanz, J., and LeFevre, J. (1990). Strategic and nonstrategic processing in the development of mathematical cognition. In. D. Bjorklund (Ed.), Children’s strategies: Contemporary views of cognitive development (pp. 213-243). Hillsdale, NJ: Lawrence Erlbaum Associates.

Carey, S., Evans, R., Honda, M., Jay, E., and Unger, C. (1989). An experiment is when you try it and see if it works: A study of grade 7 students’ understanding of the construction of scientific knowledge. International Journal of Science Education, 11 , 514-529.

Chase, W.G., and Simon, H.A. (1973). The mind’s eye in chess. In W.G. Chase (Ed.), Visual information processing . New York: Academic.

Chen, Z., and Klahr, D. (1999). All other things being equal: Children’s acquisition of the control of variables strategy. Child Development, 70, 1098-1120.

Chi, M.T.H. (1996). Constructing self-explanations and scaffolded explanations in tutoring. Applied Cognitive Psychology, 10, 33-49.

Chi, M.T.H., de Leeuw, N., Chiu, M., and Lavancher, C. (1994). Eliciting self-explanations improves understanding. Cognitive Science, 18, 439-477.

Chinn, C.A., and Brewer, W.F. (1998). An empirical test of a taxonomy of responses to anomalous data in science. Journal of Research in Science Teaching, 35, 623-654.

Chinn, C.A., and Brewer, W. (2001). Model of data: A theory of how people evaluate data. Cognition and Instruction , 19 (3), 323-343.

Chinn, C.A., and Malhotra, B.A. (2001). Epistemologically authentic scientific reasoning. In K. Crowley, C.D. Schunn, and T. Okada (Eds.), Designing for science: Implications from everyday, classroom, and professional settings (pp. 351-392). Mahwah, NJ: Lawrence Erlbaum Associates.

Chinn, C.A., and Malhotra, B.A. (2002). Children’s responses to anomalous scientific data: How is conceptual change impeded? Journal of Educational Psychology, 94, 327-343.

DeLoache, J.S. (2004). Becoming symbol-minded. Trends in Cognitive Sciences, 8 , 66-70.

DiPerna, E. (2002). Data models of ourselves: Body self-portrait project. In R. Lehrer and L. Schauble (Eds.), Investigating real data in the classroom: Expanding children’s understanding of math and science. Ways of knowing in science and mathematics series . Willington, VT: Teachers College Press.

diSessa, A.A. (2004). Metarepresentation: Native competence and targets for instruction. Cognition and Instruction, 22 (3), 293-331.

Dunbar, K., and Klahr, D. (1989). Developmental differences in scientific discovery strategies. In D. Klahr and K. Kotovsky (Eds.), Complex information processing: The impact of Herbert A. Simon (pp. 109-143). Hillsdale, NJ: Lawrence Erlbaum Associates.

Echevarria, M. (2003). Anomalies as a catalyst for middle school students’ knowledge construction and scientific reasoning during science inquiry. Journal of Educational Psychology, 95, 357-374 .

Garcia-Mila, M., and Andersen, C. (2005). Developmental change in notetaking during scientific inquiry. Manuscript submitted for publication.

Gleason, M.E., and Schauble, L. (2000). Parents’ assistance of their children’s scientific reasoning. Cognition and Instruction, 17 (4), 343-378.

Goodwin, C. (2000). Introduction: Vision and inscription in practice. Mind, Culture, and Activity , 7 , 1-3.

Greeno, J., and Hall, R. (1997). Practicing representation: Learning with and about representational forms. Phi Delta Kappan, January, 361-367.

Hausmann, R., and Chi, M. (2002) Can a computer interface support self-explaining? The International Journal of Cognitive Technology , 7 (1).

Hegarty, M., and Just, A. (1993). Constructing mental models of machines from text and diagrams. Journal of Memory and Language , 32 , 717-742.

Inhelder, B., and Piaget, J. (1958). The growth of logical thinking from childhood to adolescence . New York: Basic Books.

Kahneman, D., Slovic, P, and Tversky, A. (1982). Judgment under uncertainty: Heuristics and biases . New York: Cambridge University Press.

Kanari, Z., and Millar, R. (2004). Reasoning from data: How students collect and interpret data in science investigations. Journal of Research in Science Teaching , 41 , 17.

Keselman, A. (2003). Supporting inquiry learning by promoting normative understanding of multivariable causality. Journal of Research in Science Teaching, 40, 898-921.

Keys, C.W. (1994). The development of scientific reasoning skills in conjunction with collaborative writing assignments: An interpretive study of six ninth-grade students. Journal of Research in Science Teaching, 31, 1003-1022.

Klaczynski, P.A. (2000). Motivated scientific reasoning biases, epistemological beliefs, and theory polarization: A two-process approach to adolescent cognition. Child Development , 71 (5), 1347-1366.

Klaczynski, P.A., and Narasimham, G. (1998). Development of scientific reasoning biases: Cognitive versus ego-protective explanations. Developmental Psychology, 34 (1), 175-187.

Klahr, D. (2000). Exploring science: The cognition and development of discovery processes. Cambridge, MA: MIT Press.

Klahr, D., and Carver, S.M. (1995). Scientific thinking about scientific thinking. Monographs of the Society for Research in Child Development, 60, 137-151.

Klahr, D., Chen, Z., and Toth, E.E. (2001). From cognition to instruction to cognition: A case study in elementary school science instruction. In K. Crowley, C.D. Schunn, and T. Okada (Eds.), Designing for science: Implications from everyday, classroom, and professional settings (pp. 209-250). Mahwah, NJ: Lawrence Erlbaum Associates.

Klahr, D., and Dunbar, K. (1988). Dual search space during scientific reasoning. Cognitive Science, 12, 1-48.

Klahr, D., Fay, A., and Dunbar, K. (1993). Heuristics for scientific experimentation: A developmental study. Cognitive Psychology, 25, 111-146.

Klahr, D., and Nigam, M. (2004). The equivalence of learning paths in early science instruction: Effects of direct instruction and discovery learning. Psychological Science, 15 (10), 661-667.

Klahr, D., and Robinson, M. (1981). Formal assessment of problem solving and planning processes in preschool children. Cognitive Psychology , 13 , 113-148.

Klayman, J., and Ha, Y. (1989). Hypothesis testing in rule discovery: Strategy, structure, and content. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15 (4), 596-604.

Kline, M. (1980). Mathematics: The loss of certainty . New York: Oxford University Press.

Konold, C. (1989). Informal conceptions of probability. Cognition and Instruction , 6 , 59-98.

Koslowski, B. (1996). Theory and evidence: The development of scientific reasoning. Cambridge, MA: MIT Press.

Koslowski, B., and Okagaki, L. (1986). Non-human indices of causation in problem-solving situations: Causal mechanisms, analogous effects, and the status of rival alternative accounts. Child Development, 57, 1100-1108.

Koslowski, B., Okagaki, L., Lorenz, C., and Umbach, D. (1989). When covariation is not enough: The role of causal mechanism, sampling method, and sample size in causal reasoning. Child Development, 60, 1316-1327.

Kuhn, D. (1989). Children and adults as intuitive scientists . Psychological Review, 96 , 674-689.

Kuhn, D. (2001). How do people know? Psychological Science, 12, 1-8.

Kuhn, D. (2002). What is scientific thinking and how does it develop? In U. Goswami (Ed.), Blackwell handbook of childhood cognitive development (pp. 371-393). Oxford, England: Blackwell.

Kuhn, D., Amsel, E., and O’Loughlin, M. (1988). The development of scientific thinking skills. Orlando, FL: Academic Press.

Kuhn, D., and Dean, D. (2005). Is developing scientific thinking all about learning to control variables? Psychological Science, 16 (11), 886-870.

Kuhn, D., and Franklin, S. (2006). The second decade: What develops (and how)? In W. Damon, R.M. Lerner, D. Kuhn, and R.S. Siegler (Eds.), Handbook of child psychology, volume 2, cognition, peception, and language, 6th edition (pp. 954-994). Hoboken, NJ: Wiley.

Kuhn, D., Garcia-Mila, M., Zohar, A., and Andersen, C. (1995). Strategies of knowledge acquisition. Monographs of the Society for Research in Child Development, Serial No. 245 (60), 4.

Kuhn, D., and Pearsall, S. (1998). Relations between metastrategic knowledge and strategic performance. Cognitive Development, 13, 227-247.

Kuhn, D., and Pearsall, S. (2000). Developmental origins of scientific thinking. Journal of Cognition and Development, 1, 113-129.

Kuhn, D., and Phelps, E. (1982). The development of problem-solving strategies. In H. Reese (Ed.), Advances in child development and behavior ( vol. 17, pp. 1-44). New York: Academic Press.

Kuhn, D., Schauble, L., and Garcia-Mila, M. (1992). Cross-domain development of scientific reasoning. Cognition and Instruction, 9, 285-327.

Kunda, Z. (1990). The case for motivated reasoning. Psychological Bulletin, 108, 480-498.

Larkin, J.H., McDermott, J., Simon, D.P, and Simon, H.A. (1980). Expert and novice performance in solving physics problems. Science , 208 , 1335-1342.

Latour, B. (1990). Drawing things together. In M. Lynch and S. Woolgar (Eds.), Representation in scientific practice (pp. 19-68). Cambridge, MA: MIT Press.

Lehrer, R. (2003). Developing understanding of measurement. In J. Kilpatrick, W.G. Martin, and D.E. Schifter (Eds.), A research companion to principles and standards for school mathematics (pp. 179-192). Reston, VA: National Council of Teachers of Mathematics.

Lehrer, R., Giles, N., and Schauble, L. (2002). Data modeling. In R. Lehrer and L. Schauble (Eds.), Investigating real data in the classroom: Expanding children’s understanding of math and science (pp. 1-26). New York: Teachers College Press.

Lehrer, R., and Pritchard, C. (2002). Symbolizing space into being. In K. Gravemeijer, R. Lehrer, B. van Oers, and L. Verschaffel (Eds.), Symbolization, modeling and tool use in mathematics education (pp. 59-86). Dordrecht, The Netherlands: Kluwer Academic.

Lehrer, R., and Romberg, T. (1996). Exploring children’s data modeling. Cognition and Instruction , 14 , 69-108.

Lehrer, R., and Schauble, L. (2000). The development of model-based reasoning. Journal of Applied Developmental Psychology, 21 (1), 39-48.

Lehrer, R., and Schauble, L. (2002). Symbolic communication in mathematics and science: Co-constituting inscription and thought. In E.D. Amsel and J. Byrnes (Eds.), Language, literacy, and cognitive development: The development and consequences of symbolic communicat i on (pp. 167-192). Mahwah, NJ: Lawrence Erlbaum Associates.

Lehrer, R., and Schauble, L. (2003). Origins and evolution of model-based reasoning in mathematics and science. In R. Lesh and H.M. Doerr (Eds.), Beyond constructivism: A models and modeling perspective on mathematics problem-solving, learning, and teaching (pp. 59-70). Mahwah, NJ: Lawrence Erlbaum Associates.

Lehrer, R., and Schauble, L., (2005). Developing modeling and argument in the elementary grades. In T.A. Rombert, T.P. Carpenter, and F. Dremock (Eds.), Understanding mathematics and science matters (Part II: Learning with understanding). Mahwah, NJ: Lawrence Erlbaum Associates.

Lehrer, R., and Schauble, L. (2006). Scientific thinking and science literacy. In W. Damon, R. Lerner, K.A. Renninger, and I.E. Sigel (Eds.), Handbook of child psychology, 6th edition (vol. 4). Hoboken, NJ: Wiley.

Lehrer, R., Schauble, L., Strom, D., and Pligge, M. (2001). Similarity of form and substance: Modeling material kind. In D. Klahr and S. Carver (Eds.), Cognition and instruction: 25 years of progress (pp. 39-74). Mahwah, NJ: Lawrence Erlbaum Associates.

Liben, L.S., and Downs, R.M. (1993). Understanding per son-space-map relations: Cartographic and developmental perspectives. Developmental Psychology, 29 , 739-752.

Linn, M.C. (1978). Influence of cognitive style and training on tasks requiring the separation of variables schema. Child Development , 49 , 874-877.

Linn, M.C. (1980). Teaching students to control variables: Some investigations using free choice experiences. In S. Modgil and C. Modgil (Eds.), Toward a theory of psychological development within the Piagettian framework . Windsor Berkshire, England: National Foundation for Educational Research.

Linn, M.C., Chen, B., and Thier, H.S. (1977). Teaching children to control variables: Investigations of a free choice environment. Journal of Research in Science Teaching , 14 , 249-255.

Linn, M.C., and Levine, D.I. (1978). Adolescent reasoning: Influence of question format and type of variables on ability to control variables. Science Education , 62 (3), 377-388.

Lovett, M.C., and Anderson, J.R. (1995). Making heads or tails out of selecting problem-solving strategies. In J.D. Moore and J.F. Lehman (Eds.), Proceedings of the seventieth annual conference of the Cognitive Science Society (pp. 265-270). Hillsdale, NJ: Lawrence Erlbaum Associates.

Lovett, M.C., and Anderson, J.R. (1996). History of success and current context in problem solving. Cognitive Psychology , 31 (2), 168-217.

Masnick, A.M., and Klahr, D. (2003). Error matters: An initial exploration of elementary school children’s understanding of experimental error. Journal of Cognition and Development, 4 , 67-98.

Mayer, R. (1993). Illustrations that instruct. In R. Glaser (Ed.), Advances in instructional psychology (vol. 4, pp. 253-284). Hillsdale, NJ: Lawrence Erlbaum Associates.

McClain, K., Cobb, P., Gravemeijer, K., and Estes, B. (1999). Developing mathematical reasoning within the context of measurement. In L. Stiff (Ed.), Developing mathematical reasoning, K-12 (pp. 93-106). Reston, VA: National Council of Teachers of Mathematics.

McNay, M., and Melville, K.W. (1993). Children’s skill in making predictions and their understanding of what predicting means: A developmental study. Journal of Research in Science Teaching , 30, 561-577.

Metz, K.E. (2004). Children’s understanding of scientific inquiry: Their conceptualization of uncertainty in investigations of their own design. Cognition and Instruction, 22( 2), 219-290.

Mokros, J., and Russell, S. (1995). Children’s concepts of average and representativeness. Journal for Research in Mathematics Education, 26 (1), 20-39.

National Council of Teachers of Mathematics. (2000). Principles and standards for school mathematics. Reston, VA: Author.

Nisbett, R.E., Krantz, D.H., Jepson, C., and Kind, Z. (1983). The use of statistical heuristics in everyday inductive reasoning. Psychological Review, 90 , 339-363.

Penner, D., Giles, N.D., Lehrer, R., and Schauble, L. (1997). Building functional models: Designing an elbow. Journal of Research in Science Teaching, 34(2) , 125-143.

Penner, D.E., and Klahr, D. (1996a). The interaction of domain-specific knowledge and domain-general discovery strategies: A study with sinking objects. Child Development, 67, 2709-2727.

Penner, D.E., and Klahr, D. (1996b). When to trust the data: Further investigations of system error in a scientific reasoning task. Memory and Cognition, 24, 655-668 .

Perfetti, CA. (1992). The representation problem in reading acquisition. In P.B. Gough, L.C. Ehri, and R. Treiman (Eds.), Reading acquisition (pp. 145-174). Hillsdale, NJ: Lawrence Erlbaum Associates.

Petrosino, A., Lehrer, R., and Schauble, L. (2003). Structuring error and experimental variation as distribution in the fourth grade. Mathematical Thinking and Learning, 5 (2-3), 131-156.

Pollatsek, A., Lima, S., and Well, A.D. (1981). Concept or computation: Students’ misconceptions of the mean. Educational Studies in Mathematics , 12, 191-204.

Ruffman, T., Perner, I., Olson, D.R., and Doherty, M. (1993). Reflecting on scientific thinking: Children’s understanding of the hypothesis-evidence relation. Child Development, 64 (6), 1617-1636.

Schauble, L. (1990). Belief revision in children: The role of prior knowledge and strategies for generating evidence. Journal of Experimental Child Psychology , 49 (1), 31-57.

Schauble, L. (1996). The development of scientific reasoning in knowledge-rich contexts. Developmental Psychology , 32 (1), 102-119.

Schauble, L., Glaser, R., Duschl, R., Schulze, S., and John, J. (1995). Students’ understanding of the objectives and procedures of experimentation in the science classroom. Journal of the Learning Sciences , 4 (2), 131-166.

Schauble, L., Glaser, R., Raghavan, K., and Reiner, M. (1991). Causal models and experimentation strategies in scientific reasoning. Journal of the Learning Sciences , 1 (2), 201-238.

Schauble, L., Glaser, R., Raghavan, K., and Reiner, M. (1992). The integration of knowledge and experimentation strategies in understanding a physical system. Applied Cognitive Psychology , 6 , 321-343.

Schauble, L., Klopfer, L.E., and Raghavan, K. (1991). Students’ transition from an engineering model to a science model of experimentation. Journal of Research in Science Teaching , 28 (9), 859-882.

Siegler, R.S. (1987). The perils of averaging data over strategies: An example from children’s addition. Journal of Experimental Psychology: General, 116, 250-264 .

Siegler, R.S., and Alibali, M.W. (2005). Children’s thinking (4th ed.). Upper Saddle River, NJ: Prentice Hall.

Siegler, R.S., and Crowley, K. (1991). The microgenetic method: A direct means for studying cognitive development. American Psychologist , 46 , 606-620.

Siegler, R.S., and Jenkins, E. (1989). How children discover new strategies . Hillsdale, NJ: Lawrence Erlbaum Associates.

Siegler, R.S., and Liebert, R.M. (1975). Acquisition of formal experiment. Developmental Psychology , 11 , 401-412.

Siegler, R.S., and Shipley, C. (1995). Variation, selection, and cognitive change. In T. Simon and G. Halford (Eds.), Developing cognitive competence: New approaches to process modeling (pp. 31-76). Hillsdale, NJ: Lawrence Erlbaum Associates.

Simon, H.A. (1975). The functional equivalence of problem solving skills. Cognitive Psychology, 7 , 268-288.

Simon, H.A. (2001). Learning to research about learning. In S.M. Carver and D. Klahr (Eds.), Cognition and instruction: Twenty-five years of progress (pp. 205-226). Mahwah, NJ: Lawrence Erlbaum Associates.

Slowiaczek, L.M., Klayman, J., Sherman, S.J., and Skov, R.B. (1992). Information selection and use in hypothesis testing: What is a good question, and what is a good answer. Memory and Cognition, 20 (4), 392-405.

Sneider, C., Kurlich, K., Pulos, S., and Friedman, A. (1984). Learning to control variables with model rockets: A neo-Piagetian study of learning in field settings. Science Education , 68 (4), 463-484.

Sodian, B., Zaitchik, D., and Carey, S. (1991). Young children’s differentiation of hypothetical beliefs from evidence. Child Development, 62 (4), 753-766.

Stevens, R., and Hall, R. (1998). Disciplined perception: Learning to see in technoscience. In M. Lampert and M.L. Blunk (Eds.), Talking mathematics in school: Studies of teaching and learning (pp. 107-149). Cambridge, MA: Cambridge University Press.

Strauss, S., and Bichler, E. (1988). The development of children’s concepts of the arithmetic average. Journal for Research in Mathematics Education, 19 (1), 64-80.

Thagard, P. (1998a). Ulcers and bacteria I: Discovery and acceptance. Studies in History and Philosophy of Science. Part C: Studies in History and Philosophy of Biology and Biomedical Sciences, 29, 107-136.

Thagard, P. (1998b). Ulcers and bacteria II: Instruments, experiments, and social interactions. Studies in History and Philosophy of Science. Part C: Studies in History and Philosophy of Biology and Biomedical Sciences, 29 (2), 317-342.

Toth, E.E., Klahr, D., and Chen, Z. (2000). Bridging research and practice: A cognitively-based classroom intervention for teaching experimentation skills to elementary school children. Cognition and Instruction , 18 (4), 423-459.

Trafton, J.G., and Trickett, S.B. (2001). Note-taking for self-explanation and problem solving. Human-Computer Interaction, 16, 1-38.

Triona, L., and Klahr, D. (in press). The development of children’s abilities to produce external representations. In E. Teubal, J. Dockrell, and L. Tolchinsky (Eds.), Notational knowledge: Developmental and historical perspectives . Rotterdam, The Netherlands: Sense.

Varnhagen, C. (1995). Children’s spelling strategies. In V. Berninger (Ed.), The varieties of orthographic knowledge: Relationships to phonology, reading and writing (vol. 2, pp. 251-290). Dordrecht, The Netherlands: Kluwer Academic.

Warren, B., Rosebery, A., and Conant, F. (1994). Discourse and social practice: Learning science in language minority classrooms. In D. Spencer (Ed.), Adult biliteracy in the United States (pp. 191-210). McHenry, IL: Delta Systems.

Wolpert, L. (1993). The unnatural nature of science . London, England: Faber and Faber.

Zachos, P., Hick, T.L., Doane, W.E.I., and Sargent, C. (2000). Setting theoretical and empirical foundations for assessing scientific inquiry and discovery in educational programs. Journal of Research in Science Teaching, 37 (9), 938-962.

Zimmerman, C., Raghavan, K., and Sartoris, M.L. (2003). The impact of the MARS curriculum on students’ ability to coordinate theory and evidence. International Journal of Science Education, 25, 1247-1271.

What is science for a child? How do children learn about science and how to do science? Drawing on a vast array of work from neuroscience to classroom observation, Taking Science to School provides a comprehensive picture of what we know about teaching and learning science from kindergarten through eighth grade. By looking at a broad range of questions, this book provides a basic foundation for guiding science teaching and supporting students in their learning. Taking Science to School answers such questions as:

  • When do children begin to learn about science? Are there critical stages in a child's development of such scientific concepts as mass or animate objects?
  • What role does nonschool learning play in children's knowledge of science?
  • How can science education capitalize on children's natural curiosity?
  • What are the best tasks for books, lectures, and hands-on learning?
  • How can teachers be taught to teach science?

The book also provides a detailed examination of how we know what we know about children's learning of science—about the role of research and evidence. This book will be an essential resource for everyone involved in K-8 science education—teachers, principals, boards of education, teacher education providers and accreditors, education researchers, federal education agencies, and state and federal policy makers. It will also be a useful guide for parents and others interested in how children learn.


Welcome to OpenBook!

You're looking at OpenBook,'s online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

Do you want to take a quick tour of the OpenBook's features?

Show this book's table of contents , where you can jump to any chapter by name.

...or use these buttons to go back to the previous chapter or skip to the next one.

Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

Switch between the Original Pages , where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

To search the entire text of this book, type in your search term here and press Enter .

Share a link to this book page on your preferred social network or via email.

View our suggested citation for this chapter.

Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

Get Email Updates

Do you enjoy reading reports from the Academies online for free ? Sign up for email notifications and we'll let you know about new publications in your areas of interest when they're released.

  • Library databases
  • Library website

Evidence-Based Research: Levels of Evidence Pyramid


One way to organize the different types of evidence involved in evidence-based practice research is the levels of evidence pyramid. The pyramid includes a variety of evidence types and levels.

  • systematic reviews
  • critically-appraised topics
  • critically-appraised individual articles
  • randomized controlled trials
  • cohort studies
  • case-controlled studies, case series, and case reports
  • Background information, expert opinion

Levels of evidence pyramid

The levels of evidence pyramid provides a way to visualize both the quality of evidence and the amount of evidence available. For example, systematic reviews are at the top of the pyramid, meaning they are both the highest level of evidence and the least common. As you go down the pyramid, the amount of evidence will increase as the quality of the evidence decreases.

Levels of Evidence Pyramid

Text alternative for Levels of Evidence Pyramid diagram

EBM Pyramid and EBM Page Generator, copyright 2006 Trustees of Dartmouth College and Yale University. All Rights Reserved. Produced by Jan Glover, David Izzo, Karen Odato and Lei Wang.

Filtered Resources

Filtered resources appraise the quality of studies and often make recommendations for practice. The main types of filtered resources in evidence-based practice are:

Scroll down the page to the Systematic reviews , Critically-appraised topics , and Critically-appraised individual articles sections for links to resources where you can find each of these types of filtered information.

Systematic reviews

Authors of a systematic review ask a specific clinical question, perform a comprehensive literature review, eliminate the poorly done studies, and attempt to make practice recommendations based on the well-done studies. Systematic reviews include only experimental, or quantitative, studies, and often include only randomized controlled trials.

You can find systematic reviews in these filtered databases :

  • Cochrane Database of Systematic Reviews Cochrane systematic reviews are considered the gold standard for systematic reviews. This database contains both systematic reviews and review protocols. To find only systematic reviews, select Cochrane Reviews in the Document Type box.
  • JBI EBP Database (formerly Joanna Briggs Institute EBP Database) This database includes systematic reviews, evidence summaries, and best practice information sheets. To find only systematic reviews, click on Limits and then select Systematic Reviews in the Publication Types box. To see how to use the limit and find full text, please see our Joanna Briggs Institute Search Help page .

Open Access databases provide unrestricted access to and use of peer-reviewed and non peer-reviewed journal articles, books, dissertations, and more.

You can also find systematic reviews in this unfiltered database :

Some journals are peer reviewed

To learn more about finding systematic reviews, please see our guide:

  • Filtered Resources: Systematic Reviews

Critically-appraised topics

Authors of critically-appraised topics evaluate and synthesize multiple research studies. Critically-appraised topics are like short systematic reviews focused on a particular topic.

You can find critically-appraised topics in these resources:

  • Annual Reviews This collection offers comprehensive, timely collections of critical reviews written by leading scientists. To find reviews on your topic, use the search box in the upper-right corner.
  • Guideline Central This free database offers quick-reference guideline summaries organized by a new non-profit initiative which will aim to fill the gap left by the sudden closure of AHRQ’s National Guideline Clearinghouse (NGC).
  • JBI EBP Database (formerly Joanna Briggs Institute EBP Database) To find critically-appraised topics in JBI, click on Limits and then select Evidence Summaries from the Publication Types box. To see how to use the limit and find full text, please see our Joanna Briggs Institute Search Help page .
  • National Institute for Health and Care Excellence (NICE) Evidence-based recommendations for health and care in England.
  • Filtered Resources: Critically-Appraised Topics

Critically-appraised individual articles

Authors of critically-appraised individual articles evaluate and synopsize individual research studies.

You can find critically-appraised individual articles in these resources:

  • EvidenceAlerts Quality articles from over 120 clinical journals are selected by research staff and then rated for clinical relevance and interest by an international group of physicians. Note: You must create a free account to search EvidenceAlerts.
  • ACP Journal Club This journal publishes reviews of research on the care of adults and adolescents. You can either browse this journal or use the Search within this publication feature.
  • Evidence-Based Nursing This journal reviews research studies that are relevant to best nursing practice. You can either browse individual issues or use the search box in the upper-right corner.

To learn more about finding critically-appraised individual articles, please see our guide:

  • Filtered Resources: Critically-Appraised Individual Articles

Unfiltered resources

You may not always be able to find information on your topic in the filtered literature. When this happens, you'll need to search the primary or unfiltered literature. Keep in mind that with unfiltered resources, you take on the role of reviewing what you find to make sure it is valid and reliable.

Note: You can also find systematic reviews and other filtered resources in these unfiltered databases.

The Levels of Evidence Pyramid includes unfiltered study types in this order of evidence from higher to lower:

You can search for each of these types of evidence in the following databases:

TRIP database

Background information & expert opinion.

Background information and expert opinions are not necessarily backed by research studies. They include point-of-care resources, textbooks, conference proceedings, etc.

  • Family Physicians Inquiries Network: Clinical Inquiries Provide the ideal answers to clinical questions using a structured search, critical appraisal, authoritative recommendations, clinical perspective, and rigorous peer review. Clinical Inquiries deliver best evidence for point-of-care use.
  • Harrison, T. R., & Fauci, A. S. (2009). Harrison's Manual of Medicine . New York: McGraw-Hill Professional. Contains the clinical portions of Harrison's Principles of Internal Medicine .
  • Lippincott manual of nursing practice (8th ed.). (2006). Philadelphia, PA: Lippincott Williams & Wilkins. Provides background information on clinical nursing practice.
  • Medscape: Drugs & Diseases An open-access, point-of-care medical reference that includes clinical information from top physicians and pharmacists in the United States and worldwide.
  • Virginia Henderson Global Nursing e-Repository An open-access repository that contains works by nurses and is sponsored by Sigma Theta Tau International, the Honor Society of Nursing. Note: This resource contains both expert opinion and evidence-based practice articles.
  • Previous Page: Phrasing Research Questions
  • Next Page: Evidence Types
  • Office of Student Disability Services

Walden Resources


  • Academic Residencies
  • Academic Skills
  • Career Planning and Development
  • Customer Care Team
  • Field Experience
  • Military Services
  • Student Success Advising
  • Writing Skills

Centers and Offices

  • Center for Social Change
  • Office of Academic Support and Instructional Services
  • Office of Degree Acceleration
  • Office of Research and Doctoral Services
  • Office of Student Affairs

Student Resources

  • Doctoral Writing Assessment
  • Form & Style Review
  • Quick Answers
  • ScholarWorks
  • SKIL Courses and Workshops
  • Walden Bookstore
  • Walden Catalog & Student Handbook
  • Student Safety/Title IX
  • Legal & Consumer Information
  • Website Terms and Conditions
  • Cookie Policy
  • Accessibility
  • Accreditation
  • State Authorization
  • Net Price Calculator
  • Contact Walden

Walden University is a member of Adtalem Global Education, Inc. Walden University is certified to operate by SCHEV © 2024 Walden University LLC. All rights reserved.

12.1 Introducing Research and Research Evidence

Learning outcomes.

By the end of this section, you will be able to:

  • Articulate how research evidence and sources are key rhetorical concepts in presenting a position or an argument.
  • Locate and distinguish between primary and secondary research materials.
  • Implement methods and technologies commonly used for research and communication within various fields.

The writing tasks for this chapter and the next two chapters are based on argumentative research. However, not all researched evidence (data) is presented in the same genre. You may need to gather evidence for a poster, a performance, a story, an art exhibit, or even an architectural design. Although the genre may vary, you usually will be required to present a perspective , or viewpoint, about a debatable issue and persuade readers to support the “validity of your viewpoint,” as discussed in Position Argument: Practicing the Art of Rhetoric . Remember, too, that a debatable issue is one that has more than a single perspective and is subject to disagreement.

The Research Process

Although individual research processes are rhetorically situated, they share some common aspects:

  • Interest. The researcher has a genuine interest in the topic. It may be difficult to fake curiosity, but it is possible to develop it. Some academic assignments will allow you to pursue issues that are personally important to you; others will require you to dive into the research first and generate interest as you go.
  • Questions. The researcher asks questions. At first, these questions are general. However, as researchers gain more knowledge, the questions become more sharply focused. No matter what your research assignment is, begin by articulating questions, find out where the answers lead, and then ask still more questions.
  • Answers. The researcher seeks answers from people as well as from print and other media. Research projects profit when you ask knowledgeable people, such as librarians and other professionals, to help you answer questions or point you in directions to find answers. Information about research is covered more extensively in Research Process: Accessing and Recording Information and Annotated Bibliography: Gathering, Evaluating, and Documenting Sources .
  • Field research. The researcher conducts field research. Field research allows researchers not only to ask questions of experts but also to observe and experience directly. It allows researchers to generate original data. No matter how much other people tell you, your knowledge increases through personal observations. In some subject areas, field research is as important as library or database research. This information is covered more extensively in Research Process: Accessing and Recording Information .
  • Examination of texts. The researcher examines texts. Consulting a broad range of texts—such as magazines, brochures, newspapers, archives, blogs, videos, documentaries, or peer-reviewed journals—is crucial in academic research.
  • Evaluation of sources. The researcher evaluates sources. As your research progresses, you will double-check information to find out whether it is confirmed by more than one source. In informal research, researchers evaluate sources to ensure that the final decision is satisfactory. Similarly, in academic research, researchers evaluate sources to ensure that the final product is accurate and convincing. Previewed here, this information is covered more extensively in Research Process: Accessing and Recording Information .
  • Writing. The researcher writes. The writing during the research process can take a range of forms: from notes during library, database, or field work; to journal reflections on the research process; to drafts of the final product. In practical research, writing helps researchers find, remember, and explore information. In academic research, writing is even more important because the results must be reported accurately and thoroughly.
  • Testing and Experimentation. The researcher tests and experiments. Because opinions vary on debatable topics and because few research topics have correct or incorrect answers, it is important to test and conduct experiments on possible hypotheses or solutions.
  • Synthesis. The researcher synthesizes. By combining information from various sources, researchers support claims or arrive at new conclusions. When synthesizing, researchers connect evidence and ideas, both original and borrowed. Accumulating, sorting, and synthesizing information enables researchers to consider what evidence to use in support of a thesis and in what ways.
  • Presentation. The researcher presents findings in an interesting, focused, and well-documented product.

Types of Research Evidence

Research evidence usually consists of data, which comes from borrowed information that you use to develop your thesis and support your organizational structure and reasoning. This evidence can take a range of forms, depending on the type of research conducted, the audience, and the genre for reporting the research.

Primary Research Sources

Although precise definitions vary somewhat by discipline, primary data sources are generally defined as firsthand accounts, such as texts or other materials produced by someone drawing from direct experience or observation. Primary source documents include, but are not limited to, personal narratives and diaries; eyewitness accounts; interviews; original documents such as treaties, official certificates, and government documents detailing laws or acts; speeches; newspaper coverage of events at the time they occurred; observations; and experiments. Primary source data is, in other words, original and in some way conducted or collected primarily by the researcher. The Research Process: Where to Look for Existing Sources and Compiling Sources for an Annotated Bibliography contain more information on both primary and secondary sources.

Secondary Research Sources

Secondary sources , on the other hand, are considered at least one step removed from the experience. That is, they rely on sources other than direct observation or firsthand experience. Secondary sources include, but are not limited to, most books, articles online or in databases, and textbooks (which are sometimes classified as tertiary sources because, like encyclopedias and other reference works, their primary purpose might be to summarize or otherwise condense information). Secondary sources regularly cite and build upon primary sources to provide perspective and analysis. Effective use of researched evidence usually includes both primary and secondary sources. Works of history, for example, draw on a large range of primary and secondary sources, citing, analyzing, and synthesizing information to present as many perspectives of a past event in as rich and nuanced a way as possible.

It is important to note that the distinction between primary and secondary sources depends in part on their use: that is, the same document can be both a primary source and a secondary source. For example, if Scholar X wrote a biography about Artist Y, the biography would be a secondary source about the artist and, at the same time, a primary source about the scholar.

As an Amazon Associate we earn from qualifying purchases.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.

Access for free at
  • Authors: Michelle Bachelor Robinson, Maria Jerskey, featuring Toby Fulwiler
  • Publisher/website: OpenStax
  • Book title: Writing Guide with Handbook
  • Publication date: Dec 21, 2021
  • Location: Houston, Texas
  • Book URL:
  • Section URL:

© Dec 19, 2023 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

The Role of Evidence Evaluation in Critical Thinking: Fostering Epistemic Vigilance

  • First Online: 28 February 2022

Cite this chapter

research evidence is best evaluated using which type of process

  • Ravit Golan Duncan 20 ,
  • Veronica L. Cavera 20 &
  • Clark A. Chinn 20  

Part of the book series: Contributions from Biology Education Research ((CBER))

695 Accesses

1 Citations

Comprehending and evaluating scientific evidence is not trivial, especially in the current “post truth” era with its rampant misinformation and fake news. Moreover, evaluating scientific evidence (even simplified evidence presented in the media) entails some disciplinary knowledge of core concepts in the domain and an understanding of the epistemic criteria for what counts as good evidence in that domain. In our work we have examined how secondary students evaluate evidence related to biological phenomena. Students use the evidence to decide between two or more mechanistic explanations (models) of phenomena such as the underlying mechanism of genetic resistance to HIV. The evidence provided to them varies in quality in terms of its source (anecdotal versus generated by experts), the method (sample size, controlling for confounds, etc.), and how conclusive it is in supporting or refuting the competing models. In our chapter we apply the Grasp of Evidence framework to analyze students’ critical evaluation of evidence in the context of written arguments about the merits of competing models in explaining biological phenomena.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

research evidence is best evaluated using which type of process

Examining undergraduates’ text-based evidence identification, evaluation, and use

research evidence is best evaluated using which type of process

Concept of Evidence and the Quality of Evidence-Based Reasoning in Elementary Students

research evidence is best evaluated using which type of process

Evidenced-Based Thinking for Scientific Thinking

Bamberger, Y. M., & Davis, E. A. (2013). Middle-school science students’ scientific modelling performances across content areas and within a learning progression. International Journal of Science Education, 35 (2), 213–238.

Article   Google Scholar  

Barzilai, S., & Ka’adan, I. (2017). Learning to integrate divergent information sources: The interplay of epistemic cognition and epistemic metacognition. Metacognition and Learning, 12 (2), 193–232.

Berland, L. K., & McNeill, K. L. (2010). A learning progression for scientific argumentation: Understanding studentwork and designing supportive instructional contexts. Science Education, 94 , 765–793.

Berland, L. K., Schwarz, C. V., Krist, C., Kenyon, L., Lo, A. S., & Reiser, B. J. (2016). Epistemologies in practice: Making scientific practices meaningful for students. Journal of Research in Science Teaching, 53 , 1082–1112.

Britt, M. A., Rouet, J.-F., Blaum, D., & Millis, K. (2019). A reasoned approach to dealing with fake news. Policy Insights from the Behavioral and Brain Sciences, 6 (1), 94–101.

Bromme, R., & Goldman, S. R. (2014). The public’s bounded understanding of science. Educational Psychologist, 49 (2), 59–69.

Bullock, M., Sodian, B., & Koerber, S. (2009). Doing experiments and understanding science: Development of scientific reasoning from childhood to adulthood. In W. Schneider & M. Bullock (Eds.), Human development from early childhood to early adulthood: Findings from a 20 year longitudinal study (pp. 173–197). Psychology Press.

Google Scholar  

Castro-Faix, M., Duncan, R. G., & Choi, J. (2020). Data-driven refinements of a genetics learning progression. Journal of Research in Science Teaching . Early view.

Chinn, C. A., & Duncan, R. (2018). What is the value of general knowledge of scientific reasoning? In K. Engelmann, F. Fischer, J. Osborne, & C. A. Chinn (Eds.), Interplay of domain-specific and domain-general aspects of scientific reasoning and argumentation skills (pp. 77–101). Routledge.

Chinn, C. A., & Malhotra, B. A. (2002). Epistemologically authentic inquiry in schools: A theoretical framework for evaluating inquiry tasks. Science Education, 86 (2), 175–218.

Chinn, C. A., Rinehart, R. W., & Buckland, L. A. (2014). Epistemic cognition and evaluating information: Applying the AIR model of epistemic cognition. In D. Rapp & J. Braasch (Eds.), Processing inaccurate information (pp. 425–454). MIT Press.

Chinn, C. A., Barzilai, S., & Duncan, R. G. (2021). Education for a “Post-Truth” world: New directions for research and practice. Educational Researcher, 50 (1), 51–60.

Collins, H., & Pinch, T. (2012). Edible knowledge: The chemical transfer of memory. In The golem: What you should know about science (Canto classics) (pp. 5–26). Cambridge University Press.

Chapter   Google Scholar  

Driver, R., Newton, P., & Osborne, J. (2000). Establishing the norms of scientific argumentation in classrooms. Science Education, 84 (3), 287–312.

Duncan, R. G., Chinn, C. A., & Barzilai, S. (2018). Grasp of evidence: Problematizing and expanding the next generation science standards’ conceptualization of evidence. Journal of Research in Science Teaching, 55 (7), 907–937.

Duncan, R. G., Choi, J., Castro-Faix, M., & Cavera, V. L. (2017). A study of two instructional sequences informed by alternative learning progressions in genetics. Science & Education, 26 (10), 1115–1141.

Erduran, S., & Dagher, Z. R. (2014). Reconceptualizing nature of science for science education. In Reconceptualizing the nature of science for science education (pp. 1–18). Springer.

Feinstein, N. W., Allen, S., & Jenkins, E. (2013). Outside the pipeline: Reimagining science education for nonscientists. Science, 340 (6130), 314–317.

Ford, M. (2008). Grasp of practice’ as a reasoning resource for inquiry and nature of science understanding. Science & Education, 17 (2–3), 147–177.

Galison, P. (1997). Image & logic: A material culture of microphysics . University of Chicago Press.

Iordanou, K., & Constantinou, C. P. (2014). Developing pre-service teachers’ evidence-based argumentation skills on socio-scientific issues. Learning and Instruction, 34 , 42–57.

Jiménez-Aleixandre, M. P., Bugallo Rodríguez, A., & Duschl, R. A. (2000). “Doing the lesson” or “doing science”: Argument in high school genetics. Science Education, 84 (6), 757–792.

Kelly, G. J., & Takao, A. (2002). Epistemic levels in argument: An analysis of university oceanography students’ use of evidence in writing. Science Education, 86 (3), 314–342.

Keren, A. (2018). The public understanding of what? Laypersons’ epistemic needs, the division of cognitive labor, and the demarcation of science. Philosophy of Science, 85 (5), 781–792.

Kuhn, D. (2018). A role for reasoning in a dialogic approach to critical thinking. Topoi, 37 (1), 121–128.

Longino, H. E. (1990). Science as social knowledge: Values and objectivity in scientific inquiry . Princeton University Press.

Book   Google Scholar  

Longino, H. E. (2002). Science and the common good: Thoughts on Philip Kitcher’s science, truth, and democracy. Philosophy of Science, 69 (4), 560–568.

Mayo, D. G. (1996). Error and the growth of experimental knowledge . University of Chicago Press.

McNeill, K. L., & Berland, L. (2017). What is (or should be) scientific evidence use in k-12 classrooms? Journal of Research in Science Teaching, 54 (5), 672–689.

McNeill, K. L., & Krajcik, J. (2012). Supporting grade 5–8 students in constructing explanations in science: The claim, evidence and reasoning framework for talk and writing . Pearson Allyn & Bacon.

National Research Council. (2012). A framework for K-12 science education: Practices, crosscutting concepts, and core ideas . The National Academies Press.

Nisbett, R. E., & Ross, L. (1980). Human inference: Strategies and shortcomings of social judgment .

Passmore, C. M., & Svoboda, J. (2012). Exploring opportunities for argumentation in modelling classrooms. International Journal of Science Education, 34 (10), 1535–1554.

Pluta, W. J., Chinn, C. A., & Duncan, R. G. (2011). Learners’ epistemic criteria for good scientific models. Journal of Research in Science Teaching, 48 (5), 486–511.

Rinehart, R. W., Duncan, R. G., & Chinn, C. A. (2014). A scaffolding suite to support evidence-based modeling and argumentation. Science Scope, 38 , 70–77.

Rinehart, R. W., Duncan, R. G., Chinn, C. A., Atkins, T., & DiBenedetti, J. (2016). Critical design decisions for successful model-based inquiry in science classrooms. International Journal of Designs for Learning, 7 , 17–40.

Samarapungavan, A. (2018). Construing scientific evidence: The role of disciplinary knowledge in reasoning with and about evidence in scientific practice. In K. Engelmann, F. Fischer, J. Osborne, & C. A. Chinn (Eds.), Scientific reasoning and argumentation: The roles of domain-specific and domain-general knowledge (pp. 56–76). Routledge.

Sandoval, W. A., & Millwood, K. A. (2005). The quality of students’ use of evidence in written scientific explanations. Cognition and Instruction, 23 (1), 23–55.

Sperber, D., Clement, F., Heintz, C., Mascaro, O., Mercier, H., Origgi, G., & Wilson, D. (2010). Epistemic vigilance. Mind & Language, 25 (4), 359–393.

Staley, K. W. (2004). Robust evidence and secure evidence claims. Philosophy of Science, 71 (4), 467–488.

Thagard, P. (1992). Conceptual revolutions . University Press Princeton.

Thagard, P. (2012). The cognitive science of science: Explanation, discovery, and conceptual change . MIT Press.

Thomm, E., Barzilai, S., & Bromme, R. (2017). Why do experts disagree? The role of conflict topics and epistemic perspectives in conflict explanations. Learning and Instruction, 52 , 15–26.

WHO. (2020). Munich security conference (WHO director-general speech).

Download references


The research presented herein was supported by National Science Foundation Awards #1053953 and #100863. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. We also wish to acknowledge Aitan G. Duncan who kindly assisted with data analysis.

Author information

Authors and affiliations.

Rutgers University, New Brunswick, NJ, USA

Ravit Golan Duncan, Veronica L. Cavera & Clark A. Chinn

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Ravit Golan Duncan .

Editor information

Editors and affiliations.

Facultade Ciencias da Educación, Universidade de Santiago de Compostela, Santiago de Compostela, Spain

Blanca Puig

María Pilar Jiménez-Aleixandre

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Duncan, R.G., Cavera, V.L., Chinn, C.A. (2022). The Role of Evidence Evaluation in Critical Thinking: Fostering Epistemic Vigilance. In: Puig, B., Jiménez-Aleixandre, M.P. (eds) Critical Thinking in Biology and Environmental Education. Contributions from Biology Education Research. Springer, Cham.

Download citation


Published : 28 February 2022

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-92005-0

Online ISBN : 978-3-030-92006-7

eBook Packages : Education Education (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • University of Wisconsin–Madison
  • University of Wisconsin-Madison
  • Research Guides
  • Evidence Synthesis, Systematic Review Services
  • Literature Review Types, Taxonomies

Evidence Synthesis, Systematic Review Services : Literature Review Types, Taxonomies

  • Develop a Protocol
  • Develop Your Research Question
  • Select Databases
  • Select Gray Literature Sources
  • Write a Search Strategy
  • Manage Your Search Process
  • Register Your Protocol
  • Citation Management
  • Article Screening
  • Risk of Bias Assessment
  • Synthesize, Map, or Describe the Results
  • Find Guidance by Discipline
  • Manage Your Research Data
  • Browse Evidence Portals by Discipline
  • Automate the Process, Tools & Technologies
  • Additional Resources

Choosing a Literature Review Methodology

Growing interest in evidence-based practice has driven an increase in review methodologies. Your choice of review methodology (or literature review type) will be informed by the intent (purpose, function) of your research project and the time and resources of your team. 

  • Decision Tree (What Type of Review is Right for You?) Developed by Cornell University Library staff, this "decision-tree" guides the user to a handful of review guides given time and intent.

Types of Evidence Synthesis*

Critical Review - Aims to demonstrate writer has extensively researched literature and critically evaluated its quality. Goes beyond mere description to include degree of analysis and conceptual innovation. Typically results in hypothesis or model.

Mapping Review (Systematic Map) - Map out and categorize existing literature from which to commission further reviews and/or primary research by identifying gaps in research literature.

Meta-Analysis - Technique that statistically combines the results of quantitative studies to provide a more precise effect of the results.

Mixed Studies Review (Mixed Methods Review) - Refers to any combination of methods where one significant component is a literature review (usually systematic). Within a review context it refers to a combination of review approaches for example combining quantitative with qualitative research or outcome with process studies.

Narrative (Literature) Review - Generic term: published materials that provide examination of recent or current literature. Can cover wide range of subjects at various levels of completeness and comprehensiveness.

Overview - Generic term: summary of the [medical] literature that attempts to survey the literature and describe its characteristics.

Qualitative Systematic Review or Qualitative Evidence Synthesis - Method for integrating or comparing the findings from qualitative studies. It looks for ‘themes’ or ‘constructs’ that lie in or across individual qualitative studies.

Rapid Review - Assessment of what is already known about a policy or practice issue, by using systematic review methods to search and critically appraise existing research.

Scoping Review or Evidence Map - Preliminary assessment of potential size and scope of available research literature. Aims to identify nature and extent of research.

State-of-the-art Review - Tend to address more current matters in contrast to other combined retrospective and current approaches. May offer new perspectives on issue or point out area for further research.

Systematic Review - Seeks to systematically search for, appraise and synthesis research evidence, often adhering to guidelines on the conduct of a review. (An emerging subset includes Living Reviews or Living Systematic Reviews - A [review or] systematic review which is continually updated, incorporating relevant new evidence as it becomes available.)

Systematic Search and Review - Combines strengths of critical review with a comprehensive search process. Typically addresses broad questions to produce ‘best evidence synthesis.’

Umbrella Review - Specifically refers to review compiling evidence from multiple reviews into one accessible and usable document. Focuses on broad condition or problem for which there are competing interventions and highlights reviews that address these interventions and their results.

*These definitions are in Grant & Booth's "A Typology of Reviews: An Analysis of 14 Review Types and Associated Methodologies."

Literature Review Types/Typologies, Taxonomies

Grant, M. J., and A. Booth. "A Typology of Reviews: An Analysis of 14 Review Types and Associated Methodologies."  Health Information and Libraries Journal  26.2 (2009): 91-108.  DOI: 10.1111/j.1471-1842.2009.00848.x  Link

Munn, Zachary, et al. “Systematic Review or Scoping Review? Guidance for Authors When Choosing between a Systematic or Scoping Review Approach.” BMC Medical Research Methodology , vol. 18, no. 1, Nov. 2018, p. 143. DOI: 10.1186/s12874-018-0611-x. Link

Sutton, A., et al. "Meeting the Review Family: Exploring Review Types and Associated Information Retrieval Requirements."  Health Information and Libraries Journal  36.3 (2019): 202-22.  DOI: 10.1111/hir.12276  Link

  • << Previous: Home
  • Next: The Systematic Review Process >>
  • Last Updated: May 22, 2024 4:45 PM
  • URL:
  • Privacy Policy

Research Method

Home » Evaluating Research – Process, Examples and Methods

Evaluating Research – Process, Examples and Methods

Table of Contents

Evaluating Research

Evaluating Research


Evaluating Research refers to the process of assessing the quality, credibility, and relevance of a research study or project. This involves examining the methods, data, and results of the research in order to determine its validity, reliability, and usefulness. Evaluating research can be done by both experts and non-experts in the field, and involves critical thinking, analysis, and interpretation of the research findings.

Research Evaluating Process

The process of evaluating research typically involves the following steps:

Identify the Research Question

The first step in evaluating research is to identify the research question or problem that the study is addressing. This will help you to determine whether the study is relevant to your needs.

Assess the Study Design

The study design refers to the methodology used to conduct the research. You should assess whether the study design is appropriate for the research question and whether it is likely to produce reliable and valid results.

Evaluate the Sample

The sample refers to the group of participants or subjects who are included in the study. You should evaluate whether the sample size is adequate and whether the participants are representative of the population under study.

Review the Data Collection Methods

You should review the data collection methods used in the study to ensure that they are valid and reliable. This includes assessing the measures used to collect data and the procedures used to collect data.

Examine the Statistical Analysis

Statistical analysis refers to the methods used to analyze the data. You should examine whether the statistical analysis is appropriate for the research question and whether it is likely to produce valid and reliable results.

Assess the Conclusions

You should evaluate whether the data support the conclusions drawn from the study and whether they are relevant to the research question.

Consider the Limitations

Finally, you should consider the limitations of the study, including any potential biases or confounding factors that may have influenced the results.

Evaluating Research Methods

Evaluating Research Methods are as follows:

  • Peer review: Peer review is a process where experts in the field review a study before it is published. This helps ensure that the study is accurate, valid, and relevant to the field.
  • Critical appraisal : Critical appraisal involves systematically evaluating a study based on specific criteria. This helps assess the quality of the study and the reliability of the findings.
  • Replication : Replication involves repeating a study to test the validity and reliability of the findings. This can help identify any errors or biases in the original study.
  • Meta-analysis : Meta-analysis is a statistical method that combines the results of multiple studies to provide a more comprehensive understanding of a particular topic. This can help identify patterns or inconsistencies across studies.
  • Consultation with experts : Consulting with experts in the field can provide valuable insights into the quality and relevance of a study. Experts can also help identify potential limitations or biases in the study.
  • Review of funding sources: Examining the funding sources of a study can help identify any potential conflicts of interest or biases that may have influenced the study design or interpretation of results.

Example of Evaluating Research

Example of Evaluating Research sample for students:

Title of the Study: The Effects of Social Media Use on Mental Health among College Students

Sample Size: 500 college students

Sampling Technique : Convenience sampling

  • Sample Size: The sample size of 500 college students is a moderate sample size, which could be considered representative of the college student population. However, it would be more representative if the sample size was larger, or if a random sampling technique was used.
  • Sampling Technique : Convenience sampling is a non-probability sampling technique, which means that the sample may not be representative of the population. This technique may introduce bias into the study since the participants are self-selected and may not be representative of the entire college student population. Therefore, the results of this study may not be generalizable to other populations.
  • Participant Characteristics: The study does not provide any information about the demographic characteristics of the participants, such as age, gender, race, or socioeconomic status. This information is important because social media use and mental health may vary among different demographic groups.
  • Data Collection Method: The study used a self-administered survey to collect data. Self-administered surveys may be subject to response bias and may not accurately reflect participants’ actual behaviors and experiences.
  • Data Analysis: The study used descriptive statistics and regression analysis to analyze the data. Descriptive statistics provide a summary of the data, while regression analysis is used to examine the relationship between two or more variables. However, the study did not provide information about the statistical significance of the results or the effect sizes.

Overall, while the study provides some insights into the relationship between social media use and mental health among college students, the use of a convenience sampling technique and the lack of information about participant characteristics limit the generalizability of the findings. In addition, the use of self-administered surveys may introduce bias into the study, and the lack of information about the statistical significance of the results limits the interpretation of the findings.

Note*: Above mentioned example is just a sample for students. Do not copy and paste directly into your assignment. Kindly do your own research for academic purposes.

Applications of Evaluating Research

Here are some of the applications of evaluating research:

  • Identifying reliable sources : By evaluating research, researchers, students, and other professionals can identify the most reliable sources of information to use in their work. They can determine the quality of research studies, including the methodology, sample size, data analysis, and conclusions.
  • Validating findings: Evaluating research can help to validate findings from previous studies. By examining the methodology and results of a study, researchers can determine if the findings are reliable and if they can be used to inform future research.
  • Identifying knowledge gaps: Evaluating research can also help to identify gaps in current knowledge. By examining the existing literature on a topic, researchers can determine areas where more research is needed, and they can design studies to address these gaps.
  • Improving research quality : Evaluating research can help to improve the quality of future research. By examining the strengths and weaknesses of previous studies, researchers can design better studies and avoid common pitfalls.
  • Informing policy and decision-making : Evaluating research is crucial in informing policy and decision-making in many fields. By examining the evidence base for a particular issue, policymakers can make informed decisions that are supported by the best available evidence.
  • Enhancing education : Evaluating research is essential in enhancing education. Educators can use research findings to improve teaching methods, curriculum development, and student outcomes.

Purpose of Evaluating Research

Here are some of the key purposes of evaluating research:

  • Determine the reliability and validity of research findings : By evaluating research, researchers can determine the quality of the study design, data collection, and analysis. They can determine whether the findings are reliable, valid, and generalizable to other populations.
  • Identify the strengths and weaknesses of research studies: Evaluating research helps to identify the strengths and weaknesses of research studies, including potential biases, confounding factors, and limitations. This information can help researchers to design better studies in the future.
  • Inform evidence-based decision-making: Evaluating research is crucial in informing evidence-based decision-making in many fields, including healthcare, education, and public policy. Policymakers, educators, and clinicians rely on research evidence to make informed decisions.
  • Identify research gaps : By evaluating research, researchers can identify gaps in the existing literature and design studies to address these gaps. This process can help to advance knowledge and improve the quality of research in a particular field.
  • Ensure research ethics and integrity : Evaluating research helps to ensure that research studies are conducted ethically and with integrity. Researchers must adhere to ethical guidelines to protect the welfare and rights of study participants and to maintain the trust of the public.

Characteristics Evaluating Research

Characteristics Evaluating Research are as follows:

  • Research question/hypothesis: A good research question or hypothesis should be clear, concise, and well-defined. It should address a significant problem or issue in the field and be grounded in relevant theory or prior research.
  • Study design: The research design should be appropriate for answering the research question and be clearly described in the study. The study design should also minimize bias and confounding variables.
  • Sampling : The sample should be representative of the population of interest and the sampling method should be appropriate for the research question and study design.
  • Data collection : The data collection methods should be reliable and valid, and the data should be accurately recorded and analyzed.
  • Results : The results should be presented clearly and accurately, and the statistical analysis should be appropriate for the research question and study design.
  • Interpretation of results : The interpretation of the results should be based on the data and not influenced by personal biases or preconceptions.
  • Generalizability: The study findings should be generalizable to the population of interest and relevant to other settings or contexts.
  • Contribution to the field : The study should make a significant contribution to the field and advance our understanding of the research question or issue.

Advantages of Evaluating Research

Evaluating research has several advantages, including:

  • Ensuring accuracy and validity : By evaluating research, we can ensure that the research is accurate, valid, and reliable. This ensures that the findings are trustworthy and can be used to inform decision-making.
  • Identifying gaps in knowledge : Evaluating research can help identify gaps in knowledge and areas where further research is needed. This can guide future research and help build a stronger evidence base.
  • Promoting critical thinking: Evaluating research requires critical thinking skills, which can be applied in other areas of life. By evaluating research, individuals can develop their critical thinking skills and become more discerning consumers of information.
  • Improving the quality of research : Evaluating research can help improve the quality of research by identifying areas where improvements can be made. This can lead to more rigorous research methods and better-quality research.
  • Informing decision-making: By evaluating research, we can make informed decisions based on the evidence. This is particularly important in fields such as medicine and public health, where decisions can have significant consequences.
  • Advancing the field : Evaluating research can help advance the field by identifying new research questions and areas of inquiry. This can lead to the development of new theories and the refinement of existing ones.

Limitations of Evaluating Research

Limitations of Evaluating Research are as follows:

  • Time-consuming: Evaluating research can be time-consuming, particularly if the study is complex or requires specialized knowledge. This can be a barrier for individuals who are not experts in the field or who have limited time.
  • Subjectivity : Evaluating research can be subjective, as different individuals may have different interpretations of the same study. This can lead to inconsistencies in the evaluation process and make it difficult to compare studies.
  • Limited generalizability: The findings of a study may not be generalizable to other populations or contexts. This limits the usefulness of the study and may make it difficult to apply the findings to other settings.
  • Publication bias: Research that does not find significant results may be less likely to be published, which can create a bias in the published literature. This can limit the amount of information available for evaluation.
  • Lack of transparency: Some studies may not provide enough detail about their methods or results, making it difficult to evaluate their quality or validity.
  • Funding bias : Research funded by particular organizations or industries may be biased towards the interests of the funder. This can influence the study design, methods, and interpretation of results.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Data collection

Data Collection – Methods Types and Examples


Delimitations in Research – Types, Examples and...

Research Process

Research Process – Steps, Examples and Tips

Research Design

Research Design – Types, Methods and Examples

Institutional Review Board (IRB)

Institutional Review Board – Application Sample...

Research Questions

Research Questions – Types, Examples and Writing...

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of sysrev

Systematic review of the use of process evaluations in knowledge translation research

Shannon d. scott.

1 Faculty of Nursing, University of Alberta, Edmonton, Alberta Canada

Thomas Rotter

2 School of Nursing, Queen’s University, Kingston, Ontario Canada

Rachel Flynn

Hannah m. brooks, tabatha plesuk, katherine h. bannar-martin, thane chambers.

3 University of Alberta Libraries, Edmonton, Alberta Canada

Lisa Hartling

4 Department of Pediatrics, University of Alberta, Edmonton, Alberta Canada

Associated Data

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Experimental designs for evaluating knowledge translation (KT) interventions can provide strong estimates of effectiveness but offer limited insight into how the intervention worked. Consequently, process evaluations have been used to explore the causal mechanisms at work; however, there are limited standards to guide this work. This study synthesizes current evidence of KT process evaluations to provide future methodological recommendations.

Peer-reviewed search strategies were developed by a health research librarian. Studies had to be in English, published since 1996, and were not excluded based on design. Studies had to (1) be a process evaluation of a KT intervention study in primary health, (2) be a primary research study, and (3) include a licensed healthcare professional delivering or receiving the intervention. A two-step, two-person hybrid screening approach was used for study inclusion with inter-rater reliability ranging from 94 to 95%. Data on study design, data collection, theoretical influences, and approaches used to evaluate the KT intervention, analysis, and outcomes were extracted by two reviewers. Methodological quality was assessed with the Mixed Methods Appraisal Tool (MMAT).

Of the 20,968 articles screened, 226 studies fit our inclusion criteria. The majority of process evaluations used qualitative forms of data collection (43.4%) and individual interviews as the predominant data collection method. 72.1% of studies evaluated barriers and/or facilitators to implementation. 59.7% of process evaluations were stand-alone evaluations. The timing of data collection varied widely with post-intervention data collection being the most frequent (46.0%). Only 38.1% of the studies were informed by theory. Furthermore, 38.9% of studies had MMAT scores of 50 or less indicating poor methodological quality.


There is widespread acceptance that the generalizability of quantitative trials of KT interventions would be significantly enhanced through complementary process evaluations. However, this systematic review found that process evaluations are of mixed quality and lack theoretical guidance. Most process evaluation data collection occurred post-intervention undermining the ability to evaluate the process of implementation. Strong science and methodological guidance is needed to underpin and guide the design and execution of process evaluations in KT science.


This study is not registered with PROSPERO.

The implementation of research into healthcare practice is complex [ 1 ], with multiple levels to consider such as the patient, healthcare provider, multidisciplinary team, healthcare institution, and local and national healthcare systems. The implementation of evidence-based treatments to achieve healthcare system improvement that is robust, efficient, and sustainable is crucially important. However, it is well established that improving the availability of research is not enough for successful implementation [ 2 ]; rather, active knowledge translation (KT) interventions are essential to facilitate the implementation of research to practice. Determining the success of KT interventions and the implementation process itself relies on evaluation studies.

In the KT field, experimental designs such as randomized trials, cluster randomized trials, and stepped wedge designs are widely used for evaluating the effectiveness of KT interventions. Rigorous experimental designs can provide strong estimates of KT intervention effectiveness, but offer limited insight into how the intervention worked or not [ 1 ] as well as how KT interventions are mediated by different facilitators and barriers and how they lead to implementation or not [ 3 – 5 ]. KT interventions contain several interacting components, such as the degree of flexibility or tailoring of the intervention, the number of interacting components within the interventions, and the number and difficulty of behaviors required by those delivering or receiving the intervention [ 3 ]. This complexity makes it particularly challenging to evaluate KT intervention effectiveness [ 3 – 5 ]. The effectiveness of KT interventions is a result of the interactions between many factors such as context and mechanisms of change. A lack of intervention effect may be due to implementation failure rather than the ineffectiveness of the intervention itself. KT interventions pose methodological challenges and require augmentations to the standard experimental designs [ 6 ] to understand how they do or do not work.

As a result of these limitations, researchers have started to conduct process evaluations alongside experimental designs for evaluating KT interventions. The broad purpose of a process evaluation is to explore aspects of the implementation process [ 7 ]. Process evaluations can be used to assess the fidelity, dose, adaptation, reach, and quality of implementation [ 8 , 9 ] and to identify the causal mechanisms [ 10 , 11 ], mechanisms of impact [ 12 ], and contextual factors associated with variation in outcomes across sites [ 6 , 13 ]. Furthermore, process evaluations can assist in interpreting the outcome results [ 7 ], the barriers and facilitators to implementation [ 14 , 15 ] and sustainability [ 16 ], as well as examining the participants’ views [ 17 ] and understandings of components of the intervention [ 18 , 19 ]. Process evaluations are vital in identifying the success or failure of implementation, which is critical in understanding intervention effectiveness.

Notwithstanding the work of Moore and colleagues [ 12 ], there have been scant methodological recommendations to guide KT process evaluations. This deficit has made designing process evaluations in KT research challenging and has hindered the potential for meaningful comparisons across process evaluation studies. In 2000, the Medical Research Council released an evaluation framework for designing and evaluating complex interventions; this report was later revised in 2008 [ 4 , 20 ]. Of note, earlier guidance for evaluating complex interventions focused exclusively on randomized designs with no mention of process evaluations. The revisions mentioned process evaluations and the role that they can have with complex interventions, yet did not provide specific recommendations for evaluation designs, data collection types, time points, and standardized evaluation approaches for complex interventions. This level of specificity is imperative for research comparisons across KT intervention process evaluations and to understand how change is mediated by specific factors.

Recently, the Medical Research Council has commissioned an update of this guidance to be published in 2019 [ 21 , 22 ]. The update re-emphasizes some of the previous messages related to complex intervention development and evaluation; however, it provides a more flexible and less linear model of the process with added emphasis to development, implementation, and evaluation phases as well as providing a variety of successful case examples that employ a range of methods (from natural experiments to clinical trials). Early reports of the update to the MRC framework highlight the importance of process and economic evaluations as good investments and a move away from experimental methods as the only or best option for evaluation.

In 2013, a framework for process evaluations for cluster-randomized trials of complex interventions was proposed by Grant and colleagues [ 20 ]; however, these recommendations were not based upon a comprehensive, systematic review of all approaches used by others. One study found that only 30% of the randomized controlled trails had associated qualitative investigations [ 23 ]. Moreover, a large proportion of those qualitative evaluations were completed before the trial, with smaller numbers of qualitative evaluations completed during the trial or following it. Given the limitations of the process evaluation work to date, it is critical to systematically review all existing process evaluations of KT outcome assessment. Doing so will aid in the development of rigorous methodological guidance for process evaluation research of KT interventions moving forward.

The aim of our systematic review is to synthesize the existing evidence on process evaluation studies assessing KT interventions. The purpose of our review is to make explicit the current state of methodological guidance for process evaluation research with the aim of providing recommendations for multiple end-user groups. This knowledge is critically important for healthcare providers, health quality consultants, decision and policy makers, non-governmental organizations, governmental departments, and health services researchers to evaluate the effectiveness of their KT efforts in order to ensure scarce healthcare resources are effectively utilized and enhanced knowledge is properly generalized to benefit others.

Objectives and key questions

As per our study protocol [ 24 ] available openly via 10.1186/2046-4053-3-149, the objectives for this systematic review were to (1) systematically locate, assess, and report on published studies in healthcare that are a stand-alone process evaluation of a KT intervention or have a process evaluation component, and (2) offer guidance for researchers in terms of the development and design of process evaluations of KT interventions. The key research question guiding this systematic review was: what is the “state-of-the-science” of separate (stand-alone) or integrated process evaluations conducted alongside KT intervention studies?

Search strategy

This systematic review followed a comprehensive methodology using rigorous guidelines to synthesize diverse forms of research evidence [ 25 ], as outlined in our published protocol [ 24 ]. A peer-reviewed literature search was conducted by a health research librarian of English language articles published between 1996 and 2018 in six databases (Ovid MEDLINE/Ovid MEDLINE (R) In-Process & Other Non-Indexed Citations, Ovid EMBASE, Ovid PsycINFO, EBSCOhost CINAHL, ISI Web of Science, and ProQuest Dissertations and Theses). Full search details can be found in Additional file  1 . See Additional file  2 for the completed PRISMA checklist.

Inclusion/exclusion criteria

Studies were not excluded based upon research design and had to comply with three inclusion criteria (Table  1 ). A two-person hybrid approach was used for screening article titles and abstracts with inter-rater reliability ranging from 94 to 95%. Full-text articles were independently screened by two reviewers, and a two-person hybrid approach was used for data extraction.

Process evaluation systematic review inclusion criteria

1 Health is defined according to the WHO (1946) conceptualization of a state of complete physical and mental well-being and not merely the absence of disease or infirmity, including prevention components and mental health but not “social health”

Quality assessment

The methodological quality of all included studies was assessed using the Mixed Methods Appraisal Tool (MMAT) [ 26 , 27 ] for quantitative, qualitative, and mixed methods research designs. The tool results in a methodological rating of 0, 25, 50, 75, and 100 (with 100 being the highest quality) for each study based on the evaluation of study selection bias, study design, data collection methods, sample size, intervention integrity, and analysis. We adapted the MMAT for multi-method studies (studies where more than one research approach was utilized, but the data were not integrated) by assessing the methods in the study individually and then choosing the lowest quality rating assigned. For studies where the process evaluation was integrated into the study design, the quality of the entire study was assessed.

Data extraction, analysis, and synthesis

Study data were extracted using standardized Excel forms. Only data reported in included studies were extracted. Variables extracted included the following: (1) study design, (2) process evaluation type (integrated vs. separate), (3) process evaluation terms used, (4) timing of data collection (e.g., pre- and post-implementation of intervention), (5) KT intervention type, (6) KT intervention recipient, (7) target behavior, and (8) theory. Studies were grouped and synthesized according to each of the above variables. Evidence tables were created to summarize and describe the studies included in this review.

Theoretical guidance

We extracted and analyzed data on any theoretical guidance that was identified and discussed for the process evaluation stage of the included studies. For the purpose of our systematic review, included studies were stated to be theoretically informed if the process evaluation used theory to (a) assist in the identification of appropriate outcomes, measures, and variables; (b) guide the evaluation of the KT process; and (c) identify potential predictors or mediators, or (d) as a framework for data analysis.

Study design

Of the 20,968 articles screened, 226 full-text articles were included in our review (Fig.  1 ). See Additional file  3 for a full citation list of included studies.

An external file that holds a picture, illustration, etc.
Object name is 13643_2019_1161_Fig1_HTML.jpg

PRISMA flow diagram (Adapted from Moher et al. 2009)

Among these included articles, the following research designs were used: qualitative ( n  = 85, 37.6%), multi-methods ( n  = 55, 24.3%), quantitative descriptive ( n  = 44, 19.5%), mixed methods ( n  = 25, 11.1%), quantitative RCT ( n  = 14, 6.2%), and quantitative non-randomized ( n  = 3, 1.3%). See Table  2 .

Types of research design and associated quality of included studies ( n  = 226)

RCT randomized controlled trial

Process evaluation type and terms

A total of 136 (60.2%) of the included studies were separate (stand-alone) process evaluations, while the process evaluations of the remaining studies ( n  = 90, 39.8%) were integrated into the KT intervention evaluation. Process evaluation research designs included the following: qualitative ( n  = 98, 43.4%), multi-methods ( n  = 56, 24.8%), quantitative descriptive ( n  = 51, 22.6%), and mixed methods ( n  = 21, 9.3%). See Table  3 .

Process evaluation research design of included studies ( n  = 226)

The way in which each of the included studies described the purpose and focus of their process evaluation was synthesized and categorized thematically. Barriers and/or facilitators to implementation was the most widely reported term to describe the purpose and focus of the process evaluation (Table  4 ).

Thematic analysis of process evaluation terms used in included studies ( n  = 226)

*Some studies used multiple terms to describe the process evaluation and its focus

Methods and timing of data collection

Process evaluations had widespread variations in the methods of data collection, with individual interviews ( n  = 123) and surveys or questionnaires ( n  = 100) being the predominant methods (Table  5 ).

Methods of data collection of included studies ( n  = 226)

*Some studies had more than one method of data collection

The majority of process evaluations collected data post-intervention ( n  = 104, 46.0%). The remaining studies collected data pre- and post-intervention ( n  = 40, 17.7%); during and post-intervention ( n  = 29, 12.8%); during intervention ( n  = 25, 11.1%); pre-, during, and post-intervention ( n  = 18, 7.9%); pre- and during intervention ( n  = 5, 2.2%); or pre-intervention ( n  = 3, 1.3%). In 2 studies (0.9%), the timing of data collection was unclear. See Table  6 .

Timing of data collection of included studies ( n  = 226)

Intervention details (type, recipient, and target behavior)

Most of the studies ( n  = 154, 68.1%) identified healthcare professionals (HCPs) as the exclusive KT intervention recipient, while the remaining studies had combined intervention recipients including HCP and others ( n  = 59, 26.1%), and HCP and patients ( n  = 13, 5.8%). Utilizing the Cochrane Effective Practice and Organisation of Care (EPOC) intervention classification schema [ 28 ], 218 (96.5%) studies had professional type interventions, 5 (2.2%) studies had professional type and organizational type interventions, and 3 (1.3%) studies had professional type and financial type interventions. The most common KT intervention target behaviors were “General management of a problem” ( n  = 132), “Clinical prevention services” ( n  = 45), “Patient outcome” ( n  = 35), “Procedures” ( n  = 33), and “Patient education/advice” ( n  = 32). See Table  7 .

Intervention details of included studies ( n  = 226)

*Some studies had multiple targeted behaviors

Of the 226 studies, 38.1% ( n  = 86) were informed by theory (Table  8 ). The most frequently reported theories were as follows: (a) Roger’s Diffusion of Innovation Theory ( n  = 13), (b) Normalization Process Theory ( n  = 10), (c) Promoting Action on Research Implementation in Health Services Framework ( n  = 9), (d) Theory of Planned Behavior ( n  = 9), (e) Plan-Do-Study-Act Framework ( n  = 7), and (f) the Consolidated Framework for Implementation Research ( n  = 6).

Theories used by theory-guided studies ( n  = 86)

*Some studies had multiple theories guiding the process evaluation

The distribution of MMAT scores varied with study design (Table  2 ). The lowest scoring study design was multi-method, with 74.5% ( n  = 41) of multi-method studies scoring 50 or lower. Overall, many of the studies ( n  = 88, 38.9%) had an MMAT score of 50 or lower, with 29 (12.8%) studies scoring 25 and 7 (3.1%) studies scoring 0. Eighty-one studies (35.8%) scored 75, and 57 studies (25.2%) scored 100 (high quality). See Table  9 .

Distribution of MMAT scores (0 = lowest and 100 = highest score)

Our findings provided many insights into the current practices of KT researchers conducting integrated or separate process evaluations, the focus of these process evaluations, the data collection considerations, and the poor methodological quality and a lack of theoretical guidance informing these process evaluations.

The majority of included studies (60.2%) conducted a separate (stand-alone) rather than integrated process evaluation. As Moore and colleagues suggest, there are advantages and disadvantages of either (separated or integrated) approach [ 12 ]. Arguments for separate process evaluations focus on analyzing process data without knowledge of outcome analysis to prevent biasing interpretations of results. Arguments for integration include ensuring implementation data is integrated into outcome analysis and using the process evaluation to identify intermediate outcome data and causal processes while informing the integration of new measures into outcome data collection. Our findings highlight that there is no clear preference for separate or integrated process evaluations. The decision for separation or integration of the process evaluation should be carefully considered by study teams to ensure it is the best option for their study objectives.

Our findings draw attention to a wide variety of terms and foci used within process evaluations. We identified a lack of clear and consistent concepts for process evaluations and their multifaceted components, as well as an absence of standard recommendations on how process evaluations should be developed and conducted. This finding is supported by a literature overview on process evaluations in public health published by Linnan and Steckler in 2002 [ 29 ]. We would encourage researchers to employ terms that are utilized by other researchers to facilitate making meaningful comparisons across studies in the future and to be mindful of comprehensively including the key components of a process evaluation, context, implementation, and mechanisms of impact [ 12 ].

Our findings highlight two important aspects about process evaluation data collection in relation to timing and type of data collected. In terms of data collection timing, almost half of the investigators collected their process evaluation data post-intervention (46%) without any pre-intervention or during intervention data collection. Surprisingly, only 17.7% of the included studies collected data pre- and post-intervention, and only 18 studies collected data pre-, during, and post-intervention. Process evaluations can provide useful information about intervention delivery and if the interventions were delivered as planned (fidelity), the intervention dose, as well as useful information about intervention reach and how the context shaped the implementation process. Our findings suggest a current propensity to collect data after intervention delivery (as compared to before and/or during). It is unclear if our findings are the result of a lack of forethought to employ data collection pre- and during implementation, a lack of resources, or a reliance on data collection approaches post-intervention. This aside, based upon our findings, we recommend that KT researchers planning process evaluations consider data collection earlier in the implementation process to prevent challenges with retrospective data collection and to maximize the potential power of process evaluations. Consideration of key components of process evaluations (context, implementation, and mechanisms of impact) is critically important to prevent inference-observation confusion from an exclusive reliance on outcome evaluations [ 12 ]. An intervention can have positive outcomes even when an intervention was not delivered as intended, as other events or influences can be shaping a context [ 30 ]. Conversely, an intervention may have limited or no effects for a number of reasons that extend beyond the ineffectiveness of the intervention including a weak research design or improper implementation of the intervention [ 31 ]. Implicitly, the process evaluation framework by Moore and colleagues suggests that process evaluation data collection ideally needs to be collected before and throughout the implementation process in order to capture all aspects of implementation [ 12 ].

In terms of data collection type, just over half (54.4%) of the studies utilized qualitative interviews as one form of data collection. Reflecting on the key components of process evaluations (context, implementation, and mechanisms of impact), the frequency of qualitative data collection approaches is lower than anticipated. Qualitative approaches such as interviewing are ideal for uncovering rich and detailed aspects of the implementation context, nuanced participant perspectives on the implementation processes, and the potential mediators to implementation impact. When considering the key components of a process evaluation (context, implementation, and mechanisms of impact), by default, it is suggestive of multi-method work. Consequently, we urge researchers to consider integrating qualitative and quantitative data into their process evaluation study designs to richly capture various perspectives. In addition to individual interviews, surveys, participant observation, focus groups, and document analysis could be used.

A major finding from this systematic review is the lack of methodological rigor in many of the process evaluations. Almost 40% of the studies included in this review had a MMAT score of 50 or less, but the scores varied significantly in terms of study designs used by the investigators. Moreover, the frequency of low MMAT scores for multi-method and mixed method studies suggests a tendency for lower methodological quality which could point to the challenging nature of these research designs [ 32 ] or a lack of reporting guidelines.

Our findings identified a lack of theoretical guidance employed and reported in the included process evaluation studies. It is important to note the role of theory within evaluation is considered contentious by some [ 33 , 34 ], yet conversely, there are increasing calls for the use of theory in the literature. While there is this tension between using or not using theory in evaluations, there are many reported advantages to theory-driven evaluations [ 29 , 33 , 34 ], yet more than 60% of the included studies were not informed by theory. Current research evidence suggests that using theory can help to design studies that increase KT and enable better interpretation and replication of findings of implementation studies [ 35 ]. In alignment with Moore and colleagues, we encourage researchers to consider utilizing theory when designing process evaluations. There is no shortage of KT theories available. Recently, Strifler and colleagues identified 159 KT theories, models, and frameworks in the literature [ 36 ]. In the words of Moore and colleagues who were citing the revised MRC guidance (2008), “an understanding of the causal assumptions underpinning the intervention and use of evaluation to understand how interventions work in practice are vital in building an evidence base that informs policy and practice” [ 9 ].


As with all reviews, there is the possibility of incomplete retrieval of identified research; however, this review entailed a comprehensive search of published literature and rigorous review methods. Limitations include the eligibility restrictions (only published studies in the English language were included, for example), and data collection did not extend beyond data reported in included studies.

The current state of the quality of evidence base of process evaluations in KT is weak. Policy makers and funding organizations should call for theory-based multi or mixed method designs with a complimentary process evaluation component. Mixed method designs, with an integrated process evaluation component, would help to inform decision makers about effective process evaluation approaches, and research funding organizations could further promote theory-based designs to guide the development and conduct of implementation studies with a rigorous process evaluation component. Achieving this goal may require well-assembled implementation teams including clinical experts, as well as strong researchers with methodological expertise.

We recommend that future investigators employ rigorous theory-guided multi or mixed method approaches to evaluate the processes of implementation of KT interventions. Our findings highlighted that to date, qualitative study designs in the form of separate (stand-alone) process evaluations are the most frequently reported approaches. The predominant data collection method of using qualitative interviews helps to better understand process evaluations and to answer questions about why the implementation processes work or not, but does not provide an answer about the effectiveness of the implementation processes used. In light of the work of Moore and colleagues [ 12 ], we advocate that future process evaluation investigators should use both qualitative and quantitative methods (mixed methods) with an integrated process evaluation component to evaluate implementation processes in KT research.

We identified the timing of data collection as another methodological weakness in this systematic review. It remains unclear why almost half of the included process evaluation studies collected data only post-implementation. To provide high-certainty evidence for process evaluations, we advocate for the collection of pre-, during, and post-implementation measures and the use of statistical uncertainty measures (e.g., standard deviation, standard error, p values, and confidence intervals). This would allow a rigorous assessment of the implementation processes and sound recommendations supported by statistical measures. The timing of pre-evaluations also helps to address issues before implementation occurs. There is widespread acceptance that the generalizability of quantitative trials of KT interventions would be significantly enhanced through complementary process evaluations. Most data collection occurred post-intervention undermining the ability to evaluate the process of implementation.

Strong science and methodological guidance is needed to underpin and guide the design and execution of process evaluations in KT science. A theory-based approach to inform process evaluations of KT interventions would allow investigators to reach conclusions, not only about the processes by which interventions were implemented and the outcomes they have generated, but also about the reliability of the causal assumptions that link intervention processes and outcomes. Future research is needed that could provide state-of-the-art recommendations on how to design, conduct, and report rigorous process evaluations as part of a theory-based mixed methods evaluation of KT projects. Intervention theory should be used to inform the design of implementation studies to investigate the success or failure of the strategies used. This could lead to more generalizable findings to inform researchers and knowledge users about effective implementation strategies.

Supplementary information


We would like to thank CIHR for providing the funding for the systematic review. We would like to thank our Knowledge User Advisory Panel members for providing guidance and feedback, including Dr. Thomas Rotter, Brenda VandenBeld, Lisa Halma, Christine Jensen-Ross, Gayle Knapik, and Klaas VandenBeld. We would lastly like to acknowledge the contributions of Xuan Wu in data analysis.


Authors’ contributions.

SDS conceptualized and designed the study and secured the study funding from CIHR. She led all aspects of the study process. TC conducted the search. TR, KHBM, RF, TP, and HMB contributed to the data collection. KHBM, RF, TP, and HMB contributed to the data analysis. TR and LH contributed to the data interpretation. All authors contributed to the manuscript drafts and reviewed the final manuscript. All authors read and approved the final manuscript.

Authors’ information

SDS holds a Canada Research Chair for Knowledge Translation in Child Health. LH holds a Canada Research Chair in Knowledge Synthesis and Translation.

Canadian Institutes of Health Research (CIHR) Knowledge Synthesis Grant #305365.

Availability of data and materials

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Shannon D. Scott, Email: [email protected] .

Thomas Rotter, Email: [email protected] .

Rachel Flynn, Email: ac.atreblau@nnylfmr .

Hannah M. Brooks, Email: [email protected] .

Tabatha Plesuk, Email: ac.atreblau@kuselp .

Katherine H. Bannar-Martin, Email: moc.liamg@mrannabk .

Thane Chambers, Email: ac.atreblau@enaht .

Lisa Hartling, Email: [email protected] .

Supplementary information accompanies this paper at 10.1186/s13643-019-1161-y.


  1. Levels of Evidence in Research: Examples, Hierachies & Practice

    research evidence is best evaluated using which type of process

  2. Levels of Evidence

    research evidence is best evaluated using which type of process

  3. qualitative research hierarchy of evidence

    research evidence is best evaluated using which type of process

  4. Levels Of Evidence

    research evidence is best evaluated using which type of process

  5. Evidence-based practice

    research evidence is best evaluated using which type of process

  6. Levels of Evidence

    research evidence is best evaluated using which type of process


  1. Metho 4: Good Research Qualities / Research Process / Research Methods Vs Research Methodology

  2. What Have We Learned? Improving Development Policy Through Impact Evaluation

  3. NIH Peer Review Process

  4. Systematic Reviews In Research Universe

  5. Understanding Research Processes and Practices



  1. Evidence-Based Practice: Research Guide

    5 Steps of EBP. Ask: Convert the need for information into an answerable question. Find: Track down the best evidence with which to answer that question. Appraise: Critically appraise that evidence for its validity and applicability. Apply: Integrate the critical appraisal with clinical expertise and with the patient's unique biology, values ...

  2. Levels of Evidence, Quality Assessment, and Risk of Bias: Evaluating

    Systematic review, meta-analysis, and network meta-analysis: Systematic review is a structured methodology for identifying, selecting and evaluating all relevant research to address a structured question, which may relate to descriptive characteristics such as prevalence, etiology, efficacy of interventions, or diagnostic test accuracy ().Meta-analysis is the statistical combination of results ...

  3. Step 3 of EBP: Part 1—Evaluating Research Designs

    Step 3 of the EBP process involves evaluating the quality and client relevance of research results you have located to inform treatment planning. While some useful clinical resources include careful appraisals of research quality, clinicians must critically evaluate the content both included in these summaries and what is excluded or omitted ...

  4. Methodology I: The Best Available Evidence

    This chapter establishes some of the foundational concepts principal of the field, namely, the pursuit of the best available evidence for translational effectiveness as a science that follows the scientific strategy. The process commences and is driven by a research question ( i.e ., PICO [TS]) that emerges from the patient-clinician encounter.

  5. The Evidence for Evidence-Based Practice Implementation

    Models of Evidence-Based Practice. Multiple models of EBP are available and have been used in a variety of clinical settings. 16-36 Although review of these models is beyond the scope of this chapter, common elements of these models are selecting a practice topic (e.g., discharge instructions for individuals with heart failure), critique and syntheses of evidence, implementation, evaluation ...

  6. PDF 1. Critical appraisal: how to examine and evaluate the research evidence

    ask yourself to ensure these are included. (Note: publication of research does not necessarily mean that a research study has all of these attributes.) Imagine you are conducting a review that aims to make an evidence-based argument about the hypothesis that rumination (that is, repetitive negative thinking) is positively associated with depres ...

  7. Home Page

    This research guide provides an overview of the evidence synthesis process, guidance documents for conducting evidence synthesis projects, and links to resources to help you conduct a comprehensive and systematic search of the scholarly literature. Navigate the guide using the tabs on the left.

  8. Process evaluation: evidence-based quality indicators in clinical ...

    Evaluation is fundamental to evidence implementation; the Joanna Briggs Institute (JBI) model indicates that for evidence-based healthcare (EBHC) to be implemented, evaluation is a critical component to demonstrating measures of impact or sustainability of a change process. 1 The lack of robust evaluation studies has, historically, been one of the perennial criticisms of the EBHC field.

  9. Evidence-Based Research Series-Paper 1: What Evidence-Based Research is

    Evidence-based research is the use of prior research in a systematic and transparent way to inform a new study so that it is answering questions that matter in a valid, efficient, and accessible manner. Results: We describe evidence-based research and provide an overview of the approach of systematically and transparently using previous ...


    8 USING RESEARCH EVIDENCE A Practice Guide The reason for this focus is nicely captured by an earlier guide on evidence use produced during the creation of the ESRC UK Centre for Evidence Based Policy in the early 2000s: When we refer to 'research evidence', this includes evidence from published research articles and papers, or unpublished

  11. Achieving Better Educational Practices Through Research Evidence: A

    The logic model shows flow of information both rightward—from process to outcome—and leftward—from formative evaluation data to input and implementation processes. ... Attention by policy makers and practitioners to using research evidence to select educational programs has increased substantially during the past 5 years, spurred by ESSA ...

  12. Evidence-Based Practice, Step by Step: Evaluating and ...

    m studies and patient care data with clinician expertise and patient preferences and values. When delivered in a context of caring and in a supportive organizational culture, the highest quality of care and best patient outcomes can be achieved. The purpose of this series has been to give nurses the knowledge and skills they need to implement EBP consistently, one step at a time. The final ...

  13. Research Guides: Systematic Review and Evidence Synthesis: Types of

    The most commonly referred-to type of evidence synthesis. Time-intensive; can take months or longer than a year to complete. At the top of the evidence pyramid . Meta-Analysis. Statistical technique for combining the findings from multiple quantitative studies. Uses statistical methods to objectively evaluate, synthesize, and summarize results.

  14. EBP Process

    Step 1: Frame Your Clinical Question. Step 2: Gather Evidence. Step 3: Assess the Evidence. Step 4: Make Your Clinical Decision. Now that you've identified evidence to address your client's problem or situation, the next step in the EBP process is to assess the internal and external evidence. When assessing the evidence, keep in mind that ...

  15. Evidence-Based Research: Evidence Types

    Not all evidence is the same, and appraising the quality of the evidence is part of evidence-based practice research.The hierarchy of evidence is typically represented as a pyramid shape, with the smaller, weaker and more abundant research studies near the base of the pyramid, and systematic reviews and meta-analyses at the top with higher validity but a more limited range of topics.

  16. Generating and Evaluating Scientific Evidence and Explanations

    The evidence-gathering phase of inquiry includes designing the investigation as well as carrying out the steps required to collect the data. Generating evidence entails asking questions, deciding what to measure, developing measures, collecting data from the measures, structuring the data, systematically documenting outcomes of the investigations, interpreting and evaluating the data, and ...

  17. Evidence-Based Research: Levels of Evidence Pyramid

    One way to organize the different types of evidence involved in evidence-based practice research is the levels of evidence pyramid. The pyramid includes a variety of evidence types and levels. Filtered resources: pre-evaluated in some way. systematic reviews. critically-appraised topics. critically-appraised individual articles.

  18. 12.1 Introducing Research and Research Evidence

    Types of Research Evidence. Research evidence usually consists of data, which comes from borrowed information that you use to develop your thesis and support your organizational structure and reasoning. This evidence can take a range of forms, depending on the type of research conducted, the audience, and the genre for reporting the research.

  19. What Evidence-Based Research is and why is it important?

    Evidence-based research uses 'prior research in a systematic and transparent way to inform a new study so that it is answering questions that matter in a valid, efficient and accessible manner ...

  20. The Role of Evidence Evaluation in Critical Thinking: Fostering

    These dimensions are complementary, because having a mid-level understanding of how scientists use evidence can help laypeople engage with scientific evidence in productive ways despite limited disciplinary knowledge. For this study we focused on four of the five dimensions, described below: 1. Evidence evaluation.

  21. Research Guides: Evidence Synthesis, Systematic Review Services

    Systematic Search and Review - Combines strengths of critical review with a comprehensive search process. Typically addresses broad questions to produce 'best evidence synthesis.' Umbrella Review - Specifically refers to review compiling evidence from multiple reviews into one accessible and usable document. Focuses on broad condition or ...

  22. Evaluating Research

    By examining the evidence base for a particular issue, policymakers can make informed decisions that are supported by the best available evidence. Enhancing education: Evaluating research is essential in enhancing education. Educators can use research findings to improve teaching methods, curriculum development, and student outcomes.

  23. EBP 5.01, 5.06, 5.08, 5.10, 6.03 Pre and Post Test Practice

    Evidence-based practice models have been developed to help nurses move evidence into practice. Rank order the list of steps in the EBP process from beginning (top) to end (bottom): Framing an answerable clinical question. Appraising and synthesizing the evidence. Integrating evidence with other factors.

  24. Systematic review of the use of process evaluations in knowledge

    Results. Of the 20,968 articles screened, 226 studies fit our inclusion criteria. The majority of process evaluations used qualitative forms of data collection (43.4%) and individual interviews as the predominant data collection method. 72.1% of studies evaluated barriers and/or facilitators to implementation. 59.7% of process evaluations were stand-alone evaluations.