What is visual representation?

In the vast landscape of communication, where words alone may fall short, visual representation emerges as a powerful ally. In a world inundated with information, the ability to convey complex ideas, emotions, and data through visual means is becoming increasingly crucial. But what exactly is visual representation, and why does it hold such sway in our understanding?

Defining Visual Representation:

Visual representation is the act of conveying information, ideas, or concepts through visual elements such as images, charts, graphs, maps, and other graphical forms. It’s a means of translating the abstract into the tangible, providing a visual language that transcends the limitations of words alone.

The Power of Images:

The adage “a picture is worth a thousand words” encapsulates the essence of visual representation. Images have an unparalleled ability to evoke emotions, tell stories, and communicate complex ideas in an instant. Whether it’s a photograph capturing a poignant moment or an infographic distilling intricate data, images possess a unique capacity to resonate with and engage the viewer on a visceral level.

What is visual representation

Facilitating Understanding:

One of the primary functions of visual representation is to enhance understanding. Humans are inherently visual creatures, and we often process and retain visual information more effectively than text. Complex concepts that might be challenging to grasp through written explanations can be simplified and clarified through visual aids. This is particularly valuable in fields such as science, where intricate processes and structures can be elucidated through diagrams and illustrations.

Visual representation also plays a crucial role in education. In classrooms around the world, teachers leverage visual aids to facilitate learning, making lessons more engaging and accessible. From simple charts that break down historical timelines to interactive simulations that bring scientific principles to life, visual representation is a cornerstone of effective pedagogy.

Data Visualization:

In an era dominated by big data, the importance of data visualization cannot be overstated. Raw numbers and statistics can be overwhelming and abstract, but when presented visually, they transform into meaningful insights. Graphs, charts, and maps are powerful tools for conveying trends, patterns, and correlations, enabling decision-makers to glean actionable intelligence from vast datasets.

Consider the impact of a well-crafted infographic that distills complex research findings into a visually digestible format. Data visualization not only simplifies information but also allows for more informed decision-making in fields ranging from business and healthcare to social sciences and environmental studies.

Cultural and Artistic Expression:

Visual representation extends beyond the realm of information and education; it is also a potent form of cultural and artistic expression. Paintings, sculptures, photographs, and other visual arts serve as mediums through which individuals can convey their emotions, perspectives, and cultural narratives. Artistic visual representation has the power to transcend language barriers, fostering a shared human experience that resonates universally.

Conclusion:

In a world inundated with information, visual representation stands as a beacon of clarity and understanding. Whether it’s simplifying complex concepts, conveying data-driven insights, or expressing the depth of human emotion, visual elements enrich our communication in ways that words alone cannot. As we navigate an increasingly visual society, recognizing and harnessing the power of visual representation is not just a skill but a necessity for effective communication and comprehension. So, let us embrace the visual language that surrounds us, unlocking a deeper, more nuanced understanding of the world.

Initial Thoughts

Perspectives & resources, what is high-quality mathematics instruction and why is it important.

  • Page 1: The Importance of High-Quality Mathematics Instruction
  • Page 2: A Standards-Based Mathematics Curriculum
  • Page 3: Evidence-Based Mathematics Practices

What evidence-based mathematics practices can teachers employ?

  • Page 4: Explicit, Systematic Instruction

Page 5: Visual Representations

  • Page 6: Schema Instruction
  • Page 7: Metacognitive Strategies
  • Page 8: Effective Classroom Practices
  • Page 9: References & Additional Resources
  • Page 10: Credits

Teacher at board with student

Research Shows

  • Students who use accurate visual representations are six times more likely to correctly solve mathematics problems than are students who do not use them. However, students who use inaccurate visual representations are less likely to correctly solve mathematics problems than those who do not use visual representations at all. (Boonen, van Wesel, Jolles, & van der Schoot, 2014)
  • Students with a learning disability (LD) often do not create accurate visual representations or use them strategically to solve problems. Teaching students to systematically use a visual representation to solve word problems has led to substantial improvements in math achievement for students with learning disabilities. (van Garderen, Scheuermann, & Jackson, 2012; van Garderen, Scheuermann, & Poch, 2014)
  • Students who use visual representations to solve word problems are more likely to solve the problems accurately. This was equally true for students who had LD, were low-achieving, or were average-achieving. (Krawec, 2014)

Visual representations are flexible; they can be used across grade levels and types of math problems. They can be used by teachers to teach mathematics facts and by students to learn mathematics content. Visual representations can take a number of forms. Click on the links below to view some of the visual representations most commonly used by teachers and students.

How does this practice align?

High-leverage practice (hlp).

  • HLP15 : Provide scaffolded supports

CCSSM: Standards for Mathematical Practice

  • MP1 : Make sense of problems and persevere in solving them.

Number Lines

Definition : A straight line that shows the order of and the relation between numbers.

Common Uses : addition, subtraction, counting

Number line from negative 5 to 5.

Strip Diagrams

Definition : A bar divided into rectangles that accurately represent quantities noted in the problem.

Common Uses : addition, fractions, proportions, ratios

Strip diagram divided into thirds, with two-thirds filled in.

Definition : Simple drawings of concrete or real items (e.g., marbles, trucks).

Common Uses : counting, addition, subtraction, multiplication, division

Picture showing 2 basketballs plus 3 basketballs.

Graphs/Charts

Definition : Drawings that depict information using lines, shapes, and colors.

Common Uses : comparing numbers, statistics, ratios, algebra

Example bar graph, line graph, and pie chart.

Graphic Organizers

Definition : Visual that assists students in remembering and organizing information, as well as depicting the relationships between ideas (e.g., word webs, tables, Venn diagrams).

Common Uses : algebra, geometry

Triangles
equilateral – all sides are same length
– all angles 60°
isosceles – two sides are same length
– two angles are the same
scalene – no sides are the same length
– no angles are the same
right – one angle is 90°(right angle)
– opposite side of right angle is longest side (hypotenuse)
obtuse – one angle is greater than 90°
acute – all angles are less than 90°

Before they can solve problems, however, students must first know what type of visual representation to create and use for a given mathematics problem. Some students—specifically, high-achieving students, gifted students—do this automatically, whereas others need to be explicitly taught how. This is especially the case for students who struggle with mathematics and those with mathematics learning disabilities. Without explicit, systematic instruction on how to create and use visual representations, these students often create visual representations that are disorganized or contain incorrect or partial information. Consider the examples below.

Elementary Example

Mrs. Aldridge ask her first-grade students to add 2 + 4 by drawing dots.

Talia's drawing of 2 plus 4 equals 6.

Notice that Talia gets the correct answer. However, because Colby draws his dots in haphazard fashion, he fails to count all of them and consequently arrives at the wrong solution.

High School Example

Mr. Huang asks his students to solve the following word problem:

The flagpole needs to be replaced. The school would like to replace it with the same size pole. When Juan stands 11 feet from the base of the pole, the angle of elevation from Juan’s feet to the top of the pole is 70 degrees. How tall is the pole?

Compare the drawings below created by Brody and Zoe to represent this problem. Notice that Brody drew an accurate representation and applied the correct strategy. In contrast, Zoe drew a picture with partially correct information. The 11 is in the correct place, but the 70° is not. As a result of her inaccurate representation, Zoe is unable to move forward and solve the problem. However, given an accurate representation developed by someone else, Zoe is more likely to solve the problem correctly.

brodys drawing

Manipulatives

Some students will not be able to grasp mathematics skills and concepts using only the types of visual representations noted in the table above. Very young children and students who struggle with mathematics often require different types of visual representations known as manipulatives. These concrete, hands-on materials and objects—for example, an abacus or coins—help students to represent the mathematical idea they are trying to learn or the problem they are attempting to solve. Manipulatives can help students develop a conceptual understanding of mathematical topics. (For the purpose of this module, the term concrete objects refers to manipulatives and the term visual representations refers to schematic diagrams.)

It is important that the teacher make explicit the connection between the concrete object and the abstract concept being taught. The goal is for the student to eventually understand the concepts and procedures without the use of manipulatives. For secondary students who struggle with mathematics, teachers should show the abstract along with the concrete or visual representation and explicitly make the connection between them.

A move from concrete objects or visual representations to using abstract equations can be difficult for some students. One strategy teachers can use to help students systematically transition among concrete objects, visual representations, and abstract equations is the Concrete-Representational-Abstract (CRA) framework.

If you would like to learn more about this framework, click here.

Concrete-Representational-Abstract Framework

boy with manipulative number board

  • Concrete —Students interact and manipulate three-dimensional objects, for example algebra tiles or other algebra manipulatives with representations of variables and units.
  • Representational — Students use two-dimensional drawings to represent problems. These pictures may be presented to them by the teacher, or through the curriculum used in the class, or students may draw their own representation of the problem.
  • Abstract — Students solve problems with numbers, symbols, and words without any concrete or representational assistance.

CRA is effective across all age levels and can assist students in learning concepts, procedures, and applications. When implementing each component, teachers should use explicit, systematic instruction and continually monitor student work to assess their understanding, asking them questions about their thinking and providing clarification as needed. Concrete and representational activities must reflect the actual process of solving the problem so that students are able to generalize the process to solve an abstract equation. The illustration below highlights each of these components.

CRA framework showing a group of 4 and 6 pencils with matching tallies underneath accompanied by  4 + 6 = 10.

For Your Information

One promising practice for moving secondary students with mathematics difficulties or disabilities from the use of manipulatives and visual representations to the abstract equation quickly is the CRA-I strategy . In this modified version of CRA, the teacher simultaneously presents the content using concrete objects, visual representations of the concrete objects, and the abstract equation. Studies have shown that this framework is effective for teaching algebra to this population of students (Strickland & Maccini, 2012; Strickland & Maccini, 2013; Strickland, 2017).

Kim Paulsen discusses the benefits of manipulatives and a number of things to keep in mind when using them (time: 2:35).

Kim Paulsen, EdD Associate Professor, Special Education Vanderbilt University

View Transcript

kim paulsen

Transcript: Kim Paulsen, EdD

Manipulatives are a great way of helping kids understand conceptually. The use of manipulatives really helps students see that conceptually, and it clicks a little more with them. Some of the things, though, that we need to remember when we’re using manipulatives is that it is important to give students a little bit of free time when you’re using a new manipulative so that they can just explore with them. We need to have specific rules for how to use manipulatives, that they aren’t toys, that they really are learning materials, and how students pick them up, how they put them away, the right time to use them, and making sure that they’re not distracters while we’re actually doing the presentation part of the lesson. One of the important things is that we don’t want students to memorize the algorithm or the procedures while they’re using the manipulatives. It really is just to help them understand conceptually. That doesn’t mean that kids are automatically going to understand conceptually or be able to make that bridge between using the concrete manipulatives into them being able to solve the problems. For some kids, it is difficult to use the manipulatives. That’s not how they learn, and so we don’t want to force kids to have to use manipulatives if it’s not something that is helpful for them. So we have to remember that manipulatives are one way to think about teaching math.

I think part of the reason that some teachers don’t use them is because it takes a lot of time, it takes a lot of organization, and they also feel that students get too reliant on using manipulatives. One way to think about using manipulatives is that you do it a couple of lessons when you’re teaching a new concept, and then take those away so that students are able to do just the computation part of it. It is true we can’t walk around life with manipulatives in our hands. And I think one of the other reasons that a lot of schools or teachers don’t use manipulatives is because they’re very expensive. And so it’s very helpful if all of the teachers in the school can pool resources and have a manipulative room where teachers can go check out manipulatives so that it’s not so expensive. Teachers have to know how to use them, and that takes a lot of practice.

  • Open access
  • Published: 19 July 2015

The role of visual representations in scientific practices: from conceptual understanding and knowledge generation to ‘seeing’ how science works

  • Maria Evagorou 1 ,
  • Sibel Erduran 2 &
  • Terhi Mäntylä 3  

International Journal of STEM Education volume  2 , Article number:  11 ( 2015 ) Cite this article

76k Accesses

78 Citations

13 Altmetric

Metrics details

The use of visual representations (i.e., photographs, diagrams, models) has been part of science, and their use makes it possible for scientists to interact with and represent complex phenomena, not observable in other ways. Despite a wealth of research in science education on visual representations, the emphasis of such research has mainly been on the conceptual understanding when using visual representations and less on visual representations as epistemic objects. In this paper, we argue that by positioning visual representations as epistemic objects of scientific practices, science education can bring a renewed focus on how visualization contributes to knowledge formation in science from the learners’ perspective.

This is a theoretical paper, and in order to argue about the role of visualization, we first present a case study, that of the discovery of the structure of DNA that highlights the epistemic components of visual information in science. The second case study focuses on Faraday’s use of the lines of magnetic force. Faraday is known of his exploratory, creative, and yet systemic way of experimenting, and the visual reasoning leading to theoretical development was an inherent part of the experimentation. Third, we trace a contemporary account from science focusing on the experimental practices and how reproducibility of experimental procedures can be reinforced through video data.

Conclusions

Our conclusions suggest that in teaching science, the emphasis in visualization should shift from cognitive understanding—using the products of science to understand the content—to engaging in the processes of visualization. Furthermore, we suggest that is it essential to design curriculum materials and learning environments that create a social and epistemic context and invite students to engage in the practice of visualization as evidence, reasoning, experimental procedure, or a means of communication and reflect on these practices. Implications for teacher education include the need for teacher professional development programs to problematize the use of visual representations as epistemic objects that are part of scientific practices.

During the last decades, research and reform documents in science education across the world have been calling for an emphasis not only on the content but also on the processes of science (Bybee 2014 ; Eurydice 2012 ; Duschl and Bybee 2014 ; Osborne 2014 ; Schwartz et al. 2012 ), in order to make science accessible to the students and enable them to understand the epistemic foundation of science. Scientific practices, part of the process of science, are the cognitive and discursive activities that are targeted in science education to develop epistemic understanding and appreciation of the nature of science (Duschl et al. 2008 ) and have been the emphasis of recent reform documents in science education across the world (Achieve 2013 ; Eurydice 2012 ). With the term scientific practices, we refer to the processes that take place during scientific discoveries and include among others: asking questions, developing and using models, engaging in arguments, and constructing and communicating explanations (National Research Council 2012 ). The emphasis on scientific practices aims to move the teaching of science from knowledge to the understanding of the processes and the epistemic aspects of science. Additionally, by placing an emphasis on engaging students in scientific practices, we aim to help students acquire scientific knowledge in meaningful contexts that resemble the reality of scientific discoveries.

Despite a wealth of research in science education on visual representations, the emphasis of such research has mainly been on the conceptual understanding when using visual representations and less on visual representations as epistemic objects. In this paper, we argue that by positioning visual representations as epistemic objects, science education can bring a renewed focus on how visualization contributes to knowledge formation in science from the learners’ perspective. Specifically, the use of visual representations (i.e., photographs, diagrams, tables, charts) has been part of science and over the years has evolved with the new technologies (i.e., from drawings to advanced digital images and three dimensional models). Visualization makes it possible for scientists to interact with complex phenomena (Richards 2003 ), and they might convey important evidence not observable in other ways. Visual representations as a tool to support cognitive understanding in science have been studied extensively (i.e., Gilbert 2010 ; Wu and Shah 2004 ). Studies in science education have explored the use of images in science textbooks (i.e., Dimopoulos et al. 2003 ; Bungum 2008 ), students’ representations or models when doing science (i.e., Gilbert et al. 2008 ; Dori et al. 2003 ; Lehrer and Schauble 2012 ; Schwarz et al. 2009 ), and students’ images of science and scientists (i.e., Chambers 1983 ). Therefore, studies in the field of science education have been using the term visualization as “the formation of an internal representation from an external representation” (Gilbert et al. 2008 , p. 4) or as a tool for conceptual understanding for students.

In this paper, we do not refer to visualization as mental image, model, or presentation only (Gilbert et al. 2008 ; Philips et al. 2010 ) but instead focus on visual representations or visualization as epistemic objects. Specifically, we refer to visualization as a process for knowledge production and growth in science. In this respect, modeling is an aspect of visualization, but what we are focusing on with visualization is not on the use of model as a tool for cognitive understanding (Gilbert 2010 ; Wu and Shah 2004 ) but the on the process of modeling as a scientific practice which includes the construction and use of models, the use of other representations, the communication in the groups with the use of the visual representation, and the appreciation of the difficulties that the science phase in this process. Therefore, the purpose of this paper is to present through the history of science how visualization can be considered not only as a cognitive tool in science education but also as an epistemic object that can potentially support students to understand aspects of the nature of science.

Scientific practices and science education

According to the New Generation Science Standards (Achieve 2013 ), scientific practices refer to: asking questions and defining problems; developing and using models; planning and carrying out investigations; analyzing and interpreting data; using mathematical and computational thinking; constructing explanations and designing solutions; engaging in argument from evidence; and obtaining, evaluating, and communicating information. A significant aspect of scientific practices is that science learning is more than just about learning facts, concepts, theories, and laws. A fuller appreciation of science necessitates the understanding of the science relative to its epistemological grounding and the process that are involved in the production of knowledge (Hogan and Maglienti 2001 ; Wickman 2004 ).

The New Generation Science Standards is, among other changes, shifting away from science inquiry and towards the inclusion of scientific practices (Duschl and Bybee 2014 ; Osborne 2014 ). By comparing the abilities to do scientific inquiry (National Research Council 2000 ) with the set of scientific practices, it is evident that the latter is about engaging in the processes of doing science and experiencing in that way science in a more authentic way. Engaging in scientific practices according to Osborne ( 2014 ) “presents a more authentic picture of the endeavor that is science” (p.183) and also helps the students to develop a deeper understanding of the epistemic aspects of science. Furthermore, as Bybee ( 2014 ) argues, by engaging students in scientific practices, we involve them in an understanding of the nature of science and an understanding on the nature of scientific knowledge.

Science as a practice and scientific practices as a term emerged by the philosopher of science, Kuhn (Osborne 2014 ), refers to the processes in which the scientists engage during knowledge production and communication. The work that is followed by historians, philosophers, and sociologists of science (Latour 2011 ; Longino 2002 ; Nersessian 2008 ) revealed the scientific practices in which the scientists engage in and include among others theory development and specific ways of talking, modeling, and communicating the outcomes of science.

Visualization as an epistemic object

Schematic, pictorial symbols in the design of scientific instruments and analysis of the perceptual and functional information that is being stored in those images have been areas of investigation in philosophy of scientific experimentation (Gooding et al. 1993 ). The nature of visual perception, the relationship between thought and vision, and the role of reproducibility as a norm for experimental research form a central aspect of this domain of research in philosophy of science. For instance, Rothbart ( 1997 ) has argued that visualizations are commonplace in the theoretical sciences even if every scientific theory may not be defined by visualized models.

Visual representations (i.e., photographs, diagrams, tables, charts, models) have been used in science over the years to enable scientists to interact with complex phenomena (Richards 2003 ) and might convey important evidence not observable in other ways (Barber et al. 2006 ). Some authors (e.g., Ruivenkamp and Rip 2010 ) have argued that visualization is as a core activity of some scientific communities of practice (e.g., nanotechnology) while others (e.g., Lynch and Edgerton 1988 ) have differentiated the role of particular visualization techniques (e.g., of digital image processing in astronomy). Visualization in science includes the complex process through which scientists develop or produce imagery, schemes, and graphical representation, and therefore, what is of importance in this process is not only the result but also the methodology employed by the scientists, namely, how this result was produced. Visual representations in science may refer to objects that are believed to have some kind of material or physical existence but equally might refer to purely mental, conceptual, and abstract constructs (Pauwels 2006 ). More specifically, visual representations can be found for: (a) phenomena that are not observable with the eye (i.e., microscopic or macroscopic); (b) phenomena that do not exist as visual representations but can be translated as such (i.e., sound); and (c) in experimental settings to provide visual data representations (i.e., graphs presenting velocity of moving objects). Additionally, since science is not only about replicating reality but also about making it more understandable to people (either to the public or other scientists), visual representations are not only about reproducing the nature but also about: (a) functioning in helping solving a problem, (b) filling gaps in our knowledge, and (c) facilitating knowledge building or transfer (Lynch 2006 ).

Using or developing visual representations in the scientific practice can range from a straightforward to a complicated situation. More specifically, scientists can observe a phenomenon (i.e., mitosis) and represent it visually using a picture or diagram, which is quite straightforward. But they can also use a variety of complicated techniques (i.e., crystallography in the case of DNA studies) that are either available or need to be developed or refined in order to acquire the visual information that can be used in the process of theory development (i.e., Latour and Woolgar 1979 ). Furthermore, some visual representations need decoding, and the scientists need to learn how to read these images (i.e., radiologists); therefore, using visual representations in the process of science requires learning a new language that is specific to the medium/methods that is used (i.e., understanding an X-ray picture is different from understanding an MRI scan) and then communicating that language to other scientists and the public.

There are much intent and purposes of visual representations in scientific practices, as for example to make a diagnosis, compare, describe, and preserve for future study, verify and explore new territory, generate new data (Pauwels 2006 ), or present new methodologies. According to Latour and Woolgar ( 1979 ) and Knorr Cetina ( 1999 ), visual representations can be used either as primary data (i.e., image from a microscope). or can be used to help in concept development (i.e., models of DNA used by Watson and Crick), to uncover relationships and to make the abstract more concrete (graphs of sound waves). Therefore, visual representations and visual practices, in all forms, are an important aspect of the scientific practices in developing, clarifying, and transmitting scientific knowledge (Pauwels 2006 ).

Methods and Results: Merging Visualization and scientific practices in science

In this paper, we present three case studies that embody the working practices of scientists in an effort to present visualization as a scientific practice and present our argument about how visualization is a complex process that could include among others modeling and use of representation but is not only limited to that. The first case study explores the role of visualization in the construction of knowledge about the structure of DNA, using visuals as evidence. The second case study focuses on Faraday’s use of the lines of magnetic force and the visual reasoning leading to the theoretical development that was an inherent part of the experimentation. The third case study focuses on the current practices of scientists in the context of a peer-reviewed journal called the Journal of Visualized Experiments where the methodology is communicated through videotaped procedures. The three case studies represent the research interests of the three authors of this paper and were chosen to present how visualization as a practice can be involved in all stages of doing science, from hypothesizing and evaluating evidence (case study 1) to experimenting and reasoning (case study 2) to communicating the findings and methodology with the research community (case study 3), and represent in this way the three functions of visualization as presented by Lynch ( 2006 ). Furthermore, the last case study showcases how the development of visualization technologies has contributed to the communication of findings and methodologies in science and present in that way an aspect of current scientific practices. In all three cases, our approach is guided by the observation that the visual information is an integral part of scientific practices at the least and furthermore that they are particularly central in the scientific practices of science.

Case study 1: use visual representations as evidence in the discovery of DNA

The focus of the first case study is the discovery of the structure of DNA. The DNA was first isolated in 1869 by Friedrich Miescher, and by the late 1940s, it was known that it contained phosphate, sugar, and four nitrogen-containing chemical bases. However, no one had figured the structure of the DNA until Watson and Crick presented their model of DNA in 1953. Other than the social aspects of the discovery of the DNA, another important aspect was the role of visual evidence that led to knowledge development in the area. More specifically, by studying the personal accounts of Watson ( 1968 ) and Crick ( 1988 ) about the discovery of the structure of the DNA, the following main ideas regarding the role of visual representations in the production of knowledge can be identified: (a) The use of visual representations was an important part of knowledge growth and was often dependent upon the discovery of new technologies (i.e., better microscopes or better techniques in crystallography that would provide better visual representations as evidence of the helical structure of the DNA); and (b) Models (three-dimensional) were used as a way to represent the visual images (X-ray images) and connect them to the evidence provided by other sources to see whether the theory can be supported. Therefore, the model of DNA was built based on the combination of visual evidence and experimental data.

An example showcasing the importance of visual representations in the process of knowledge production in this case is provided by Watson, in his book The Double Helix (1968):

…since the middle of the summer Rosy [Rosalind Franklin] had had evidence for a new three-dimensional form of DNA. It occurred when the DNA 2molecules were surrounded by a large amount of water. When I asked what the pattern was like, Maurice went into the adjacent room to pick up a print of the new form they called the “B” structure. The instant I saw the picture, my mouth fell open and my pulse began to race. The pattern was unbelievably simpler than those previously obtained (A form). Moreover, the black cross of reflections which dominated the picture could arise only from a helical structure. With the A form the argument for the helix was never straightforward, and considerable ambiguity existed as to exactly which type of helical symmetry was present. With the B form however, mere inspection of its X-ray picture gave several of the vital helical parameters. (p. 167-169)

As suggested by Watson’s personal account of the discovery of the DNA, the photo taken by Rosalind Franklin (Fig.  1 ) convinced him that the DNA molecule must consist of two chains arranged in a paired helix, which resembles a spiral staircase or ladder, and on March 7, 1953, Watson and Crick finished and presented their model of the structure of DNA (Watson and Berry 2004 ; Watson 1968 ) which was based on the visual information provided by the X-ray image and their knowledge of chemistry.

X-ray chrystallography of DNA

In analyzing the visualization practice in this case study, we observe the following instances that highlight how the visual information played a role:

Asking questions and defining problems: The real world in the model of science can at some points only be observed through visual representations or representations, i.e., if we are using DNA as an example, the structure of DNA was only observable through the crystallography images produced by Rosalind Franklin in the laboratory. There was no other way to observe the structure of DNA, therefore the real world.

Analyzing and interpreting data: The images that resulted from crystallography as well as their interpretations served as the data for the scientists studying the structure of DNA.

Experimenting: The data in the form of visual information were used to predict the possible structure of the DNA.

Modeling: Based on the prediction, an actual three-dimensional model was prepared by Watson and Crick. The first model did not fit with the real world (refuted by Rosalind Franklin and her research group from King’s College) and Watson and Crick had to go through the same process again to find better visual evidence (better crystallography images) and create an improved visual model.

Example excerpts from Watson’s biography provide further evidence for how visualization practices were applied in the context of the discovery of DNA (Table  1 ).

In summary, by examining the history of the discovery of DNA, we showcased how visual data is used as scientific evidence in science, identifying in that way an aspect of the nature of science that is still unexplored in the history of science and an aspect that has been ignored in the teaching of science. Visual representations are used in many ways: as images, as models, as evidence to support or rebut a model, and as interpretations of reality.

Case study 2: applying visual reasoning in knowledge production, the example of the lines of magnetic force

The focus of this case study is on Faraday’s use of the lines of magnetic force. Faraday is known of his exploratory, creative, and yet systemic way of experimenting, and the visual reasoning leading to theoretical development was an inherent part of this experimentation (Gooding 2006 ). Faraday’s articles or notebooks do not include mathematical formulations; instead, they include images and illustrations from experimental devices and setups to the recapping of his theoretical ideas (Nersessian 2008 ). According to Gooding ( 2006 ), “Faraday’s visual method was designed not to copy apparent features of the world, but to analyse and replicate them” (2006, p. 46).

The lines of force played a central role in Faraday’s research on electricity and magnetism and in the development of his “field theory” (Faraday 1852a ; Nersessian 1984 ). Before Faraday, the experiments with iron filings around magnets were known and the term “magnetic curves” was used for the iron filing patterns and also for the geometrical constructs derived from the mathematical theory of magnetism (Gooding et al. 1993 ). However, Faraday used the lines of force for explaining his experimental observations and in constructing the theory of forces in magnetism and electricity. Examples of Faraday’s different illustrations of lines of magnetic force are given in Fig.  2 . Faraday gave the following experiment-based definition for the lines of magnetic forces:

a Iron filing pattern in case of bar magnet drawn by Faraday (Faraday 1852b , Plate IX, p. 158, Fig. 1), b Faraday’s drawing of lines of magnetic force in case of cylinder magnet, where the experimental procedure, knife blade showing the direction of lines, is combined into drawing (Faraday, 1855, vol. 1, plate 1)

A line of magnetic force may be defined as that line which is described by a very small magnetic needle, when it is so moved in either direction correspondent to its length, that the needle is constantly a tangent to the line of motion; or it is that line along which, if a transverse wire be moved in either direction, there is no tendency to the formation of any current in the wire, whilst if moved in any other direction there is such a tendency; or it is that line which coincides with the direction of the magnecrystallic axis of a crystal of bismuth, which is carried in either direction along it. The direction of these lines about and amongst magnets and electric currents, is easily represented and understood, in a general manner, by the ordinary use of iron filings. (Faraday 1852a , p. 25 (3071))

The definition describes the connection between the experiments and the visual representation of the results. Initially, the lines of force were just geometric representations, but later, Faraday treated them as physical objects (Nersessian 1984 ; Pocovi and Finlay 2002 ):

I have sometimes used the term lines of force so vaguely, as to leave the reader doubtful whether I intended it as a merely representative idea of the forces, or as the description of the path along which the power was continuously exerted. … wherever the expression line of force is taken simply to represent the disposition of forces, it shall have the fullness of that meaning; but that wherever it may seem to represent the idea of the physical mode of transmission of the force, it expresses in that respect the opinion to which I incline at present. The opinion may be erroneous, and yet all that relates or refers to the disposition of the force will remain the same. (Faraday, 1852a , p. 55-56 (3075))

He also felt that the lines of force had greater explanatory power than the dominant theory of action-at-a-distance:

Now it appears to me that these lines may be employed with great advantage to represent nature, condition, direction and comparative amount of the magnetic forces; and that in many cases they have, to the physical reasoned at least, a superiority over that method which represents the forces as concentrated in centres of action… (Faraday, 1852a , p. 26 (3074))

For giving some insight to Faraday’s visual reasoning as an epistemic practice, the following examples of Faraday’s studies of the lines of magnetic force (Faraday 1852a , 1852b ) are presented:

(a) Asking questions and defining problems: The iron filing patterns formed the empirical basis for the visual model: 2D visualization of lines of magnetic force as presented in Fig.  2 . According to Faraday, these iron filing patterns were suitable for illustrating the direction and form of the magnetic lines of force (emphasis added):

It must be well understood that these forms give no indication by their appearance of the relative strength of the magnetic force at different places, inasmuch as the appearance of the lines depends greatly upon the quantity of filings and the amount of tapping; but the direction and forms of these lines are well given, and these indicate, in a considerable degree, the direction in which the forces increase and diminish . (Faraday 1852b , p.158 (3237))

Despite being static and two dimensional on paper, the lines of magnetic force were dynamical (Nersessian 1992 , 2008 ) and three dimensional for Faraday (see Fig.  2 b). For instance, Faraday described the lines of force “expanding”, “bending,” and “being cut” (Nersessian 1992 ). In Fig.  2 b, Faraday has summarized his experiment (bar magnet and knife blade) and its results (lines of force) in one picture.

(b) Analyzing and interpreting data: The model was so powerful for Faraday that he ended up thinking them as physical objects (e.g., Nersessian 1984 ), i.e., making interpretations of the way forces act. Of course, he made a lot of experiments for showing the physical existence of the lines of force, but he did not succeed in it (Nersessian 1984 ). The following quote illuminates Faraday’s use of the lines of force in different situations:

The study of these lines has, at different times, been greatly influential in leading me to various results, which I think prove their utility as well as fertility. Thus, the law of magneto-electric induction; the earth’s inductive action; the relation of magnetism and light; diamagnetic action and its law, and magnetocrystallic action, are the cases of this kind… (Faraday 1852a , p. 55 (3174))

(c) Experimenting: In Faraday's case, he used a lot of exploratory experiments; in case of lines of magnetic force, he used, e.g., iron filings, magnetic needles, or current carrying wires (see the quote above). The magnetic field is not directly observable and the representation of lines of force was a visual model, which includes the direction, form, and magnitude of field.

(d) Modeling: There is no denying that the lines of magnetic force are visual by nature. Faraday’s views of lines of force developed gradually during the years, and he applied and developed them in different contexts such as electromagnetic, electrostatic, and magnetic induction (Nersessian 1984 ). An example of Faraday’s explanation of the effect of the wire b’s position to experiment is given in Fig.  3 . In Fig.  3 , few magnetic lines of force are drawn, and in the quote below, Faraday is explaining the effect using these magnetic lines of force (emphasis added):

Picture of an experiment with different arrangements of wires ( a , b’ , b” ), magnet, and galvanometer. Note the lines of force drawn around the magnet. (Faraday 1852a , p. 34)

It will be evident by inspection of Fig. 3 , that, however the wires are carried away, the general result will, according to the assumed principles of action, be the same; for if a be the axial wire, and b’, b”, b”’ the equatorial wire, represented in three different positions, whatever magnetic lines of force pass across the latter wire in one position, will also pass it in the other, or in any other position which can be given to it. The distance of the wire at the place of intersection with the lines of force, has been shown, by the experiments (3093.), to be unimportant. (Faraday 1852a , p. 34 (3099))

In summary, by examining the history of Faraday’s use of lines of force, we showed how visual imagery and reasoning played an important part in Faraday’s construction and representation of his “field theory”. As Gooding has stated, “many of Faraday’s sketches are far more that depictions of observation, they are tools for reasoning with and about phenomena” (2006, p. 59).

Case study 3: visualizing scientific methods, the case of a journal

The focus of the third case study is the Journal of Visualized Experiments (JoVE) , a peer-reviewed publication indexed in PubMed. The journal devoted to the publication of biological, medical, chemical, and physical research in a video format. The journal describes its history as follows:

JoVE was established as a new tool in life science publication and communication, with participation of scientists from leading research institutions. JoVE takes advantage of video technology to capture and transmit the multiple facets and intricacies of life science research. Visualization greatly facilitates the understanding and efficient reproduction of both basic and complex experimental techniques, thereby addressing two of the biggest challenges faced by today's life science research community: i) low transparency and poor reproducibility of biological experiments and ii) time and labor-intensive nature of learning new experimental techniques. ( http://www.jove.com/ )

By examining the journal content, we generate a set of categories that can be considered as indicators of relevance and significance in terms of epistemic practices of science that have relevance for science education. For example, the quote above illustrates how scientists view some norms of scientific practice including the norms of “transparency” and “reproducibility” of experimental methods and results, and how the visual format of the journal facilitates the implementation of these norms. “Reproducibility” can be considered as an epistemic criterion that sits at the heart of what counts as an experimental procedure in science:

Investigating what should be reproducible and by whom leads to different types of experimental reproducibility, which can be observed to play different roles in experimental practice. A successful application of the strategy of reproducing an experiment is an achievement that may depend on certain isiosyncratic aspects of a local situation. Yet a purely local experiment that cannot be carried out by other experimenters and in other experimental contexts will, in the end be unproductive in science. (Sarkar and Pfeifer 2006 , p.270)

We now turn to an article on “Elevated Plus Maze for Mice” that is available for free on the journal website ( http://www.jove.com/video/1088/elevated-plus-maze-for-mice ). The purpose of this experiment was to investigate anxiety levels in mice through behavioral analysis. The journal article consists of a 9-min video accompanied by text. The video illustrates the handling of the mice in soundproof location with less light, worksheets with characteristics of mice, computer software, apparatus, resources, setting up the computer software, and the video recording of mouse behavior on the computer. The authors describe the apparatus that is used in the experiment and state how procedural differences exist between research groups that lead to difficulties in the interpretation of results:

The apparatus consists of open arms and closed arms, crossed in the middle perpendicularly to each other, and a center area. Mice are given access to all of the arms and are allowed to move freely between them. The number of entries into the open arms and the time spent in the open arms are used as indices of open space-induced anxiety in mice. Unfortunately, the procedural differences that exist between laboratories make it difficult to duplicate and compare results among laboratories.

The authors’ emphasis on the particularity of procedural context echoes in the observations of some philosophers of science:

It is not just the knowledge of experimental objects and phenomena but also their actual existence and occurrence that prove to be dependent on specific, productive interventions by the experimenters” (Sarkar and Pfeifer 2006 , pp. 270-271)

The inclusion of a video of the experimental procedure specifies what the apparatus looks like (Fig.  4 ) and how the behavior of the mice is captured through video recording that feeds into a computer (Fig.  5 ). Subsequently, a computer software which captures different variables such as the distance traveled, the number of entries, and the time spent on each arm of the apparatus. Here, there is visual information at different levels of representation ranging from reconfiguration of raw video data to representations that analyze the data around the variables in question (Fig.  6 ). The practice of levels of visual representations is not particular to the biological sciences. For instance, they are commonplace in nanotechnological practices:

Visual illustration of apparatus

Video processing of experimental set-up

Computer software for video input and variable recording

In the visualization processes, instruments are needed that can register the nanoscale and provide raw data, which needs to be transformed into images. Some Imaging Techniques have software incorporated already where this transformation automatically takes place, providing raw images. Raw data must be translated through the use of Graphic Software and software is also used for the further manipulation of images to highlight what is of interest to capture the (inferred) phenomena -- and to capture the reader. There are two levels of choice: Scientists have to choose which imaging technique and embedded software to use for the job at hand, and they will then have to follow the structure of the software. Within such software, there are explicit choices for the scientists, e.g. about colour coding, and ways of sharpening images. (Ruivenkamp and Rip 2010 , pp.14–15)

On the text that accompanies the video, the authors highlight the role of visualization in their experiment:

Visualization of the protocol will promote better understanding of the details of the entire experimental procedure, allowing for standardization of the protocols used in different laboratories and comparisons of the behavioral phenotypes of various strains of mutant mice assessed using this test.

The software that takes the video data and transforms it into various representations allows the researchers to collect data on mouse behavior more reliably. For instance, the distance traveled across the arms of the apparatus or the time spent on each arm would have been difficult to observe and record precisely. A further aspect to note is how the visualization of the experiment facilitates control of bias. The authors illustrate how the olfactory bias between experimental procedures carried on mice in sequence is avoided by cleaning the equipment.

Our discussion highlights the role of visualization in science, particularly with respect to presenting visualization as part of the scientific practices. We have used case studies from the history of science highlighting a scientist’s account of how visualization played a role in the discovery of DNA and the magnetic field and from a contemporary illustration of a science journal’s practices in incorporating visualization as a way to communicate new findings and methodologies. Our implicit aim in drawing from these case studies was the need to align science education with scientific practices, particularly in terms of how visual representations, stable or dynamic, can engage students in the processes of science and not only to be used as tools for cognitive development in science. Our approach was guided by the notion of “knowledge-as-practice” as advanced by Knorr Cetina ( 1999 ) who studied scientists and characterized their knowledge as practice, a characterization which shifts focus away from ideas inside scientists’ minds to practices that are cultural and deeply contextualized within fields of science. She suggests that people working together can be examined as epistemic cultures whose collective knowledge exists as practice.

It is important to stress, however, that visual representations are not used in isolation, but are supported by other types of evidence as well, or other theories (i.e., in order to understand the helical form of DNA, or the structure, chemistry knowledge was needed). More importantly, this finding can also have implications when teaching science as argument (e.g., Erduran and Jimenez-Aleixandre 2008 ), since the verbal evidence used in the science classroom to maintain an argument could be supported by visual evidence (either a model, representation, image, graph, etc.). For example, in a group of students discussing the outcomes of an introduced species in an ecosystem, pictures of the species and the ecosystem over time, and videos showing the changes in the ecosystem, and the special characteristics of the different species could serve as visual evidence to help the students support their arguments (Evagorou et al. 2012 ). Therefore, an important implication for the teaching of science is the use of visual representations as evidence in the science curriculum as part of knowledge production. Even though studies in the area of science education have focused on the use of models and modeling as a way to support students in the learning of science (Dori et al. 2003 ; Lehrer and Schauble 2012 ; Mendonça and Justi 2013 ; Papaevripidou et al. 2007 ) or on the use of images (i.e., Korfiatis et al. 2003 ), with the term using visuals as evidence, we refer to the collection of all forms of visuals and the processes involved.

Another aspect that was identified through the case studies is that of the visual reasoning (an integral part of Faraday’s investigations). Both the verbalization and visualization were part of the process of generating new knowledge (Gooding 2006 ). Even today, most of the textbooks use the lines of force (or just field lines) as a geometrical representation of field, and the number of field lines is connected to the quantity of flux. Often, the textbooks use the same kind of visual imagery than in what is used by scientists. However, when using images, only certain aspects or features of the phenomena or data are captured or highlighted, and often in tacit ways. Especially in textbooks, the process of producing the image is not presented and instead only the product—image—is left. This could easily lead to an idea of images (i.e., photos, graphs, visual model) being just representations of knowledge and, in the worse case, misinterpreted representations of knowledge as the results of Pocovi and Finlay ( 2002 ) in case of electric field lines show. In order to avoid this, the teachers should be able to explain how the images are produced (what features of phenomena or data the images captures, on what ground the features are chosen to that image, and what features are omitted); in this way, the role of visualization in knowledge production can be made “visible” to students by engaging them in the process of visualization.

The implication of these norms for science teaching and learning is numerous. The classroom contexts can model the generation, sharing and evaluation of evidence, and experimental procedures carried out by students, thereby promoting not only some contemporary cultural norms in scientific practice but also enabling the learning of criteria, standards, and heuristics that scientists use in making decisions on scientific methods. As we have demonstrated with the three case studies, visual representations are part of the process of knowledge growth and communication in science, as demonstrated with two examples from the history of science and an example from current scientific practices. Additionally, visual information, especially with the use of technology is a part of students’ everyday lives. Therefore, we suggest making use of students’ knowledge and technological skills (i.e., how to produce their own videos showing their experimental method or how to identify or provide appropriate visual evidence for a given topic), in order to teach them the aspects of the nature of science that are often neglected both in the history of science and the design of curriculum. Specifically, what we suggest in this paper is that students should actively engage in visualization processes in order to appreciate the diverse nature of doing science and engage in authentic scientific practices.

However, as a word of caution, we need to distinguish the products and processes involved in visualization practices in science:

If one considers scientific representations and the ways in which they can foster or thwart our understanding, it is clear that a mere object approach, which would devote all attention to the representation as a free-standing product of scientific labor, is inadequate. What is needed is a process approach: each visual representation should be linked with its context of production (Pauwels 2006 , p.21).

The aforementioned suggests that the emphasis in visualization should shift from cognitive understanding—using the products of science to understand the content—to engaging in the processes of visualization. Therefore, an implication for the teaching of science includes designing curriculum materials and learning environments that create a social and epistemic context and invite students to engage in the practice of visualization as evidence, reasoning, experimental procedure, or a means of communication (as presented in the three case studies) and reflect on these practices (Ryu et al. 2015 ).

Finally, a question that arises from including visualization in science education, as well as from including scientific practices in science education is whether teachers themselves are prepared to include them as part of their teaching (Bybee 2014 ). Teacher preparation programs and teacher education have been critiqued, studied, and rethought since the time they emerged (Cochran-Smith 2004 ). Despite the years of history in teacher training and teacher education, the debate about initial teacher training and its content still pertains in our community and in policy circles (Cochran-Smith 2004 ; Conway et al. 2009 ). In the last decades, the debate has shifted from a behavioral view of learning and teaching to a learning problem—focusing on that way not only on teachers’ knowledge, skills, and beliefs but also on making the connection of the aforementioned with how and if pupils learn (Cochran-Smith 2004 ). The Science Education in Europe report recommended that “Good quality teachers, with up-to-date knowledge and skills, are the foundation of any system of formal science education” (Osborne and Dillon 2008 , p.9).

However, questions such as what should be the emphasis on pre-service and in-service science teacher training, especially with the new emphasis on scientific practices, still remain unanswered. As Bybee ( 2014 ) argues, starting from the new emphasis on scientific practices in the NGSS, we should consider teacher preparation programs “that would provide undergraduates opportunities to learn the science content and practices in contexts that would be aligned with their future work as teachers” (p.218). Therefore, engaging pre- and in-service teachers in visualization as a scientific practice should be one of the purposes of teacher preparation programs.

Achieve. (2013). The next generation science standards (pp. 1–3). Retrieved from http://www.nextgenscience.org/ .

Google Scholar  

Barber, J, Pearson, D, & Cervetti, G. (2006). Seeds of science/roots of reading . California: The Regents of the University of California.

Bungum, B. (2008). Images of physics: an explorative study of the changing character of visual images in Norwegian physics textbooks. NorDiNa, 4 (2), 132–141.

Bybee, RW. (2014). NGSS and the next generation of science teachers. Journal of Science Teacher Education, 25 (2), 211–221. doi: 10.1007/s10972-014-9381-4 .

Article   Google Scholar  

Chambers, D. (1983). Stereotypic images of the scientist: the draw-a-scientist test. Science Education, 67 (2), 255–265.

Cochran-Smith, M. (2004). The problem of teacher education. Journal of Teacher Education, 55 (4), 295–299. doi: 10.1177/0022487104268057 .

Conway, PF, Murphy, R, & Rath, A. (2009). Learning to teach and its implications for the continuum of teacher education: a nine-country cross-national study .

Crick, F. (1988). What a mad pursuit . USA: Basic Books.

Dimopoulos, K, Koulaidis, V, & Sklaveniti, S. (2003). Towards an analysis of visual images in school science textbooks and press articles about science and technology. Research in Science Education, 33 , 189–216.

Dori, YJ, Tal, RT, & Tsaushu, M. (2003). Teaching biotechnology through case studies—can we improve higher order thinking skills of nonscience majors? Science Education, 87 (6), 767–793. doi: 10.1002/sce.10081 .

Duschl, RA, & Bybee, RW. (2014). Planning and carrying out investigations: an entry to learning and to teacher professional development around NGSS science and engineering practices. International Journal of STEM Education, 1 (1), 12. doi: 10.1186/s40594-014-0012-6 .

Duschl, R., Schweingruber, H. A., & Shouse, A. (2008). Taking science to school . Washington DC: National Academies Press.

Erduran, S, & Jimenez-Aleixandre, MP (Eds.). (2008). Argumentation in science education: perspectives from classroom-based research . Dordrecht: Springer.

Eurydice. (2012). Developing key competencies at school in Europe: challenges and opportunities for policy – 2011/12 (pp. 1–72).

Evagorou, M, Jimenez-Aleixandre, MP, & Osborne, J. (2012). “Should we kill the grey squirrels?” A study exploring students’ justifications and decision-making. International Journal of Science Education, 34 (3), 401–428. doi: 10.1080/09500693.2011.619211 .

Faraday, M. (1852a). Experimental researches in electricity. – Twenty-eighth series. Philosophical Transactions of the Royal Society of London, 142 , 25–56.

Faraday, M. (1852b). Experimental researches in electricity. – Twenty-ninth series. Philosophical Transactions of the Royal Society of London, 142 , 137–159.

Gilbert, JK. (2010). The role of visual representations in the learning and teaching of science: an introduction (pp. 1–19).

Gilbert, J., Reiner, M. & Nakhleh, M. (2008). Visualization: theory and practice in science education . Dordrecht, The Netherlands: Springer.

Gooding, D. (2006). From phenomenology to field theory: Faraday’s visual reasoning. Perspectives on Science, 14 (1), 40–65.

Gooding, D, Pinch, T, & Schaffer, S (Eds.). (1993). The uses of experiment: studies in the natural sciences . Cambridge: Cambridge University Press.

Hogan, K, & Maglienti, M. (2001). Comparing the epistemological underpinnings of students’ and scientists’ reasoning about conclusions. Journal of Research in Science Teaching, 38 (6), 663–687.

Knorr Cetina, K. (1999). Epistemic cultures: how the sciences make knowledge . Cambridge: Harvard University Press.

Korfiatis, KJ, Stamou, AG, & Paraskevopoulos, S. (2003). Images of nature in Greek primary school textbooks. Science Education, 88 (1), 72–89. doi: 10.1002/sce.10133 .

Latour, B. (2011). Visualisation and cognition: drawing things together (pp. 1–32).

Latour, B, & Woolgar, S. (1979). Laboratory life: the construction of scientific facts . Princeton: Princeton University Press.

Lehrer, R, & Schauble, L. (2012). Seeding evolutionary thinking by engaging children in modeling its foundations. Science Education, 96 (4), 701–724. doi: 10.1002/sce.20475 .

Longino, H. E. (2002). The fate of knowledge . Princeton: Princeton University Press.

Lynch, M. (2006). The production of scientific images: vision and re-vision in the history, philosophy, and sociology of science. In L Pauwels (Ed.), Visual cultures of science: rethinking representational practices in knowledge building and science communication (pp. 26–40). Lebanon, NH: Darthmouth College Press.

Lynch, M. & S. Y. Edgerton Jr. (1988). ‘Aesthetic and digital image processing representational craft in contemporary astronomy’, in G. Fyfe & J. Law (eds), Picturing Power; Visual Depictions and Social Relations (London, Routledge): 184 – 220.

Mendonça, PCC, & Justi, R. (2013). An instrument for analyzing arguments produced in modeling-based chemistry lessons. Journal of Research in Science Teaching, 51 (2), 192–218. doi: 10.1002/tea.21133 .

National Research Council (2000). Inquiry and the national science education standards . Washington DC: National Academies Press.

National Research Council (2012). A framework for K-12 science education . Washington DC: National Academies Press.

Nersessian, NJ. (1984). Faraday to Einstein: constructing meaning in scientific theories . Dordrecht: Martinus Nijhoff Publishers.

Book   Google Scholar  

Nersessian, NJ. (1992). How do scientists think? Capturing the dynamics of conceptual change in science. In RN Giere (Ed.), Cognitive Models of Science (pp. 3–45). Minneapolis: University of Minnesota Press.

Nersessian, NJ. (2008). Creating scientific concepts . Cambridge: The MIT Press.

Osborne, J. (2014). Teaching scientific practices: meeting the challenge of change. Journal of Science Teacher Education, 25 (2), 177–196. doi: 10.1007/s10972-014-9384-1 .

Osborne, J. & Dillon, J. (2008). Science education in Europe: critical reflections . London: Nuffield Foundation.

Papaevripidou, M, Constantinou, CP, & Zacharia, ZC. (2007). Modeling complex marine ecosystems: an investigation of two teaching approaches with fifth graders. Journal of Computer Assisted Learning, 23 (2), 145–157. doi: 10.1111/j.1365-2729.2006.00217.x .

Pauwels, L. (2006). A theoretical framework for assessing visual representational practices in knowledge building and science communications. In L Pauwels (Ed.), Visual cultures of science: rethinking representational practices in knowledge building and science communication (pp. 1–25). Lebanon, NH: Darthmouth College Press.

Philips, L., Norris, S. & McNab, J. (2010). Visualization in mathematics, reading and science education . Dordrecht, The Netherlands: Springer.

Pocovi, MC, & Finlay, F. (2002). Lines of force: Faraday’s and students’ views. Science & Education, 11 , 459–474.

Richards, A. (2003). Argument and authority in the visual representations of science. Technical Communication Quarterly, 12 (2), 183–206. doi: 10.1207/s15427625tcq1202_3 .

Rothbart, D. (1997). Explaining the growth of scientific knowledge: metaphors, models and meaning . Lewiston, NY: Mellen Press.

Ruivenkamp, M, & Rip, A. (2010). Visualizing the invisible nanoscale study: visualization practices in nanotechnology community of practice. Science Studies, 23 (1), 3–36.

Ryu, S, Han, Y, & Paik, S-H. (2015). Understanding co-development of conceptual and epistemic understanding through modeling practices with mobile internet. Journal of Science Education and Technology, 24 (2-3), 330–355. doi: 10.1007/s10956-014-9545-1 .

Sarkar, S, & Pfeifer, J. (2006). The philosophy of science, chapter on experimentation (Vol. 1, A-M). New York: Taylor & Francis.

Schwartz, RS, Lederman, NG, & Abd-el-Khalick, F. (2012). A series of misrepresentations: a response to Allchin’s whole approach to assessing nature of science understandings. Science Education, 96 (4), 685–692. doi: 10.1002/sce.21013 .

Schwarz, CV, Reiser, BJ, Davis, EA, Kenyon, L, Achér, A, Fortus, D, et al. (2009). Developing a learning progression for scientific modeling: making scientific modeling accessible and meaningful for learners. Journal of Research in Science Teaching, 46 (6), 632–654. doi: 10.1002/tea.20311 .

Watson, J. (1968). The Double Helix: a personal account of the discovery of the structure of DNA . New York: Scribner.

Watson, J, & Berry, A. (2004). DNA: the secret of life . New York: Alfred A. Knopf.

Wickman, PO. (2004). The practical epistemologies of the classroom: a study of laboratory work. Science Education, 88 , 325–344.

Wu, HK, & Shah, P. (2004). Exploring visuospatial thinking in chemistry learning. Science Education, 88 (3), 465–492. doi: 10.1002/sce.10126 .

Download references

Acknowledgements

The authors would like to acknowledge all reviewers for their valuable comments that have helped us improve the manuscript.

Author information

Authors and affiliations.

University of Nicosia, 46, Makedonitissa Avenue, Egkomi, 1700, Nicosia, Cyprus

Maria Evagorou

University of Limerick, Limerick, Ireland

Sibel Erduran

University of Tampere, Tampere, Finland

Terhi Mäntylä

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Maria Evagorou .

Additional information

Competing interests.

The authors declare that they have no competing interests.

Authors’ contributions

ME carried out the introductory literature review, the analysis of the first case study, and drafted the manuscript. SE carried out the analysis of the third case study and contributed towards the “Conclusions” section of the manuscript. TM carried out the second case study. All authors read and approved the final manuscript.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( https://creativecommons.org/licenses/by/4.0 ), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

Evagorou, M., Erduran, S. & Mäntylä, T. The role of visual representations in scientific practices: from conceptual understanding and knowledge generation to ‘seeing’ how science works. IJ STEM Ed 2 , 11 (2015). https://doi.org/10.1186/s40594-015-0024-x

Download citation

Received : 29 September 2014

Accepted : 16 May 2015

Published : 19 July 2015

DOI : https://doi.org/10.1186/s40594-015-0024-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Visual representations
  • Epistemic practices
  • Science learning

what are visual representations

Cart

  • SUGGESTED TOPICS
  • The Magazine
  • Newsletters
  • Managing Yourself
  • Managing Teams
  • Work-life Balance
  • The Big Idea
  • Data & Visuals
  • Reading Lists
  • Case Selections
  • HBR Learning
  • Topic Feeds
  • Account Settings
  • Email Preferences

Visualizations That Really Work

  • Scott Berinato

what are visual representations

Not long ago, the ability to create smart data visualizations (or dataviz) was a nice-to-have skill for design- and data-minded managers. But now it’s a must-have skill for all managers, because it’s often the only way to make sense of the work they do. Decision making increasingly relies on data, which arrives with such overwhelming velocity, and in such volume, that some level of abstraction is crucial. Thanks to the internet and a growing number of affordable tools, visualization is accessible for everyone—but that convenience can lead to charts that are merely adequate or even ineffective.

By answering just two questions, Berinato writes, you can set yourself up to succeed: Is the information conceptual or data-driven? and Am I declaring something or exploring something? He leads readers through a simple process of identifying which of the four types of visualization they might use to achieve their goals most effectively: idea illustration, idea generation, visual discovery, or everyday dataviz.

This article is adapted from the author’s just-published book, Good Charts: The HBR Guide to Making Smarter, More Persuasive Data Visualizations.

Know what message you’re trying to communicate before you get down in the weeds.

Idea in Brief

Knowledge workers need greater visual literacy than they used to, because so much data—and so many ideas—are now presented graphically. But few of us have been taught data-visualization skills.

Tools Are Fine…

Inexpensive tools allow anyone to perform simple tasks such as importing spreadsheet data into a bar chart. But that means it’s easy to create terrible charts. Visualization can be so much more: It’s an agile, powerful way to explore ideas and communicate information.

…But Strategy Is Key

Don’t jump straight to execution. Instead, first think about what you’re representing—ideas or data? Then consider your purpose: Do you want to inform, persuade, or explore? The answers will suggest what tools and resources you need.

Not long ago, the ability to create smart data visualizations, or dataviz, was a nice-to-have skill. For the most part, it benefited design- and data-minded managers who made a deliberate decision to invest in acquiring it. That’s changed. Now visual communication is a must-have skill for all managers, because more and more often, it’s the only way to make sense of the work they do.

  • Scott Berinato is a senior editor at Harvard Business Review and the author of Good Charts Workbook: Tips Tools, and Exercises for Making Better Data Visualizations and Good Charts: The HBR Guide to Making Smarter, More Persuasive Data Visualizations .

what are visual representations

Partner Center

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Cogn Res Princ Implic

Creating visual explanations improves learning

Eliza bobek.

1 University of Massachusetts Lowell, Lowell, MA USA

Barbara Tversky

2 Stanford University, Columbia University Teachers College, New York, NY USA

Associated Data

Many topics in science are notoriously difficult for students to learn. Mechanisms and processes outside student experience present particular challenges. While instruction typically involves visualizations, students usually explain in words. Because visual explanations can show parts and processes of complex systems directly, creating them should have benefits beyond creating verbal explanations. We compared learning from creating visual or verbal explanations for two STEM domains, a mechanical system (bicycle pump) and a chemical system (bonding). Both kinds of explanations were analyzed for content and learning assess by a post-test. For the mechanical system, creating a visual explanation increased understanding particularly for participants of low spatial ability. For the chemical system, creating both visual and verbal explanations improved learning without new teaching. Creating a visual explanation was superior and benefitted participants of both high and low spatial ability. Visual explanations often included crucial yet invisible features. The greater effectiveness of visual explanations appears attributable to the checks they provide for completeness and coherence as well as to their roles as platforms for inference. The benefits should generalize to other domains like the social sciences, history, and archeology where important information can be visualized. Together, the findings provide support for the use of learner-generated visual explanations as a powerful learning tool.

Electronic supplementary material

The online version of this article (doi:10.1186/s41235-016-0031-6) contains supplementary material, which is available to authorized users.

Significance

Uncovering cognitive principles for effective teaching and learning is a central application of cognitive psychology. Here we show: (1) creating explanations of STEM phenomena improves learning without additional teaching; and (2) creating visual explanations is superior to creating verbal ones. There are several notable differences between visual and verbal explanations; visual explanations map thought more directly than words and provide checks for completeness and coherence as well as a platform for inference, notably from structure to process. Extensions of the technique to other domains should be possible. Creating visual explanations is likely to enhance students’ spatial thinking skills, skills that are increasingly needed in the contemporary and future world.

Dynamic systems such as those in science and engineering, but also in history, politics, and other domains, are notoriously difficult to learn (e.g. Chi, DeLeeuw, Chiu, & Lavancher, 1994 ; Hmelo-Silver & Pfeffer, 2004 ; Johnstone, 1991 ; Perkins & Grotzer, 2005 ). Mechanisms, processes, and behavior of complex systems present particular challenges. Learners must master not only the individual components of the system or process (structure) but also the interactions and mechanisms (function), which may be complex and frequently invisible. If the phenomena are macroscopic, sub-microscopic, or abstract, there is an additional level of difficulty. Although the teaching of STEM phenomena typically relies on visualizations, such as pictures, graphs, and diagrams, learning is typically revealed in words, both spoken and written. Visualizations have many advantages over verbal explanations for teaching; can creating visual explanations promote learning?

Learning from visual representations in STEM

Given the inherent challenges in teaching and learning complex or invisible processes in science, educators have developed ways of representing these processes to enable and enhance student understanding. External visual representations, including diagrams, photographs, illustrations, flow charts, and graphs, are often used in science to both illustrate and explain concepts (e.g., Hegarty, Carpenter, & Just, 1990 ; Mayer, 1989 ). Visualizations can directly represent many structural and behavioral properties. They also help to draw inferences (Larkin & Simon, 1987 ), find routes in maps (Levine, 1982 ), spot trends in graphs (Kessell & Tversky, 2011 ; Zacks & Tversky, 1999 ), imagine traffic flow or seasonal changes in light from architectural sketches (e.g. Tversky & Suwa, 2009 ), and determine the consequences of movements of gears and pulleys in mechanical systems (e.g. Hegarty & Just, 1993 ; Hegarty, Kriz, & Cate, 2003 ). The use of visual elements such as arrows is another benefit to learning with visualizations. Arrows are widely produced and comprehended as representing a range of kinds of forces as well as changes over time (e.g. Heiser & Tversky, 2002 ; Tversky, Heiser, MacKenzie, Lozano, & Morrison, 2007 ). Visualizations are thus readily able to depict the parts and configurations of systems; presenting the same content via language may be more difficult. Although words can describe spatial properties, because the correspondences of meaning to language are purely symbolic, comprehension and construction of mental representations from descriptions is far more effortful and error prone (e.g. Glenberg & Langston, 1992 ; Hegarty & Just, 1993 ; Larkin & Simon, 1987 ; Mayer, 1989 ). Given the differences in how visual and verbal information is processed, how learners draw inferences and construct understanding in these two modes warrants further investigation.

Benefits of generating explanations

Learner-generated explanations of scientific phenomena may be an important learning strategy to consider beyond the utility of learning from a provided external visualization. Explanations convey information about concepts or processes with the goal of making clear and comprehensible an idea or set of ideas. Explanations may involve a variety of elements, such as the use of examples and analogies (Roscoe & Chi, 2007 ). When explaining something new, learners may have to think carefully about the relationships between elements in the process and prioritize the multitude of information available to them. Generating explanations may require learners to reorganize their mental models by allowing them to make and refine connections between and among elements and concepts. Explaining may also help learners metacognitively address their own knowledge gaps and misconceptions.

Many studies have shown that learning is enhanced when students are actively engaged in creative, generative activities (e.g. Chi, 2009 ; Hall, Bailey, & Tillman, 1997 ). Generative activities have been shown to benefit comprehension of domains involving invisible components, including electric circuits (Johnson & Mayer, 2010 ) and the chemistry of detergents (Schwamborn, Mayer, Thillmann, Leopold, & Leutner, 2010 ). Wittrock’s ( 1990 ) generative theory stresses the importance of learners actively constructing and developing relationships. Generative activities require learners to select information and choose how to integrate and represent the information in a unified way. When learners make connections between pieces of information, knowledge, and experience, by generating headings, summaries, pictures, and analogies, deeper understanding develops.

The information learners draw upon to construct their explanations is likely important. For example, Ainsworth and Loizou ( 2003 ) found that asking participants to self-explain with a diagram resulted in greater learning than self-explaining from text. How might learners explain with physical mechanisms or materials with multi-modal information?

Generating visual explanations

Learner-generated visualizations have been explored in several domains. Gobert and Clement ( 1999 ) investigated the effectiveness of student-generated diagrams versus student-generated summaries on understanding plate tectonics after reading an expository text. Students who generated diagrams scored significantly higher on a post-test measuring spatial and causal/dynamic content, even though the diagrams contained less domain-related information. Hall et al. ( 1997 ) showed that learners who generated their own illustrations from text performed equally as well as learners provided with text and illustrations. Both groups outperformed learners only provided with text. In a study concerning the law of conservation of energy, participants who generated drawings scored higher on a post-test than participants who wrote their own narrative of the process (Edens & Potter, 2003 ). In addition, the quality and number of concept units present in the drawing/science log correlated with performance on the post-test. Van Meter ( 2001 ) found that drawing while reading a text about Newton’s Laws was more effective than answering prompts in writing.

One aspect to explore is whether visual and verbal productions contain different types of information. Learning advantages for the generation of visualizations could be attributed to learners’ translating across modalities, from a verbal format into a visual format. Translating verbal information from the text into a visual explanation may promote deeper processing of the material and more complete and comprehensive mental models (Craik & Lockhart, 1972 ). Ainsworth and Iacovides ( 2005 ) addressed this issue by asking two groups of learners to self-explain while learning about the circulatory system of the human body. Learners given diagrams were asked to self-explain in writing and learners given text were asked to explain using a diagram. The results showed no overall differences in learning outcomes, however the learners provided text included significantly more information in their diagrams than the other group. Aleven and Koedinger ( 2002 ) argue that explanations are most helpful if they can integrate visual and verbal information. Translating across modalities may serve this purpose, although translating is not necessarily an easy task (Ainsworth, Bibby, & Wood, 2002 ).

It is important to remember that not all studies have found advantages to generating explanations. Wilkin ( 1997 ) found that directions to self-explain using a diagram hindered understanding in examples in physical motion when students were presented with text and instructed to draw a diagram. She argues that the diagrams encouraged learners to connect familiar but unrelated knowledge. In particular, “low benefit learners” in her study inappropriately used spatial adjacency and location to connect parts of diagrams, instead of the particular properties of those parts. Wilkin argues that these learners are novices and that experts may not make the same mistake since they have the skills to analyze features of a diagram according to their relevant properties. She also argues that the benefits of self-explaining are highest when the learning activity is constrained so that learners are limited in their possible interpretations. Other studies that have not found a learning advantage from generating drawings have in common an absence of support for the learner (Alesandrini, 1981 ; Leutner, Leopold, & Sumfleth, 2009 ). Another mediating factor may be the learner’s spatial ability.

The role of spatial ability

Spatial thinking involves objects, their size, location, shape, their relation to one another, and how and where they move through space. How then, might learners with different levels of spatial ability gain structural and functional understanding in science and how might this ability affect the utility of learner-generated visual explanations? Several lines of research have sought to explore the role of spatial ability in learning science. Kozhevnikov, Hegarty, and Mayer ( 2002 ) found that low spatial ability participants interpreted graphs as pictures, whereas high spatial ability participants were able to construct more schematic images and manipulate them spatially. Hegarty and Just ( 1993 ) found that the ability to mentally animate mechanical systems correlated with spatial ability, but not verbal ability. In their study, low spatial ability participants made more errors in movement verification tasks. Leutner et al. ( 2009 ) found no effect of spatial ability on the effectiveness of drawing compared to mentally imagining text content. Mayer and Sims ( 1994 ) found that spatial ability played a role in participants’ ability to integrate visual and verbal information presented in an animation. The authors argue that their results can be interpreted within the context of dual-coding theory. They suggest that low spatial ability participants must devote large amounts of cognitive effort into building a visual representation of the system. High spatial ability participants, on the other hand, are more able to allocate sufficient cognitive resources to building referential connections between visual and verbal information.

Benefits of testing

Although not presented that way, creating an explanation could be regarded as a form of testing. Considerable research has documented positive effects of testing on learning. Presumably taking a test requires retrieving and sometimes integrating the learned material and those processes can augment learning without additional teaching or study (e.g. Roediger & Karpicke, 2006 ; Roediger, Putnam, & Smith, 2011 ; Wheeler & Roediger, 1992 ). Hausmann and Vanlehn ( 2007 ) addressed the possibility that generating explanations is beneficial because learners merely spend more time with the content material than learners who are not required to generate an explanation. In their study, they compared the effects of using instructions to self-explain with instructions to merely paraphrase physics (electrodynamics) material. Attending to provided explanations by paraphrasing was not as effective as generating explanations as evidenced by retention scores on an exam 29 days after the experiment and transfer scores within and across domains. Their study concludes, “the important variable for learning was the process of producing an explanation” (p. 423). Thus, we expect benefits from creating either kind of explanation but for the reasons outlined previously, we expect larger benefits from creating visual explanations.

Present experiments

This study set out to answer a number of related questions about the role of learner-generated explanations in learning and understanding of invisible processes. (1) Do students learn more when they generate visual or verbal explanations? We anticipate that learning will be greater with the creation of visual explanations, as they encourage completeness and the integration of structure and function. (2) Does the inclusion of structural and functional information correlate with learning as measured by a post-test? We predict that including greater counts of information, particularly invisible and functional information, will positively correlate with higher post-test scores. (3) Does spatial ability predict the inclusion of structural and functional information in explanations, and does spatial ability predict post-test scores? We predict that high spatial ability participants will include more information in their explanations, and will score higher on post-tests.

Experiment 1

The first experiment examines the effects of creating visual or verbal explanations on the comprehension of a bicycle tire pump’s operation in participants with low and high spatial ability. Although the pump itself is not invisible, the components crucial to its function, notably the inlet and outlet valves, and the movement of air, are located inside the pump. It was predicted that visual explanations would include more information than verbal explanations, particularly structural information, since their construction encourages completeness and the production of a whole mechanical system. It was also predicted that functional information would be biased towards a verbal format, since much of the function of the pump is hidden and difficult to express in pictures. Finally, it was predicted that high spatial ability participants would be able to produce more complete explanations and would thus also demonstrate better performance on the post-test. Explanations were coded for structural and functional content, essential features, invisible features, arrows, and multiple steps.

Participants

Participants were 127 (59 female) seventh and eighth grade students, aged 12–14 years, enrolled in an independent school in New York City. The school’s student body is 70% white, 30% other ethnicities. Approximately 25% of the student body receives financial aid. The sample consisted of three class sections of seventh grade students and three class sections of eighth grade students. Both seventh and eighth grade classes were integrated science (earth, life, and physical sciences) and students were not grouped according to ability in any section. Written parental consent was obtained by means of signed informed consent forms. Each participant was randomly assigned to one of two conditions within each class. There were 64 participants in the visual condition explained the bicycle pump’s function by drawing and 63 participants explained the pump’s function by writing.

The materials consisted of a 12-inch Spalding bicycle pump, a blank 8.5 × 11 in. sheet of paper, and a post-test (Additional file 1 ). The pump’s chamber and hose were made of clear plastic; the handle and piston were black plastic. The parts of the pump (e.g. inlet valve, piston) were labeled.

Spatial ability was assessed using the Vandenberg and Kuse ( 1978 ) mental rotation test (MRT). The MRT is a 20-item test in which two-dimensional drawings of three-dimensional objects are compared. Each item consists of one “target” drawing and four drawings that are to be compared to the target. Two of the four drawings are rotated versions of the target drawing and the other two are not. The task is to identify the two rotated versions of the target. A score was determined by assigning one point to each question if both of the correct rotated versions were chosen. The maximum score was 20 points.

The post-test consisted of 16 true/false questions printed on a single sheet of paper measuring 8.5 × 11 in. Half of the questions related to the structure of the pump and the other half related to its function. The questions were adapted from Heiser and Tversky ( 2002 ) in order to be clear and comprehensible for this age group.

The experiment was conducted over the course of two non-consecutive days during the normal school day and during regularly scheduled class time. On the first day, participants completed the MRT as a whole-class activity. After completing an untimed practice test, they were given 3 min for each of the two parts of the MRT. On the second day, occurring between two and four days after completing the MRT, participants were individually asked to study an actual bicycle tire pump and were then asked to generate explanations of its function. The participants were tested individually in a quiet room away from the rest of the class. In addition to the pump, each participant was one instruction sheet and one blank sheet of paper for their explanations. The post-test was given upon completion of the explanation. The instruction sheet was read aloud to participants and they were instructed to read along. The first set of instructions was as follows: “A bicycle pump is a mechanical device that pumps air into bicycle tires. First, take this bicycle pump and try to understand how it works. Spend as much time as you need to understand the pump.” The next set of instructions differed for participants in each condition. The instructions for the visual condition were as follows: “Then, we would like you to draw your own diagram or set of diagrams that explain how the bike pump works. Draw your explanation so that someone else who has not seen the pump could understand the bike pump from your explanation. Don’t worry about the artistic quality of the diagrams; in fact, if something is hard for you to draw, you can explain what you would draw. What’s important is that the explanation should be primarily visual, in a diagram or diagrams.” The instructions for the verbal condition were as follows: “Then, we would like you to write an explanation of how the bike pump works. Write your explanation so that someone else who has not seen the pump could understand the bike pump from your explanation.” All participants then received these instructions: “You may not use the pump while you create your explanations. Please return it to me when you are ready to begin your explanation. When you are finished with the explanation, you will hand in your explanation to me and I will then give you 16 true/false questions about the bike pump. You will not be able to look at your explanation while you complete the questions.” Study and test were untimed. All students finished within the 45-min class period.

Spatial ability

The mean score on the MRT was 10.56, with a median of 11. Boys scored significantly higher (M = 13.5, SD = 4.4) than girls (M = 8.8, SD = 4.5), F(1, 126) = 19.07, p  < 0.01, a typical finding (Voyer, Voyer, & Bryden, 1995 ). Participants were split into high or low spatial ability by the median. Low and high spatial ability participants were equally distributed in the visual and verbal groups.

Learning outcomes

It was predicted that high spatial ability participants would be better able to mentally animate the bicycle pump system and therefore score higher on the post-test and that post-test scores would be higher for those who created visual explanations. Table  1 shows the scores on the post-test by condition and spatial ability. A two-way factorial ANOVA revealed marginally significant main effect of spatial ability F(1, 124) = 3.680, p  = 0.06, with high spatial ability participants scoring higher on the post-test. There was also a significant interaction between spatial ability and explanation type F(1, 124) = 4.094, p  < 0.01, see Fig.  1 . Creating a visual explanation of the bicycle pump selectively helped low spatial participants.

Post-test scores, by explanation type and spatial ability

Explanation type
VisualVerbalTotal
Spatial abilityMeanSDMeanSDMeanSD
Low11.451.939.752.3110.602.27
High11.201.4711.601.8011.421.65
Total11.31.7110.742.23

An external file that holds a picture, illustration, etc.
Object name is 41235_2016_31_Fig1_HTML.jpg

Scores on the post-test by condition and spatial ability

Coding explanations

Explanations (see Fig.  2 ) were coded for structural and functional content, essential features, invisible features, arrows, and multiple steps. A subset of the explanations (20%) was coded by the first author and another researcher using the same coding system as a guide. The agreement between scores was above 90% for all measures. Disagreements were resolved through discussion. The first author then scored the remaining explanations.

An external file that holds a picture, illustration, etc.
Object name is 41235_2016_31_Fig2_HTML.jpg

Examples of visual and verbal explanations of the bicycle pump

Coding for structure and function

A maximum score of 12 points was awarded for the inclusion and labeling of six structural components: chamber, piston, inlet valve, outlet valve, handle, and hose. For the visual explanations, 1 point was given for a component drawn correctly and 1 additional point if the component was labeled correctly. For verbal explanations, sentences were divided into propositions, the smallest unit of meaning in a sentence. Descriptions of structural location e.g. “at the end of the piston is the inlet valve,” or of features of the components, e.g. the shape of a part, counted as structural components. Information was coded as functional if it depicted (typically with an arrow) or described the function/movement of an individual part, or the way multiple parts interact. No explanation contained more than ten functional units.

Visual explanations contained significantly more structural components (M = 6.05, SD = 2.76) than verbal explanations (M = 4.27, SD = 1.54), F(1, 126) = 20.53, p  < 0.05. The number of functional components did not differ between visual and verbal explanations as displayed in Figs.  3 and ​ and4. 4 . Many visual explanations (67%) contained verbal components; the structural and functional information in explanations was coded as depictive or descriptive. Structural and functional information were equally likely to be expressed in words or pictures in visual explanations. It was predicted that explanations created by high spatial participants would include more functional information. However, there were no significant differences found between low spatial (M = 5.15, SD = 2.21) and high spatial (M = 4.62, SD = 2.16) participants in the number of structural units or between low spatial (M = 3.83, SD = 2.51) and high spatial (M = 4.10, SD = 2.13) participants in the number of functional units.

An external file that holds a picture, illustration, etc.
Object name is 41235_2016_31_Fig3_HTML.jpg

Average number of structural and functional components in visual and verbal explanations

An external file that holds a picture, illustration, etc.
Object name is 41235_2016_31_Fig4_HTML.jpg

Visual and verbal explanations of chemical bonding

Coding of essential features

To further establish a relationship between the explanations generated and outcomes on the post-test, explanations were also coded for the inclusion of information essential to its function according to a 4-point scale (adapted from Hall et al., 1997 ). One point was given if both the inlet and the outlet valve were clearly present in the drawing or described in writing, 1 point was given if the piston inserted into the chamber was shown or described to be airtight, and 1 point was given for each of the two valves if they were shown or described to be opening/closing in the correct direction.

Visual explanations contained significantly more essential information (M = 1.78, SD = 1.0) than verbal explanations (M = 1.20, SD = 1.21), F(1, 126) = 7.63, p  < 0.05. Inclusion of essential features correlated positively with post-test scores, r = 0.197, p  < 0.05).

Coding arrows and multiple steps

For the visual explanations, three uses of arrows were coded and tallied: labeling a part or action, showing motion, or indicating sequence. Analysis of visual explanations revealed that 87% contained arrows. No significant differences were found between low and high spatial participants’ use of arrows to label and no signification correlations were found between the use of arrows and learning outcomes measured on the post-test.

The explanations were coded for the number of discrete steps used to explain the process of using the bike pump. The number of steps used by participants ranged from one to six. Participants whose explanations, whether verbal or visual, contained multiple steps scored significantly higher (M = 0.76, SD = 0.18) on the post-test than participants whose explanations consisted of a single step (M = 0.67, SD = 0.19), F(1, 126) = 5.02, p  < 0.05.

Coding invisible features

The bicycle tire pump, like many mechanical devices, contains several structural features that are hidden or invisible and must be inferred from the function of the pump. For the bicycle pump the invisible features are the inlet and outlet valves and the three phases of movement of air, entering the pump, moving through the pump, exiting the pump. Each feature received 1 point for a total of 5 possible points.

The mean score for the inclusion of invisible features was 3.26, SD = 1.25. The data were analyzed using linear regression and revealed that the total score for invisible parts significantly predicted scores on the post-test, F(1, 118) = 3.80, p  = 0.05.

In the first experiment, students learned the workings of a bicycle pump from interacting with an actual pump and creating a visual or verbal explanation of its function. Understanding the functionality of a bike pump depends on the actions and consequences of parts that are not visible. Overall, the results provide support for the use of learner-generated visual explanations in developing understanding of a new scientific system. The results show that low spatial ability participants were able to learn as successfully as high spatial ability participants when they first generated an explanation in a visual format.

Visual explanations may have led to greater understanding for a number of reasons. As discussed previously, visual explanations encourage completeness. They force learners to decide on the size, shape, and location of parts/objects. Understanding the “hidden” function of the invisible parts is key to understanding the function of the entire system and requires an understanding of how both the visible and invisible parts interact. The visual format may have been able to elicit components and concepts that are invisible and difficult to integrate into the formation of a mental model. The results show that including more of the essential features and showing multiple steps correlated with superior test performance. Understanding the bicycle pump requires understanding how all of these components are connected through movement, force, and function. Many (67%) of the visual explanations also contained written components to accompany their explanation. Arguably, some types of information may be difficult to depict visually and verbal language has many possibilities that allow for specificity. The inclusion of text as a complement to visual explanations may be key to the success of learner-generated explanations and the development of understanding.

A limitation of this experiment is that participants were not provided with detailed instructions for completing their explanations. In addition, this experiment does not fully clarify the role of spatial ability, since high spatial participants in the visual and verbal groups demonstrated equivalent knowledge of the pump on the post-test. One possibility is that the interaction with the bicycle pump prior to generating explanations was a sufficient learning experience for the high spatial participants. Other researchers (e.g. Flick, 1993 ) have shown that hands-on interactive experiences can be effective learning situations. High spatial ability participants may be better able to imagine the movement and function of a system (e.g. Hegarty, 1992 ).

Experiment 1 examined learning a mechanical system with invisible (hidden) parts. Participants were introduced to the system by being able to interact with an actual bicycle pump. While we did not assess participants’ prior knowledge of the pump with a pre-test, participants were randomly assigned to each condition. The findings have promising implications for teaching. Creating visual explanations should be an effective way to improve performance, especially in low spatial students. Instructors can guide the creation of visual explanations toward the features that augment learning. For example, students can be encouraged to show every step and action and to focus on the essential parts, even if invisible. The coding system shows that visual explanations can be objectively evaluated to provide feedback on students’ understanding. The utility of visual explanations may differ for scientific phenomena that are more abstract, or contain elements that are invisible due to their scale. Experiment 2 addresses this possibility by examining a sub-microscopic area of science: chemical bonding.

Experiment 2

In this experiment, we examine visual and verbal explanations in an area of chemistry: ionic and covalent bonding. Chemistry is often regarded as a difficult subject; one of the essential or inherent features of chemistry which presents difficulty is the interplay between the macroscopic, sub-microscopic, and representational levels (e.g. Bradley & Brand, 1985 ; Johnstone, 1991 ; Taber, 1997 ). In chemical bonding, invisible components engage in complex processes whose scale makes them impossible to observe. Chemists routinely use visual representations to investigate relationships and move between the observable, physical level and the invisible particulate level (Kozma, Chin, Russell, & Marx, 2002 ). Generating explanations in a visual format may be a particularly useful learning tool for this domain.

For this topic, we expect that creating a visual rather than verbal explanation will aid students of both high and low spatial abilities. Visual explanations demand completeness; they were predicted to include more information than verbal explanations, particularly structural information. The inclusion of functional information should lead to better performance on the post-test since understanding how and why atoms bond is crucial to understanding the process. Participants with high spatial ability may be better able to explain function since the sub-microscopic nature of bonding requires mentally imagining invisible particles and how they interact. This experiment also asks whether creating an explanation per se can increase learning in the absence of additional teaching by administering two post-tests of knowledge, one immediately following instruction but before creating an explanation and one after creating an explanation. The scores on this immediate post-test were used to confirm that the visual and verbal groups were equivalent prior to the generation of explanations. Explanations were coded for structural and functional information, arrows, specific examples, and multiple representations. Do the acts of selecting, integrating, and explaining knowledge serve learning even in the absence of further study or teaching?

Participants were 126 (58 female) eighth grade students, aged 13–14 years, with written parental consent and enrolled in the same independent school described in Experiment 1. None of the students previously participated in Experiment 1. As in Experiment 1, randomization occurred within-class, with participants assigned to either the visual or verbal explanation condition.

The materials consisted of the MRT (same as Experiment 1), a video lesson on chemical bonding, two versions of the instructions, the immediate post-test, the delayed post-test, and a blank page for the explanations. All paper materials were typed on 8.5 × 11 in. sheets of paper. Both immediate and delayed post-tests consisted of seven multiple-choice items and three free-response items. The video lesson on chemical bonding consisted of a video that was 13 min 22 s. The video began with a brief review of atoms and their structure and introduced the idea that atoms combine to form molecules. Next, the lesson showed that location in the periodic table reveals the behavior and reactivity of atoms, in particular the gain, loss, or sharing of electrons. Examples of atoms, their valence shell structure, stability, charges, transfer and sharing of electrons, and the formation of ionic, covalent, and polar covalent bonds were discussed. The example of NaCl (table salt) was used to illustrate ionic bonding and the examples of O 2 and H 2 O (water) were used to illustrate covalent bonding. Information was presented verbally, accompanied by drawings, written notes of keywords and terms, and a color-coded periodic table.

On the first of three non-consecutive school days, participants completed the MRT as a whole-class activity. On the second day (occurring between two and three days after completing the MRT), participants viewed the recorded lesson on chemical bonding. They were instructed to pay close attention to the material but were not allowed to take notes. Immediately following the video, participants had 20 min to complete the immediate post-test; all finished within this time frame. On the third day (occurring on the next school day after viewing the video and completing the immediate post-test), the participants were randomly assigned to either the visual or verbal explanation condition. The typed instructions were given to participants along with a blank 8.5 × 11 in. sheet of paper for their explanations. The instructions differed for each condition. For the visual condition, the instructions were as follows: “You have just finished learning about chemical bonding. On the next piece of paper, draw an explanation of how atoms bond and how ionic and covalent bonds differ. Draw your explanation so that another student your age who has never studied this topic will be able to understand it. Be as clear and complete as possible, and remember to use pictures/diagrams only. After you complete your explanation, you will be asked to answer a series of questions about bonding.”

For the verbal condition the instructions were: “You have just finished learning about chemical bonding. On the next piece of paper, write an explanation of how atoms bond and how ionic and covalent bonds differ. Write your explanation so that another student your age who has never studied this topic will be able to understand it. Be as clear and complete as possible. After you complete your explanation, you will be asked to answer a series of questions about bonding.”

Participants were instructed to read the instructions carefully before beginning the task. The participants completed their explanations as a whole-class activity. Participants were given unlimited time to complete their explanations. Upon completion of their explanations, participants were asked to complete the ten-question delayed post-test (comparable to but different from the first) and were given a maximum of 20 min to do so. All participants completed their explanations as well as the post-test during the 45-min class period.

The mean score on the MRT was 10.39, with a median of 11. Boys (M = 12.5, SD = 4.8) scored significantly higher than girls (M = 8.0, SD = 4.0), F(1, 125) = 24.49, p  < 0.01. Participants were split into low and high spatial ability based on the median.

The maximum score for both the immediate and delayed post-test was 10 points. A repeated measures ANOVA showed that the difference between the immediate post-test scores (M = 4.63, SD = 0.469) and delayed post-test scores (M = 7.04, SD = 0.299) was statistically significant F(1, 125) = 18.501, p  < 0.05). Without any further instruction, scores increased following the generation of a visual or verbal explanation. Both groups improved significantly; those who created visual explanations (M = 8.22, SD = 0.208), F(1, 125) = 51.24, p  < 0.01, Cohen’s d  = 1.27 as well as those who created verbal explanations (M = 6.31, SD = 0.273), F(1,125) = 15.796, p  < 0.05, Cohen’s d  = 0.71. As seen in Fig.  5 , participants who generated visual explanations (M = 0.822, SD = 0.208) scored considerably higher on the delayed post-test than participants who generated verbal explanations (M = 0.631, SD = 0.273), F(1, 125) = 19.707, p  < 0.01, Cohen’s d  = 0.88. In addition, high spatial participants (M = 0.824, SD = 0.273) scored significantly higher than low spatial participants (M = 0.636, SD = 0.207), F(1, 125) = 19.94, p  < 0.01, Cohen’s d  = 0.87. The results of the test of the interaction between group and spatial ability was not significant.

An external file that holds a picture, illustration, etc.
Object name is 41235_2016_31_Fig5_HTML.jpg

Scores on the post-tests by explanation type and spatial ability

Explanations were coded for structural and functional content, arrows, specific examples, and multiple representations. A subset of the explanations (20%) was coded by both the first author and a middle school science teacher with expertise in Chemistry. Both scorers used the same coding system as a guide. The percentage of agreement between scores was above 90 for all measures. The first author then scored the remainder of the explanations. As evident from Fig.  4 , the visual explanations were individual inventions; they neither resembled each other nor those used in teaching. Most contained language, especially labels and symbolic language such as NaCl.

Structure, function, and modality

Visual and verbal explanations were coded for depicting or describing structural and functional components. The structural components included the following: the correct number of valence electrons, the correct charges of atoms, the bonds between non-metals for covalent molecules and between a metal and non-metal for ionic molecules, the crystalline structure of ionic molecules, and that covalent bonds were individual molecules. The functional components included the following: transfer of electrons in ionic bonds, sharing of electrons in covalent bonds, attraction between ions of opposite charge, bonding resulting in atoms with neutral charge and stable electron shell configurations, and outcome of bonding shows molecules with overall neutral charge. The presence of each component was awarded 1 point; the maximum possible points was 5 for structural and 5 for functional information. The modality, visual or verbal, of each component was also coded; if the information was given in both formats, both were coded.

As displayed in Fig.  6 , visual explanations contained a significantly greater number of structural components (M = 2.81, SD = 1.56) than verbal explanations (M = 1.30, SD = 1.54), F(1, 125) = 13.69, p  < 0.05. There were no differences between verbal and visual explanations in the number of functional components. Structural information was more likely to be depicted (M = 3.38, SD = 1.49) than described (M = 0.429, SD = 1.03), F(1, 62) = 21.49, p  < 0.05, but functional information was equally likely to be depicted (M = 1.86, SD = 1.10) or described (M = 1.71, SD = 1.87).

An external file that holds a picture, illustration, etc.
Object name is 41235_2016_31_Fig6_HTML.jpg

Functional information expressed verbally in the visual explanations significantly predicted scores on the post-test, F(1, 62) = 21.603, p  < 0.01, while functional information in verbal explanations did not. The inclusion of structural information did not significantly predict test scores. As seen Fig.  7 , explanations created by high spatial participants contained significantly more functional components, F(1, 125) = 7.13, p  < 0.05, but there were no ability differences in the amount of structural information created by high spatial participants in either visual or verbal explanations.

An external file that holds a picture, illustration, etc.
Object name is 41235_2016_31_Fig7_HTML.jpg

Average number of structural and functional components created by low and high spatial ability learners

Ninety-two percent of visual explanations contained arrows. Arrows were used to indicate motion as well as to label. The use of arrows was positively correlated with scores on the post-test, r = 0.293, p  < 0.05. There were no significant differences in the use of arrows between low and high spatial participants.

Specific examples

Explanations were coded for the use of specific examples, such as NaCl, to illustrate ionic bonding and CO 2 and O 2 to illustrate covalent bonding. High spatial participants (M = 1.6, SD = 0.69) used specific examples in their verbal and visual explanations more often than low spatial participants (M = 1.07, SD = 0.79), a marginally significant effect F(1, 125) = 3.65, p  = 0.06. Visual and verbal explanations did not differ in the presence of specific examples. The inclusion of a specific example was positively correlated with delayed test scores, r = 0.555, p  < 0.05.

Use of multiple representations

Many of the explanations (65%) contained multiple representations of bonding. For example, ionic bonding and its properties can be represented at the level of individual atoms or at the level of many atoms bonded together in a crystalline compound. The representations that were coded were as follows: symbolic (e.g. NaCl), atomic (showing structure of atom(s), and macroscopic (visible). Participants who created visual explanations generated significantly more (M =1.79, SD = 1.20) than those who created verbal explanations (M = 1.33, SD = 0.48), F (125) = 6.03, p  < 0.05. However, the use of multiple representations did not significantly correlate with delayed post-test scores on the delayed post-test.

Metaphoric explanations

Although there were too few examples to be included in the statistical analyses, some participants in the visual group created explanations that used metaphors and/or analogies to illustrate the differences between the types of bonding. Figure  4 shows examples of metaphoric explanations. In one example, two stick figures are used to show “transfer” and “sharing” of an object between people. In another, two sharks are used to represent sodium and chlorine, and the transfer of fish instead of electrons.

In the second experiment, students were introduced to chemical bonding, a more abstract and complex set of phenomena than the bicycle pump used in the first experiment. Students were tested immediately after instruction. The following day, half the students created visual explanations and half created verbal explanations. Following creation of the explanations, students were tested again, with different questions. Performance was considerably higher as a consequence of creating either explanation despite the absence of new teaching. Generating an explanation in this way could be regarded as a test of learning. Seen this way, the results echo and amplify previous research showing the advantages of testing over study (e.g. Roediger et al., 2011 ; Roediger & Karpicke, 2006 ; Wheeler & Roediger, 1992 ). Specifically, creating an explanation requires selecting the crucial information, integrating it temporally and causally, and expressing it clearly, processes that seem to augment learning and understanding without additional teaching. Importantly, creating a visual explanation gave an extra boost to learning outcomes over and above the gains provided by creating a verbal explanation. This is most likely due to the directness of mapping complex systems to a visual-spatial format, a format that can also provide a natural check for completeness and coherence as well as a platform for inference. In the case of this more abstract and complex material, generating a visual explanation benefited both low spatial and high spatial participants even if it did not bring low spatial participants up to the level of high spatial participants as for the bicycle pump.

Participants high in spatial ability not only scored better, they also generated better explanations, including more of the information that predicted learning. Their explanations contained more functional information and more specific examples. Their visual explanations also contained more functional information.

As in Experiment 1, qualities of the explanations predicted learning outcomes. Including more arrows, typically used to indicate function, predicted delayed test scores as did articulating more functional information in words in visual explanations. Including more specific examples in both types of explanation also improved learning outcomes. These are all indications of deeper understanding of the processes, primarily expressed in the visual explanations. As before, these findings provide ways that educators can guide students to craft better visual explanations and augment learning.

General discussion

Two experiments examined how learner-generated explanations, particularly visual explanations, can be used to increase understanding in scientific domains, notably those that contain “invisible” components. It was proposed that visual explanations would be more effective than verbal explanations because they encourage completeness and coherence, are more explicit, and are typically multimodal. These two experiments differ meaningfully from previous studies in that the information selected for drawing was not taken from a written text, but from a physical object (bicycle pump) and a class lesson with multiple representations (chemical bonding).

The results show that creating an explanation of a STEM phenomenon benefits learning, even when the explanations are created after learning and in the absence of new instruction. These gains in performance in the absence of teaching bear similarities to recent research showing gains in learning from testing in the absence of new instruction (e.g. Roediger et al., 2011 ; Roediger & Karpicke, 2006 ; Wheeler & Roediger, 1992 ). Many researchers have argued that the retrieval of information required during testing strengthens or enhances the retrieval process itself. Formulating explanations may be an especially effective form of testing for post-instruction learning. Creating an explanation of a complex system requires the retrieval of critical information and then the integration of that information into a coherent and plausible account. Other factors, such as the timing of the creation of the explanations, and whether feedback is provided to students, should help clarify the benefits of generating explanations and how they may be seen as a form of testing. There may even be additional benefits to learners, including increasing their engagement and motivation in school, and increasing their communication and reasoning skills (Ainsworth, Prain, & Tytler, 2011 ). Formulating a visual explanation draws upon students’ creativity and imagination as they actively create their own product.

As in previous research, students with high spatial ability both produced better explanations and performed better on tests of learning (e.g. Uttal et al., 2013 ). The visual explanations of high spatial students contained more information and more of the information that predicts learning outcomes. For the workings of a bicycle pump, creating a visual as opposed to verbal explanation had little impact on students of high spatial ability but brought students of lower spatial ability up to the level of students with high spatial abilities. For the more difficult set of concepts, chemical bonding, creating a visual explanation led to much larger gains than creating a verbal one for students both high and low in spatial ability. It is likely a mistake to assume that how and high spatial learners will remain that way; there is evidence that spatial ability develops with experience (Baenninger & Newcombe, 1989 ). It is possible that low spatial learners need more support in constructing explanations that require imagining the movement and manipulation of objects in space. Students learned the function of the bike pump by examining an actual pump and learned bonding through a video presentation. Future work to investigate methods of presenting material to students may also help to clarify the utility of generating explanations.

Creating visual explanations had greater benefits than those accruing from creating verbal ones. Surely some of the effectiveness of visual explanations is because they represent and communicate more directly than language. Elements of a complex system can be depicted and arrayed spatially to reflect actual or metaphoric spatial configurations of the system parts. They also allow, indeed, encourage, the use of well-honed spatial inferences to substitute for and support abstract inferences (e.g. Larkin & Simon, 1987 ; Tversky, 2011 ). As noted, visual explanations provide checks for completeness and coherence, that is, verification that all the necessary elements of the system are represented and that they work together properly to produce the outcomes of the processes. Visual explanations also provide a concrete reference for making and checking inferences about the behavior, causality, and function of the system. Thus, creating a visual explanation facilitates the selection and integration of information underlying learning even more than creating a verbal explanation.

Creating visual explanations appears to be an underused method of supporting and evaluating students’ understanding of dynamic processes. Two obstacles to using visual explanations in classrooms seem to be developing guidelines for creating visual explanations and developing objective scoring systems for evaluating them. The present findings give insights into both. Creating a complete and coherent visual explanation entails selecting the essential components and linking them by behavior, process, or causality. This structure and organization is familiar from recipes or construction sets: first the ingredients or parts, then the sequence of actions. It is also the ingredients of theater or stories: the players and their actions. In fact, the creation of visual explanations can be practiced on these more familiar cases and then applied to new ones in other domains. Deconstructing and reconstructing knowledge and information in these ways has more generality than visual explanations: these techniques of analysis serve thought and provide skills and tools that underlie creative thought. Next, we have shown that objective scoring systems can be devised, beginning with separating the information into structure and function, then further decomposing the structure into the central parts or actors and the function into the qualities of the sequence of actions and their consequences. Assessing students’ prior knowledge and misconceptions can also easily be accomplished by having students create explanations at different times in a unit of study. Teachers can see how their students’ ideas change and if students can apply their understanding by analyzing visual explanations as a culminating activity.

Creating visual explanations of a range of phenomena should be an effective way to augment students’ spatial thinking skills, thereby increasing the effectiveness of these explanations as spatial ability increases. The proverbial reading, writing, and arithmetic are routinely regarded as the basic curriculum of school learning and teaching. Spatial skills are not typically taught in schools, but should be: these skills can be learned and are essential to functioning in the contemporary and future world (see Uttal et al., 2013 ). In our lives, both daily and professional, we need to understand the maps, charts, diagrams, and graphs that appear in the media and public places, with our apps and appliances, in forms we complete, in equipment we operate. In particular, spatial thinking underlies the skills needed for professional and amateur understanding in STEM fields and knowledge and understanding STEM concepts is increasingly required in what have not been regarded as STEM fields, notably the largest employers, business, and service.

This research has shown that creating visual explanations has clear benefits to students, both specific and potentially general. There are also benefits to teachers, specifically, revealing misunderstandings and gaps in knowledge. Visualizations could be used by teachers as a formative assessment tool to guide further instructional activities and scoring rubrics could allow for the identification of specific misconceptions. The bottom line is clear. Creating a visual explanation is an excellent way to learn and master complex systems.

Additional file

Post-tests. (DOC 44 kb)

Acknowledgments

The authors are indebted to the Varieties of Understanding Project at Fordham University and The John Templeton Foundation and to the following National Science Foundation grants for facilitating the research and/or preparing the manuscript: National Science Foundation NSF CHS-1513841, HHC 0905417, IIS-0725223, IIS-0855995, and REC 0440103. We are grateful to James E. Corter for his helpful suggestions and to Felice Frankel for her inspiration. The opinions expressed in this publication are those of the authors and do not necessarily reflect the views of the funders. Please address correspondence to Barbara Tversky at the Columbia Teachers College, 525 W. 120th St., New York, NY 10025, USA. Email: [email protected].

Authors’ contributions

This research was part of EB’s doctoral dissertation under the advisement of BT. Both authors contributed to the design, analysis, and drafting of the manuscript. Both authors read and approved the final manuscript.

Competing interests

The author declares that they have no competing interests.

  • Ainsworth SE, Bibby PA, Wood DJ. Examining the effects of different multiple representational systems in learning primary mathematics. Journal of the Learning Sciences. 2002; 11 (1):25–62. doi: 10.1207/S15327809JLS1101_2. [ CrossRef ] [ Google Scholar ]
  • Ainsworth, S. E., & Iacovides, I. (2005). Learning by constructing self-explanation diagrams. Paper presented at the 11th Biennial Conference of European Association for Resarch on Learning and Instruction, Nicosia, Cyprus.
  • Ainsworth SE, Loizou AT. The effects of self-explaining when learning with text or diagrams. Cognitive Science. 2003; 27 (4):669–681. doi: 10.1207/s15516709cog2704_5. [ CrossRef ] [ Google Scholar ]
  • Ainsworth S, Prain V, Tytler R. Drawing to learn in science. Science. 2011; 26 :1096–1097. doi: 10.1126/science.1204153. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Alesandrini KL. Pictorial-verbal and analytic-holistic learning strategies in science learning. Journal of Educational Psychology. 1981; 73 :358–368. doi: 10.1037/0022-0663.73.3.358. [ CrossRef ] [ Google Scholar ]
  • Aleven, V. & Koedinger, K. R. (2002). An effective metacognitive strategy: learning by doing and explaining with a computer-based cognitive tutor. Cognitive Science , 26 , 147–179.
  • Baenninger M, Newcombe N. The role of experience in spatial test performance: A meta-analysis. Sex Roles. 1989; 20 (5–6):327–344. doi: 10.1007/BF00287729. [ CrossRef ] [ Google Scholar ]
  • Bradley JD, Brand M. Stamping out misconceptions. Journal of Chemical Education. 1985; 62 (4):318. doi: 10.1021/ed062p318. [ CrossRef ] [ Google Scholar ]
  • Chi MT. Active-Constructive-Interactive: A conceptual framework for differentiating learning activities. Topics in Cognitive Science. 2009; 1 :73–105. doi: 10.1111/j.1756-8765.2008.01005.x. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chi MTH, DeLeeuw N, Chiu M, LaVancher C. Eliciting self-explanations improves understanding. Cognitive Science. 1994; 18 :439–477. [ Google Scholar ]
  • Craik F, Lockhart R. Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal Behavior. 1972; 11 :671–684. doi: 10.1016/S0022-5371(72)80001-X. [ CrossRef ] [ Google Scholar ]
  • Edens KM, Potter E. Using descriptive drawings as a conceptual change strategy in elementary science. School Science and Mathematics. 2003; 103 (3):135–144. doi: 10.1111/j.1949-8594.2003.tb18230.x. [ CrossRef ] [ Google Scholar ]
  • Flick LB. The meanings of hands-on science. Journal of Science Teacher Education. 1993; 4 :1–8. doi: 10.1007/BF02628851. [ CrossRef ] [ Google Scholar ]
  • Glenberg AM, Langston WE. Comprehension of illustrated text: Pictures help to build mental models. Journal of Memory and Language. 1992; 31 :129–151. doi: 10.1016/0749-596X(92)90008-L. [ CrossRef ] [ Google Scholar ]
  • Gobert JD, Clement JJ. Effects of student-generated diagrams versus student-generated summaries on conceptual understanding of causal and dynamic knowledge in plate tectonics. Journal of Research in Science Teaching. 1999; 36 :39–53. doi: 10.1002/(SICI)1098-2736(199901)36:1<39::AID-TEA4>3.0.CO;2-I. [ CrossRef ] [ Google Scholar ]
  • Hall VC, Bailey J, Tillman C. Can student-generated illustrations be worth ten thousand words? Journal of Educational Psychology. 1997; 89 (4):677–681. doi: 10.1037/0022-0663.89.4.677. [ CrossRef ] [ Google Scholar ]
  • Hausmann RGM, Vanlehn K. Explaining self-explaining: A contrast between content and generation. In: Luckin R, Koedinger KR, Greer J, editors. Artificial intelligence in education: Building technology rich learning contexts that work. Amsterdam: Ios Press; 2007. pp. 417–424. [ Google Scholar ]
  • Hegarty M. Mental animation: Inferring motion from static displays of mechanical systems. Journal of Experimental Psychology: Learning, Memory & Cognition. 1992; 18 :1084–1102. [ PubMed ] [ Google Scholar ]
  • Hegarty M, Carpenter PA, Just MA. Diagrams in the comprehension of scientific text. In: Barr R, Kamil MS, Mosenthal P, Pearson PD, editors. Handbook of reading research. New York: Longman; 1990. pp. 641–669. [ Google Scholar ]
  • Hegarty M, Just MA. Constructing mental models of machines from text and diagrams. Journal of Memory and Language. 1993; 32 :717–742. doi: 10.1006/jmla.1993.1036. [ CrossRef ] [ Google Scholar ]
  • Hegarty M, Kriz S, Cate C. The roles of mental animations and external animations in understanding mechanical systems. Cognition & Instruction. 2003; 21 (4):325–360. doi: 10.1207/s1532690xci2104_1. [ CrossRef ] [ Google Scholar ]
  • Heiser J, Tversky B. Diagrams and descriptions in acquiring complex systems. Proceedings of the Cognitive Science Society. Hillsdale: Erlbaum; 2002. [ Google Scholar ]
  • Hmelo-Silver C, Pfeffer MG. Comparing expert and novice understanding of a complex system from the perspective of structures, behaviors, and functions. Cognitive Science. 2004; 28 :127–138. doi: 10.1207/s15516709cog2801_7. [ CrossRef ] [ Google Scholar ]
  • Johnson CI, Mayer RE. Applying the self-explanation principle to multimedia learning in a computer-based game-like environment. Computers in Human Behavior. 2010; 26 :1246–1252. doi: 10.1016/j.chb.2010.03.025. [ CrossRef ] [ Google Scholar ]
  • Johnstone AH. Why is science difficult to learn? Things are seldom what they seem. Journal of Chemical Education. 1991; 61 (10):847–849. doi: 10.1021/ed061p847. [ CrossRef ] [ Google Scholar ]
  • Kessell AM, Tversky B. Visualizing space, time, and agents: Production, performance, and preference. Cognitive Processing. 2011; 12 :43–52. doi: 10.1007/s10339-010-0379-3. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kozhevnikov M, Hegarty M, Mayer R. Revising the Visualizer–Verbalizer Dimension: Evidence for Two Types of Visualizers. Cognition & Instruction. 2002; 20 :37–77. doi: 10.1207/S1532690XCI2001_3. [ CrossRef ] [ Google Scholar ]
  • Kozma R, Chin E, Russell J, Marx N. The roles of representations and tools in the chemistry laboratory and their implication for chemistry learning. Journal of the Learning Sciences. 2002; 9 (2):105–143. doi: 10.1207/s15327809jls0902_1. [ CrossRef ] [ Google Scholar ]
  • Larkin J, Simon H. Why a diagram is (sometimes) worth ten thousand words. Cognitive Science. 1987; 11 :65–100. doi: 10.1111/j.1551-6708.1987.tb00863.x. [ CrossRef ] [ Google Scholar ]
  • Leutner D, Leopold C, Sumfleth E. Cognitive load and science text comprehension: Effects of drawing and mentally imagining text content. Computers in Human Behavior. 2009; 25 :284–289. doi: 10.1016/j.chb.2008.12.010. [ CrossRef ] [ Google Scholar ]
  • Levine M. You-are-here maps: Psychological considerations. Environment and Behavior. 1982; 14 :221–237. doi: 10.1177/0013916584142006. [ CrossRef ] [ Google Scholar ]
  • Mayer RE. Systematic thinking fostered by illustrations in scientific text. Journal of Educational Psychology. 1989; 81 :240–246. doi: 10.1037/0022-0663.81.2.240. [ CrossRef ] [ Google Scholar ]
  • Mayer RE, Sims VK. For whom is a picture worth a thousand words? Extensions of a dual-coding theory of multimedia learning. Journal of Educational Psychology. 1994; 86 (3):389–401. doi: 10.1037/0022-0663.86.3.389. [ CrossRef ] [ Google Scholar ]
  • Perkins DN, Grotzer TA. Dimensions of causal understanding: The role of complex causal models in students’ understanding of science. Studies in Science Education. 2005; 41 :117–166. doi: 10.1080/03057260508560216. [ CrossRef ] [ Google Scholar ]
  • Roediger HL, Karpicke JD. Test enhanced learning: Taking memory tests improves long-term retention. Psychological Science. 2006; 17 :249–255. doi: 10.1111/j.1467-9280.2006.01693.x. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Roediger HL, Putnam AL, Smith MA. Ten benefits of testing and their applications to educational practice. In: Ross BH, editor. The psychology of learning and motivation. New York: Elsevier; 2011. pp. 1–36. [ Google Scholar ]
  • Roscoe RD, Chi MTH. Understanding tutor learning: Knowledge-building and knowledge-telling in peer tutors’ explanations and questions. Review of Educational Research. 2007; 77 :534–574. doi: 10.3102/0034654307309920. [ CrossRef ] [ Google Scholar ]
  • Schwamborn A, Mayer RE, Thillmann H, Leopold C, Leutner D. Drawing as a generative activity and drawing as a prognostic activity. Journal of Educational Psychology. 2010; 102 :872–879. doi: 10.1037/a0019640. [ CrossRef ] [ Google Scholar ]
  • Taber KS. Student understanding of ionic bonding: Molecular versus electrostatic framework? School Science Review. 1997; 78 (285):85–95. [ Google Scholar ]
  • Tversky B. Visualizing thought. Topics in Cognitive Science. 2011; 3 :499–535. doi: 10.1111/j.1756-8765.2010.01113.x. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tversky B, Heiser J, MacKenzie R, Lozano S, Morrison JB. Enriching animations. In: Lowe R, Schnotz W, editors. Learning with animation: Research implications for design. New York: Cambridge University Press; 2007. pp. 263–285. [ Google Scholar ]
  • Tversky B, Suwa M. Thinking with sketches. In: Markman AB, Wood KL, editors. Tools for innovation. Oxford: Oxford University Press; 2009. pp. 75–84. [ Google Scholar ]
  • Uttal DH, Meadow NG, Tipton E, Hand LL, Alden AR, Warren C, et al. The malleability of spatial skills: A meta-analysis of training studies. Psychological Bulletin. 2013; 139 :352–402. doi: 10.1037/a0028446. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Van Meter P. Drawing construction as a strategy for learning from text. Journal of Educational Psychology. 2001; 93 (1):129–140. doi: 10.1037/0022-0663.93.1.129. [ CrossRef ] [ Google Scholar ]
  • Vandenberg SG, Kuse AR. Mental rotations: A group test of three-dimensional spatial visualization. Perceptual Motor Skills. 1978; 47 :599–604. doi: 10.2466/pms.1978.47.2.599. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Voyer D, Voyer S, Bryden MP. Magnitude of sex differences in spatial abilities: A meta-analysis and consideration of critical variables. Psychological Bulletin. 1995; 117 :250–270. doi: 10.1037/0033-2909.117.2.250. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wheeler MA, Roediger HL. Disparate effects of repeated testing: Reconciling Ballard’s (1913) and Bartlett’s (1932) results. Psychological Science. 1992; 3 :240–245. doi: 10.1111/j.1467-9280.1992.tb00036.x. [ CrossRef ] [ Google Scholar ]
  • Wilkin J. Learning from explanations: Diagrams can “inhibit” the self-explanation effect. In: Anderson M, editor. Reasoning with diagrammatic representations II. Menlo Park: AAAI Press; 1997. [ Google Scholar ]
  • Wittrock MC. Generative processes of comprehension. Educational Psychologist. 1990; 24 :345–376. doi: 10.1207/s15326985ep2404_2. [ CrossRef ] [ Google Scholar ]
  • Zacks J, Tversky B. Bars and lines: A study of graphic communication. Memory and Cognition. 1999; 27 :1073–1079. doi: 10.3758/BF03201236. [ PubMed ] [ CrossRef ] [ Google Scholar ]

SEP home page

  • Table of Contents
  • Random Entry
  • Chronological
  • Editorial Information
  • About the SEP
  • Editorial Board
  • How to Cite the SEP
  • Special Characters
  • Advanced Tools
  • Support the SEP
  • PDFs for SEP Friends
  • Make a Donation
  • SEPIA for Libraries
  • Entry Contents

Bibliography

Academic tools.

  • Friends PDF Preview
  • Author and Citation Info
  • Back to Top

The Epistemology of Visual Thinking in Mathematics

Visual thinking is widespread in mathematical practice, and has diverse cognitive and epistemic purposes. This entry discusses potential roles of visual thinking in proving and in discovering, with some examples, and epistemic difficulties and limitations are considered. Also discussed is the bearing of epistemic uses of visual representations on the application of the a priori–a posteriori distinction to mathematical knowledge. A final section looks briefly at how visual means can aid comprehension and deepen understanding of proofs.

1. Introduction

2. historical background, 3.1 the reliability question, 3.2 visual means in non-formal proving, 3.3 a dispute: diagrams in proofs in analysis., 4.1 propositional discovery, 4.2 discovering a proof strategy, 4.3 discovering properties and kinds, 5. visual thinking and mental arithmetic, 6.1 evidential uses of visual experience, 6.2 an evidential use of visual experience in proving, 6.3 a non-evidential use of visual experience, 7. further uses of visual representations, 8. conclusion, other internet resources, related entries.

Visual thinking is a feature of mathematical practice across many subject areas and at many levels. It is so pervasive that the question naturally arises: does visual thinking in mathematics have any epistemically significant roles? A positive answer begets further questions. Can we rationally arrive at a belief with the generality and necessity characteristic of mathematical theorems by attending to specific diagrams or images? If visual thinking contributes to warrant for believing a mathematical conclusion, must the outcome be an empirical belief? How, if at all can visual thinking contribute to understanding abstract mathematical subject matter?

Visual thinking includes thinking with external visual representations (e.g., diagrams, symbol arrays, kinematic computer images) and thinking with internal visual imagery; often the two are used in combination, as when we are required to visually imagine a certain spatial transformation of an object represented by a diagram on paper or on screen. Almost always (and perhaps always) visual thinking in mathematics is used in conjunction with non-visual thinking. Possible epistemic roles include contributions to evidence, proof, discovery, understanding and grasp of concepts. The kinds and the uses of visual thinking in mathematics are numerous and diverse. This entry will deal with some of the topics in this area that have received attention and omit others. Among the omissions is the possible explanatory role of visual representations in mathematics. The topic of explanation within pure mathematics is tricky and best dealt with separately; for this an excellent starting place is the entry on explanation in mathematics (Mancosu 2011). Two other omissions are the development of logic diagrams (Euler, Venn, Pierce and Shin) and the nature and use of geometric diagrams in Euclid’s Elements , both of which are well treated in the entry diagrams (Shin et al. 2013). The focus here is on visual thinking generally, which includes thinking with symbol arrays as well as with diagrams; there will be no attempt here to formulate a criterion for distinguishing between symbolic and diagrammatic thinking. However, the use of visual thinking in proving and in various kinds of discovery will be covered in what follows. Discussions of some related questions and some studies of historical cases not considered here are to be found in the collection Diagrams in Mathematics: History and Philosophy (Mumma and Panza 2012).

“Mathematics can achieve nothing by concepts alone but hastens at once to intuition” wrote Kant (1781/9: A715/B743), before describing the geometrical construction in Euclid’s proof of the angle sum theorem (Euclid, Book 1, proposition 32). In a review of 1816 Gauss echoes Kant:

anybody who is acquainted with the essence of geometry knows that [the logical principles of identity and contradiction] are able to accomplish nothing by themselves, and that they put forth sterile blossoms unless the fertile living intuition of the object itself prevails everywhere. (Ewald 1996 [Vol. 1]: 300)

The word “intuition” here translates the German “ Anschauung ”, a word which applies to visual imagination and perception, though it also has more general uses.

By the late 19 th century a different view had emerged, at least in foundational areas. In a celebrated text giving the first rigorous axiomatization of projective geometry, Pasch wrote: “the theorem is only truly demonstrated if the proof is completely independent of the figure” (Pasch 1882), a view expressed also by Hilbert in writing on the foundations of geometry (Hilbert 1894). A negative attitude to visual thinking was not confined to geometry. Dedekind, for example, wrote of an overpowering feeling of dissatisfaction with appeal to geometric intuitions in basic infinitesimal analysis (Dedekind 1872, Introduction). The grounds were felt to be uncertain, the concepts employed vague and unclear. When such concepts were replaced by precisely defined alternatives without allusions to space, time or motion, our intuitive expectations turned out to be unreliable (Hahn 1933).

In some quarters this view turned into a general disdain for visual thinking in mathematics: “In the best books” Russell pronounced “there are no figures at all” (Russell 1901). Although this attitude was opposed by a few mathematicians, notably Klein (1893), others took it to heart. Landau, for example, wrote a calculus textbook without a single diagram (Landau 1934). But the predominant view was not so extreme: thinking in terms of figures was valued as a means of facilitating grasp of formulae and linguistic text, but only reasoning expressed by means of formulae and text could bear any epistemological weight.

By the late 20 th century the mood had swung back in favour of visualization: Mancosu (2005) provides an excellent survey. Some books advertise their defiance of anti-visual puritanism in their titles, for example Visual Geometry and Topology (Fomenko 1994) and Visual Complex Analysis (Needham 1997); mathematics educators turn their attention to pedagogical uses of visualization (Zimmerman and Cunningham 1991); the use of computer-generated imagery begins to bear fruit at research level (Hoffman 1987; Palais 1999), and diagrams find their way into research papers in abstract fields: see for example the papers on higher dimensional category theory by Joyal et al. (1996), Leinster (2004) and Lauda (2005, Other Internet Resources). But attitudes to the epistemology of visual thinking remain mixed. The discussion is mostly concerned with the role of diagrams in proofs.

3. Visual thinking and proof

In some cases, it is claimed, a picture alone is a proof (Brown 1999: ch. 3). But that view is rare. Even the editor of Proofs without Words: Exercises in Visual Thinking , writes “Of course, ‘proofs without words’ are not really proofs” (Nelsen 1993: vi). Expressions of the other extreme are rare but can be found:

[the diagram] has no proper place in the proof as such. For the proof is a syntactic object consisting only of sentences arranged in a finite and inspectable array. (Tennant 1986)

Between the extremes we find the view that, even if no picture alone is a proof, visual representations can have a non-superfluous role in reasoning that constitutes a proof. (This is not to deny that there may be another proof of the same conclusion which does not involve any visual representation.) Geometric diagrams, graphs and maps, all carry information. Taking valid deductive reasoning to be the reliable extraction of information from information already obtained, Barwise and Etchemendy (1996:4) pose the following question: Why cannot the representations composing a proof be visual as well as linguistic? The sole reason for denying this role to visual representations is the thought that, with the possible exception of very restricted cases, visual thinking is unreliable, hence cannot contribute to proof. Is that right?

Our concern here is thinking through the steps in a proof, either for the first time (a first successful attempt to construct a proof) or following a given proof. Clearly we want to distinguish between visual thinking which merely accompanies the process of thinking through the steps in a proof and visual thinking which is essential to the process. This is not always straightforward as a proof can be presented in different ways. How different can distinct presentations be and yet be presentations of the same proof? There is no context-invariant answer to this. Often mathematicians are happy to regard two presentations as presenting the same proof if the central idea is the same in both cases. But if one’s main concern is with what is involved in thinking through a proof, its central idea is not enough to individuate it: the overall structure, the sequence of steps and perhaps some other factors affecting the cognitive processes involved will be relevant.

Once individuation of proofs has been settled, we can distinguish between replaceable thinking and superfluous thinking, where these attributions are understood as relative to a given argument or proof. In the process of thinking through a proof, a given part of the thinking is replaceable if thinking of some other kind could stand in place of the given part in a process that would count as thinking through the same proof. A given part of the thinking is superfluous if its excision without replacement would be a process of thinking through the same proof. Superfluous thinking may be extremely valuable in facilitating grasp of the proof text and in enabling one to understand the idea underlying the proof steps; but it is not necessary for thinking through the proof.

It is uncontentious that the visual thinking involved in symbol manipulations, for example in following the “algebraic” steps of proofs of basic lemmas about groups, can be essential, that is neither superfluous nor replaceable. The worry is about thinking visually with diagrams, where “diagram” is used widely to include all non-symbolic visual representations. Let us agree that there can be superfluous diagrammatic thinking in thinking through a proof. This leaves several possibilities.

  • (a) All diagrammatic thinking in a process of thinking through a proof is superfluous.
  • (b) Not all diagrammatic thinking in a process of thinking through a proof is superfluous; but if not superfluous it will be replaceable by non-diagrammatic thinking.
  • (c) Some diagrammatic thinking in a process of thinking through a proof is neither superfluous nor replaceable by non-diagrammatic thinking.

The negative view stated earlier that diagrams can have no role in proof entails claim (a). The idea behind (a) is that, because diagrammatic reasoning is unreliable, if a process of thinking through an argument contains some non-superfluous diagrammatic thinking, that process lacks the epistemic security to be a case of thinking through a proof.

This view, claim (a) in particular, is threatened by cases in which the reliability of the diagrammatic thinking is demonstrated non-visually. The clearest kind of example would be provided by a formal system which has diagrams in place of formulas among its syntactic objects, and types of inter-diagram transition for inference rules. Suppose you take in such a formal system and an interpretation of it, and then think through a proof of the system’s soundness with respect to that interpretation; suppose you then inspect a sequence of diagrams, checking along the way that it constitutes a derivation in the system; suppose finally that you recover the interpretation to reach a conclusion. (The order is unimportant: one can go through the derivation first and then follow the soundness proof.) That entire process would constitute thinking through a proof of the conclusion; and the diagrammatic thinking involved would not be superfluous.

Shin et al. (2013) report that formal diagrammatic systems of logic and geometry have been proven to be sound. People have indeed followed proofs in these systems. That is enough to refute claim (a), the claim that all diagrammatic thinking in thinking through a proof is superfluous. For a concrete example, Figure 1 presents a derivation of Euclid’s first theorem, that on any straight line segment an equilateral triangle is constructible, in a formal diagrammatic system of a part of Euclidean geometry (Miller 2001).

[a three by three  array of rectangles each containing a diagram. Going left to right then top to bottom, the first has a line segment with each end having a dot.  The second is a circle with a radius drawn and dots on each end of the radius line segment. The third is the same the second except another overlapping circle is drawn using the same radius line segment but with the first circle's center dot now on the perimeter and the first circle's perimeter dot now the center of the second circle, dots are added at the two points the circles intersect.  The fourth diagram is identical to the third except a line segment is drawn from the top intersection dot to the first circle's center dot.  The fifth diagram is like the fourth except a line segment is drawn from the top intersection dot to the center  dot of the second circle.  ...]

What about Tennant’s claim that a proof is “a syntactic object consisting only of sentences” as opposed to diagrams? A proof is never a syntactic object. A formal derivation on its own is a syntactic object but not a proof. Without an interpretation of the language of the formal system the end-formula of the derivation says nothing; and so nothing is proved. Without a demonstration of the system’s soundness with respect to the interpretation, one may lack sufficient reason to believe that all derivable conclusions are true. A formal derivation plus an interpretation and soundness proof can be a proof of the derived conclusion, but that whole package is not a syntactic object. Moreover, the part of the proof which really is a syntactic object, the formal derivation, need not consist solely of sentences; it can consist of diagrams.

With claim (a) disposed of, consider again claim (b) that, while not all diagrammatic thinking in a process of thinking through a proof is superfluous, all non-superfluous diagrammatic thinking will be replaceable by non-diagrammatic thinking in a process of thinking through that same proof. The visual thinking in following the proof of Euclid’s first theorem using Miller’s formal system consists in going through a sequence of diagrams and at each step seeing that the next diagram results from a permitted alteration of the previous diagram. It is clear that in a process that counts as thinking through this proof, the diagrammatic thinking is neither superfluous nor replaceable by non-diagrammatic thinking. That knocks out (b), leaving only (c): some thinking that involves a diagram in thinking through a proof is neither superfluous nor replaceable by non-diagrammatic thinking (without changing the proof).

Mathematical practice almost never proceeds by way of formal systems. Outside the context of formal diagrammatic systems, the use of diagrams is widely felt to be unreliable. A diagram can be unfaithful to the described construction: it may represent something with a property that is ruled out by the description, or without a property that is demanded by the description. This is exemplified by diagrams in the famous argument for the proposition that all triangles are isosceles: the meeting point of an angle bisector and the perpendicular bisector of the opposite side is represented as falling inside the triangle, when it has to be outside (Rouse Ball 1939; Maxwell 1959). Errors of this sort are comparatively rare, usually avoidable with a modicum of care, and not inherent in the nature of diagrams; so they do not warrant a general charge of unreliability.

The major sort of error is unwarranted generalisation. Typically diagrams (and other non-verbal visual representations) do not represent their objects as having a property that is actually ruled out by the intention or specification of the object to be represented. But diagrams very frequently do represent their objects as having properties that, though not ruled out by the specification, are not demanded by it. Verbal descriptions can be discrete, in that they supply no more information than is needed. But visual representations are typically indiscrete, in that they supply too much detail. This is often unavoidable, because for many properties or kinds \(F\), a visual representation cannot represent something as being \(F\) without representing it as being \(F\) in a particular way . Any diagram of a triangle, for instance, must represent it as having three acute angles or as having just two acute angles, even if neither property is required by the specification, as would be the case if the specification were “Let ABC be a triangle”. As a result there is a danger that in using a diagram to reason about an arbitrary instance of class \(K\), we will unwittingly rely on a feature represented in the diagram that is not common to all instances of the class \(K\). Thus the risk of unwarranted generalisation is a danger inherent in the use of many diagrams.

Indiscretion of diagrams is not confined to geometrical figures. The dot or pebble diagrams of ancient mathematics used to convince one of elementary truths of number theory necessarily display particular numbers of dots, though the truths are general. Here is an example, used to justify the formula for the \(n\) th triangular number, i.e., the sum of the first \(n\) positive integers.

[a grid of blue dots 5 wide and 7 deep, on the right side is a brace embracing the right column labeled n+1 and on the bottom a brace embracing the bottom row labeled n]

The conclusion drawn is that the sum of integers from 1 to \(n\) is \((n \times n+1)/2\) for any positive integer \(n\), but the diagram presents the case for \(n = 6\). We can perhaps avoid representing a particular number of dots when we merely imagine a display of the relevant kind; or if a particular number is represented, our experience may not make us aware of the number—just as, when one imagines the sky on a starry night, for no particular number \(k\) are we aware that exactly \(k\) stars are represented. Even so, there is likely to be some extra specificity. For example, in imagining an array of dots of the form just illustrated, one is unlikely to imagine just two columns of three dots, the rectangular array for \(n = 2\). Typically the subject will be aware of imagining an array with more than two columns. This entails that an image is likely to have unintended exclusions. In this case it would exclude the three-by-two array. An image of a triangle representing all angles as acute would exclude triangles with an obtuse angle or a right angle. The danger is that the visual reasoning will not be valid for the cases that are unintentionally excluded by the visual representation, with the result that the step to the conclusion is an unwarranted generalisation.

What should we make of this? First, let us note that in a few cases the image or diagram will not be over-specific. When in geometry all instances of the relevant class are congruent to one another, for instance all circles or all squares, the image or diagram will not be over-specific for a generalisation about that class; so there will be no unintended exclusions and no danger of unwarranted generalisation. Here then are possibilities for reliable visual thinking in proving.

To get clear about the other cases, where there is a danger of over generalizing, it helps to look at generalisation in ordinary non-visual reasoning. Schematically put, in reasoning about things of kind \(K\), once we have shown that from certain premisses it follows that such-and-such a condition is true of arbitrary instance \(c\), we can validly infer from those same premisses that that condition is true of all \(K\)s, with the proviso that neither the condition nor any premiss mentions \(c\). The proviso is required, because if a premiss or the condition does mention \(c\), the reasoning may depend on a property of \(c\) that is not shared by all other \(K\)s and so the generalisation would be unsafe. For a trivial example consider a step from “\(x = c\)” to “\(\forall x [x = c]\)”.

A question we face is whether, in order to come to know the truth of a conclusion by following an argument involving generalisation on an arbitrary instance (a.k.a. universal generalisation, or universal quantifier introduction), the thinking must include a conscious, explicit check that the proviso is met. It is clearly not enough that the proviso is in fact met. For in that case it might just be the thinker’s good luck that the proviso is met; hence the thinker would not know that the generalisation is valid and so would not have genuinely thought through the proof at that step.

This leaves two options. The strict option is that without a conscious, explicit check one has not really thought through the proof. The relaxed option is that one can properly think through the proof without checking that the proviso is met, but only if one is sensitive to the potential error and would detect it in otherwise similar arguments. For then one is not just lucky that the proviso is met. Being sensitive in this context consists in being alert to dependence on features of the arbitrary instance not shared by all members of the class of generalisation, a state produced by a combination of past experience and current vigilance. Without compelling reason to prefer one of these options, decisions on what is to count as proving or following a proof must be conditional.

How does all this apply to generalizing from visual thinking about an arbitrary instance? Take the example of the visual route to the formula for triangular numbers using the diagram of Figure 2 . The diagram reveals that the formula holds for the 6 th triangular number. The generalisation to all triangular numbers is justified only if the visuo-spatial method used is applicable to the \(n\) th triangular number for all positive integers \(n\), that is, provided that the method used does not depend on a property not shared by all positive integers. A conscious, explicit check that this proviso is met requires making explicit the method exemplified for 6 and proving that the method is applicable for all positive integers in place of 6. (For a similar idea in the context of automating visual arguments, see Jamnik 2001). This is not done in practice when thinking visually, and so if we accept the strict option for thinking through a proof involving generalisation, we would have to accept that the visual route to the formula for triangular numbers does not amount to thinking through a proof of it; and the same would apply to the familiar visual routes to other general positive integer formulas, such as that \(n^2 =\) the sum of the first \(n\) odd numbers.

But what if the strict option for proving by generalisation on an arbitrary instance is too strict, and the relaxed option is right? When arriving at the formula in the visual way indicated, one does not pay attention to the fact that the visual display represents the situation for the 6 th triangular number; it is as if the mind had somehow extracted a general schema of visual reasoning from exposure to the particular case, and had then proceeded to reason schematically, converting a schematic result into a universal proposition. What is required, on the relaxed option, is sensitivity to the possibility that the schema is not applicable to all positive integers; one must be so alert to ways a schema of the given kind can fall short of universal applicability that if one had been presented with a schema that did fall short, one would have detected the failure.

In the example at hand, the schema of visual reasoning involves at the start taking a number \(k\) to be represented by a column of \(k\) dots, thence taking the triangular array of \(n\) columns to represent the sum of the first \(n\) positive integers, thence taking that array combined with an inverted copy to make a rectangular array of \(n\) columns of \(n+1\) dots. For a schema starting this way to be universally applicable, it must be possible, given any positive integer \(n\), for the sum of the first \(n\) positive integers to be represented in the form of a triangular array, so that combined with an inverted copy one gets a rectangular array. This actually fails at the extreme case: \(n = 1\). The formula \((n.(n + 1))/2\) holds for this case; but that is something we know by substituting “1” for the variable in the formula, not by the visual method indicated. That method cannot be applied to \(n = 1\), because a single dot does not form a triangular array, and combined with a copy it does not form a rectangular array. But we can check that the method works for all positive integers after the first, using visual reasoning to assure ourselves that it works for 2 and that if the method works for \(k\) it works for \(k+1\). Together with this reflective thinking, the visual thinking sketched earlier constitutes following a proof of the formula for the \(n\) th triangular number for all integers \(n > 1\), at least if the relaxed view of thinking through a proof is correct. Similar conclusions hold in the case of other “dot” arguments (Giaquinto 1993, 2007: ch. 8). So in some cases when the visual representation carries unwanted detail, the danger of over-generalisation in visual reasoning can be overcome.

But the fact that this is frequently missed by commentators suggests that the required sensitivity is often absent. Missing an untypical case is a common hazard in attempts at visual proving. A well-known example is the proof of Euler’s formula \(V - E + F = 2\) for polyhedra by “removing triangles” of a triangulated planar projection of a polyhedron. One is easily convinced by the thinking, but only because the polyhedra we normally think of are convex, while the exceptions are not convex. But it is also easy to miss a case which is not untypical or extreme when thinking visually. An example is Cauchy’s attempted proof (Cauchy 1813) of the claim that if a convex polygon is transformed into another polygon keeping all but one of the sides constant, then if some or all of the internal angles at the vertices increase, the remaining side increases, while if some or all of the internal angles at the vertices decrease, the remaining side decreases. The argument proceeds by considering what happens when one transforms a polygon by increasing (or decreasing) angles, angle by angle. But in a trapezoid, changing a single angle can turn a convex polygon into a concave polygon, and this invalidates the argument (Lyusternik 1963).

The frequency of such mistakes indicates that visual arguments (other than symbol manipulations) often lack the transparency required for proof. Even when a visual argument is in fact sound, its soundness may not be clear, in which case the argument is not a way of proving the truth of the conclusion, though it may be a way of discovering it. But this is consistent with the claim that visual non-symbolic thinking can be (and often is) part of a way of proving something.

An example from knot theory will substantiate the modal part of this claim. To present the example, we need some background information, which will be given with a minimum of technical detail.

A knot is a tame closed non-self-intersecting curve in Euclidean 3-space.

In other words, knots are just the tame curves in Euclidean 3-space which are homeomorphic to a circle. The word “tame” here stands for a property intended to rule out certain pathological cases, such as curves with infinitely nested knotting. There is more than one way of making this mathematically precise, but we have no need for these details. A knot has a specific geometric shape, size and axis-relative position. Now imagine it to be made of flexible yet unbreakable yarn that is stretchable and shrinkable, so that it can be smoothly transformed into other knots without cutting or gluing. Since our interest in a knot is the nature of its knottedness regardless of shape, size or axis-relative position, the real focus of interest is not just the knot but all its possible transforms. A way to think of this is to imagine a knot transforming continuously, so that every possible transform is realized at some time. Then the thing of central interest would be the object that persists over time in varying forms, with knots strictly so called being the things captured in each particular freeze frame. Mathematically, we represent the relevant entity as an equivalence class of knots.

Two knots are equivalent iff one can be smoothly deformed into the other by stretching, shrinking, twisting, flipping, repositioning or in any other way that does not involve cutting, gluing or passing one strand through another.

The relevant kind of deformation forbids eliminating a knotted part by shrinking it down to a point. Again there are mathematically precise definitions of knot-equivalence. Figure 3 gives diagrams of equivalent knots, instances of a trefoil.

[a closed line which goes under, over, under, over, under, over itself forming a shape with three nodes]

Diagrams like these are not merely illustrations; they also have an operational role in knot theory. But not any picture of a knot will do for this purpose. We need to specify:

A knot diagram is a regular projection of a knot onto a plane which, when there is a crossing, tells us which strand passes over the other.

Regularity here is a combination of conditions. In particular, regularity entails that not more than two points of the strict knot project to the same point on the plane, and that two points of the strict knot project to the same point on the plane only where there is a crossing. For more on diagrams in knot theory see (De Toffoli and Giardino 2014).

A major task of knot theory is to find ways of telling whether two knot diagrams are diagrams of equivalent knots. In particular we will want to know if a given knot diagram represents a knot equivalent to an unknot , that is, a knot representable by a knot diagram without crossings.

One way of showing that a knot diagram represents a knot equivalent to an unknot is to show that the diagram can be transformed into one without crossings by a sequence of atomic moves, known as Reidemeister moves. The relevant background fact is Reidemeister’s theorem, which links the visualizable diagrammatic changes to the mathematically precise definition of knot equivalence: Two knots are equivalent if and only if there is a finite sequence of Reidemeister moves taking a knot diagram of one to a knot diagram of the other. Figure 4 illustrates. Each knot diagram is changed into the adjacent knot diagram by a Reidemeister move; hence the knot represented by the leftmost diagram is equivalent to the unknot.

[a closed line that goes under, under,  under, over, over, over but forming otherwise a shape much like figure 3a]

In contrast to these, the knot presented by the left knot diagram of Figure 3 , a trefoil, may seem impossible to deform into an unknot. And in fact it is. To prove it, we can use a knot invariant known as colourability. An arc in a knot diagram is a maximal part between crossings (or the whole thing if there are no crossings). Colourability is this:

A knot diagram is colourable if and only if each of its arcs can be coloured one of three different colours so that (a) at least two colours are used and (b) at each crossing the three arcs are all coloured the same or all coloured differently.

The reference to colours here is inessential. Colourability is in fact a specific case of a kind of combinatorial property known as mod \(p\) labelling (for \(p\) an odd prime). Colourability is a knot invariant in the sense that if one diagram of a knot is colourable every diagram of that knot and of any equivalent knot is colourable. (By Reidemeister’s theorem this can be proved by showing that each Reidemeister move preserves colourability.) A standard diagram of an unknot, a diagram without crossings, is clearly not colourable because it has only one arc (the whole thing) and so two colours cannot be used. So in order to complete proving that the trefoil is not equivalent to an unknot, we only need prove that our trefoil diagram is colourable. This can be done visually. Colour each arc of the knot diagram one of the three colours red, green or blue so that no two arcs have the same colour (or visualize this). Then do a visual check of each crossing, to see that at each crossing the three meeting arcs are all coloured differently. That visual part of the proof is clearly non-superfluous and non-replaceable (without changing the proof). Moreover, the soundness of the argument is quite transparent. So here is a case of a non-formal, non-symbolic visual way of proving a mathematical truth.

Where notions involving the infinite are in play, such as many involving limits, the use of diagrams is famously risky. For this reason it has been widely thought that, beyond some very simple cases, arguments in real and complex analysis in which diagrams have a non-superfluous role are not genuine proofs. Bolzano [1817] expressed this attitude with regard to the intermediate value theorem for the real numbers (IVT) before giving a purely analytic proof, arguing that spatial thinking could not be used to help justify the IVT. James Robert Brown (1999) takes issue with Bolzano on this point. The IVT is this:

If \(f\) is a real-valued function of a real variable continuous on the closed interval \([a, b]\) and \(f(a) < c < f(b)\), then for some \(x\) in \((a, b), f(x) = c\).

Brown focuses on the special case when \(c = 0\). As the IVT can be deduced easily from this special case using the theorem that the difference of two continuous functions is continuous, there is no loss of generality here. Alluding to a diagram like Figure 5, Brown (1999) writes

We have a continuous line running from below to above the \(x\)-axis. Clearly, it must cross that axis in doing so. (1999: 26)

Later he claims:

Using the picture alone, we can be certain of this result—if we can be certain of anything. (1999: 28)

[a first quadrant graph, the x-axis labeled near the left with 'a' and near the right with 'b'; the y-axis labeled at the top with 'f(b)', in the middle with 'c' and near the bottom with 'f(a)'.  A dotted horizontal line lines up with the 'c'.  A solid curve starts the intersection of 'f(b)' and 'a', rambles horizontally for a short while before rising above the 'c' dotted line, dips below then rises again and ending at the intersection of 'f(b)' and 'b'. ]

Bolzano’s diagram-free proof of the IVT is an argument from what later became known as the Dedekind completeness of the real numbers: every non-empty set of reals bounded above (below) has a least upper bound (greatest lower bound). The value of Bolzano’s deduction of the IVT from the Dedekind completeness of the reals, according to Brown, is not that it proves the IVT but that it gives us confirmation of Dedekind completeness, just as an empirical hypothesis in empirical science gets confirmed by deducing some consequence of the hypothesis and observing those consequence to be true. This view assumes that we already know the IVT to be true by observing a diagram relevantly like Figure 5 .

That assumption is challenged by Giaquinto (2011). Once we distinguish graphical concepts from associated analytic concepts, the underlying argument from the diagram is essentially this.

  • 1. Any function \(f\) which is \(\varepsilon\textrm{-}\delta\) continuous on \([a, b]\) with \(f (a) < 0 < f (b)\) has a visually continuous graphical curve from below the horizontal line representing the \(x\)-axis to above.
  • 2. Any visually continuous graphical curve from below a horizontal line to above it meets the line at a crossing point.
  • 3. Any function whose graphical curve meets the line representing the \(x\)-axis at a crossing point has a zero value.
  • 4. So, any \(\varepsilon\textrm{-}\delta\) continuous function \(f\) on \([a, b]\) with \(f (a) < 0< f (b)\) has a zero value.

What is inferred from the diagram is premiss 2. Premisses 1 and 3 are assumptions linking analytical with graphical conditions. These linking assumptions are disputed. With regard to premiss 1 Giaquinto (2011) argues that there are functions on the reals which meet the antecedent condition but do not have graphical curves, such as continuous but nowhere differentiable functions and functions which oscillate with unbounded frequency e.g., \(f(x) = x \cdot\sin(1/x)\) for non-zero \(x\) in \([-1, 1]\) and \(f(0) = 0\).

With regard to premiss 3 it is argued that, under the standard conventions of graphical representation of functions in a Cartesian co-ordinate frame, the graphical curve for \(x^2 - 2\) in the rationals is the same as the graphical curve for \(x^2- 2\) in the reals. This is because every real is a limit point of rationals; so for every point \(P\) with one or both co-ordinates irrational, there are points arbitrarily close to \(P\) with both co-ordinates rational; so no gaps would appear if irrational points were removed from the curve for \(x^2- 2\) in the reals. But for \(x\) in the rational interval [0, 2] the function \(x^2- 2\) has no zero value, even though it has a graphical curve which visually crosses the line representing the \(x\)-axis. So one cannot read off the existence of a zero of \(x^2- 2\) on the reals from the diagram; one needs to appeal to some property of the reals which the rationals lack, such as Dedekind completeness.

This raises some obvious questions. Do any theorems of analysis have proofs in which diagrams have a non-superfluous role? Littlewood (1953: 54–5) thought so and gives an example which is examined in Giaquinto (1994). If so, can we demarcate this class of theorems by some mathematical feature of their content? Another question is whether there is a significantly broad class of functions on the reals for which we could prove an intermediate value theorem (i.e., restricted to that class).

If there are theorems of analysis provable with diagrams we do not yet have a mathematical demarcation criterion for them. A natural place to look would be O-minimal structures on the reals—this was brought to the author’s attention by Ethan Galebach. This is because of some remarkable theorems about such structures which exclude all the pathological (hence vision-defying) functions on the reals (Van den Dries 1998), such as continuous nowhere differentiable functions and “space-filling” curves i.e., continuous surjections \(f:(0, 1)\rightarrow(0, 1)^2\). Is the IVT for functions in an O-minimal structure on the reals provable by visual means? Certainly one objection to the visual argument for the unrestricted IVT does not apply when the restriction is in place. This is the objection that continuous nowhere differentiable functions, having no graphical curve, provide counterexamples to the premiss that any \(\varepsilon\textrm{-}\delta\) continuous function \(f\) on \([a, b]\) with \(f (a) < c < f (b)\) has a visually continuous graphical curve from below the horizontal line representing \(y = c\) to above. But the existence of continuous functions with no graphical curve is not the only objection to the visual argument, contrary to a claim of Azzouni (2013: 327). There are also counterexamples to the premiss that any function that does have a graphical curve which visibly crosses the line representing \(y = c\) takes \(c\) as a value, e.g., the function \(x^2 - 2\) on the rationals with \(c = 0\). So the question of a visual proof of the IVT restricted to functions in an O-minimal structure on the reals is still open at the time of writing.

4. Visual thinking and discovery

Though philosophical discussion of visual thinking in mathematics has concentrated on its role in proof, visual thinking may be more valuable for discovery than proof. Three kinds of discovery important in mathematical practice are these:

  • (1) propositional discovery (discovering, of a proposition, that it is true),
  • (2) discovering a proof strategy (or more loosely, getting the idea for a proof of a proposition), and
  • (3) discovering a property or kind of mathematical entity.

In the following subsections visual discovery of these kinds will be discussed and illustrated.

To discover a truth, as that expression is being used here, is to come to believe it by one’s own lights (as opposed to reading it or being told) in a way that is reliable and involves no violation of epistemic rationality (given one’s epistemic state). One can discover a truth without being the first to discover it (in this context); it is enough that one comes to believe it in an independent, reliable and rational way. The difference between merely discovering a truth and proving it is a matter of transparency: for proving or following a proof the subject must be aware of the way in which the conclusion is reached and the soundness of that way; this is not required for discovery.

Sometimes one discovers something by means of visual thinking using background knowledge, resulting in a cogent argument from which one could construct a proof. A nice example is a visual argument that any knot diagram with a finite number of crossings can be turned into a diagram of an unknot by interchanging the over-strand and under-strand of some of its crossings (Adams 2001: 58–90). That argument is a bit too long to present accessibly here. For a short example, here is a way of discovering that the geometric mean of two positive numbers is less than or equal to their arithmetic mean (Eddy 1985) using Figure 6.

[two circles of differing sizes next to each other and touching at one point, the larger left circle has a vertical diameter line drawn and adjacent, parallel on the left is a double arrow headed line labelled 'a'.  The smaller circle has a similar vertical diameter line with a double arrow headed line labelled 'b' to the right.  The bottom of the diameter lines are connected by a double headed arrow line labeled 'square root of (ab)'. Another line connects the centers of both circles and has a parallel double arrow headed line labeled '(a+b)/2'.  A dashed horizontal line goes horizontally from the center of the smaller circle until it hits the diameter line of the larger circle.  Between this intersection  and the center of the larger circle is a double arrow headed line labeled '(a-b)/2'.]

Two circles (with diameters \(a\) and \(b\)) meet at a single point. A line is drawn between their centres through their common point; its length is \((a + b)/2\), the sum of the two radii. This line is the hypotenuse of a right angled triangle with one other side of length \((a - b)/2\), the difference of the radii. Pythagoras’s theorem is used to infer that the remaining side of the right-angled triangle has length \(\sqrt{(ab)}\).Then visualizing what happens to the triangle when the diameter of the smaller circle varies between 0 and the diameter of the larger circle, one infers that \(0 < \sqrt{(ab)} < (a + b)/2\); then verifying symbolically that \(\sqrt{(ab)} = (a + b)/2\) when \(a = b\), one concludes that for positive \(a\) and \(b\), \(\sqrt{(ab)} \le (a + b)/2\).

This thinking does not constitute a case of proving or following a proof of the conclusion, because it involves a step which we cannot clearly tell is valid. This is the step of attempting to visually imagine what would happen when the smaller circle varies in diameter between 0 and the diameter of the larger circle and inferring from the resulting experience that the line joining the centres of the circles will always be longer than the horizontal line from the centre of the smaller circle to the vertical diameter of the larger circle. This step seems sound (does not lead us into error) and may be sound; but its soundness is opaque. If in fact it is sound, the whole thinking process is a reliable way of reaching the conclusion; so in the absence of factors that would make it irrational to trust the thinking, it would be a way of discovering the conclusion to be true.

In some cases visual thinking inclines one to believe something on the basis of assumptions suggested by the visual representation that remain to be justified given the subject’s current knowledge. In such cases there is always the danger that the subject takes the visual representation to show the correctness of the assumptions and ends up with an unwarranted belief. In such a case, even if the belief is true, the subject has not made a discovery, as the means of belief-acquisition is unreliable. Here is an example using Figure 7 (Montuchi and Page 1988).

[A first quadrant graph, on the x-axis are marked (2 squareroot(k), 0) and further to the right (j,0).  On the y-axis is marked (0,2(squareroot(k)) and further up, (0,j).  Solid lines connect (0,2(squareroot(k)) to (2(squareroot(k),0)  and (0,j) to (j,0).  A dotted line goes from the origin in a roughly 45 degree angle the point where it intersects the (0,2(squareroot(k)) to (2(squareroot(k),0) line is labeled (squareroot(k),squareroot(k)).  A curve tangent to that point with one end heading up and the other right is labeled 'xy=k'.]

Using this diagram one can come to think the following about the real numbers. When for a constant \(k\) the positive values of \(x\) and \(y\) are constrained to satisfy the equation \(x \cdot y = k\), the positive values of \(x\) and \(y\) for which \(x + y\) is minimal are \(x = \sqrt{k} = y\). (Let “#” denote this claim.)

Suppose that one knows the conventions for representing functions by graphs in a Cartesian co-ordinate system, knows also that the diagonal represents the function \(y = x\), and that a line segment with gradient –1 from \((0, b)\) to \((b, 0)\) represents the function \(x + y = b\). Then looking at the diagram may incline one to think that for no positive value of \(x\) does the value of \(y\) in the function \(x\cdot y = k\) fall below the value of \(y\) in \(x + y = 2\sqrt{k}\), and that these functions coincide just at the diagonal. From these beliefs the subject may (correctly) infer the conclusion #. But mere attention to the diagram cannot warrant believing that, for a given positive \(x\)-value, the \(y\)-value of \(x\cdot y = k\) never falls below the \(y\)-value of \(x + y = 2\sqrt{k}\) and that the functions coincide just at the diagonal; for the conventions of representation do not rule out that the curve of \(x\cdot y = k\) meets the curve of \(x + y = 2\sqrt{k}\) at two points extremely close to the diagonal, and that the former curve falls under the latter in between those two points. So the visual thinking is not in this case a means of discovering proposition #.

But it is useful because it provides the idea for a proof of the conclusion—one of the major benefits of visual thinking in mathematics. In brief: for each equation \((x\cdot y = k\); \(x + y = 2\sqrt{k})\) if \(x = y\), their common value is \(\sqrt{k}\). So the functions expressed by those equations meet at the diagonal. To show that, for a fixed positive \(x\)-value, the \(y\)-values of \(x\cdot y = k\) never fall below the \(y\)-values of \(x + y = 2\sqrt{k}\), it suffices to show that \(2\sqrt{k} - x \le k/x\). As a geometric mean is less than or equal to the corresponding arithmetic mean, \(\sqrt{[x \cdot (k/x)]} \le [x + (k/x)]/2\). So \(2\sqrt{k} \le x + (k/x)\). So \(2\sqrt{k} - x \le k/x\).

In this example, visual attention to, and reasoning about, the diagram is not part of a way of discovering the conclusion. But if it gave one the idea for the argument just given, it would be part of what led to a way of discovering the conclusion, and that is important.

Can visual thinking lead to discovery of an idea for a proof in more advanced contexts? Yes. Carter (2010) gives an example from free probability theory. The case is about certain permutations (those denoted by “\(p\)” with a circumflex in Carter 2010) on a finite set of natural numbers. Using specific kinds of diagram, easily seen properties of the diagrams lead one naturally to certain properties of the permutations (crossing and non-crossing, having neighbouring pairs), and to a certain operation (cancellation of neighbouring pairs). All of these have algebraic definitions, but the ideas defined were noticed by thinking in terms of the diagrams. For the relevant permutations \(\sigma\), \(\sigma(\sigma(n)) = n\); so a permutation can be represented by a set of lines joining dots. The permutations represented on the left and right in Figure 8 are non-crossing and crossing respectively, the former with neighbouring pairs \(\{2, 3\}\) and \(\{6, 7\}\).

[a circle with 8 points on the circumference, a point at about 45 degrees is labeled '1', at 15 degrees, '2', at -15 degrees '3', at -45 degrees '4', at -135 degrees '5', at -165 degrees '6', at 165 degrees '7', and at 135 degrees '8'.  Smooth curves in the interior of the circle connect point 1 to 4, 2 to 3, 5 to 8, and 6 to 7.]

A permutation \(\sigma\) of \(\{1, 2, \ldots, 2p\}\) is defined to have a crossing just when there are \(a\), \(b\), \(c\), \(d\) in \(\{1, 2, \ldots, 2p\}\) such that \(a < b < c < d\) and \(\sigma(a) = c\) and \(\sigma(b) = d\). The focus is on the proof of a theorem which employs this notion. (The theorem is that when a permutation of \(\{1, 2, \ldots, 2p\}\) of the relevant kind is non-crossing, there will be exactly \(p+1\) R-equivalence classes, where \(R\) is a certain equivalence relation on \(\{1, 2, \ldots, 2p\}\) defined in terms of the permutation.) Carter says that the proofs of some lemmas “rely on a visualization of the setup”, in that to grasp the correctness of one or more of the steps one needs to visualize the situation. There is also a nice example of some reasoning in terms of a diagram which gives the idea for a proof (“suggests a proof strategy”) for the lemma that every non-crossing permutation has a neighbouring pair. Reflection on a diagram such as Figure 9 does the work.

[A circle, a dashed interior curve connects an unmarked point at about 40 degrees to an unmarked point at -10 degrees (the second point is labeled 'j+1').  Another dashed interior curve connects this point to an unmarked point at about -100 degrees.  A solid interior curve connects and unmarked point at about 10 degrees (labeled 'j') to another unmarked point at about -60 degrees (labeled 'j+a').  Between the labels 'j+1' and 'j+a' is another label 'j+2' and then a dotted line between 'j+2' and 'j+a'.]

The reasoning is this. Suppose that \(\pi\) has no neighbouring pair. Choose \(j\) such that \(\pi(j) - j = a\) is minimal, that is, for all \(k, \pi(j) - j \le \pi(k) - k\). As \(\pi\) has no neighbouring pair, \(\pi(j+1) \ne j\). So either \(\pi(j+1)\) is less than \(j\) and we have a crossing, or by minimality of \(\pi(j) - j\), \(\pi(j+1)\) is greater than \(j+a\) and again we have a crossing. Carter reports that this disjunction was initially believed by thinking in term of the diagram, and the proof of the lemma given in the published paper is a non-diagrammatic “version” of that reasoning. In this case study, visual thinking is shown to contribute to discovery in several ways; in particular, by leading the mathematicians to notice crucial properties—the “definitions are based on the diagrams”—and in giving them the ideas for parts of the overall proof.

In this section I will illustrate and then discuss the use of visual thinking in discovering kinds of mathematical entity, by going through a few of the main steps leading to geometric group theory, a subject which really took off in the 1980s through the work of Mikhail Gromov. The material is set out nicely in greater depth in Starikova (2012).

Sometimes it can be fruitful to think of non-spatial entities, such as algebraic structures, in terms of a spatial representation. An example is the representation of a finitely generated group by a Cayley graph. Let \((G, \cdot)\) be a group and \(S\) a finite subset of \(G\). Let \(S^{-1}\) be the set of inverses of members of \(S\). Then \((G, \cdot)\) is generated by \(S\) if and only if every member of \(G\) is the product (with respect to \(\cdot\)) of members of \(S\cup S^{-1}\). In that case \((G, \cdot, S)\) is said to be a finitely generated group. Here are a couple of examples.

First consider the group \(S_{3}\) of permutations of 3 elements under composition. Letting \(\{a, b, c\}\) be the elements, all six permutations can be generated by \(\rf\) and \(\rr\) where

\(\rf\) (for “flip”) fixes a and swaps \(b\) with \(c\), i.e., it takes to \(\langle a, b, c\rangle\) to \(\langle a, c, b\rangle\), and

\(\rr\) (for “rotate”) takes \(\langle a, b, c\rangle\) to \(\langle c, a, b\rangle\).

The Cayley graph for \((S_{3}, \cdot, \{\rf, \rr\})\) is a graph whose vertices represent the members of \(S_{3}\) and two “colours” of directed edges, representing composition with \(\rf\) and composition with \(\rr\). Figure 10 illustrates: red directed edges represent composition with \(\rr\) and black edges represent composition with \(\rf\). So a red edge from a vertex \(\rv\) representing \(\rs\) in \(S_{3}\) ends at a vertex representing \(\rs\rr\) and a black edge from \(\rv\) ends at a vertex representing \(\rs\rf\). (Notation: “\(\rs\rr\)” abbreviates “\(\rs \cdot \rr\)” which here denotes “\(\rs\) followed by \(\rr\)”; same for “\(\rf\)” in place of “\(\rr\)”.) A black edge has arrowheads both ways because \(\rf\) is its own inverse, that is, flipping and flipping again takes you back to where you started. (Sometimes a pair of edges with arrows in opposite directions is used instead.) The symbol “\(\re\)” denotes the identity.

[Two red equilateral triangles, one inside the other.  The smaller triangle has arrows on each side pointing in a clockwise direction; the larger has arrows on each side in a counterclockwise direction.  Black double arrow lines connect the respective vertices of each triangle.  The top vertice of the outside triangle is labeled 'e', of the inside triangle 'f'; the bottom left vertice of the outside triangle is labeled 'r', of the inside triangle 'r'; the bottom right vertix of the outside triangle is labeled with 'rr',of the inside triangle with 'fr'.]

An example of a finitely generated group of infinite order is \((\mathbb{Z}, +, \{1\})\). We can get any integer by successively adding 1 or its additive inverse \(-1\). Since 3 added to the inverse of 2 is 1, and 2 added to the inverse of 3 is \(-1\), we can get any integer by adding members of \(\{2, 3\}\) and their inverses. Thus both \(\{1\}\) and \(\{2, 3\}\) are generating sets for \((\mathbb{Z}, +)\). Figure 11 illustrates part of the Cayley graph for \((\mathbb{Z}, +, \{2, 3\})\). The horizontal directed edges represent +2. The directed edges ascending or descending obliquely represent \(+3\).

[Two horizontal parallel black lines with directional arrows pointing to the right. The top line has equidistant points marked '-2', '0', '2', '4' and the bottom line equidistant points marked '-1' (about half way between the upper line's '-2' and '0'), '1', '3', '5'.  A  red arrow goes from '-2' to '1', from somewhere to the left up to '0', from '0' to '3',  from '-1' to '2', from '1' to '4, from '2' to '5', and from '3' to somewhere to the right up.]

Another example of a generated group of infinite order is \(F_2\), the free group generated by a pair of members. The first few iterations of its Cayley graph are shown in Figure 12, where \(\{a, b\}\) is the set of generators and a right horizontal move between adjacent vertices represents composition with \(a\), an upward vertical move represents composition with \(b\), and leftward and downward moves represent composition with the inverse of \(a\) and the inverse of \(b\) respectively. The central vertex represents the identity.

[A blue vertical line pointing up labeled 'b' crossed by a red horizontal line pointing right labeled 'b'. Each line is crossed by two smaller copies of the other line on either side of the main intersection. And, in turn, each of those smaller copies of the line are crossed by two smaller copies of the other line, again on either side of their main intersection.]

Thinking of generated groups in terms of their Cayley graphs makes it very natural to view them as metric spaces. A path is a sequence of consecutively adjacent edges, regardless of direction. For example in the Cayley graph for \((\mathbb{Z}, +, \{2, 3\})\) the edges from \(-2\) to 1, from 1 to \(-1\), from \(-1\) to 2 (in that order) constitute a path, representing the action, starting from \(-2\), of adding 3, then adding \(-2\), then adding 3. Taking each edge to have unit length, the metric \(d_S\) for a group \(G\) generated by a finite subset \(S\) of \(G\) is defined: for any \(g\), \(h \in G\), \(d_{S}(g, h) =\) the length of a shortest path from \(g\) to \(h\) in the Caley graph of \((G, \cdot, S)\). This is the word metric for this generated group.

Viewing a finitely generated group as a metric space allows us to consider its growth function \(\gamma(n)\) which is the cardinality of the “ball” of radius \(\le n\) centred on the identity (the number of members of the group whose distance from the identity is not greater than \(n\)). A growth function for a given group depends on the set of generators chosen, but when the group is infinite the asymptotic behaviour as \(n \rightarrow \infty\) of the growth functions is independent of the set of generators.

Noticing the possibility of defining a metric on generated groups did not require first viewing diagrams of their Cayley graphs. This is because a word in the generators is just a finite sequence of symbols for the generators or their inverses (we omit the symbol for the group operation), and so has an obvious length visually suggested by the written form of the word, namely the number of symbols in the sequence; and then it is natural to define the distance between group members \(g\) and \(h\) to be the length of a shortest word that gets one from \(g\) to \(h\) by right multiplication, that is, \(\textrm{min}\{\textrm{length}(w): w = g^{-1}h\}\).

However, viewing generated groups by means of their Cayley graphs was the necessary starting point for geometric group theory, which enables us to view finitely generated groups of infinite order not merely as graphs or metric spaces but as geometric entities. The main steps on this route will be sketched briefly here; for more detail see Starikova (2012) and the references therein. The visual key is to start thinking in terms of the “coarse geometry” of the Cayley graph of the generated group, by zooming out in visual imagination so far that the discrete nature of the graph is transformed into a traditional geometrical object. For example, the Cayley graph of a generated group of finite order such as \((S_{3}, \cdot, \{f, r\})\) illustrated in Figure 11 becomes a dot; the Cayley graph for \((\mathbb{Z}, +, \{2, 3\})\) illustrated in Figure 12 becomes an uninterrupted line infinite in both directions.

The word metric of a generated group is discrete: the values are always in \(N\). How is this visuo-spatial association of a discrete metric space with a continuous geometrical object achieved mathematically? By quasi-isometry. While an isometry from one metric space to another is a distance preserving map, a quasi-isometry is a map which preserves distances to within fixed linear bounds. Precisely put, a map \(f\) from \((S, d)\) to \((S', d')\) is a quasi-isometry iff for some real constants \(L > 0\) and \(K \ge 0\) and all \(x\), \(y\) in \(S\) \[ d(x, y)/L - K \le d'(f(x), f(y)) \le L \cdot d(x, y) + K. \]

The spaces \((S, d)\) and \((S', d')\) are quasi - isometric spaces iff the quasi-isometry \(f\) is also quasi-surjective, in the sense that there is a real constant \(M \ge 0\) such that every point of \(S'\) is no further than \(M\) away from some point in the image of \(f\).

For example, \((\mathbb{Z}, d)\) is quasi-isometric to \((\mathbb{R}, d)\) where \(d(x, y) = |y - x|\), because the inclusion map \(\iota\) from \(\mathbb{Z}\) to \(\mathbb{R}\), \(\iota(n) = n\), is an isometry hence a quasi-isometry with \(L = 1\) and \(K = 0\), and each point in \(\mathbb{R}\) is no further than \(1/2\) away from an integer (in \(\mathbb{R}\)). Also, it is easy to see that for any real number \(x\), if \(g(x) =\) the nearest integer to \(x\) (or the greatest integer less than \(x\) if it is midway between integers) then \(g\) is a quasi-isometry from \(\mathbb{R}\) to \(\mathbb{Z}\) with \(L = 1\) and \(K =\frac{1}{2}\);.

The relation between metric spaces of being quasi-isometric is an equivalence relation. Also, if \(S\) and \(T\) are generating sets of a group \((G, \cdot)\), the Cayley graphs of \((G, \cdot, S)\) and \((G, \cdot, T)\) with their word metrics are quasi-isometric spaces. This means that properties of a generated group which are quasi-isometric invariants will be independent of the choice of generating set, and therefore informative about the group itself.

Moreover, it is easy to show that the Cayley graph of a generated group with word metric is quasi-isometric to a geodesic space. [ 1 ] A triangle with vertices \(x\), \(y\), \(z\) in this space is the union of three geodesic segments, between \(x\) and \(y\), between \(y\) and \(z\), and between \(z\) and \(x\). This is the gateway for the application of Gromov’s insights, some of which can be grasped with the help of visual geometric thinking.

Here are some indications. Recall the Poincaré open disc model of hyperbolic geometry: geodesics are diameters or arcs of circles orthogonal to the boundary, with unit distance represented by ever shorter Euclidean distances as one moves from the centre towards the boundary. (The boundary is not part of the model). All triangles have angle sum \(< \pi\) ( Figure 13, left ), and there is a global constant δ such that all triangles are δ-thin in the following sense:

A triangle \(T\) is δ- thin if and only if any point on one side of \(T\) lies within δ of some point on one of the other two sides.

This condition is equivalent to the condition that each side of \(T\) lies within the union of the δ-neighbourhoods of the other two sides, as illustrated in Figure 13 , right. There is no constant δ such that all triangles in a Euclidean plane are δ-thin, because for any δ there are triangles large enough that the midpoint of a longest side lies further than δ from all points on the other two sides.

[a circle.  In the interior are three arcs colored  green, blue, and red. For all three smooth curves where each meets the circumference of the circle is marked as at a 90 degree angle.  The green curve may actually be a straight line and goes from about 160 degrees to about -20 degrees.  The blue curve goes from about 170 degrees to about 80 degrees.  The red curve goes from about 90 degrees to about -25 degrees.  Where the green and blue curves intersect is marked as an angle and labelled with the Greek letter alpha; where the blue and the red curves intersect is also marked as an angle and labelled with gamma; and with where the red and the green curves intersect and this labelled with beta.]

Figure 13 [ 2 ]

The definition of thin triangles is sufficiently general to apply to any geodesic space and allows a generalisation of the concept of hyperbolicity beyond its original context:

  • A geodesic space is hyperbolic iff for some δ all its triangles are δ-thin.
  • A group is hyperbolic iff it has a Cayley graph quasi-isometric to a hyperbolic geodesic space.

The class of hyperbolic groups is large and includes important subkinds, such as finite groups, free groups and the fundamental groups of surfaces of genus \(\ge 2\). Some striking theorems have been proved for them. For example, for every hyperbolic group the word problem is solvable, and every hyperbolic group has a finite presentation. So we can reasonably conclude that the discovery of this mathematical kind, the hyperbolic groups, has been fruitful.

How important was visual thinking to the discoveries leading to geometric group theory? Visual thinking was needed to discover Cayley graphs as a means of representing finitely generated groups. This is not the triviality it might seem: Cayley graphs must be distinguished from the diagrams we use to present them visually. A Cayley graph is a mathematical representation of a generated group, not a visual representation. It consists of the following components: a set \(V\) (“vertices”), a set \(E\) of ordered pairs of members of \(V\) (“directed edges”) and a partition of \(E\) into distinguished subsets, (“colours”, each one for representing right multiplication by a particular generator). The Cayley graph of a generated group of infinite order cannot be fully represented by a diagram given the usual conventions of representation for diagrams of graphs, and distinct diagrams may visually represent the same Cayley graph: both diagrams in Figure 14 can be labelled so that under the usual conventions they represent the Cayley graph of \((S_{3}, \cdot, \{f, r\})\), already illustrated by Figure 10 . So the Cayley graph cannot be a diagram.

[two identical red triangles, one above the other and inverted.   Both have arrows going clockwise around. Black lines with arrows pointing both ways link the respective vertices.]

Diagrams of Cayley graphs were important in prompting mathematicians to think in terms of the coarse-grained geometry of the graphs, in that this idea arises just when one thinks in terms of “zooming out” visually. Gromov (1993) makes the point in a passage quoted in Starikova (2012:138)

This space [a Cayley graph with the word metric] may appear boring and uneventful to a geometer’s eye since it is discrete and the traditional (e.g., topological and infinitesimal) machinery does not run in [the group] Γ. To regain the geometric perspective one has to change one’s position and move the observation point far away from Γ. Then the metric in Γ seen from the distance \(d\) becomes the original distance divided by \(d\) and for \(d \rightarrow \infty\) the points in Γ coalesce into a connected continuous solid unity which occupies the visual horizon without any gaps and holes and fills our geometer’s heart with joy.

In saying that one has to move the observation point far away from Γ so that the points coalesce into a unity which occupies the visual horizon, he makes clear that visual imagination is involved in a crucial step on the road to geometric group theory. Visual thinking is again involved in discovering hyperbolicity as a property of general geodesic spaces from thinking about the Poincaré disk model of hyperbolic geometry. It is hard to see how this property would have been discovered without the use of visual resources.

While there is no reason to think that mental arithmetic (mental calculation in the integers and rational numbers) typically involves much visual thinking, there is strong evidence of substantial visual processing in the mental arithmetic of highly trained abacus users.

In earlier times an abacus would be a rectangular board or table surface marked with lines or grooves along which pebbles or counters could be moved. The oldest surviving abacus, the Salamis abacus, dated around 300 BCE, is a white marble slab, with markings designed for monetary calculation (Fernandes 2015, Other Internet Resources). These were superseded by rectangular frames within which wires or rods parallel to the short sides are fixed, with moveable holed beads on them. There are several kinds of modern abacus — the Chinese suanpan, the Russian schoty and the Japanese soroban for example — each kind with variations. Evidence for visual processing in mental arithmetic comes from studies with well trained users of the soroban, an example of which is shown in Figure 15.

[Picture of a soroban with 17 columns of beads, each column has 1 bead above the horizontal bar used to represent 5 and 4 beads below the bar each of which represents 1. Together the beads in each column can represent any digit from 0 to 9.]

Each column of beads represents a power of 10, increasing to the left. The horizontal bar, sometimes called the reckoning bar , separates the beads on each column into one bead of value 5 above and four beads of value 1 below. The number represented in a column is determined by the beads which are not separated from the reckoning bar. A column on which all beads are separated by a gap from the bar represents zero. For example, the number 6059 is represented on a portion of a schematic soroban in Figure 16.

[A schematic soroban representing 6059. There are 8 places and the first four from the left are set to 0, then 6, then 0, then 5, then 9 ]

On some sorobans there is a mark on the reckoning bar at every third column; if a user chooses one of these as a unit column, the marks will help the user keep track of which columns represent which powers of ten. Calculations are made by using forefinger and thumb to move beads according to procedures for the standard four numerical operations and for extraction of square and cube roots (Bernazzani 2005,Other Internet Resources). Despite the fact that the soroban has a decimal place representation of numbers, the soroban procedures are not ‘translations’ of the procedures normally taught for the standard operations using arabic numerals. For example, multidigit addition on a soroban starts by adding highest powers of ten and proceeds rightwards to lower powers, instead of starting with units thence proceeding leftwards to tens, hundreds and so on.

People trained to use a soroban often learn to do mental arithmetic by visualizing an abacus and imagining moving beads on it in accordance with the procedures learned for arithmetical calculations (Frank and Barner 2012). Mental abacus (MA), as this kind of mental arithmetic is known, compares favourably with other kinds of mental calculation for speed and accuracy (Kojima 1954) and MA users are often found among the medallists in the Mental Calculation World Cup.

Although visual and manual motor imagery is likely to occur, cognitive scientists have probed the question whether the actual processes of MA calculation consist in or involve imagining performing operations on a physical abacus. Brain imaging studies provide one source of evidence bearing on this question. Comparing well-trained abacus calculators with matched controls, evidence has been found that MA involves neural resources of visuospatial working memory with a form of abacus which does not depend on the modality (visual or auditory) of the numerical inputs (Chen et al. 2006). Another imaging study found that, compared to controls without abacus training, subjects with long term MA training from a young age had enhanced brain white matter related to motor and visuospatial processes (Hu et al. 2011).

Behavioural studies provide more evidence. Tests on expert and intermediate level abacus users strongly suggest that MA calculators mentally manipulate an abacus representation so that it passes through the same states that an actual abacus would pass through in solving an addition problem. Without using an actual abacus MA calculators were able to answer correctly questions about intermediates states unique to the abacus-based solution of a problem; moreover, their response times were a monotonic function of the position of the probed state in the sequence of states of the abacus process for solving the problem (Stigler 1984). On top of the ‘intermediate states’ evidence, there is ‘error type’ evidence. Mental addition tests comparing abacus users with American subjects revealed that abacus users made errors of a kind which the Americans did not make, but which were predictable from the distribution of errors in physical abacus addition (Stigler 1984).

Another study found evidence that when a sequence of numbers is presented auditorily (as a verbal whole “three thousand five hundred and forty seven” or as a digit sequence “Three, five, four, seven”) abacus experts encode it into an imaged abacus display, while non-experts encode it verbally (Hishitani 1990).

Further evidence comes from behavioural interference studies. In these studies subjects have to perform mental calculations, with and without a task of some other kind to be performed during the calculation, with the aim of seeing which kinds of task interfere with calculation as measured by differences of reaction time and error rate. An early study found that a linguistic task interfered weakly with MA performance (unless the linguistic task was to answer a mathematical question), while motor and visual tasks interfered relatively strongly. These findings suggested to the paper’s authors that MA representations are not linguistic in nature but rely on visual mechanisms and, for intermediate practitioners, on motor mechanisms as well (Hatano et al. 1977).

These studies provide impressive evidence that MA does involve mental manipulation of a visualized abacus. However, limits of the known capacities for perceiving or representing pluralities of objects seem to pose a problem. We have a parallel individuation system for keeping track of up to four objects simultaneously and an approximate number system (ANS) which allows us to gauge roughly the cardinality of a set of things, with an error which increases with the size of the set. The parallel individuation system has a limit of three or four objects and the ANS represents cardinalities greater than four only approximately. Yet mental abacus users would need to hold in mind with precision abacus representations involving a much larger number of beads than four (and the way in which those beads are distributed on the abacus). For example, the number 439 requires a precise distribution of twelve beads. Frank and Barner (2012) address this problem. In some circumstances we can perceive a plurality of objects as a single entity, a set, and simultaneously perceive those objects as individuals. There is evidence that we can keep track of up to three such sets in parallel and simultaneously make reliable estimates of the cardinalities of the sets (if not more than four). If the sets themselves can be easily perceived as (a) divided into disjoint subsets, e.g. columns of beads on an abacus, and (b) structured in a familiar way, e.g. as a distribution of four beads below a reckoning bar and one above, we have the resources for recognising a three-digit number from its abacus representation. The findings of (Frank and Barner 2012) suggest that this is what happens in MA: a mental abacus is represented in visuospatial working memory by splitting it into a series of columns each of which is stored as a unit with its own detailed substructure.

These cognitive investigations confirm the self-reports of mental abacus users that they calculate mentally by visualizing operating on an abacus as they would operate on a physical abacus. (See the 20-second movie Brief interview with mental abacus user , at the Stanford Language and Cognition Lab, for one such self-report.) There is good evidence that MA often involves processes linked to motor cognition in addition to active visual imagination. Intermediate abacus users often make hand movements, without necessarily attending to those movements during MA calculation, as shown in the second of the three short movies just mentioned. Experiments to test the possible role of motor processes in MA resulted in findings which led the authors to conclude that premotor processes involved in the planning of hand movements were involved in MA (Brooks et al. 2018).

6. A priori and a posteriori roles of visual experience

In coming to know a mathematical truth visual experience can play a merely “enabling” role. For example, visual experience may have been a factor in a person’s getting certain concepts involved in a mathematical proposition, thus enabling her to understand the proposition, without giving her reason to believe it. Or the visual experience of reading an argument in a text book may enable one to find out just what the argument is, without helping her tell that the argument is sound. In earlier sections visual experience has been presented as having roles in proof and propositional discovery that are not merely enabling. On the face of it this raises a puzzle: mathematics, as opposed to its application to natural phenomena, has traditionally been thought to be an a priori science; but if visual experience plays a role in acquiring mathematical knowledge which is not merely enabling, the result would surely be a posteriori knowledge, not a priori knowledge. Setting aside knowledge acquired by testimony (reading or hearing that such-&-such is the case), there remain plenty of cases where sensory experience seems to play an evidential role in coming to know some mathematical fact.

A plausible example of the evidential use of sensory experience is the case of a child coming to know that \(5 + 3 = 8\) by counting on her fingers. While there may be an important \(a\) priori element in the child’s appreciation that she can reliably generalise from the result of her counting experiment, getting that result by counting is an a posteriori route to it. For another example, consider the question: how many vertices does a cube have? With the background knowledge that cubes do not vary in shape and that material cubes do not differ from geometrical cubes in number of vertices (where a “vertex” of a material cube is a corner), one can find the answer by visually inspecting a material cube. Or if one does not have a material cube to hand, one can visually imagine a cube, and by attending to its top and bottom faces extract the information that the vertices of the cube are exactly the vertices of these two quadrangular faces. When one gets the answer by inspecting a material cube, the visual experience contributes to one’s grounds for believing the answer and that contribution is part of what makes the belief state knowledge. So the role of the visual experience is evidential; hence the resulting knowledge is not a priori . When one gets the answer by visually imagining a cube, one is drawing on the accumulated cognitive effects of past experiences of seeing material cubes to bring to mind what a cube looks like; so the experience of visual imagining has an indirectly evidential role in this case.

Do such examples show that mathematics is not an a priori science? Yes, if an a priori science is understood to be one whose knowable truths are all knowable only in an a priori way, without use of sense experience as evidence. No, if an a priori science is one whose knowable truths are all knowable in an a priori way, allowing that some may be knowable also in an a posteriori way.

Many cases of proving something (or following a proof of it) involve making, or imagining making, changes in a symbol array. A standard presentation of the proof of left-cancellation in group theory provides an example. “Left-cancellation” is the claim that for any members \(a\), \(b\), \(c\) of a group with operation \(\cdot\) and identity element \(\mathbf{e}\), if \(a \cdot b = a \cdot c\), then \(b = c\). Here is (the core of) a proof of it:

Suppose that one comes to know left-cancellation by following this sequence of steps. Is this an a priori way of getting this knowledge? Although following a mathematical proof is thought to be a paradigmatically a priori way of getting knowledge, attention to the role of visual experience here throws this into doubt. The case for claiming that the visual experience has an evidential role is as follows.

The visual experience reveals not only what the steps of the argument are but also that they are valid, thereby contributing to our grounds for accepting the argument and believing its conclusion. Consider, for example, the step from the second equation to the third. The relevant background knowledge, apart from the logic of identity, is that a group operation is associative. This fact is usually represented in the form of an equation that simply relocates brackets in an obvious way:

We see that relocating the brackets in accord with this format, the left-hand term of the second equation is transformed into the left-hand term of the third equation, and the same for the right-hand terms. So the visual experience plays an evidential role in our recognising as valid the step from the second equation to the third. Hence this quite standard route to knowledge of left-cancellation turns out to be a posteriori , even though it is a clear case of following a proof.

Against this, one may argue that the description just given of what is going on in following the proof is not strictly correct, as follows. Exactly the same proof can be expressed in natural language, using “the composition of \(x\) with \(y\)” for “\(x \cdot y\)”, but the result would be hard to take in. Or the proof can be presented using a different notational convention, one which forces a quite different expression of associativity. For example, we can use the Polish convention of putting the operation symbol before the operands: instead of “\(x \cdot y\)” we put “\(\cdot x y\)”. In that case associativity would be expressed in the following way, without brackets:

The equations of the proof would then need to be re-symbolised; but what is expressed by each equation after re-symbolisation and the steps from one to the next would be exactly as before. So we would be following the very same proof, step by step. But we would not be using visual experiences involved to notice the relocation of brackets this time. This suggests that the role of the different visual experiences involved in following the argument in its different guises is merely to give us access to the common reasoning: the role of the experience is merely enabling. On this account the visual experience does not strictly and literally enable us to see that any of the steps are valid; rather, recognition of (or sensitivity to) the validity of the steps results from cognitive processing at a more abstract level.

Which of these rival views is correct? Does our visual experience in following the argument presented with brackets (1) reveal to us the validity of some of the steps, given the relevant background knowledge ? Or (2) merely give us access to the argument? The core of the argument against view (1) is this:

Seeing the relocation of brackets is not essential to following the argument.

So seeing merely gives access to the argument; it does not reveal any step to be valid.

The step to this conclusion is faulty. How one follows a proof may, and in this case does, depend on how it is presented, and different ways of following a proof may be different ways of coming to know its conclusion. While seeing the relocation of brackets is not essential to all ways of following this argument, it is essential to the normal way of following the argument when it is symbolically presented with brackets in the way given above.

Associativity, expressed without symbols, is this: When the binary group operation is applied twice in succession on an ordered triple of operands \(\langle a, b, c\rangle\), it makes no difference whether the first application is to the initial two operands or the final two operands. While this is the content of associativity, for ease of processing associativity is almost always expressed as a symbol-manipulation rule. Visual perception is used to tell in particular cases whether the rule thus expressed is correctly implemented, in the context of prior knowledge that the rule is correct. What is going on here is a familiar division of labour in mathematical thinking. We first establish the soundness of a rule of symbol-manipulation (in terms of the governing semantic conventions—in this case the matter is trivial); then we check visually that the rule is correctly implemented. Processing at a more abstract, semantic level is often harder than processing at a purely syntactic level; it is for this reason that we often resort to symbol-manipulation techniques as proxy for reasoning directly with meanings to solve a problem. (What is six eighths divided by three fifths, without using any symbolic technique?) When we do use symbol-manipulation in proving or following a proof, visual experience is required to discern that the moves conform to permitted patterns and thus contributes to our grounds for accepting the argument. Then the way of coming to know the conclusion has an a posteriori element.

Must a use of visual experience in knowledge acquisition be evidential , if the visual experience is not merely enabling? Here is an example which supports a negative answer. Imagine a square or look at a drawing of one. Each of its four sides has a midpoint. Now visualize the “inner” square whose sides run between the midpoints of adjacent sides of the original square (Figure 17, left). By visualizing this figure, it should be clear that the original square is composed precisely of the inner square plus four corner triangles, each side of the inner square being the base of a corner triangle. One can now visualize the corner triangles folding over, with creases along the sides of the inner square. The starting and end states of the imagery transformation can be represented by the left and right diagrams of Figure 17.

[The first of identical squares in size.  The first has lines connecting the midpoints of each adjacent pair of sides to form another square.  The second has in addition lines connecting the midpoints of opposite pairs of sides.  In addition the outer square of the second has dashed lines instead of solid.]

Visualizing the folding-over within the remembered frame of the original square results in an image of the original square divided into square quarters, its quadrants, and the sides of the inner square seem to be diagonals of the quadrants. Many people conclude that the corner triangles can be arranged to cover the inner square exactly, without any gap or overlap. Thence they infer that the area of the original square is twice the size of the inner square. Let us assume that the propositions concerned are about Euclidean figures. Our concern is with the visual route to the following:

The parts of a square beyond its inner square (formed by joining midpoints of adjacent sides of the original square) can be arranged to fit the inner square exactly, without overlap or gap, without change of size or shape.

The experience of visualizing the corner triangles folding over can lead one to this belief. But it cannot provide good evidence for it. This is because visual experience (of sight or imagination) has limited acuity and so does not enable us to discriminate between a situation in which the outer triangles fit the inner square exactly and a situation in which they fit inexactly but well enough for the mismatch to escape visual detection. (This contrasts with the case of discovering the number of vertices of a cube by seeing or visualizing one.) Even though visualizing the square, the inner square and then visualizing the corner triangles folding over is constrained by the results of earlier perceptual experience of scenes with relevant similarities, we cannot draw from it reliable information about exact equality of areas, because perception itself is not reliable about exact equalities (or exact proportions) of continuous magnitudes.

Though the visual experience could not provide good evidence for the belief, it is possible that we erroneously use the experience evidentially in reaching the belief. But it is also possible, when reaching the belief in the way described, that we do not take the experience to provide evidence. A non-evidential use is more likely, if when one arrives at the belief in this way one feels fairly certain of it, while aware that visual perception and imagination have limited acuity and so cannot provide evidence for a claim of exact fit.

But what could the role of the visualizing experience possibly be, if it were neither merely enabling nor evidential? One suggestion is that we already have relevant beliefs and belief-forming dispositions, and the visualizing experience could serve to bring to mind the beliefs and to activate the belief-forming dispositions (Giaquinto 2007). These beliefs and dispositions will have resulted from prior possession of cognitive resources, some subject-specific such as concepts of geometrical figures, some subject-general such as symmetry perception about perceptually salient vertical and horizontal axes. A relevant prior belief in this case might be that a square is symmetric about a diagonal. A relevant disposition might be the disposition to believe that the quadrants of a square are congruent squares upon seeing or visualizing a square with a horizontal base plus the vertical and horizontal line segments joining midpoints of its opposite sides. (These dispositions differ from ordinary perceptual dispositions to believe what we see in that they are not cancelled when we mistrust the accuracy of the visual experience.)

The question whether the resulting belief would be knowledge depends on whether the belief-forming dispositions are reliable (truth-conducive) and the pre-existing belief states are states of knowledge. As these conditions can be met without any violation of epistemic rationality, the visualizing route described incompletely here can be a route to knowledge. In that case we would have an example of a use of visual experience which is integral to a way of knowing a truth, which is not merely enabling and yet not evidential. A fuller account and discussion is given in chapters 3 and 4 of Giaquinto (2007).

There are other significant uses of visual representations in mathematics. This final section briefly presents a couple of them.

Although the use of diagrams in arguments in analysis faces special dangers (as noted in 3.3 ), the use of diagrams to illustrate symbolically presented operations can be very helpful. Consider, for example, this pair of operations \(\{ f(x) + k, f(x + k) \}\). Grasping them and the difference between them can be aided by a visual illustration; similarly for the sets \(\{ f(x + k), f(x - k) \}\), \(\{ |f(x)|, f(|x|) \}\), \(\{ f(x)^{-1}, f^{-1}(x), f(x^{-1}) \}\). While generalization on the basis of a visual illustration is unreliable, we can use them as checks against calculation errors and overgeneralization. The same holds for properties. Consider for example, functions for which \(f(-x) = f(x)\), known as even functions, and functions for which \(f(-x) = -f(x)\), the odd functions: it can be helpful to have in mind the images of graphs of \(y = x^2\) and \(y = x^{3}\) as instances of evenness and oddness, to remind one that even functions are symmetrical about the \(y\)-axis and odd functions have rotation symmetry by \(\pi\) about the origin. They can serve as a reminder and check against over-generalisation: any general claim true of all odd functions, for example, must be true of \(y = x^{3}\) in particular.

The utility of visual representations in real and complex analysis is not confined to such simple cases. Visual representations can help us grasp what motivates certain definitions and arguments, and thereby deepen our understanding. Abundant confirmation of this claim can be gathered from working through the text Visual Complex Analysis (Needham 1997). Some mathematical subjects have natural visual representations, which then give rise to a domain of mathematical entities in their own right. This is true of geometry but is also true of subjects which become algebraic in nature very quickly, such as graph theory, knot theory and braid theory. Techniques of computer graphics now enable us to use moving images. For an example of the power of kinematic visual representations to provide and increase understanding of a subject, see the first two “chapters” of the online introduction to braid theory by Ester Dalvit (2012, Other Internet Resources).

With regard to proofs, a minimal kind of understanding consists in understanding each line (proposition or formula) and grasping the validity of each step to a new line from earlier lines. But we can have that stepwise grasp of proof without any idea of why it proceeds by those steps. One has a more advanced (or deeper) kind of understanding when one has the minimal understanding and a grasp of the motivating idea(s) and strategy of the proof. The point is sharply expressed by Weyl (1995 [1932]: 453), quoted in (Tappenden 2005:150)

We are not very pleased when we are forced to accept a mathematical truth by virtue of a complicated chain of formal conclusions and computations, which we traverse blindly, link by link, feeling our way by touch. We want first an overview of the aim and the road; we want to understand the idea of the proof, the deeper context.

Occasionally the author of a proof gives readers the desired understanding by adding commentary. But this is not always needed, as the idea of a proof is sometimes revealed in the presentation of the proof itself. Often this is done by using visual representations. An example is Fisk’s proof of Chvátal’s “art gallery” theorem. This theorem is the answer to a combinatorial problem in geometry. Put concretely, the problem is this. Let the \(n\) walls of a single-floored gallery make a polygon. What is the smallest number of stationary guards needed to ensure that every point of the gallery wall can be seen by a guard? If the polygon is convex (all interior angles < 180°), one guard will suffice, as guards may rotate. But if the polygon is not convex, as in Figure 18, one guard may not be enough.

[An irregular 9 sided polygon.]

Chvátal’s theorem gives the answer: for a gallery with \(n\) walls, \(\llcorner n/3\lrcorner\) guards suffice, where \(\llcorner n/3\lrcorner\) is the greatest integer \(\le n/3\). (If this does not sound to you sufficiently like a mathematical theorem, it can be restated as follows: Let \(S\) be a subset of the Euclidean plane. For a subset \(B\) of \(S\) let us say that \(B\) supervises \(S\) iff for each \(x \in S\) there is a \(y \in B\) such that the segment \(xy\) lies within \(S\). Then the smallest number \(f(n)\) such that every set bounded by a simple \(n\)-gon is supervised by a set of \(f(n)\) points is at most \(\llcorner n/3.\lrcorner\)

Here is Steve Fisk’s proof. A short induction shows that every polygon can be triangulated, i.e., non-crossing edges between non-adjacent vertices (“diagonals”) can be added so that the polygon is entirely composed of non-overlapping triangles. So take any \(n\)-sided polygon with a fixed triangulation. Think of it as a graph, a set of vertices and connected edges, as in Figure 19.

[10 irregularly placed black dots with a solid black line connecting them to form an irregular 10 sided polygon.  One black dot has dashed lines going to four other dots that are not adjacent to it and one of its adjacent dots has dashed lines going to three other non-adjacent dots (including one dot that was the endpoint for one of the first dots dashed lines), the dashed lines do not intersect.]

The first part of the proof shows that the graph is 3-colourable, i.e., every vertex can be coloured with one of just three colours (red, white and blue, say) so that no edge connects vertices of the same colour.

The argument proceeds by induction on \(n \ge 3\), the number of vertices.

For \(n = 3\) it is trivial. Assume it holds for all \(k\), where \(3 \le k < n\).

Let triangulated polygon \(G\) have \(n\) vertices. Let \(u\) and \(v\) be any two vertices connected by diagonal edge \(uv\). The diagonal \(uv\) splits \(G\) into two smaller graphs, both containing \(uv\). Give \(u\) and \(v\) different colours, say red and white, as in Figure 20.

[Same figure as before with one of the black dots split into two red dots side-by-side and another black dot split into two white dots side-by-side. This splits the previously joined figure into two smaller graphs.]

By the inductive assumption, we may colour each of the smaller graphs with the three colours so that no edge joins vertices of the same colour, keeping fixed the colours of \(u\) and \(v\). Pasting together the two smaller graphs as coloured gives us a 3-colouring of the whole graph.

What remains is to show that \(\llcorner n/3\lrcorner\) or fewer guards can be placed on vertices so that every triangle is in the view of a guard. Let \(b\), \(r\) and \(w\) be the number of vertices coloured blue, red and white respectively. Let \(b\) be minimal in \(\{b, r, w\}\). Then \(b \le r\) and \(b \le w\). Then \(2b \le r + w\). So \(3b \le b + r + w = n\). So \(b \le n/3\) and so \(b \le \llcorner n/3\lrcorner\). Place a guard on each blue vertex. Done.

The central idea of this proof, or the proof strategy, is clear. While the actual diagrams produced here are superfluous to the proof, some visualizing enables us to grasp the central idea.

Thinking which involves the use of seen or visualized images, which may be static or moving, is widespread in mathematical practice. Such visual thinking may constitute a non-superfluous and non-replaceable part of thinking through a specific proof. But there is a real danger of over-generalisation when using images, which we need to guard against, and in some contexts, such as real and complex analysis, the apparent soundness of a diagrammatic inference is liable to be illusory.

Even when visual thinking does not contribute to proving a mathematical truth, it may enable one to discover a truth, where to discover a truth is to come to believe it in an independent, reliable and rational way. Visual thinking can also play a large role in discovering a central idea for a proof or a proof-strategy; and in discovering a kind of mathematical entity or a mathematical property.

The (non-superfluous) use of visual thinking in coming to know a mathematical truth does in some cases introduce an a posteriori element into the way one comes to know it, resulting in a posteriori mathematical knowledge. This is not as revolutionary as it may sound as a truth knowable a posteriori may also be knowable a priori . More interesting is the possibility that one can acquire some mathematical knowledge in a way in which visual thinking is essential but does not contribute evidence; in this case the role of the visual thinking may be to activate one’s prior cognitive resources. This opens the possibility that non-superfluous visual thinking may result in a priori knowledge of a mathematical truth.

Visual thinking may contribute to understanding in more than one way. Visual illustrations may be extremely useful in providing examples and non-examples of analytic concepts, thus helping to sharpen our grasp of those concepts. Also, visual thinking accompanying a proof may deepen our understanding of the proof, giving us an awareness of the direction of the proof so that, as Hermann Weyl put it, we are not forced to traverse the steps blindly, link by link, feeling our way by touch.

  • Adams, C., 2001, The Knot Book , Providence, Rhode Island: American Mathematical Society.
  • Azzouni, J., 2013, “That we see that some diagrammatic proofs are perfectly rigorous”, Philosophia Mathematica , 21: 323–338.
  • Brown, J., 1999, Philosophy of Mathematics: an introduction to the world of proofs and pictures , London: Routledge.
  • Barwise, J. and J. Etchemendy, 1996, “Visual information and valid reasoning”, in Logical Reasoning with Diagrams , G. Allwein and J. Barwise (eds) Oxford: Oxford University Press.
  • Bolzano, B., 1817, “Purely analytic proof of the theorem that between any two values which give results of opposite sign there lies at least one real root of the equation”, in Ewald 1996: vol. 1, 225–248.
  • Brooks, N., Barner, D., Frank, M. and Goldin-Meadow, S., 2018, The role of gesture in supporting mental representations: The case of mental abacus arithmetic. Cognitive Science , 45(2), 554–575.
  • Carter, J., 2010, “Diagrams and Proofs in Analysis”, International Studies in the Philosophy of Science , 24: 1–14.
  • Cauchy, A., 1813, “Recherche sur les polyèdres—premier mémoire”, Journal de l’Ecole Polytechnique , 9: 66–86.
  • Chen, F., Hu, Z., Zhao, X., Wang, R. and Tang, X., 2006, Neural correlates of serial abacus mental calculation in children: a functional MRI study. Neuroscience Letters , 403(1–2), 46‐51.
  • Chvátal, V., 1975, “A Combinatorial Theorem in Plane Geometry”, Journal of Combinatorial Theory , series B,18: 39–41, 1975.
  • Dedekind, R., 1872, “Continuity and the Irrational Numbers”, in Essays on the Theory of Numbers , W. Beman (trans.) New York: Dover Publications.
  • De Toffoli, S. and V. Giardino, 2014, “Forms and Roles of Diagrams in Knot Theory”, Erkenntnis , 79: 829–842.
  • Eddy, R., 1985, “Behold! The Arithmetic-Geometric Mean Inequality”, College Mathematics Journal , 16: 208. Reprinted in Nelsen 1993: 51.
  • Euclid, Elements , Published as Euclid’s Elements: all thirteen books complete in one volume , T. Heath (trans.), D. Densmore (ed.). Santa Fe: Green Lion Press 2002.
  • Ewald, W. (ed.), 1996, From Kant to Hilbert. A Source Book in the Foundations of Mathematics , Volumes 1 and 2. Oxford: Clarendon Press.
  • Fisk, S., 1978, “A Short Proof of Chvátal’s Watchman Theorem”, Journal of Combinatorial Theory , series B, 24: 374.
  • Fomenko, A., 1994, Visual Geometry and Topology , M. Tsaplina (trans.) New York: Springer.
  • Frank, M. and Barner, D., 2012, Representing exact number visually using mental abacus. Journal of Experimental Psychology: General , 141(1), 134–149.
  • Giaquinto, M., 1993b, “Visualizing in Arithmetic”, Philosophy and Phenomenological Research , 53: 385–396.
  • –––, 1994, “Epistemology of visual thinking in elementary real analysis”, British Journal for the Philosophy of Science , 45: 789–813.
  • –––, 2007, Visual Thinking in Mathematics , Oxford: Oxford University Press.
  • –––, 2011, “Crossing curves: a limit to the use of diagrams in proofs”, Philosophia Mathematica , 19: 281–307.
  • Gromov, M., 1993, “Asymptotic invariants of infinite groups”, in Geometric Group Theory , A. Niblo and M. Roller (eds.), LMS Lecture Note Series, Vol. 182, Cambridge: Cambridge University Press, (vol. 2).
  • Hahn, H., 1933, “The crisis in intuition”, Translated in Hans Hahn. Empiricism, Logic and Mathematics: Philosophical Papers , B. McGuiness (ed.) Dordrecht: D. Reidel 1980. First published in Krise und Neuaufbau in den exakten Wissenschaften , Fünf Wiener Vorträge, Leipzig and Vienna 1933.
  • Hatano, G., Miyake, Y. and Binks, M., 1977, Performance of expert abacus operators. Cognition , 5, 47–55.
  • Hilbert, D., 1894, “Die Grundlagen der Geometrie”, Ch. 2, in David Hilbert’s Lectures on the Foundations of Geometry (1891–1902) , M. Hallett and U. Majer (eds) Berlin: Springer 2004.
  • Hishitani, S., 1990, Imagery experts: How do expert abacus operators process imagery? Applied Cognitive Psychology , 4(1), 33–46.
  • Hoffman, D., 1987, “The Computer-Aided Discovery of New Embedded Minimal Surfaces”, Mathematical Intelligencer , 9: 8–21.
  • Hu, Y., Geng, F., Tao, L., Hu, N., Du, F., Fu, K. and Chen, F., 2011, Enhanced white matter tracts integrity in children with abacus training. Human Brain Mapping . 32, 10–21.
  • Jamnik, M., 2001, Mathematical Reasoning with Diagrams: From Intuition to Automation , Stanford, California: CSLI Publications.
  • Joyal, A., R. Street, and D. Verity, 1996, “Traced monoidal categories”, Mathematical Proceedings of the Cambridge Philosophical Society , 119(3) 447–468.
  • Kant, I., 1781/9, Kritik der reinen Vernunft , P. Guyer and A. Wood (trans. & eds), Cambridge: Cambridge University Press, 1998.
  • Klein, F., 1893, “Sixth Evanston Colloquium lecture”, in The Evanston Colloquium Lectures on Mathematics , New York: Macmillan 1911. Partially reprinted in Ewald 1996: vol. 2: 958-65.
  • Kojima, T., 1954, The Japanese abacus: Its use and theory . Tokyo, Japan. Tuttle.
  • Landau, E., 1934, Differential and Integral Calculus , Hausner and Davis (trans.), New York: Chelsea 1950.
  • Leinster, T., 2004, “Operads in Higher-dimensional Category theory”, Theory and Applications of Categories , 12(3): 73–194.
  • Littlewood, J., 1953, “Postscript on Pictures”, in Littlewood’ Miscellany , Cambridge: Cambridge University Press 1986.
  • Lyusternik, L., 1963, Convex Figures and Polyhedra , T. Smith (trans.), New York: Dover Publications.
  • Mancosu, P., 2005, “Visualization in Logic and Mathematics”, in P. Mancosu, K. Jørgensen and S. Pedersen (eds), Visualization, Explanation and Reasoning Styles in Mathematics , Dordrecht: Springer.
  • –––, 2011, “Explanation in Mathematics”, The Stanford Encyclopedia of Philosophy (Summer 2011 Edition), Edward N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/sum2011/entries/mathematics-explanation/ >.
  • Maxwell, E., 1959, Fallacies in Mathematics , Cambridge University Press.
  • Montuchi, P. and W. Page, 1988, “Behold! Two extremum problems (and the arithmetic-geometric mean inequality)”, College Mathematics Journal , 19: 347. Reprinted in Nelsen 1993: 52.
  • Miller, N., 2001, A Diagrammatic Formal System for Euclidean Geometry , Ph. D. Thesis, Cornell University.
  • Mumma, J. and M. Panza, 2012, “Diagrams in Mathematics: History and Philosophy”, Synthese , 186: Issue 1.
  • Needham, T., 1997, Visual Complex Analysis , Oxford: Clarendon Press.
  • Nelsen, R., 1993, Proofs Without Words: Exercises in Visual Thinking , Washington DC: The Mathematical Association of America.
  • Palais, R., 1999, “The visualization of mathematics: towards a mathematical exploratorium”, Notices of the American Mathematical Society , 46: 647–658.
  • Pasch, M., 1882, Vorlesungen über neuere Geometrie , Berlin: Springer 1926, 1976 (with introduction by Max Dehn).
  • Rouse Ball, W., 1939, Mathematical Recreations and Essays , Revised by H. Coxeter, 11 th edition. (First published in 1892). New York: Macmillan.
  • Russell, B., 1901, “Recent Work on the Principles of Mathematics”, International Monthly , 4: 83–101. Reprinted as “Mathematics and the Metaphysicians” in Mysticism and Logic , London: George Allen and Unwin 1918.
  • Shin, Sun-Joo, Oliver Lemon, and John Mumma, 2013, “Diagrams”, The Stanford Encyclopedia of Philosophy (Fall 2013 Edition), Edward N. Zalta (ed.), URL = < https://plato.stanford.edu/archives/fall2013/entries/diagrams/ >.
  • Starikova, I., 2012, “From Practice to New Concepts: Geometric Properties of Groups”, Philosophia Scientiae , 16(1): 129–151.
  • Stigler, J., 1984, “Mental Abacus”: The effect of abacus training on Chinese children's mental calculation. Cognitive Psychology , 16, 145–176.
  • Tappenden, J., 2005, “Proof style and understanding in mathematics I: visualization, unification and axiom choice”, in Mancosu, P., Jørgensen, K. and Pedersen, S. (eds) Visualization, Explanation and Reasoning Styles in Mathematics , Dordrecht: Springer.
  • Tennant, N., 1986, “The Withering Away of Formal Semantics?” Mind and Language , 1(4): 382–318.
  • Van den Dries, L., 1998, Tame Topology and O-minimal Structures , LMS Lecture Note Series 248, Cambridge University Press.
  • Weyl, H., 1995 [1932], “Topology and abstract algebra as two roads of mathematical comprehension”, American Mathematical Monthly , 435–460 and 646–651. Translated by A. Shenitzer from an article of 1932, Gesammelte Abhandlungen , 3: 348–358.
  • Zimmermann W. and S. Cunningham (eds), 1991, Visualization in Teaching and Learning Mathematics , Washington, DC: Mathematical Association of America.
How to cite this entry . Preview the PDF version of this entry at the Friends of the SEP Society . Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entry at PhilPapers , with links to its database.
  • Bernazzani, D., 2005, Soroban Abacus Handbook .
  • Dalvit, E., 2012, A Journey through the Mathematical Theory of Braids .
  • Fernandes, L. 2015 The Abacus: A brief history. .
  • Lauda, A., 2005, Frobenius algebras and planar open string topological field theories , at arXiv.org.

a priori justification and knowledge | Bolzano, Bernard | Dedekind, Richard: contributions to the foundations of mathematics | diagrams | mathematical: explanation | proof theory | quantifiers and quantification | Weyl, Hermann

Copyright © 2020 by Marcus Giaquinto < m . giaquinto @ ucl . ac . uk >

  • Accessibility

Support SEP

Mirror sites.

View this site from another server:

  • Info about mirror sites

The Stanford Encyclopedia of Philosophy is copyright © 2023 by The Metaphysics Research Lab , Department of Philosophy, Stanford University

Library of Congress Catalog Data: ISSN 1095-5054

what are visual representations

  • The Open University
  • Accessibility hub
  • Guest user / Sign out
  • Study with The Open University

My OpenLearn Profile

Personalise your OpenLearn profile, save your favourite content and get recognition for your learning

Visualisation: visual representations of data and information

Visualisation: visual representations of data and information

Course description

Course content, course reviews, course learning outcomes.

After studying this course, you should be able to:

  • understand what is meant by the term 'visualisation' within the context of data and information
  • interpret and create a range of visual representations of data and information
  • recognise a range of visualisation models such as cartograms, choropleth maps and hyperbolic trees
  • select an appropriate visualisation model to represent a given data set
  • recognise when visualisations are presenting information in a misleading way.

First Published: 09/08/2012

Updated: 11/06/2019

Rate and Review

Rate this course, review this course.

Log into OpenLearn to leave reviews and join in the conversation.

Mohammad Riduan Shafiee Hj Abdul Haris

Create an account to get more

Track your progress.

Review and track your learning through your OpenLearn Profile.

Statement of Participation

On completion of a course you will earn a Statement of Participation.

Access all course activities

Take course quizzes and access all learning.

Review the course

When you have finished a course leave a review and tell others what you think.

For further information, take a look at our frequently asked questions which may give you the support you need.

About this free course

Become an ou student, download this course, share this free course.

  • Our Mission

The Power of Visualization in Math

Creating visual representations for math students can open up understanding. We have resources you can use in class tomorrow.

Photo of a student working on her math assignment, with diagrams and formulas written on the photo

When do you know it’s time to try something different in your math lesson?

For me, I knew the moment I read this word problem to my fifth-grade summer school students: “On average, the sun’s energy density reaching Earth’s upper atmosphere is 1,350 watts per square meter. Assume the incident, monochromatic light has a wavelength of 800 nanometers (each photon has an energy of 2.48 × 10 -19 joules at this wavelength). How many photons are incident on the Earth’s upper atmosphere in one second?”

Cartoon image of a photon drawn by the author

My students couldn’t get past the language, the sizes of the different numbers, or the science concepts addressed in the question. In short, I had effectively shut them down, and I needed a new approach to bring them back to their learning. So I started drawing on the whiteboard and created something with a little whimsy, a cartoon photon asking how much energy a photon has.

Immediately, students started yelling out, “2.48 × 10 -19 joules,” and they could even cite the text where they had learned the information. I knew I was on to something, so the next thing I drew was a series of boxes with our friend the photon.

If all of the photons in the image below were to hit in one second, how much energy is represented in the drawing?

Cartoon image of a series of photons hitting Earth’s atmosphere drawn by the author

Students realized that we were just adding up all the individual energy from each photon and then quickly realized that this was multiplication. And then they knew that the question we were trying to answer was just figuring out the number of photons, and since we knew the total energy in one second, we could compute the number of photons by division.

The point being, we reached a place where my students were able to process the learning. The power of the visual representation made all the difference for these students, and being able to sequence through the problem using the visual supports completely changed the interactions they were having with the problem.

If you’re like me, you’re thinking, “So the visual representations worked with this problem, but what about other types of problems? Surely there isn’t a visual model for every problem!”

The power of this moment, the change in the learning environment, and the excitement of my fifth graders as they could not only understand but explain to others what the problem was about convinced me it was worth the effort to pursue visualization and try to answer these questions: Is there a process to unlock visualizations in math? And are there resources already available to help make mathematics visual?

Chart of math resources provided by the author

I realized that the first step in unlocking visualization as a scaffold for students was to change the kind of question I was asking myself. A powerful question to start with is: “How might I represent this learning target in a visual way?” This reframing opens a world of possible representations that we might not otherwise have considered. Thinking about many possible visual representations is the first step in creating a good one for students.

The Progressions published in tandem with the Common Core State Standards for mathematics are one resource for finding specific visual models based on grade level and standard. In my fifth-grade example, what I constructed was a sequenced process to develop a tape diagram—a type of visual model that uses rectangles to represent the parts of a ratio. I didn’t realize it, but to unlock my thinking I had to commit to finding a way to represent the problem in a visual way. Asking yourself a very simple series of questions leads you down a variety of learning paths, and primes you for the next step in the sequence—finding the right resources to complete your visualization journey.

Posing the question of visualization readies your brain to identify the right tool for the desired learning target and your students. That is, you’ll more readily know when you’ve identified the right tool for the job for your students. There are many, many resources available to help make this process even easier, and I’ve created a matrix of clickable tools, articles, and resources .

The process to visualize your math instruction is summarized at the top of my Visualizing Math graphic; below that is a mix of visualization strategies and resources you can use tomorrow in your classroom.

Our job as educators is to set a stage that maximizes the amount of learning done by our students, and teaching students mathematics in this visual way provides a powerful pathway for us to do our job well. The process of visualizing mathematics tests your abilities at first, and you’ll find that it makes both you and your students learn.

Effective Use of Visual Representation in Research and Teaching within Higher Education

  • September 2020
  • 7(3):196-214

Charles Buckley at University of Liverpool

  • University of Liverpool

Chrissi Nerantzi at University of Leeds

  • University of Leeds

Abstract and Figures

Phenomenographic outcome space

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations

Nataliia Skrypnyk

  • Rejie J. Mamaril

Darren Rey Canlas Javier

  • Kceylla Shajarrah Jid Ladjammah

Isabelle Bonnici

  • Rose Marie Azzopardi
  • Joseph Bonnici

Muhammad Khateeb Khan

  • Kiran Sheraz
  • Umar Sultan
  • Adil Mushtaq

Muniri Muniri

  • Erika Yulistiyah
  • Program Studi
  • Tulungagung-Jawa Timur
  • Hammad Ahmad

Zachary Evan Karas

  • Kimberly Diaz
  • Westley Weimer
  • Shana K. Carpenter

Amber Witherby

  • Sarah K. Tauber

Chrissi Nerantzi

  • Glynis Cousin

Madeline Hallewell

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up
  • Reviews / Why join our community?
  • For companies
  • Frequently asked questions

Information Visualization

What is information visualization.

Information visualization is the process of representing data in a visual and meaningful way so that a user can better understand it. Dashboards and scatter plots are common examples of information visualization. Via its depicting an overview and showing relevant connections, information visualization allows users to draw insights from abstract data in an efficient and effective manner.

Information visualization plays an important role in making data digestible and turning raw information into actionable insights. It draws from the fields of human-computer interaction, visual design, computer science, and cognitive science, among others. Examples include world map-style representations, line graphs, and 3-D virtual building or town plan designs.

The process of creating information visualization typically starts with understanding the information needs of the target user group. Qualitative research (e.g., user interviews) can reveal how, when, and where the visualization will be used. Taking these insights, a designer can determine which form of data organization is needed for achieving the users’ goals. Once information is organized in a way that helps users understand it better—and helps them apply it so as to reach their goals—visualization techniques are the next tools a designer brings out to use. Visual elements (e.g., maps and graphs) are created, along with appropriate labels, and visual parameters such as color, contrast, distance, and size are used to create an appropriate visual hierarchy and a visual path through the information.

Information visualization is becoming increasingly interactive, especially when used in a website or application. Being interactive allows for manipulation of the visualization by users, making it highly effective in catering to their needs. With interactive information visualization, users are able to view topics from different perspectives, and manipulate their visualizations of these until they reach the desired insights. This is especially useful if users require an explorative experience.

Questions related to Information Visualization

There are many types of information visualization . And different types cater to diverse needs. The most common forms include charts, graphs, diagrams, and maps. Charts, like bar graphs, succinctly display data trends. Diagrams, such as flowcharts, convey processes. Maps visually represent spatial information, enhancing geographical insights. 

Each type serves a unique purpose, offering a comprehensive toolkit for effective information representation.

Information visualization and data visualization share a connection but diverge in scope. Data visualization centers on graphically representing raw data using charts or graphs. Information visualization extends beyond raw data, embracing a comprehensive array of contextual details and intricate datasets. It strives for a complete presentation, often employing interactivity to convey insights. 

Data visualization concentrates on visually representing data points. Conversely, information visualization adopts a holistic approach. It considers the context for deeper comprehension and decision-making. 

This video illustrates this concept using a routine example. It highlights the creative process and the importance of capturing and structuring ideas for effective communication.

  • Transcript loading…

Information visualization and infographics play unique roles. Human memory is visual, often remembering images and patterns more than raw data. Information visualization capitalizes on this aspect. It simplifies complex data through graphics for better understanding. 

This article gives valuable insights into the properties of human memory and their significance for information visualization .

Infographics portray information in engaging formats, often for storytelling or marketing. Both use visuals, but information visualization prioritizes clarity for users and turning data into usable insights. However, the latter focuses on effective communication and engagement.

No, Information Design and data visualization are distinctive in their objectives and applications. Information Design is a broader concept. It helps organize and present information to improve communication in the bigger picture. It considers the text, images, and layout to convey information effectively. 

On the other hand, data visualization translates raw data into graphical representations. It extracts meaningful insights and patterns. The approach focuses on visual elements to simplify the analysis of complex datasets.

Information visualization is a process that transforms complex data into easy-to-understand visuals. The seven stages include: 

Data collection: Gathering relevant data from diverse sources to form the basis for visualization.

Data analysis: Examining and processing the collected data to identify patterns, trends, and insights.

Data pre-processing: Cleaning and organizing the data to make it suitable for visualization.

Visual representation: Choosing appropriate visualization techniques to represent data accurately and effectively.

Interaction design: Developing user-friendly interfaces that allow meaningful interaction with the visualized data.

Interpretation: Enabling users to interpret and derive insights from the visualized information.

Evaluation: Assessing the effectiveness of the visualization in conveying information and meeting objectives.

This article provides a comprehensive overview of the data analysis process and explores key techniques for analysis. 

Information visualization helps people understand data and make decisions. It turns complicated data into easy-to-understand visuals. This makes it easier to see patterns and get a good overall picture. It also helps people communicate by showing information in a visually exciting way. Visualizations empower individuals to interact with data, enhancing engagement and enabling deeper exploration. Additionally, visual representations facilitate easier retention and recall of information.

Data visualization has advantages and disadvantages. One big challenge is misinterpretation. The visualization of data can be misleading if presented inappropriately. It can also lead to false conclusions, especially for those who do not understand the information.

Another major problem is too much information, as this article explains: Information Overload, Why it Matters, and How to Combat It . A crowded or complex visualization can overwhelm users and make communicating difficult.

Also, making good visualizations takes time and skill. This can sometimes be challenging for newbies.

Data visualization is a powerful tool. Creating valuable and impactful visualizations requires a combination of skills. You must understand the data, choose suitable visualization methods, and tell a compelling story . All this requires a good understanding of data and design, as explained in this video.

Interpreting complex data and choosing compelling visualizations can be challenging for beginners. However, leveraging available resources and enhancing skills can simplify data visualization despite the occasional difficulty.

Check out this course to learn more about Information Visualization . The course also explains the connection between the eye and the brain in creating images. It looks at the history of information visualization, how it has evolved, and common mistakes that you must avoid in visual perception.

It will teach you how to design compelling information visualizations and use various techniques for your projects.

Literature on Information Visualization

Here’s the entire UX literature on Information Visualization by the Interaction Design Foundation, collated in one place:

Learn more about Information Visualization

Take a deep dive into Information Visualization with our course Information Visualization .

Information visualization skills are in high demand, partly thanks to the rise in big data. Tech research giant Gartner Inc. observed that digital transformation has put data at the center of every organization. With the ever-increasing amount of information being gathered and analyzed, there’s an increasing need to present data in meaningful and understandable ways.

In fact, even if you are not involved in big data, information visualization will be able to help in your work processes as a designer. This is because many design processes—including conducting user interviews and analyzing user flows and sales funnels—involve the collation and presentation of information. Information visualization turns raw data into meaningful patterns, which will help you find actionable insights. From designing meaningful interfaces, to processing your own UX research, information visualization is an indispensable tool in your UX design kit.

This course is presented by Alan Dix, a former professor at Lancaster University in the UK. A world-renowned authority in the field of human-computer interaction, Alan is the author of the university-level textbook Human-Computer Interaction . “Information Visualization” is full of simple but practical lessons to guide your development in information visualization. We start with the basics of what information visualization is, including its history and necessity, and then walk you through the initial steps in creating your own information visualizations. While there’s plenty of theory here, we’ve got plenty of practice for you, too.

All open-source articles on Information Visualization

Information overload, why it matters and how to combat it.

what are visual representations

  • 1.1k shares
  • 4 years ago

Visual Representation

what are visual representations

How to Design an Information Visualization

what are visual representations

How to Visualize Your Qualitative User Research Results for Maximum Impact

what are visual representations

  • 3 years ago

Preattentive Visual Properties and How to Use Them in Information Visualization

what are visual representations

  • 5 years ago

How to Conduct Focus Groups

what are visual representations

The Properties of Human Memory and Their Importance for Information Visualization

what are visual representations

  • 7 years ago

Information Visualization – A Brief Introduction

what are visual representations

Visual Mapping – The Elements of Information Visualization

what are visual representations

Guidelines for Good Visual Information Representations

what are visual representations

How to Show Hierarchical Data with Information Visualization

what are visual representations

Information Visualization – An Introduction to Multivariate Analysis

what are visual representations

  • 8 years ago

How to Display Complex Network Data with Information Visualization

what are visual representations

Information Visualization – Who Needs It?

what are visual representations

Vision and Visual Perception Challenges

what are visual representations

Information Visualization an Introduction to Transformable Information Representations

what are visual representations

The Principles of Information Visualization for Basic Network Data

what are visual representations

The Continuum of Understanding and Information Visualization

what are visual representations

  • 6 years ago

Information Visualization – A Brief Pre-20th Century History

what are visual representations

Information Visualization an Introduction to Manipulable Information Representations

what are visual representations

Open Access—Link to us!

We believe in Open Access and the  democratization of knowledge . Unfortunately, world-class educational materials such as this page are normally hidden behind paywalls or in expensive textbooks.

If you want this to change , cite this page , link to us, or join us to help us democratize design knowledge !

Privacy Settings

Our digital services use necessary tracking technologies, including third-party cookies, for security, functionality, and to uphold user rights. Optional cookies offer enhanced features, and analytics.

Experience the full potential of our site that remembers your preferences and supports secure sign-in.

Governs the storage of data necessary for maintaining website security, user authentication, and fraud prevention mechanisms.

Enhanced Functionality

Saves your settings and preferences, like your location, for a more personalized experience.

Referral Program

We use cookies to enable our referral program, giving you and your friends discounts.

Error Reporting

We share user ID with Bugsnag and NewRelic to help us track errors and fix issues.

Optimize your experience by allowing us to monitor site usage. You’ll enjoy a smoother, more personalized journey without compromising your privacy.

Analytics Storage

Collects anonymous data on how you navigate and interact, helping us make informed improvements.

Differentiates real visitors from automated bots, ensuring accurate usage data and improving your website experience.

Lets us tailor your digital ads to match your interests, making them more relevant and useful to you.

Advertising Storage

Stores information for better-targeted advertising, enhancing your online ad experience.

Personalization Storage

Permits storing data to personalize content and ads across Google services based on user behavior, enhancing overall user experience.

Advertising Personalization

Allows for content and ad personalization across Google services based on user behavior. This consent enhances user experiences.

Enables personalizing ads based on user data and interactions, allowing for more relevant advertising experiences across Google services.

Receive more relevant advertisements by sharing your interests and behavior with our trusted advertising partners.

Enables better ad targeting and measurement on Meta platforms, making ads you see more relevant.

Allows for improved ad effectiveness and measurement through Meta’s Conversions API, ensuring privacy-compliant data sharing.

LinkedIn Insights

Tracks conversions, retargeting, and web analytics for LinkedIn ad campaigns, enhancing ad relevance and performance.

LinkedIn CAPI

Enhances LinkedIn advertising through server-side event tracking, offering more accurate measurement and personalization.

Google Ads Tag

Tracks ad performance and user engagement, helping deliver ads that are most useful to you.

Share Knowledge, Get Respect!

or copy link

Cite according to academic standards

Simply copy and paste the text below into your bibliographic reference list, onto your blog, or anywhere else. You can also just hyperlink to this page.

New to UX Design? We’re Giving You a Free ebook!

The Basics of User Experience Design

Download our free ebook The Basics of User Experience Design to learn about core concepts of UX design.

In 9 chapters, we’ll cover: conducting user interviews, design thinking, interaction design, mobile UX design, usability, UX research, and many more!

Blog Mindomo

  • X (Twitter)

Painting Pictures with Data: The Power of Visual Representations

visual representation

Picture this. A chaotic world of abstract concepts and complex data, like a thousand-piece jigsaw puzzle. Each piece, a different variable, a unique detail.

Alone, they’re baffling, nearly indecipherable.

But together? They’re a masterpiece of visual information, a detailed illustration.

American data pioneer Edward Tufte , a notable figure in the graphics press, believed that the art of seeing is not limited to the physical objects around us. He stated, “The commonality between science and art is in trying to see profoundly – to develop strategies of seeing and showing.”

It’s in this context that we delve into the world of data visualization. This is a process where you create visual representations that foster understanding and enhance decision making.

It’s the transformation of data into visual formats. The information could be anything from theoretical frameworks and research findings to word problems. Or anything in-between. And it has the power to change the way you learn, work, and more.

And with the help of modern technology, you can take advantage of data visualization easier than ever today.

What are Visual Representations?

Think of visuals, a smorgasbord of graphical representation, images, pictures, and drawings. Now blend these with ideas, abstract concepts, and data.

You get visual representations . A powerful, potent blend of communication and learning.

As a more formal definition, visual representation is the use of images to represent different types of data and ideas.

They’re more than simply a picture. Visual representations organize information visually , creating a deeper understanding and fostering conceptual understanding. These can be concrete objects or abstract symbols or forms, each telling a unique story. And they can be used to improve understanding everywhere, from a job site to an online article. University professors can even use them to improve their teaching.

But this only scratches the surface of what can be created via visual representation.

Types of Visual Representation for Improving Conceptual Understanding

Graphs, spider diagrams, cluster diagrams – the list is endless!

Each type of visual representation has its specific uses. A mind map template can help you create a detailed illustration of your thought process. It illustrates your ideas or data in an engaging way and reveals how they connect.

Here are a handful of different types of data visualization tools that you can begin using right now.

1. Spider Diagrams

spider diagram - visual representation example

Spider diagrams , or mind maps, are the master web-weavers of visual representation.

They originate from a central concept and extend outwards like a spider’s web. Different ideas or concepts branch out from the center area, providing a holistic view of the topic.

This form of representation is brilliant for showcasing relationships between concepts, fostering a deeper understanding of the subject at hand.

2. Cluster Diagrams

cluster diagram - visual representation example

As champions of grouping and classifying information, cluster diagrams are your go-to tools for usability testing or decision making. They help you group similar ideas together, making it easier to digest and understand information.

They’re great for exploring product features, brainstorming solutions, or sorting out ideas.

3. Pie Charts

Pie chart- visual representation example

Pie charts are the quintessential representatives of quantitative information.

They are a type of visual diagrams that transform complex data and word problems into simple symbols. Each slice of the pie is a story, a visual display of the part-to-whole relationship.

Whether you’re presenting survey results, market share data, or budget allocation, a pie chart offers a straightforward, easily digestible visual representation.

4. Bar Charts

Bar chart- visual representation example

If you’re dealing with comparative data or need a visual for data analysis, bar charts or graphs come to the rescue.

Bar graphs represent different variables or categories against a quantity, making them perfect for representing quantitative information. The vertical or horizontal bars bring the data to life, translating numbers into visual elements that provide context and insights at a glance.

Visual Representations Benefits

1. deeper understanding via visual perception.

Visual representations aren’t just a feast for the eyes; they’re food for thought. They offer a quick way to dig down into more detail when examining an issue.

They mold abstract concepts into concrete objects, breathing life into the raw, quantitative information. As you glimpse into the world of data through these visualization techniques , your perception deepens.

You no longer just see the data; you comprehend it, you understand its story. Complex data sheds its mystifying cloak, revealing itself in a visual format that your mind grasps instantly. It’s like going from a two dimensional to a three dimensional picture of the world.

2. Enhanced Decision Making

Navigating through different variables and relationships can feel like walking through a labyrinth. But visualize these with a spider diagram or cluster diagram, and the path becomes clear. Visual representation is one of the most efficient decision making techniques .

Visual representations illuminate the links and connections, presenting a fuller picture. It’s like having a compass in your decision-making journey, guiding you toward the correct answer.

3. Professional Development

Whether you’re presenting research findings, sharing theoretical frameworks, or revealing historical examples, visual representations are your ace. They equip you with a new language, empowering you to convey your message compellingly.

From the conference room to the university lecture hall, they enhance your communication and teaching skills, propelling your professional development. Try to create a research mind map and compare it to a plain text document full of research documentation and see the difference.

4. Bridging the Gap in Data Analysis

What is data visualization if not the mediator between data analysis and understanding? It’s more than an actual process; it’s a bridge.

It takes you from the shores of raw, complex data to the lands of comprehension and insights. With visualization techniques, such as the use of simple symbols or detailed illustrations, you can navigate through this bridge effortlessly.

5. Enriching Learning Environments

Imagine a teaching setting where concepts are not just told but shown. Where students don’t just listen to word problems but see them represented in charts and graphs. This is what visual representations bring to learning environments.

They transform traditional methods into interactive learning experiences, enabling students to grasp complex ideas and understand relationships more clearly. The result? An enriched learning experience that fosters conceptual understanding.

6. Making Abstract Concepts Understandable

In a world brimming with abstract concepts, visual representations are our saving grace. They serve as translators, decoding these concepts into a language we can understand.

Let’s say you’re trying to grasp a theoretical framework. Reading about it might leave you puzzled. But see it laid out in a spider diagram or a concept map, and the fog lifts. With its different variables clearly represented, the concept becomes tangible.

Visual representations simplify the complex, convert the abstract into concrete, making the inscrutable suddenly crystal clear. It’s the power of transforming word problems into visual displays, a method that doesn’t just provide the correct answer. It also offers a deeper understanding.

How to Make a Cluster Diagram?

Ready to get creative? Let’s make a cluster diagram.

First, choose your central idea or problem. This goes in the center area of your diagram. Next, think about related topics or subtopics. Draw lines from the central idea to these topics. Each line represents a relationship.

how to create a visual representation

While you can create a picture like this by drawing, there’s a better way.

Mindomo is a mind mapping tool that will enable you to create visuals that represent data quickly and easily. It provides a wide range of templates to kick-start your diagramming process. And since it’s an online site, you can access it from anywhere.

With a mind map template, creating a cluster diagram becomes an effortless process. This is especially the case since you can edit its style, colors, and more to your heart’s content. And when you’re done, sharing is as simple as clicking a button.

A Few Final Words About Information Visualization

To wrap it up, visual representations are not just about presenting data or information. They are about creating a shared understanding, facilitating learning, and promoting effective communication. Whether it’s about defining a complex process or representing an abstract concept, visual representations have it all covered. And with tools like Mindomo , creating these visuals is as easy as pie.

In the end, visual representation isn’t just about viewing data, it’s about seeing, understanding, and interacting with it. It’s about immersing yourself in the world of abstract concepts, transforming them into tangible visual elements. It’s about seeing relationships between ideas in full color. It’s a whole new language that opens doors to a world of possibilities.

The correct answer to ‘what is data visualization?’ is simple. It’s the future of learning, teaching, and decision-making.

Keep it smart, simple, and creative! The Mindomo Team

Related Posts

fishbone diagram template

Top 5 Fishbone Diagram Templates You Need To Know About!

visualization techniques

Mastering Your Mind: Exploring Effective Visualization Techniques

idea map

The Power of an Idea Map: Your Guide to Creative Thinking & Organizing Ideas

mind mapping vs brainstorming

Innovation Unleashed: Mind Mapping vs Brainstorming in the Generation of Game-Changing Ideas

key to success

The Key to Success with Ingredients for a Fulfilling Life

creative thinking

Cracking the Code to Creative Thinking: Ignite Your Brain and Unleash Your Ideas

Write a comment cancel reply.

Save my name, email, and website in this browser for the next time I comment.

  • Business Essentials
  • Leadership & Management
  • Credential of Leadership, Impact, and Management in Business (CLIMB)
  • Entrepreneurship & Innovation
  • Digital Transformation
  • Finance & Accounting
  • Business in Society
  • For Organizations
  • Support Portal
  • Media Coverage
  • Founding Donors
  • Leadership Team

what are visual representations

  • Harvard Business School →
  • HBS Online →
  • Business Insights →

Business Insights

Harvard Business School Online's Business Insights Blog provides the career insights you need to achieve your goals and gain confidence in your business skills.

  • Career Development
  • Communication
  • Decision-Making
  • Earning Your MBA
  • Negotiation
  • News & Events
  • Productivity
  • Staff Spotlight
  • Student Profiles
  • Work-Life Balance
  • AI Essentials for Business
  • Alternative Investments
  • Business Analytics
  • Business Strategy
  • Business and Climate Change
  • Creating Brand Value
  • Design Thinking and Innovation
  • Digital Marketing Strategy
  • Disruptive Strategy
  • Economics for Managers
  • Entrepreneurship Essentials
  • Financial Accounting
  • Global Business
  • Launching Tech Ventures
  • Leadership Principles
  • Leadership, Ethics, and Corporate Accountability
  • Leading Change and Organizational Renewal
  • Leading with Finance
  • Management Essentials
  • Negotiation Mastery
  • Organizational Leadership
  • Power and Influence for Positive Impact
  • Strategy Execution
  • Sustainable Business Strategy
  • Sustainable Investing
  • Winning with Digital Platforms

17 Data Visualization Techniques All Professionals Should Know

Data Visualizations on a Page

  • 17 Sep 2019

There’s a growing demand for business analytics and data expertise in the workforce. But you don’t need to be a professional analyst to benefit from data-related skills.

Becoming skilled at common data visualization techniques can help you reap the rewards of data-driven decision-making , including increased confidence and potential cost savings. Learning how to effectively visualize data could be the first step toward using data analytics and data science to your advantage to add value to your organization.

Several data visualization techniques can help you become more effective in your role. Here are 17 essential data visualization techniques all professionals should know, as well as tips to help you effectively present your data.

Access your free e-book today.

What Is Data Visualization?

Data visualization is the process of creating graphical representations of information. This process helps the presenter communicate data in a way that’s easy for the viewer to interpret and draw conclusions.

There are many different techniques and tools you can leverage to visualize data, so you want to know which ones to use and when. Here are some of the most important data visualization techniques all professionals should know.

Data Visualization Techniques

The type of data visualization technique you leverage will vary based on the type of data you’re working with, in addition to the story you’re telling with your data .

Here are some important data visualization techniques to know:

  • Gantt Chart
  • Box and Whisker Plot
  • Waterfall Chart
  • Scatter Plot
  • Pictogram Chart
  • Highlight Table
  • Bullet Graph
  • Choropleth Map
  • Network Diagram
  • Correlation Matrices

1. Pie Chart

Pie Chart Example

Pie charts are one of the most common and basic data visualization techniques, used across a wide range of applications. Pie charts are ideal for illustrating proportions, or part-to-whole comparisons.

Because pie charts are relatively simple and easy to read, they’re best suited for audiences who might be unfamiliar with the information or are only interested in the key takeaways. For viewers who require a more thorough explanation of the data, pie charts fall short in their ability to display complex information.

2. Bar Chart

Bar Chart Example

The classic bar chart , or bar graph, is another common and easy-to-use method of data visualization. In this type of visualization, one axis of the chart shows the categories being compared, and the other, a measured value. The length of the bar indicates how each group measures according to the value.

One drawback is that labeling and clarity can become problematic when there are too many categories included. Like pie charts, they can also be too simple for more complex data sets.

3. Histogram

Histogram Example

Unlike bar charts, histograms illustrate the distribution of data over a continuous interval or defined period. These visualizations are helpful in identifying where values are concentrated, as well as where there are gaps or unusual values.

Histograms are especially useful for showing the frequency of a particular occurrence. For instance, if you’d like to show how many clicks your website received each day over the last week, you can use a histogram. From this visualization, you can quickly determine which days your website saw the greatest and fewest number of clicks.

4. Gantt Chart

Gantt Chart Example

Gantt charts are particularly common in project management, as they’re useful in illustrating a project timeline or progression of tasks. In this type of chart, tasks to be performed are listed on the vertical axis and time intervals on the horizontal axis. Horizontal bars in the body of the chart represent the duration of each activity.

Utilizing Gantt charts to display timelines can be incredibly helpful, and enable team members to keep track of every aspect of a project. Even if you’re not a project management professional, familiarizing yourself with Gantt charts can help you stay organized.

5. Heat Map

Heat Map Example

A heat map is a type of visualization used to show differences in data through variations in color. These charts use color to communicate values in a way that makes it easy for the viewer to quickly identify trends. Having a clear legend is necessary in order for a user to successfully read and interpret a heatmap.

There are many possible applications of heat maps. For example, if you want to analyze which time of day a retail store makes the most sales, you can use a heat map that shows the day of the week on the vertical axis and time of day on the horizontal axis. Then, by shading in the matrix with colors that correspond to the number of sales at each time of day, you can identify trends in the data that allow you to determine the exact times your store experiences the most sales.

6. A Box and Whisker Plot

Box and Whisker Plot Example

A box and whisker plot , or box plot, provides a visual summary of data through its quartiles. First, a box is drawn from the first quartile to the third of the data set. A line within the box represents the median. “Whiskers,” or lines, are then drawn extending from the box to the minimum (lower extreme) and maximum (upper extreme). Outliers are represented by individual points that are in-line with the whiskers.

This type of chart is helpful in quickly identifying whether or not the data is symmetrical or skewed, as well as providing a visual summary of the data set that can be easily interpreted.

7. Waterfall Chart

Waterfall Chart Example

A waterfall chart is a visual representation that illustrates how a value changes as it’s influenced by different factors, such as time. The main goal of this chart is to show the viewer how a value has grown or declined over a defined period. For example, waterfall charts are popular for showing spending or earnings over time.

8. Area Chart

Area Chart Example

An area chart , or area graph, is a variation on a basic line graph in which the area underneath the line is shaded to represent the total value of each data point. When several data series must be compared on the same graph, stacked area charts are used.

This method of data visualization is useful for showing changes in one or more quantities over time, as well as showing how each quantity combines to make up the whole. Stacked area charts are effective in showing part-to-whole comparisons.

9. Scatter Plot

Scatter Plot Example

Another technique commonly used to display data is a scatter plot . A scatter plot displays data for two variables as represented by points plotted against the horizontal and vertical axis. This type of data visualization is useful in illustrating the relationships that exist between variables and can be used to identify trends or correlations in data.

Scatter plots are most effective for fairly large data sets, since it’s often easier to identify trends when there are more data points present. Additionally, the closer the data points are grouped together, the stronger the correlation or trend tends to be.

10. Pictogram Chart

Pictogram Example

Pictogram charts , or pictograph charts, are particularly useful for presenting simple data in a more visual and engaging way. These charts use icons to visualize data, with each icon representing a different value or category. For example, data about time might be represented by icons of clocks or watches. Each icon can correspond to either a single unit or a set number of units (for example, each icon represents 100 units).

In addition to making the data more engaging, pictogram charts are helpful in situations where language or cultural differences might be a barrier to the audience’s understanding of the data.

11. Timeline

Timeline Example

Timelines are the most effective way to visualize a sequence of events in chronological order. They’re typically linear, with key events outlined along the axis. Timelines are used to communicate time-related information and display historical data.

Timelines allow you to highlight the most important events that occurred, or need to occur in the future, and make it easy for the viewer to identify any patterns appearing within the selected time period. While timelines are often relatively simple linear visualizations, they can be made more visually appealing by adding images, colors, fonts, and decorative shapes.

12. Highlight Table

Highlight Table Example

A highlight table is a more engaging alternative to traditional tables. By highlighting cells in the table with color, you can make it easier for viewers to quickly spot trends and patterns in the data. These visualizations are useful for comparing categorical data.

Depending on the data visualization tool you’re using, you may be able to add conditional formatting rules to the table that automatically color cells that meet specified conditions. For instance, when using a highlight table to visualize a company’s sales data, you may color cells red if the sales data is below the goal, or green if sales were above the goal. Unlike a heat map, the colors in a highlight table are discrete and represent a single meaning or value.

13. Bullet Graph

Bullet Graph Example

A bullet graph is a variation of a bar graph that can act as an alternative to dashboard gauges to represent performance data. The main use for a bullet graph is to inform the viewer of how a business is performing in comparison to benchmarks that are in place for key business metrics.

In a bullet graph, the darker horizontal bar in the middle of the chart represents the actual value, while the vertical line represents a comparative value, or target. If the horizontal bar passes the vertical line, the target for that metric has been surpassed. Additionally, the segmented colored sections behind the horizontal bar represent range scores, such as “poor,” “fair,” or “good.”

14. Choropleth Maps

Choropleth Map Example

A choropleth map uses color, shading, and other patterns to visualize numerical values across geographic regions. These visualizations use a progression of color (or shading) on a spectrum to distinguish high values from low.

Choropleth maps allow viewers to see how a variable changes from one region to the next. A potential downside to this type of visualization is that the exact numerical values aren’t easily accessible because the colors represent a range of values. Some data visualization tools, however, allow you to add interactivity to your map so the exact values are accessible.

15. Word Cloud

Word Cloud Example

A word cloud , or tag cloud, is a visual representation of text data in which the size of the word is proportional to its frequency. The more often a specific word appears in a dataset, the larger it appears in the visualization. In addition to size, words often appear bolder or follow a specific color scheme depending on their frequency.

Word clouds are often used on websites and blogs to identify significant keywords and compare differences in textual data between two sources. They are also useful when analyzing qualitative datasets, such as the specific words consumers used to describe a product.

16. Network Diagram

Network Diagram Example

Network diagrams are a type of data visualization that represent relationships between qualitative data points. These visualizations are composed of nodes and links, also called edges. Nodes are singular data points that are connected to other nodes through edges, which show the relationship between multiple nodes.

There are many use cases for network diagrams, including depicting social networks, highlighting the relationships between employees at an organization, or visualizing product sales across geographic regions.

17. Correlation Matrix

Correlation Matrix Example

A correlation matrix is a table that shows correlation coefficients between variables. Each cell represents the relationship between two variables, and a color scale is used to communicate whether the variables are correlated and to what extent.

Correlation matrices are useful to summarize and find patterns in large data sets. In business, a correlation matrix might be used to analyze how different data points about a specific product might be related, such as price, advertising spend, launch date, etc.

Other Data Visualization Options

While the examples listed above are some of the most commonly used techniques, there are many other ways you can visualize data to become a more effective communicator. Some other data visualization options include:

  • Bubble clouds
  • Circle views
  • Dendrograms
  • Dot distribution maps
  • Open-high-low-close charts
  • Polar areas
  • Radial trees
  • Ring Charts
  • Sankey diagram
  • Span charts
  • Streamgraphs
  • Wedge stack graphs
  • Violin plots

Business Analytics | Become a data-driven leader | Learn More

Tips For Creating Effective Visualizations

Creating effective data visualizations requires more than just knowing how to choose the best technique for your needs. There are several considerations you should take into account to maximize your effectiveness when it comes to presenting data.

Related : What to Keep in Mind When Creating Data Visualizations in Excel

One of the most important steps is to evaluate your audience. For example, if you’re presenting financial data to a team that works in an unrelated department, you’ll want to choose a fairly simple illustration. On the other hand, if you’re presenting financial data to a team of finance experts, it’s likely you can safely include more complex information.

Another helpful tip is to avoid unnecessary distractions. Although visual elements like animation can be a great way to add interest, they can also distract from the key points the illustration is trying to convey and hinder the viewer’s ability to quickly understand the information.

Finally, be mindful of the colors you utilize, as well as your overall design. While it’s important that your graphs or charts are visually appealing, there are more practical reasons you might choose one color palette over another. For instance, using low contrast colors can make it difficult for your audience to discern differences between data points. Using colors that are too bold, however, can make the illustration overwhelming or distracting for the viewer.

Related : Bad Data Visualization: 5 Examples of Misleading Data

Visuals to Interpret and Share Information

No matter your role or title within an organization, data visualization is a skill that’s important for all professionals. Being able to effectively present complex data through easy-to-understand visual representations is invaluable when it comes to communicating information with members both inside and outside your business.

There’s no shortage in how data visualization can be applied in the real world. Data is playing an increasingly important role in the marketplace today, and data literacy is the first step in understanding how analytics can be used in business.

Are you interested in improving your analytical skills? Learn more about Business Analytics , our eight-week online course that can help you use data to generate insights and tackle business decisions.

This post was updated on January 20, 2022. It was originally published on September 17, 2019.

what are visual representations

About the Author

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 07 March 2024

Learning high-level visual representations from a child’s perspective without strong inductive biases

  • A. Emin Orhan   ORCID: orcid.org/0000-0002-5486-7385 1 &
  • Brenden M. Lake   ORCID: orcid.org/0000-0001-8959-3401 1 , 2  

Nature Machine Intelligence volume  6 ,  pages 271–283 ( 2024 ) Cite this article

3340 Accesses

2 Citations

152 Altmetric

Metrics details

  • Computer science
  • Human behaviour

This article has been updated

A preprint version of the article is available at arXiv.

Young children develop sophisticated internal models of the world based on their visual experience. Can such models be learned from a child’s visual experience without strong inductive biases? To investigate this, we train state-of-the-art neural networks on a realistic proxy of a child’s visual experience without any explicit supervision or domain-specific inductive biases. Specifically, we train both embedding models and generative models on 200 hours of headcam video from a single child collected over two years and comprehensively evaluate their performance in downstream tasks using various reference models as yardsticks. On average, the best embedding models perform at a respectable 70% of a high-performance ImageNet-trained model, despite substantial differences in training data. They also learn broad semantic categories and object localization capabilities without explicit supervision, but they are less object-centric than models trained on all of ImageNet. Generative models trained with the same data successfully extrapolate simple properties of partially masked objects, like their rough outline, texture, colour or orientation, but struggle with finer object details. We replicate our experiments with two other children and find remarkably consistent results. Broadly useful high-level visual representations are thus robustly learnable from a sample of a child’s visual experience without strong inductive biases.

This is a preview of subscription content, access via your institution

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

24,99 € / 30 days

cancel any time

Subscribe to this journal

Receive 12 digital issues and online access to articles

111,21 € per year

only 9,27 € per issue

Buy this article

  • Purchase on SpringerLink
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

what are visual representations

Similar content being viewed by others

what are visual representations

Better models of human high-level visual cortex emerge from natural language supervision with a large and diverse dataset

what are visual representations

Capturing the objects of vision with neural networks

what are visual representations

Capturing human categorization of natural images by combining deep networks and cognitive models

Data availability.

Except for SAYCam, all data used in this study are publicly available. Instructions for accessing the public datasets are detailed in Methods . The SAYCam dataset can be accessed by authorized users with an institutional affiliation from the following Databrary repository: https://doi.org/10.17910/b7.564 . The ‘Labeled S’ evaluation dataset, which is a subset of SAYCam, is also available from the same repository under the session name ‘Labeled S’.

Code availability

All of our pretrained models (over 70 different models), as well as a variety of tools to use and analyse them, are available from the following public repository: https://github.com/eminorhan/silicon-menagerie (ref. 63 ). The repository also contains further examples of (1) attention and class activation maps, (2) t -SNE visualizations of embeddings, (3) nearest neighbour retrievals from the embedding models and (4) unconditional and conditional samples from the generative models. The code used for training and evaluating all the models is also publicly available from the same repository.

Change history

11 june 2024.

In the version of the article initially published, the name of Cliona O’Doherty was not included in the peer review information for this article, which has now been amended.

Bomba, P. & Siqueland, E. The nature and structure of infant form categories. J. Exp. Child Psychol. 35 , 294–328 (1983).

Article   Google Scholar  

Murphy, G. The Big Book of Concepts (MIT, 2002).

Kellman, P. & Spelke, E. Perception of partly occluded objects in infancy. Cogn. Psychol. 15 , 483–524 (1983).

Spelke, E., Breinlinger, K., Macomber, J. & Jacobson, K. Origin of knowledge. Psychol. Rev. 99 , 605–632 (1992).

Ayzenberg, V. & Lourenco, S. Young children outperform feed-forward and recurrent neural networks on challenging object recognition tasks. J. Vis. 20 , 310–310 (2020).

Huber, L. S., Geirhos, R. & Wichmann, F. A. The developmental trajectory of object recognition robustness: children are like small adults but unlike big deep neural networks. J. Vis. 23 , 4 (2023).

Locke, J. An Essay Concerning Human Understanding (ed. Fraser, A. C.) (Clarendon Press, 1894).

Leibniz, G. New Essays on Human Understanding 2nd edn (eds Remnant, P. & Bennett, J.) (Cambridge Univ. Press, 1996).

Spelke, E. Initial knowledge: six suggestions. Cognition 50 , 431–445 (1994).

Markman, E. Categorization and Naming in Children (MIT, 1989).

Merriman, W., Bowman, L. & MacWhinney, B. The mutual exclusivity bias in children’s word learning. Monogr. Soc. Res. Child Dev. 54 , 1–132 (1989).

Elman, J., Bates, E. & Johnson, M. Rethinking Innateness: A Connectionist Perspective on Development (MIT, 1996).

Sullivan, J., Mei, M., Perfors, A., Wojcik, E. & Frank, M. SAYCam: a large, longitudinal audiovisual dataset recorded from the infant’s perspective. Open Mind 5 , 20–29 (2022).

Caron, M. et al. Emerging properties in self-supervised vision transformers. In Proc. IEEE/CVF International Conference on Computer Vision 9650–9660 (IEEE, 2021).

Zhou, P. et al. Mugs: a multi-granular self-supervised learning framework. Preprint at https://arxiv.org/abs/2203.14415 (2022).

He, K. et al. Masked autoencoders are scalable vision learners. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition 15979–15988 (IEEE, 2022).

Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. In International Conference on Learning Representations (2020).

Xie, S., Girshick, R., Dollár, P., Tu, Z. & He, K. Aggregated residual transformations for deep neural networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 1492–1500 (IEEE, 2017).

Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115 , 211–252 (2015).

Article   MathSciNet   Google Scholar  

Smaira, L. et al. A short note on the Kinetics-700-2020 human action dataset. Preprint at https://arxiv.org/abs/2010.10864 (2020).

Grauman, K. et al. Ego4D: around the world in 3,000 hours of egocentric video. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition . 18995–19012 (IEEE, 2022).

Esser, P., Rombach, R. & Ommer, B. Taming transformers for high-resolution image synthesis. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 12873–12883 (IEEE, 2021).

Radford, A. et al. Language models are unsupervised multitask learners. OpenAI https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf (2019).

Zhou, B., Khosla, A., Lapedriza, A., Oliva, A. & Torralba, A. Learning deep features for discriminative localization. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 2921–2929 (IEEE, 2016).

van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res . 9 , 2579–2605 (2008).

Kuznetsova, A. et al. The Open Images Dataset V4. Int. J. Comput. Vis. 128 , 1956–1981 (2020).

Smith, L. & Slone, L. A developmental approach to machine learning? Front. Psychol. 8 , 2124 (2017).

Bambach, S., Crandall, D., Smith, L. & Yu, C. Toddler-inspired visual object learning. Adv. Neural Inf. Process. Syst. 31 , 1209–1218 (2018).

Zaadnoordijk, L., Besold, T. & Cusack, R. Lessons from infant learning for unsupervised machine learning. Nat. Mach. Intell. 4 , 510–520 (2022).

Orhan, E., Gupta, V. & Lake, B. Self-supervised learning through the eyes of a child. Adv. Neur. In. 33 , 9960–9971 (2020).

Google Scholar  

Lee, D., Gujarathi, P. & Wood, J. Controlled-rearing studies of newborn chicks and deep neural networks. Preprint at https://arxiv.org/abs/2112.06106 (2021).

Zhuang, C. et al. Unsupervised neural network models of the ventral visual stream. Proc. Natl Acad. Sci. USA 118 , e2014196118 (2021).

Zhuang, C. et al. How well do unsupervised learning algorithms model human real-time and life-long learning? In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2022).

Vong, W. K., Wang, W., Orhan, A. E. & Lake, B. M. Grounded language acquisition through the eyes and ears of a single child. Science 383 , 504–511 (2024).

Locatello, F. et al. Object-centric learning with slot attention. Adv. Neur. In. 33 , 11525–11538 (2020).

Lillicrap, T., Santoro, A., Marris, L., Akerman, C. & Hinton, G. Backpropagation and the brain. Nat. Rev. Neurosci. 21 , 335–346 (2020).

Gureckis, T. & Markant, D. Self-directed learning: a cognitive and computational perspective. Perspect. Psychol. Sci. 7 , 464–481 (2012).

Long, B. et al. The BabyView camera: designing a new head-mounted camera to capture children’s early social and visual environments. Behav. Res. Methods https://doi.org/10.3758/s13428-023-02206-1 (2023).

Moore, D., Oakes, L., Romero, V. & McCrink, K. Leveraging developmental psychology to evaluate artificial intelligence. In 2022 IEEE International Conference on Development and Learning (ICDL) 36–41 (IEEE, 2022).

Frank, M. C. Bridging the data gap between children and large language models. Trends Cogn. Sci. 27 , 990–992 (2023).

Object stimuli. Brady Lab https://bradylab.ucsd.edu/stimuli/ObjectCategories.zip

Konkle, T., Brady, T., Alvarez, G. & Oliva, A. Conceptual distinctiveness supports detailed visual long-term memory for real-world objects. J. Exp. Psychol. Gen. 139 , 558 (2010).

Lomonaco, V. & Maltoni, D. CORe50 Dataset. GitHub https://vlomonaco.github.io/core50 (2017).

Lomonaco, V. & Maltoni, D. CORe50: a new dataset and benchmark for continuous object recognition. In Proc. 1st Annual Conference on Robot Learning (eds Levine, S. et al.) 17–26 (PMLR, 2017).

Russakovsky, O. et al. ImageNet Dataset. https://www.image-net.org/download.php (2015).

Geirhos, R. et al. Shortcut learning in deep neural networks. Nat. Mach. Intell. 2 , 665–673 (2020).

Geirhos, R. et al. Partial success in closing the gap between human and machine vision. Adv. Neur. In. 34 , 23885–23899 (2021).

Geirhos, R. et al. ImageNet OOD Dataset. GitHub https://github.com/bethgelab/model-vs-human (2021).

Mehrer, J., Spoerer, C., Jones, E., Kriegeskorte, N. & Kietzmann, T. An ecologically motivated image dataset for deep learning yields better models of human vision. Proc. Natl Acad. Sci. USA 118 , e2011417118 (2021).

Mehrer, J., Spoerer, C., Jones, E., Kriegeskorte, N. & Kietzmann, T. Ecoset Dataset. Hugging Face https://huggingface.co/datasets/kietzmannlab/ecoset (2021).

Zhou, B., Lapedriza, A., Khosla, A., Oliva, A. & Torralba, A. Places: a 10 million image database for scene recognition. IEEE T. Pattern Anal. 40 , 1452–1464 (2017).

Zhou, B. et al. Places365 Dataset. http://places2.csail.mit.edu (2017).

Pont-Tuset, J. et al. The 2017 DAVIS challenge on video object segmentation. Preprint at https://arxiv.org/abs/1704.00675 (2017).

Pont-Tuset, J. et al. DAVIS-2017 evaluation code, dataset and results. https://davischallenge.org/davis2017/code.html (2017).

Lin, T. et al. Microsoft COCO: common objects in context. In Computer Vision – ECCV 2014 (eds Fleet, D. et al.) 740–755 (2014).

COCO Dataset. https://cocodataset.org/#download (2014).

Jabri, A., Owens, A. & Efros, A. Space-time correspondence as a contrastive random walk. Adv. Neur. In. 33 , 19545–19560 (2020).

Kinetics-700-2020 Dataset. https://github.com/cvdfoundation/kinetics-dataset#kinetics-700-2020 (2020).

Ego4D Dataset. https://ego4d-data.org/ (2022).

Kingma, D. & Ba, J. Adam: A method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).

VQGAN resources. GitHub https://github.com/CompVis/taming-transformers (2021).

Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B. & Hochreiter, S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Adv. Neural Inf. Process. Syst. 30 , 6629–6640 (2017).

Orhan, A. E. eminorhan/silicon-menagerie: v1.0.0-alpha. Zenodo https://doi.org/10.5281/zenodo.8322408 (2023).

Download references

Acknowledgements

We thank W. K. Vong, A. Tartaglini and M. Ren for helpful discussions and comments on an earlier version of this paper. This work was supported by the DARPA Machine Common Sense program (B.M.L.) and NSF Award 1922658 NRT-HDR: FUTURE Foundations, Translation and Responsibility for Data Science (B.M.L.).

Author information

Authors and affiliations.

Center for Data Science, New York University, New York, NY, USA

A. Emin Orhan & Brenden M. Lake

Department of Psychology, New York University, New York, NY, USA

Brenden M. Lake

You can also search for this author in PubMed   Google Scholar

Contributions

A.E.O. and B.M.L. conceptualized and designed the study. A.E.O. implemented the experiments. A.E.O. analysed the results with feedback from B.M.L. A.E.O. wrote the first draft. B.M.L. reviewed and edited the paper.

Corresponding author

Correspondence to A. Emin Orhan .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature Machine Intelligence thanks Rhodri Cusack, Cliona O’Doherty, Masataka Sawayama and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information.

Supplementary Figs. 1–8 and Tables 1 and 2.

Reporting Summary

Rights and permissions.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article.

Orhan, A.E., Lake, B.M. Learning high-level visual representations from a child’s perspective without strong inductive biases. Nat Mach Intell 6 , 271–283 (2024). https://doi.org/10.1038/s42256-024-00802-0

Download citation

Received : 24 May 2023

Accepted : 05 February 2024

Published : 07 March 2024

Issue Date : March 2024

DOI : https://doi.org/10.1038/s42256-024-00802-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Artificial intelligence tackles the nature–nurture debate.

  • Justin N. Wood

Nature Machine Intelligence (2024)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

what are visual representations

longtable \setkeys Ginwidth= \Gin@nat@width ,height= \Gin@nat@height ,keepaspectratio \addbibresource references.bib

Universal dimensions of visual representation

Do neural network models of vision learn brain-aligned representations because they share architectural constraints and task objectives with biological vision or because they learn universal features of natural image processing? We characterized the universality of hundreds of thousands of representational dimensions from visual neural networks with varied construction. We found that networks with varied architectures and task objectives learn to represent natural images using a shared set of latent dimensions, despite appearing highly distinct at a surface level. Next, by comparing these networks with human brain representations measured with fMRI, we found that the most brain-aligned representations in neural networks are those that are universal and independent of a network’s specific characteristics. Remarkably, each network can be reduced to fewer than ten of its most universal dimensions with little impact on its representational similarity to the human brain. These results suggest that the underlying similarities between artificial and biological vision are primarily governed by a core set of universal image representations that are convergently learned by diverse systems.

1 Introduction

Deep neural networks have a remarkable ability to simulate the representations of biological vision \autocite kriegeskorte2015deep, yamins2014performance, Conwell2022.03.28.485868, elmoznino2024high. However, due to their immense complexity, the principles that govern the brain-aligned representations of deep networks remain poorly understood.

A leading approach interprets neural network representations in terms of their architectures and task objectives, which are thought to function as key constraints on a network’s learned representations \autocite yamins2016using, richards2019deep, cao2024explanatory, doerig2023neuroconnectionist, kanwisher2023using. However, an alternative possibility is that the brain-aligned representations of neural networks are not contingent on specific optimization constraints but instead reflect universal aspects of natural image representation that emerge in diverse systems \autocite guth2024on, huh2024platonic, elmoznino2024high.

Here we sought to determine if the representations that neural networks share with human vision are universal across networks. We examined over 200,000 dimensions of natural image representation in deep neural networks with varied designs. Our analyses revealed the existence of universal dimensions that are shared across networks and emerge under highly varied optimization conditions. Universal dimensions were observed across the full depth of network layers and across a variety of architectures and task objectives. Visualizations of these dimensions show that they do not simply encode low-level image statistics but also higher-level semantic properties. We next compared these dimensions to the representations of the human brain measured with fMRI, and we found that universal dimensions are highly brain-aligned and underlie conventional measures of representational similarity between neural networks and visual cortex. Together, these findings demonstrate the striking degree to which the shared properties of artificial and biological vision correspond to general-purpose representations that have little do to with the details of a network’s architecture or task objective.

2.1 Assessing universality and brain similarity

We sought to compare two fundamental quantities of representational dimensions in neural networks: 1) their universality across varied networks and 2) their similarity to human brain representations. Here we briefly describe how we computed these two quantities. A more detailed description is provided in the Methods.

Refer to caption

As illustrated in Figure 1 , we characterized universality and brain similarity by examining the activations of networks and the human brain to a large and diverse set of natural images from the Microsoft Common Objects in Context (COCO) image database \autocite lin2014microsoft. For each latent dimension d 𝑑 d italic_d of a network’s activations (i.e., each principal component), we computed a universality metric by obtaining its average predictability from the activations of m 𝑚 m italic_m other networks:

(1)

For each dimension d 𝑑 d italic_d of a network’s activations, we also computed a brain-similarity metric by obtaining its average predictability from the fMRI activations of n 𝑛 n italic_n human brains:

(2)

We examined the universality of representational dimensions among several sets of vision networks that varied in their random initializations, architectures, or task objectives. The specific networks used for these analyses are described in 4.2.1 and listed in Tables S1 and S2 . To assess brain similarity, we compared network representations with image-evoked fMRI responses from the Natural Scenes Dataset (NSD) \autocite allen2022massive, which is the largest existing fMRI dataset of natural scene perception in the human brain. We focused on a portion of this dataset that contains fMRI responses to 872 images shown to each of eight participants. This dataset is ideally suited for assessing whether the dimensions of natural image representations in neural networks can also be found in the representations of the human brain. For our main analyses, we focused on a general region of interest that included all voxels in visual cortex whose activity was modulated by the presentation of visual stimuli (Fig. 1 ).

2.2 Universality across initialization weights

We used universality and brain similarity to address a central question: Are there universal dimensions of natural image representation that are shared by neural networks and humans? We first performed these analyses for a setting in which we naturally expected to find shared network dimensions. Specifically, we examined universality among a set of networks that were initialized with different random weights but were otherwise identical (i.e., same architecture, task, and training data). These networks were 20 ResNet-18 architectures trained on image classification using the Tiny ImageNet dataset \autocite he2016deep, schurholt2022model, le2015tiny. For each dimension in each network layer, we computed its universality using the other networks as predictors. We examined network layers spanning the full range of model depth, except for the final classification layer. We iterated this analysis over a total of 36,596 dimensions across all networks.

As shown in the left panel of Figure 2 , we observed universality scores spanning the full range from 0 to 1, with a high density of points around 0, indicating that most dimensions are idiosyncratic. The high density of idiosyncratic dimensions could reflect representations present at initialization that remain largely unchanged during training, or they could reflect unique representational strategies learned by specific network instances. In contrast, the universal dimensions at the other end of the scale reflect convergent representations that reliably emerge in all networks despite differences in their starting points. Notably, these universal dimensions account for a relatively small subset of the total number of network dimensions, as illustrated by the lower density of points at the high end of the universality axis.

We next compared these network dimensions to human brain representations, and we found that the universal dimensions exhibit exceptionally strong brain similarity scores (Fig. 2 ). This demonstrates that among the many network dimensions examined here, it is only those that are invariably learned by networks with different initial conditions that are also strongly shared with the representations of the human visual system.

In follow-up analyses, we found that the trend in Figure 2 was consistently observed in each network layer (Fig. 3 ), each individual network (Fig. S1 ), each individual fMRI subject (Fig. S2 ), and in multiple regions of interest in visual cortex (Fig. S3 ). The observation of this effect in all network layers shows that universality is not restricted to low-level features in early layers but instead extends across the full depth of the network. We emphasize that the universality and brain similarity metrics in our analysis pipeline are not guaranteed to be related to one another. Using simulated data, we can trivially obtain a range of universality and brain similarity scores while observing no positive relationship between these two metrics (Fig. S4 ). Together, these findings show that among a preponderance of idiosyncratic network dimensions, there exists a smaller subset of highly convergent dimensions that are learned by different network instances and are shared with human vision.

Refer to caption

2.3 Universality across architectures and tasks

We next sought to determine if universal dimensions can be detected among networks with varied architectures and tasks. To this end, we quantified universality and brain similarity for two sets of models. The first was a set of models with different architectures but trained on the same task. Specifically, we examined 19 networks trained to perform ImageNet classification using varied architectures, including convolutional models, vision transformers, and MLP-Mixers. The second was a set of 9 models with the same architecture (ResNet-50) but trained on different tasks. The tasks included object classification and a variety of self-supervised tasks, such as contrastive learning, identifying image rotations, and solving jigsaw puzzles. For all models, we examined layers spanning the full network depth, except for the final layer. Further details about the models can be found in Tables S1 and S2 . In total, we examined 149,743 dimensions in the set of varied architectures and 43,132 dimensions in the set of varied tasks.

The findings for these two sets of models were surprisingly consistent with those observed for models with varied initializations (Fig. 2 ). We again found that most dimensions are idiosyncratic (i.e., specific to a model) and not shared with the brain, as shown by the high density of points near the origin. Again, we also found that a subset of dimensions exhibit exceptionally high scores on both the universality and brain similarity metrics. These latter dimensions correspond to representations that reliably emerge across many models despite variations in their architectures and the tasks that they were trained to perform. Furthermore, the generality of these representations extends beyond artificial vision, as they are also strongly shared with the representations of the human visual system. Remarkably, these findings are highly similar when considering networks that vary in either architectures or task objectives. This suggests that the underlying similarities among the representations of these networks—as well as their similarities to human vision—are only weakly influenced by architecture and task but instead reflect highly general properties of image representations in deep networks.

We also note that there is a striking paucity of points in the upper left quadrant of the plots in Figure 2 . Points in this quadrant would correspond to representations shared with the brain but learned only by networks with specific optimization constraints—namely, specific architectures or tasks. The lack of points in this quadrant suggests that the details of architectures and tasks have a relatively minor role in shaping the brain-aligned representations of neural networks.

In follow-up analyses, we again found that these results are highly robust—they were observed in each network layer (Fig. 3 ), each individual network (Fig. S1 ), each individual fMRI subject (Fig. S2 ), and in multiple regions of interest in visual cortex (Fig. S3 ). Together, these findings reveal the remarkable degree to which vision systems with varied architectures and task objectives can nonetheless converge on a set of general-purpose representations that are shared not only across models but also between artificial and biological vision.

2.4 Universality across untrained networks

Our analyses thus far have focused on sets of trained neural networks, and we have interpreted the dimensions in the upper right quadrant of the plots in Figure 2 as learned representations. However, we expect that shared dimensions can also be found among sets of untrained models due to statistical regularities in the activations that natural images elicit in networks with random filters. We thus wondered whether our findings for the trained networks could be explained by the statistics of image activations alone—without any need for learning—or whether they diverge from the trends observed in randomly initialized networks. To address this question, we examined 20 ResNet-18 architectures that were randomly initialized with the same seeds as the trained models presented in the left panel of Figure 2 . We followed the same procedures as in the preceding analyses. For each dimension in each network layer, we computed its universality using the other networks as predictors, and we iterated this analysis over a total of 9,413 dimensions from all networks.

These analyses showed that, as expected, there is a wide range of universality scores for the untrained network dimensions, with some dimensions that are found in all networks (Fig. 2 ). These universal dimensions of untrained networks correspond to representations that consistently emerge when propagating natural images through a hierarchy of random convolutional filters. They are thus due to image statistics alone and not learned representational properties. However, importantly, the relationship between universality and brain similarity diverges from the relationship that was observed for trained models. Specifically, for untrained networks, we observe a shallow and approximately linear relationship between universality and brain similarity, whereas for trained networks, brain similarity exhibits a sharp nonlinear increase at the high end of the universality axis. As a result, the shared dimensions of untrained networks have substantially lower brain similarity scores than the shared dimensions of trained networks. As in the previous sets of analyses, we again found that these results were consistent in each network layer (Fig. 3 ), each individual network (Fig. S1 ), each individual fMRI subject (Fig. S2 ), and in multiple regions of interest in visual cortex (Fig. S3 ). In sum, when comparing the trends for trained and untrained networks, the findings demonstrate that the universal dimensions of trained networks reflect learned representational properties that cannot be explained by image statistics and random features alone.

2.5 Universality and the visual hierarchy

Previous work has shown that in many neural networks trained on natural images, the first layer contains general-purpose V1-like filters tuned to orientation, frequency, and color, whereas subsequent layers contain filters that appear to be increasingly specialized \autocite NIPS2012_c399862d, yosinski2015understanding. This suggests the possibility that universality may only be prominent in early network layers and then rapidly diminish across the network hierarchy. To address this possibility, we examined the universality and brain similarity of network representations in individual layers along the full depth of each network. Figure 3 shows the results of these analyses for all sets of models. Across all sets of trained models, we found relatively similar distributions of universality scores at all sampled layers, with highly universal dimensions detected even in the deepest layers that we examined. Furthermore, these analyses show that the relationship between universality and brain similarity is consistent across layers. Thus, these findings suggest that at all levels of network depth, we can find general-purpose representations that are reliably learned by diverse networks and are strongly shared with the human brain.

2.6 Universal dimensions and high-level image properties

The findings from the previous section show that universal dimensions are not restricted to early network layers. However, these findings do not directly address the question of whether the shared dimensions of later layers represent high-level semantic properties or low-level image statistics, such as luminance gradients and spatial frequency distributions. To address this question, we performed exploratory visualization analyses of the universal dimensions in later network layers. Specifically, we examined the penultimate layer of the models from the varied-task set, as shown in the middle right panel of Figure 2 . We focused on the varied-task set because the models in this set have the same ResNet-50 architecture, which allowed us to examine representations from the same targeted layer in all networks. We concatenated the dimensions from the penultimate layer across all networks and ranked their universality scores. We then selected the top 100 dimensions with the highest universality scores and visualized the image representations of these 100 dimensions in a 2D space using uniform manifold approximation and projection (UMAP) \autocite mcinnes2018umap-software.

As shown in Figure 4 , these representations exhibit rich high-level organization, with images grouped into clusters of semantically related items, such as people, sports, animals, and food. For comparison, we also generated UMAP embeddings of the 100 dimensions with the lowest universality scores from this layer. We found that these idiosyncratic dimensions exhibit no clear semantic organization (Fig. S5 ). We also generated UMAP embeddings of the top 100 universal dimensions in the penultimate layer of the untrained-network set, which is shown in the rightmost panel of Figure 2 . In contrast to the trained networks, the universal dimensions of untrained networks emphasize prominent low-level features, such as coarse luminance gradients (Fig. S6 ). Together, these visualization analyses show that the universal representations of high-level layers in trained networks encode high-level image properties that group images into semantically meaningful clusters. This suggests that there are common organizing principles of high-level image semantics that are universally learned.

Refer to caption

2.7 Universal dimensions and representational similarity analysis

Our findings thus far show that the universal dimensions of networks can be strongly predicted from human brain representations. We next sought to evaluate the effect of universal dimensions on a conventional representational similarity analysis (RSA). Specifically, we performed a targeted analysis to determine if universal dimensions drive the representational similarity scores obtained from comparisons of neural networks with visual cortex. To do so, we conducted a standard RSA on networks with representations reduced to low-dimensional subspaces of their most universal dimensions. We performed this analysis for all sets of networks examined in Figure 2 . Following the RSA procedures in \autocite Conwell2022.03.28.485868, we split the stimuli into training and test sets. We selected the best-performing layer from each network on the training set and computed the final RSA score for the selected layers on the test set. We then reduced each network to a subset of its most universal dimensions and computed RSA scores for these reduced model representations. We analyzed the same general region of interest in visual cortex as in the preceding analyses. We found that even when the networks are reduced to just ten or five universal dimensions, their RSA scores exhibit little or no decrease—in fact, for all three sets of trained networks, they slightly improve (Fig. 5 ). Similar results were observed when performing these analyses in individual subjects (Fig. S7 ) and in other regions of interest (Fig. S8 ). These findings suggest that the representational similarities between neural networks and visual cortex are largely driven by subspaces of network dimensions that are universal.

Refer to caption

3 Discussion

Our work reveals universal dimensions of natural image representation that are learned by artificial vision systems and are shared with the human brain. These dimensions emerge in diverse neural networks despite variation in their architectures and task objectives. The role of these dimensions in vision appears to be general-purpose—they are not specialized for any single task but instead support many downstream objectives. Universal representations are found at all levels of visual processing in deep networks, from low-level image properties in early layers to high-level semantics in late layers. Together, these findings suggest that machines and humans share a set of general-purpose visual representations that can be learned under highly varied conditions.

Deep learning is now the standard framework for computational modeling in neuroscience, and many previous efforts have sought to understand these deep learning models in terms of their specialization: that is, what objectives they are specialized for, and what specific network characteristics underlie their similarities to the brain \autocite yamins2016using, richards2019deep, cao2024explanatory, doerig2023neuroconnectionist, kanwisher2023using. Our work views the representations of deep networks from a different perspective. Rather than searching for specific model characteristics that might be associated with stronger alignment with the brain, we sought to discover the elements of network representations that are instead invariant across models. Using this approach, we found that crucial aspects of deep network representations—those that are most strongly shared with the human brain—are, to a remarkable degree, independent of the network characteristics that many previous studies have emphasized. The invariance of these representations implies that they are not primarily governed by the details of a network’s architecture or task objective but instead by more general principles of natural image representation in deep vision systems \autocite guth2024on, huh2024platonic.

Our findings suggest several exciting directions for future work. First, our approach could be extended beyond vision models to examine the representational dimensions that are shared across vision and language. Previous work has shown that language-model embeddings of object names and scene captions are predictive of image representations in high-level visual cortex \autocite carlson2014emergence, bonner2021object, doerig2024visualrepresentationshumanbrain. An open question is whether networks trained on language data alone learn the same universal dimensions of natural scene representation as image-trained networks. Second, our findings show that universal dimensions emerge in networks despite differences in the tasks that the networks are optimized to perform. This raises the intriguing possibility that universal dimensions could be hard-coded into networks at initialization, potentially making the learning process faster and more data efficient. Third, while previous work has revealed similarities between the visual cortex representations of humans and monkeys \autocite kriegeskorte2008matching, we still know little about the degree to which representational dimensions may be universal or species-specific across mammalian vision. This question could be addressed by applying our approach to recordings of cortical responses to the same stimuli in different species.

In sum, our results show that the most brain-aligned representations of visual neural networks are universal and independent of a network’s specific characteristics. What fundamental principles might explain the convergence of networks to universal dimensions? Theories of efficient coding suggest that frequency- and orientation-tuned filters are consistently observed in the first layer of vision systems because they constitute efficient bases that are adapted to the statistics of natural images \autocite olshausen1996emergence, simoncelli2001natural. It remains an open question whether this efficient-coding hypothesis can be extended to a deep hierarchy, which could potentially explain universal dimensions as a consequence of optimal image encoding. An alternative possibility is that deep networks learn shared representations of the true generative factors in the visual world—e.g., the objects, materials, contexts, and optical phenomena that make a scene. This could be the case if the optimal strategy for solving challenging tasks on natural stimuli is to learn the invariant properties of reality \autocite huh2024platonic, and it would suggest that the universal dimensions detected here reflect a shared internal model of the visual environment in machines and humans.

4.1 Natural Scenes Dataset

4.1.1 stimuli and experimental design.

The Natural Scenes Dataset (NSD) is a large-scale publicly available fMRI dataset on human vision that is described in detail in a previous report \autocite allen2022massive. Here we briefly review the key attributes of this dataset. The NSD study sourced color natural scene stimuli from the Microsoft Common Objects in Context (COCO) database \autocite lin2014microsoft and collected 7T fMRI responses (1.8mm voxels, 1.6s TR) from eight adult subjects who viewed these stimuli while performing a continuous recognition memory task, namely to respond if the presented image was seen before in the experiment. Each subject viewed approximately 10,000 stimuli with three repetitions, though some subjects saw fewer stimuli because they did not complete all scanning sessions. Among the stimuli, 872 "shared" images were viewed by all subjects at least once. We used these shared images and the corresponding fMRI data for our main analyses.

4.1.2 Data preprocessing

We used the NSD single-trial betas, preprocessed in 1.8-mm volume space and denoised with the GLMdenoise technique ( version 3 ; betas_fithrf_GLMdenoiseRR ). Subsequently, the betas were transformed to z-scores within each individual scanning session, as recommended by the authors \autocite allen2022massive. For all analyses, we used the averaged betas across repetitions.

4.1.3 Regions of interest

Our main analyses focused on the nsdgeneral region of interest (ROI), which includes a large swath of visual cortex. This ROI, as defined in the NSD study \autocite allen2022massive, contains all voxels that were reliably modulated by the presentation of visual stimuli, comprising approximately 15,000 voxels in each subject. We also conducted follow-up analyses in smaller ROIs, including the ventral , parietal , lateral streams, using the "streams" ROIs provided in the NSD dataset.

4.2 Deep neural networks

4.2.1 model sets.

We examined four sets of DNN models to probe the universality of representational features across different factors of variation:

Random seeds in trained models

Architectures

Task objectives

Random seeds in untrained models.

The first set included 20 pretrained ResNet-18 models examined in a previous study of model hyperparameters \autocite schurholt2022model. These models were initialized with unique random seeds and trained on Tiny ImageNet \autocite le2015tiny. We extracted features from nine rectified linear unit (ReLU) layers that span the depth of the ResNet-18 architecture, except for the final output layer. In total, we extracted 36,596 dimensions from this set.

The second set included 19 pretrained models with varied architectures, which included various convolutional networks, transformers, and MLP-Mixers (Table S1 ). All were trained on ImageNet for object classification \autocite russakovsky2015imagenet and obtained from the torchvision library \autocite paszke2019pytorch,rw2019timm. Given the variety of architectures in this set, the sampled layers included ReLU, normalization, attention, multi-layer perceptron, and other model-specific operations. The sampled layers span the full range of layer depth in each model (additional details are provided in Table S1 ). The complete list of these layers are included in the supplement file model_layer.csv . A total of 149,743 dimensions were extracted from this set.

The third set included 9 pretrained ResNet-50 models from the torchvision library \autocite paszke2019pytorch and the VISSL model zoo \autocite goyal2021vissl. These models were trained to perform a variety of tasks on ImageNet images \autocite russakovsky2015imagenet (Table S2 ). A total of 43,132 dimensions were extracted from all ReLU layers, except for the final output layer.

The fourth set included 20 untrained ResNet-18 models \autocite he2016deep with different random weights, which were created using Kaiming normal initialization \autocite he2015delving. Each model had a unique random seed. A total of 9,413 dimensions were extracted from all ReLU layers. Note that the lower number of dimensions here relative to the set of trained models with different random seeds is due to the low-rank activation matrices of untrained networks.

4.2.2 Feature extraction

Before computing our universality and brain-similarity metrics, we first needed to extract a set of feature activations from each model layer. We sought to quantify these metrics for the features representations in each network rather than for the spatial representations. We thus applied global max-pooling to remove spatial information from the activations of each model layer. For convolutional networks, pooling was applied across the height and width dimensions, and for networks with patch embeddings, pooling was applied across patch dimensions. To extract all orthogonal dimensions from each model layer, we first performed PCA on the activations to the 72,128 "unshared" images from NSD, retaining all PCs up to the matrix rank, computed with the default procedure in torch.linalg.matrix_rank in PyTorch \autocite paszke2019pytorch. We then transformed the 872 "shared" images from NSD to the PC basis and computed universality and brain similarity metrics for each PC.

4.3 Metrics

4.3.1 universality.

Our universality metric estimates the degree to which a representational dimension is shared across multiple DNNs. For a given dimension in a target network, we used cross-validated ridge regression to predict its activations as a linear combination of the activations from another predictor network. We performed this analysis using the same "shared" NSD images that were used to compute brain similarity (described in 4.3.2 ). The regressors consisted of activations concatenated across all sampled layers of the predictor network. The procedure for cross-validated ridge regression is described in 4.3.3 . We computed the mean Pearson correlation between the predicted and actual responses of the target dimension across all cross-validation folds, and we repeated this process, using every network other than the target as the predictor. We then obtained the universality score by taking the median correlation across all predictor networks. We used median instead of mean to ensure that the final summary statistic was not driven by a small subset of predictor networks with exceptionally high or low scores. This entire procedure was repeated to obtain universality scores for all dimension in all sampled layers of the target network, and it was then repeated with each network as the target.

4.3.2 Brain similarity

Our brain similarity metric estimates the degree to which a dimension in a DNN can be predicted from fMRI responses measured in human visual cortex. Given a target dimension from a network, we used cross-validated ridge regression to predict its activations as a linear combination of the trial-averaged fMRI responses to the "shared" images in a single subject. We computed the mean Pearson correlation between the predicted and actual responses of the target dimension across all cross-validation folds, and we repeated this procedure for all fMRI subjects. Brain similarity is defined as the mean score across all subjects.

4.3.3 Cross-validated ridge regression

We computed universality and brain similarity scores using ridge regression with a nested cross-validation design. The outer loop of this cross-validation design had five folds. We fit the parameters of the ridge regression on four folds of training data. We first selected the optimal ridge penalty for each target dimension from values with equal logarithmic spacing between 10 − 3 superscript 10 3 10^{-3} 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT to 10 4 superscript 10 4 10^{4} 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT . The optimal ridge penalty was the one that yielded that best performance when applying leave-one-out cross-validation to the training data. We then fit the regression weights using the full set of training data and the optimal ridge penalty, and we applied these regression weights to generate predicted responses in the held-out fold of test data. Performance was evaluated as the correlation between the predicted and actual responses on the held-out test data. This procedure was repeated using all five folds as held-out test data, and the performance scores were average across folds.

4.3.4 Representational similarity analysis

We computed conventional representational similarity analysis (RSA) scores for comparisons of networks and visual cortex \autocite kriegeskorte2008representational, adapting the procedure described in \autocite Conwell2022.03.28.485868. We split the 872 "shared" images into a training set and a test set of 436 images each. In each set, representational dissimilarity matrices (RDMs) were created by calculating Pearson correlation distances for pairwise comparisons of image representations within each network layer and each fMRI subject. For each network layer, we computed RDMs using the same globally pooled channel activations that were used to compute universality and brain similarity scores. RSA scores were obtained by calculating the Spearman correlation between the RDMs for a network layer and an fMRI subject, and these scores were averaged across subjects. For each network, the best-performing layer was selected based on the RSA scores in the training set, and a final RSA score was computed for each network using the selected layer in the held-out test set. We next examined the contribution of universal dimensions to the RSA scores by reducing each network to the subspace spanned by its top ten or five universal dimensions. Specifically, we reconstructed the test-set activations of each network using only the top ten or five most universal dimensions, and we re-computed the final RSA score on these reconstructed test data.

5 Data availability

The Natural Scenes Dataset is available at https://naturalscenesdataset.org/ \autocite allen2022massive.

6 Code availability

Code for all analyses in this study is available at https://github.com/zche377/universal_dimensions .

7 Acknowledgements

This research was supported in part by a JHU Catalyst Award to MFB and grant NSF PHY-2309135 to the Kavli Institute for Theoretical Physics (KITP).

8 Supplementary material

Refer to caption

width = colspec = Q[260]Q[227]Q[154]Q[179]Q[100], cells = c, cell22 = r=19, cell24 = r=19, cell25 = r=12, cell145 = r=7, vlines, hline1-2,21 = -, hline3-20 = 1,3, hline14 = 5, Architecture & Learning objective Architecture type Training data Source ResNet18 Object classification Convolutional ImageNet PyTorch ResNet50 Convolutional ResNeXT50_32x4d Convolutional Wide_ResNet50_2 Convolutional AlexNet Convolutional VGG16 Convolutional DenseNet121 Convolutional SqueezeNet1_1 Convolutional ShuffleNet_v2_x1_0 Convolutional ConveNeXt_tiny Convolutional Swin_t Transformer MaxVit_t Transformer Cait_xxs24_224 Transformer Timm Coat_lite_tiny Transformer Deit_tiny_patch16_224 Transformer Levit_128 Transformer Mixer_b16_224 MLP-Mixer ResMLP_12_224 MLP-Mixer Dla34 Convolutional

width = colspec = Q[165]Q[265]Q[210]Q[179]Q[115], cells = c, cell21 = r=9, cell24 = r=9, cell33 = r=8, cell35 = r=8, vlines, hline1-2,11 = -, hline3 = 2-3,5, hline4-10 = 2, Architecture & Learning objective Training setting Training data Source ResNet50 Object classification Supervised ImageNet PyTorch Jigsaw Self-supervised VISSL RotNet ClusterFit NPID++ PIRL SimCLR SwAV DeepClusterV2

  • Neuroscience

Neural dynamics of visual working memory representation during sensory distraction

  • Jonas Karolis Degutis author has email address
  • Simon Weber
  • John-Dylan Haynes
  • Bernstein Center for Computational Neuroscience Berlin and Berlin Center for Advanced Neuroimaging, Charité Universitätsmedizin Berlin, corporate member of the Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
  • Max Planck School of Cognition, Leipzig, Germany
  • Department of Psychology, Humboldt-Universität zu Berlin, Berlin, Germany
  • Research Training Group “Extrospection” and Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, Berlin, Germany
  • Institute of Psychology, Otto von Guericke University, Mageburg, Germany
  • Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
  • German Center for Neurodegenerative Diseases, Göttingen, Germany
  • Research Cluster of Excellence “Science of Intelligence”, Technische Universität Berlin, Berlin, Germany
  • Collaborative Research Center “Volition and Cognitive Control”, Technische Universität Dresden, Dresden, Germany
  • https://doi.org/ 10.7554/eLife.99290.1
  • Open access
  • Copyright information

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

  • Reviewing Editor Gui Xue Beijing Normal University, Beijing, China
  • Senior Editor Tirin Moore Howard Hughes Medical Institute, Stanford University, Stanford, United States of America

Reviewer #1 (Public Review):

In this study, the authors re-analyzed Experiment 1 of a public dataset (Rademaker et al, 2019, Nature Neuroscience) which includes fMRI and behavioral data recorded while participants held an oriented grating in visual working memory (WM) and performed a delayed recall task at the end of an extended delay period. In that experiment, participants were pre-cued on each trial as to whether there would be a distracting visual stimulus presented during the delay period (filtered noise or randomly oriented grating). In this manuscript, the authors focused on identifying whether the neural code in the retinotopic cortex for remembered orientation was 'stable' over the delay period, such that the format of the code remained the same, or whether the code was dynamic, such that information was present, but encoded in an alternative format. They identify some time points - especially towards the beginning/end of the delay - where the multivariate activation pattern fails to generalize to other time points and interpret this as evidence for a dynamic code. Additionally, the authors compare the representational format of remembered orientation in the presence vs absence of a distracting stimulus, averaged over the delay period. This analysis suggested a 'rotation' of the representational subspace between distracting orientations and remembered orientations, which may help preserve simultaneous representations of both remembered and viewed stimuli.

(1) Direct comparisons of coding subspaces/manifolds between time points and task conditions is an innovative and useful approach for understanding how neural representations are transformed to support cognition.

(2) Re-use of existing datasets substantially goes beyond the authors' previous findings by comparing the geometry of representational spaces between conditions and time points, and by looking explicitly for dynamic neural representations

Weaknesses:

(1) Only Experiment 1 of Rademaker et al (2019) is reanalyzed. The previous study included another experiment (Expt 2) using different types of distractors which did result in distractor-related costs to neural and behavioral measures of working memory. The Rademaker et al (2019) study uses these two results to conclude that neural WM representations are protected from distraction when distraction does not impact behavior, but conditions that do impact behavior also impact neural WM representations. Considering this previous result is critical for relating the present manuscript's results to the previous findings, it seems necessary to address Experimentt 2's data in the present work

(2) Primary evidence for 'dynamic coding', especially in the early visual cortex, appears to be related to the transition between encoding/maintenance and maintenance/recall, but the delay period representations seem overall stable, consistent with previous findings

(3) Dynamicism index used in Figure 1f quantifies the proportion of off-diagonal cells with significant differences in decoding performance from the diagonal cell. It's unclear why the proportion of time points is the best metric, rather than something like a change in decoding accuracy. This is addressed in the subsequent analysis considering coding subspaces, but the utility of the Figure 1f analysis remains weakly justified.

(4) There is no report of how much total variance is explained by the two PCs defining the subspaces of interest in each condition, and timepoint. It could be the case that the first two principal components in one condition (e.g., sensory distractor) explain less variance than the first two principal components of another condition.

(5) Converting a continuous decoding metric (angular error) to "% decoding accuracy" serves to obfuscate the units of the actual results. Decoding precision (e.g., sd of decoding error histogram) would be more interpretable and better related to both the previous study and behavioral measures of WM performance.

(6) This report does not make use of behavioral performance data in the Rademaker et al (2019) dataset.

(7) Given there were observed differences between individual retinotopic ROIs in the temporal cross-decoding analyses shown in Figure 1, the lack of data presented for the subspace analyses for the corresponding individual ROIs is a weakness

  • https://doi.org/ 10.7554/eLife.99290.1.sa1

Reviewer #2 (Public Review):

In this work, Degutis and colleagues addressed an interesting issue related to the concurrent coding of sensory percepts and visual working memory contents in visual cortices. They used generalization analyses to test whether working memory representations change over time, diverge from sensory percepts, and vary across distraction conditions. Temporal generalization analysis demonstrated that off-diagonal decoding accuracies were lower than on-diagonal decoding accuracies, regardless of the presence of intervening distractions, implying that working memory representations can change over time. They further showed that the coding space for working memory contents showed subtle but statistically significant changes over time, potentially explaining the impaired off-diagonal decoding performance. The neural coding of sensory distractions instead remained largely stable. Generalization analyses between target and distractor codes showed overlaps but were not identical. Cross-condition decodings had lower accuracies compared to within-condition decodings. Finally, within-condition decoding revealed more reliable working memory representations in the condition with intervening random noises compared to cross-condition decoding using a trained classifier on data from the no-distraction condition, indicating a change in the VWM format between the noise distractor and no-distractor trials.

This paper demonstrates a clever use of generalization analysis to show changes in the neural codes of working memory contents across time and distraction conditions. It provides some insights into the differences between representations of working memory and sensory percepts, and how they can potentially coexist in overlapping brain regions.

(1) An alternative interpretation of the temporal dynamic pattern is that working memory representations become less reliable over time. As shown by the authors in Figure 1c and Figure 4a, the on-diagonal decoding accuracy generally decreased over time. This implies that the signal-to-noise ratio was decreasing over time. Classifiers trained with data of relatively higher SNR and lower SNR may rely on different features, leading to poor generalization performance. This issue should be addressed in the paper.

(2) The paper tests against a strong version of stable coding, where neural spaces representing WM contents must remain identical over time. In this version, any changes in the neural space will be evidence of dynamic coding. As the paper acknowledges, there is already ample evidence arguing against this possibility. However, the evidence provided here (dynamic coding cluster, angle between coding spaces) is not as strong as what prior studies have shown for meaningful transformations in neural coding. For instance, the principal angle between coding spaces over time was smaller than 8 degrees, and around 7 degrees between sensory distractors and WM contents. This suggests that the coding space for WM was largely overlapping across time and with that for sensory distractors. Therefore, the major conclusion that working memory contents are dynamically coded is not well-supported by the presented results.

(3) Relatedly, the main conclusions, such as "VWM code in several visual regions did not generalize well between different time points" and "VWM and feature-matching sensory distractors are encoded in separable coding spaces" are somewhat subjective given that cross-condition generalization analyses consistently showed above chance-level performance. These results could be interpreted as evidence of stable coding. The authors should use more objective descriptions, such as 'temporal generalization decoding showed reduced decoding accuracy in off-diagonals compared to on-diagonals.

  • https://doi.org/ 10.7554/eLife.99290.1.sa0

Be the first to read new articles from eLife

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

electronics-logo

Article Menu

what are visual representations

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Real-time dense visual slam with neural factor representation.

what are visual representations

1. Introduction

  • We propose a novel scene representation based on neural factors, demonstrating higher-quality scene reconstruction and a more compact model memory footprint in large-scale scenes.
  • In order to address insufficient real-time performance due to high MLP query costs in previous neural implicit SLAM methods, we introduce an efficient rendering approach using feature integration. This improves real-time performance without relying on a customized CUDA framework.
  • We conducted extensive experiments on both synthetic and real-world datasets to validate our design choices, achieving competitive performance against baselines in terms of 3D reconstruction, camera localization, runtime, and memory usage.

2. Related Work

2.1. traditional dense visual slam, 2.2. neural implicit representations, 2.3. neural implicit dense visual slam, 3.1. neural factors representation, 3.2. feature integration rendering, 3.3. training, 3.3.1. loss functions, 3.3.2. tracking, 3.3.3. mapping, 4. experiments, 4.1. experimental setup, 4.1.1. datasets, 4.1.2. baselines, 4.1.3. evaluation metrics, 4.1.4. hyperparameters, 4.1.5. post-processing, 4.2. reconstruction evaluation, 4.3. camera localization evaluation, 4.4. runtime and memory usage analysis, 4.5. ablation study, 5. conclusions, author contributions, data availability statement, acknowledgments, conflicts of interest.

  • Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 2015 , 31 , 1147–1163. [ Google Scholar ] [ CrossRef ]
  • Mur-Artal, R.; Tardós, J.D. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans. Robot. 2017 , 33 , 1255–1262. [ Google Scholar ] [ CrossRef ]
  • Campos, C.; Elvira, R.; Rodríguez, J.J.G.; Montiel, J.M.; Tardós, J.D. Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Trans. Robot. 2021 , 37 , 1874–1890. [ Google Scholar ] [ CrossRef ]
  • Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2021 , 65 , 99–106. [ Google Scholar ] [ CrossRef ]
  • Sucar, E.; Liu, S.; Ortiz, J.; Davison, A.J. iMAP: Implicit mapping and positioning in real-time. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 6229–6238. [ Google Scholar ]
  • Zhu, Z.; Peng, S.; Larsson, V.; Xu, W.; Bao, H.; Cui, Z.; Oswald, M.R.; Pollefeys, M. Nice-slam: Neural implicit scalable encoding for slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12786–12796. [ Google Scholar ]
  • Johari, M.M.; Carta, C.; Fleuret, F. Eslam: Efficient dense slam system based on hybrid representation of signed distance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 17408–17419. [ Google Scholar ]
  • Wang, H.; Wang, J.; Agapito, L. Co-SLAM: Joint Coordinate and Sparse Parametric Encodings for Neural Real-Time SLAM. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 13293–13302. [ Google Scholar ]
  • Chen, A.; Xu, Z.; Geiger, A.; Yu, J.; Su, H. Tensorf: Tensorial radiance fields. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 June 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 333–350. [ Google Scholar ]
  • Chen, A.; Xu, Z.; Wei, X.; Tang, S.; Su, H.; Geiger, A. Factor fields: A unified framework for neural fields and beyond. arXiv 2023 , arXiv:2302.01226. [ Google Scholar ]
  • Han, K.; Xiang, W.; Yu, L. Volume Feature Rendering for Fast Neural Radiance Field Reconstruction. arXiv 2023 , arXiv:2305.17916. [ Google Scholar ]
  • Klein, G.; Murray, D. Parallel tracking and mapping for small AR workspaces. In Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, Nara, Japan, 13–16 November 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 225–234. [ Google Scholar ]
  • Newcombe, R.A.; Lovegrove, S.J.; Davison, A.J. DTAM: Dense tracking and mapping in real-time. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 2320–2327. [ Google Scholar ]
  • Newcombe, R.A.; Izadi, S.; Hilliges, O.; Molyneaux, D.; Kim, D.; Davison, A.J.; Kohi, P.; Shotton, J.; Hodges, S.; Fitzgibbon, A. Kinectfusion: Real-time dense surface mapping and tracking. In Proceedings of the 2011 10th IEEE International Symposium on Mixed and Augmented Eeality, Basel, Switzerland, 26–29 October 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 127–136. [ Google Scholar ]
  • Schops, T.; Sattler, T.; Pollefeys, M. Bad slam: Bundle adjusted direct rgb-d slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 134–144. [ Google Scholar ]
  • Bloesch, M.; Czarnowski, J.; Clark, R.; Leutenegger, S.; Davison, A.J. Codeslam—Learning a compact, optimisable representation for dense visual slam. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2560–2568. [ Google Scholar ]
  • Teed, Z.; Deng, J. Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras. Adv. Neural Inf. Process. Syst. 2021 , 34 , 16558–16569. [ Google Scholar ]
  • Sucar, E.; Wada, K.; Davison, A. NodeSLAM: Neural object descriptors for multi-view shape reconstruction. In Proceedings of the 2020 International Conference on 3D Vision (3DV), Fukuoka, Japan, 25–28 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 949–958. [ Google Scholar ]
  • Li, R.; Wang, S.; Gu, D. DeepSLAM: A robust monocular SLAM system with unsupervised deep learning. IEEE Trans. Ind. Electron. 2020 , 68 , 3577–3587. [ Google Scholar ] [ CrossRef ]
  • Takikawa, T.; Litalien, J.; Yin, K.; Kreis, K.; Loop, C.; Nowrouzezahrai, D.; Jacobson, A.; McGuire, M.; Fidler, S. Neural geometric level of detail: Real-time rendering with implicit 3d shapes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 11358–11367. [ Google Scholar ]
  • Yu, A.; Li, R.; Tancik, M.; Li, H.; Ng, R.; Kanazawa, A. Plenoctrees for real-time rendering of neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 5752–5761. [ Google Scholar ]
  • Sun, C.; Sun, M.; Chen, H.T. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5459–5469. [ Google Scholar ]
  • Li, H.; Yang, X.; Zhai, H.; Liu, Y.; Bao, H.; Zhang, G. Vox-surf: Voxel-based implicit surface representation. IEEE Trans. Vis. Comput. Graph. 2022 , 30 , 1743–1755. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Chan, E.R.; Lin, C.Z.; Chan, M.A.; Nagano, K.; Pan, B.; De Mello, S.; Gallo, O.; Guibas, L.J.; Tremblay, J.; Khamis, S.; et al. Efficient geometry-aware 3d generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 16123–16133. [ Google Scholar ]
  • Müller, T.; Evans, A.; Schied, C.; Keller, A. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. (TOG) 2022 , 41 , 1–15. [ Google Scholar ] [ CrossRef ]
  • Yang, X.; Li, H.; Zhai, H.; Ming, Y.; Liu, Y.; Zhang, G. Vox-Fusion: Dense tracking and mapping with voxel-based neural implicit representation. In Proceedings of the 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Singapore, 17–21 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 499–507. [ Google Scholar ]
  • Sandström, E.; Li, Y.; Van Gool, L.; Oswald, M.R. Point-slam: Dense neural point cloud-based slam. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 18433–18444. [ Google Scholar ]
  • Straub, J.; Whelan, T.; Ma, L.; Chen, Y.; Wijmans, E.; Green, S.; Engel, J.J.; Mur-Artal, R.; Ren, C.; Verma, S.; et al. The Replica dataset: A digital replica of indoor spaces. arXiv 2019 , arXiv:1906.05797. [ Google Scholar ]
  • Dai, A.; Chang, A.X.; Savva, M.; Halber, M.; Funkhouser, T.; Nießner, M. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5828–5839. [ Google Scholar ]
  • Sturm, J.; Engelhard, N.; Endres, F.; Burgard, W.; Cremers, D. A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 7–12 October 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 573–580. [ Google Scholar ]
  • Dai, A.; Nießner, M.; Zollhöfer, M.; Izadi, S.; Theobalt, C. Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration. ACM Trans. Graph. (ToG) 2017 , 36 , 1. [ Google Scholar ] [ CrossRef ]
  • Lorensen, W.E.; Cline, H.E. Marching cubes: A high resolution 3D surface construction algorithm. In Seminal Graphics: Pioneering Efforts that Shaped the Field ; Association for Computing Machinery: New York, NY, USA, 1998; pp. 347–353. [ Google Scholar ]
  • Cignoni, P.; Callieri, M.; Corsini, M.; Dellepiane, M.; Ganovelli, F.; Ranzuglia, G. Meshlab: An open-source mesh processing tool. In Proceedings of the Eurographics Italian Chapter Conference, Salerno, Italy, 2–4 July 2008; Volume 2008, pp. 129–136. [ Google Scholar ]

Click here to enlarge figure

MethodMetricroom0room1room2office0office1office2office3office4Avg.
iMAP * [ ]Acc. ↓5.755.446.327.5810.258.916.895.347.06
Comp. ↓5.965.385.215.165.496.045.756.575.69
Comp. Ratio (%) ↑67.1368.9171.6970.1473.4763.9467.6862.3068.16
Depth L1 ↓5.555.476.937.638.1310.619.668.447.8
NICE-SLAM [ ]Acc. ↓1.471.211.661.281.021.541.951.601.47
Comp. ↓1.511.271.651.741.041.622.571.671.63
Comp. Ratio (%) ↑98.1698.7196.4295.9898.8397.1992.0597.3496.83
Depth L1 ↓3.383.033.762.622.314.128.192.733.77
Vox-Fusion * [ ]Acc. ↓1.081.051.211.410.821.311.341.311.19
Comp. ↓1.071.971.621.580.841.371.371.441.41
Comp. Ratio (%) ↑99.4694.7696.3795.8099.1298.2097.5597.3297.32
Depth L1 ↓1.6210.433.064.122.052.853.114.223.93
Co-SLAM [ ]Acc. ↓1.111.141.170.990.761.361.441.241.15
Comp. ↓1.061.371.140.920.781.331.361.161.14
Comp. Ratio (%) ↑99.6297.4997.8699.0799.2598.8198.4898.9698.69
Depth L1 ↓1.546.413.051.661.682.712.551.822.68
ESLAM [ ]Acc. ↓1.070.850.930.850.831.021.21 0.97
Comp. ↓1.120.881.050.960.811.091.42 1.05
Comp. Ratio (%) ↑99.0699.6498.8498.3498.8598.6096.8098.6098.60
Depth L1 ↓0.971.071.280.861.261.711.431.181.18
OursAcc. ↓ 1.0
Comp. ↓ 1.07
Comp. Ratio (%) ↑
Depth L1 ↓
Methodroom0room1room2office0office1office2office3office4Avg.
iMAP * [ ]3.883.012.432.671.074.684.832.483.13
NICE-SLAM [ ]1.761.972.21.440.921.432.561.551.73
VoxFusion * [ ]0.731.11.17.41.261.870.931.491.98
CoSLAM [ ]0.822.031.340.60.652.021.370.881.21
ESLAM [ ]0.710.70.520.570.550.580.720.630.63
Point-SLAM [ ]0.610.410.370.380.480.540.690.720.52
Ours 0.35
Scene ID000000590106016901810207Avg.
iMAP * [ ]32.217.312.017.427.912.719.42
NICE-SLAM [ ]13.312.87.813.213.96.211.2
VoxFusion * [ ]11.626.39.132.322.17.418.13
CoSLAM [ ]7.912.69.56.612.97.19.43
ESLAM [ ]7.38.57.56.5 5.77.4
Point-SLAM [ ]10.247.818.6522.1614.779.5412.19
Ours 10.3
Methodfr1/deskfr2/xyzfr3/officeAvg.
iMAP * [ ]5.92.27.65.23
NICE-SLAM [ ]2.723115.216.31
VoxFusion * [ ]3.21.625.410.06
CoSLAM [ ]2.881.852.912.55
ESLAM [ ]2.47 2.00
Point-SLAM [ ]4.341.313.483.04
Ours 1.322.48
MethodTracking TimeMapping TimeFPSParam.
(ms/it) ↓(ms/it) ↓(Hz) ↑(MB) ↓
iMAP * [ ]34.6320.150.18
NICE-SLAM [ ]7.4830.590.7111.56
Vox-Fusion * [ ]11.252.71.671.19
Co-SLAM [ ] 14.33 0.26
ESLAM [ ]7.1120.325.66.79
Point-SLAM [ ]12.2335.210.2927.23
Ours6.47 9.810.15
MethodMetricroom0room1room2office0office1office2office3office4Avg.
w/o separate factor gridsAcc. ↓10.820.860.760.660.981.131.080.91
Comp. ↓1.040.850.960.80.7211.171.120.94
Comp. Ratio (%) ↑99.4799.6299.1499.5499.3199.3598.9099.0999.3
Depth L1 ↓0.921.131.220.751.201.171.531.311.22
w/o multi-level basis gridsAcc. ↓1.110.971.031.051.061.01.181.121.07
Comp. ↓1.0411.240.930.980.981.191.161.06
Comp. Ratio (%) ↑99.6399.4997.4699.5898.6299.5298.9998.9299.03
Depth L1 ↓0.911.652.751.121.761.661.671.291.60
w/o feature integration renderingAcc. ↓0.980.790.860.790.730.931.081.030.90
Comp. ↓1.010.810.980.820.710.961.141.090.94
Comp. Ratio (%) ↑99.5199.7398.9499.5599.3299.4198.8499.2899.32
Depth L1 ↓0.820.891.220.71.011.571.230.771.03
Ours (Complete model)Acc. ↓
Comp. ↓
Comp. Ratio (%) ↑
Depth L1 ↓
Methodroom0room1room2office0office1office2office3office4Avg.
w/o separate factor grids0.620.920.590.410.540.560.560.570.59
w/o multi-level basis grids0.631.670.670.921.480.560.600.860.92
w/o feature integration rendering0.540.60.420.350.24 0.470.360.42
Ours (Complete model) 0.43
Scene ID000000590106016901810207Avg.
w/o separate factor grids7.48.58.16.210.84.47.57
w/o multi-level basis grids7.39.98.06.011.95.98.17
w/o feature integration rendering7.38.57.66.410.84.57.51
Ours (Complete model)
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Wei, W.; Wang, J.; Xie, X.; Liu, J.; Su, P. Real-Time Dense Visual SLAM with Neural Factor Representation. Electronics 2024 , 13 , 3332. https://doi.org/10.3390/electronics13163332

Wei W, Wang J, Xie X, Liu J, Su P. Real-Time Dense Visual SLAM with Neural Factor Representation. Electronics . 2024; 13(16):3332. https://doi.org/10.3390/electronics13163332

Wei, Weifeng, Jie Wang, Xiaolong Xie, Jie Liu, and Pengxiang Su. 2024. "Real-Time Dense Visual SLAM with Neural Factor Representation" Electronics 13, no. 16: 3332. https://doi.org/10.3390/electronics13163332

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Generation (not production) improves the fidelity of visual representations in picture naming

  • Brief Report
  • Published: 26 August 2024

Cite this article

what are visual representations

  • Jedidiah W. Whitridge   ORCID: orcid.org/0000-0003-1237-4977 1 ,
  • Chris A. Clark 1 ,
  • Kathleen L. Hourihan 1 &
  • Jonathan M. Fawcett 1 , 2  

The production effect refers to the finding that participants better remember items read aloud than items read silently. This pattern has been attributed to aloud items being relatively more distinctive in memory than silent items, owing to the integration of additional sensorimotor features within the encoding episode that are thought to facilitate performance at test. Other theorists have instead argued that producing an item encourages additional forms of processing not limited to production itself. We tested this hypothesis using a modified production task where participants named monochromatic line drawings aloud or silently either by generating the names themselves (no label condition) or reading a provided label (label condition). During a later test, participants were presented with each line drawing a second time and required to reproduce the original color and location using a continuous slider. Production was found to improve memory for visual features, but only when participants were required to generate the label themselves. Our findings support the notion that picture naming improves memory for visual features; however, this benefit appears to be driven by factors related to response generation rather than production itself.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

what are visual representations

Similar content being viewed by others

what are visual representations

Pure-list production improves item recognition and sometimes also improves source memory

what are visual representations

Slow naming of pictures facilitates memory for their names

what are visual representations

Task preparation as a mnemonic: The benefits of drawing (and not drawing)

Data availability.

Data and analysis scripts are available at https://doi.org/10.5281/zenodo.13270669 . Pictorial stimuli used in the study are available at http://timbrady.org/resources.html .

Code availability

Data and analysis scripts are available at https://doi.org/ https://doi.org/10.5281/zenodo.13270669 .

Although between-subjects production benefits for overall recall appear unreliable (e.g., Jones & Pyc, 2014 ), recent research suggests that production in between-subjects recall paradigms interacts with serial position such that a positive production effect emerges for items near the end of a list, while a reverse production effect (i.e., silent > aloud) occurs for early items (e.g., Gionet et al., 2022 ; Saint-Aubin et al., 2021 ; for a meta-analysis, see Fawcett et al., 2023 ).

Our initial design was conceived as three experiments, with the first excluding the verbal label, the second providing the verbal label concurrent to the object, and the third providing the verbal label preceding the object. However, due to challenges with data acquisition during the recent pandemic, our sample was smaller for the third experiment than anticipated; because no differences were observed between the second and third experiments, they have been combined here to maximize statistical power for the critical comparison between the label and non-label groups. While a balanced design is necessitated by some conventional approaches (e.g., ANOVA; see, e.g., Shaw & Mitchell-Olds, 1993), multilevel models are robust to substantial inequalities across cells (Clarke, 2008 ). Thus, we affirm that the unequal number of participants in each condition had little bearing on the analyses reported herein.

A further 18, 34, and nine participants completed the experiment in the no-label, label presented concurrent, and label presented preceding conditions, respectively, but were excluded for failing attentional checks (i.e., misreporting what they were meant to do for each instruction image), reporting off-task behavior (e.g., watching a movie), or responding to either judgment on average in less than ~ 1 s (suggesting barely enough time to interact with the continuous judgment on most trials). Whilst high for an in-person task, these exclusion rates and justification are typical of online studies as detailed in a recent review of the area (Thomas & Clifford, 2017 ).

A comparable Frequentist ANOVA applied to the absolute angular distance between the target and selected feature revealed the same, with a main effect of production, F (1,206) = 15.67, MSe  = 41.77, p  < .001, and label, F (1,206) = 13.66, MSe  = 577.29, p  < .001, but also a significant interaction, F (1,206) = 19.40, MSe  = 577.29, p  < .001, such that a production effect was observed for the no-label group, t (59) = 5.97, p  < .001, but not the label group, t (147) = 0.97, p  = .336. Inclusion of the dependent measure (color, location) as an additional factor produced the same results, with neither the main effect of measure, F (1,206) = 0.01, MSe  = 244.62, p  = .962, nor any of the interactions involving measure, all F s < 1 and all p s > .40, reached significance.

Using a paradigm like that of the present study, Overkott and Souza ( 2022 ) found that naming unlabelled pictures did not benefit long-term memory for stimulus color. However, participants in Overkott and Souza were subject to tests of short-term memory every three trials and evaluation of long-term memory consisted of 288 trials, more than thrice that of the present study. Given this variation and the fact that long-term memory performance was near floor for participants in Overkott and Souza, we do not believe these results are directly comparable to those of the present study.

More recent efforts by Overkott et al. ( 2023 ) demonstrated a mnemonic benefit for color memory in an alternate labeling paradigm, wherein participants labeled the color of an object which was common across trials. Although this paradigm bears similarities to our own investigation, a key feature of the production effect is that the benefit arises only when the productive act is item-specific, distinguishing production from labelling (see, e.g., MacLeod, 2011 ; MacLeod et al., 2010 ; Richler et al., 2013 ). Moreover, Overkott et al. ( 2023 ) assessed memory every three trials, which differs substantially from our assessment of participants’ long-term memory for a total of 80 items. Given these critical methodological differences, it seems unlikely that the developmental theoretical framework proposed by Overkott et al. ( 2023 ) possesses relevant implications for our findings.

Richler et al., ( 2013 , Experiment 2) found that relative to silent reading, producing the names of unlabeled pictures yielded a substantial numerical advantage in forced-choice discrimination between targets and similar exemplars. However, this trend was not subject to statistical analysis.

For the silent no-label condition, the mixture model revealed mnemonic advantages for the probability of remembering features of M  = 12.1%, 95% CI [0.03 – 0.21], and M  = 12.6%, 95% CI [0.03 – 0.22], relative to the silent and aloud label conditions, respectively.

Bailey, L. M., Bodner, G. E., Matheson, H. E., Stewart, B. M., Roddick, K., O’Neil, K., Simmons, M., Lambert, A. M., Krigolson, O. E., Newman, A. J., & Fawcett, J. M. (2021). Neural correlates of the production effect: An fMRI study. Brain and Cognition, 152 (2021), 105757. https://doi.org/10.1016/j.bandc.2021.105757

Article   PubMed   Google Scholar  

Bellezza, F. S. (1981). Mnemonic devices: Classification, characteristics, and criteria. Review of Educational Research, 51 (2), 247–275. https://doi.org/10.3102/00346543051002247

Article   Google Scholar  

Bodner, G. E., Huff, M. J., & Taikh, A. (2020). Pure-list production improves item recognition and sometimes also improves source memory. Memory & Cognition, 48 , 1281–1294. https://doi.org/10.3758/s13421-020-01044-2

Boucart, M., Meyer, M. E., Pins, D., Humphreys, G. W., Scheiber, C., Gounod, D., & Foucher, J. (2000). Automatic object identification: An fMRI study. NeuroReport, 11 (11), 2379–2383. https://doi.org/10.1097/00001756-200008030-00009

Brady, T. F., Konkle, T., Alvarez, G. A., & Oliva, A. (2008). Visual long-term memory has a massive storage capacity for object details. PNAS Proceedings of the National Academy of Sciences of the United States of America, 105 (38), 14325–14329. https://doi.org/10.1073/pnas.0803390105

Brady, T. F., Konkle, T., Gill, J., Oliva, A., & Alvarez, G. A. (2013). Visual long-term memory has the same limit on fidelity as visual working memory. Psychological Science, 24 (6), 981–990. https://doi.org/10.1177/0956797612465439

Bürkner, P. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software , 80 , 1–28. https://doi.org/10.18637/jss.v080.i01

Clarke, P. (2008). When can group level clustering be ignored? Multilevel models versus single-level models with sparse data. Journal of Epidemiology and Community Health, 62 (8), 752–761. https://doi.org/10.1136/jech.2007.060798

Conway, M. A., & Gathercole, S. E. (1987). Modality and long-term memory. Journal of Memory and Language, 26 (3), 341–361. https://doi.org/10.1016/0749-596X(87)90118-5

de Leeuw, J. R., Gilbert, R. A., & Luchterhandt, B. (2023). jsPsych: Enabling an open-source collaborative ecosystem of behavioral experiments. Journal of Open Source Software , 8 (85), 5351. https://doi.org/10.21105/joss.05351

Dell’acqua, R., & Job, R. (1998). Is object recognition automatic? Psychonomic Bulletin & Review, 5 , 496–503. https://doi.org/10.3758/BF03208828

Ekstrand, B. R., Wallace, W. P., & Underwood, B. J. (1966). Frequency theory of verbal-discrimination learning. Psychological Review, 73 (6), 566–578. https://doi.org/10.1037/h0023876

Fawcett, J. M. (2013). The production effect benefits performance in between-subject designs: A meta-analysis. Acta Psychologica, 142 , 1–5. https://doi.org/10.1016/j.actpsy.2012.10.001

Fawcett, J. M., Baldwin, M. M., Whitridge, J. W., Swab, M., Malayang, K., Hiscock, B., Drakes, D. H., & Willoughby, H. V. (2023). Production improves recognition and reduces intrusions in between-subject designs: An updated meta-analysis. Canadian Journal of Experimental Psychology, 77 , 35–44. https://doi.org/10.1037/cep0000302

Fawcett, J. M., Bodner, G. E., Paulewicz, B., Rose, J., & Wakeham-Lewis, R. (2022). Production can enhance semantic encoding: Evidence from forced-choice recognition with homophone versus synonym lures. Psychonomic Bulletin & Review, 29 (6), 2256–2263. https://doi.org/10.3758/s13423-022-02140-x

Fawcett, J. M., Lawrence, M. A., & Taylor, T. L. (2016). The representational consequences of intentional forgetting: Impairments to both the probability and fidelity of long-term memory. Journal of Experimental Psychology: General, 145 , 56–81. https://doi.org/10.1037/xge0000128

Fawcett, J. M., & Ozubko, J. D. (2016). Familiarity, but not recollection, supports the between-subject production effect in recognition memory. Canadian Journal of Experimental Psychology, 70 (2), 99–115. https://doi.org/10.1037/cep0000089

Article   PubMed   PubMed Central   Google Scholar  

Fawcett, J. M., Quinlan, C. K., & Taylor, T. L. (2012). Interplay of the production and picture superiority effects: A signal detection analysis. Memory (hove), 20 (7), 655–666. https://doi.org/10.1080/09658211.2012.693510

Forrin, N. D., MacLeod, C. M., & Ozubko, J. D. (2012). Widening the boundaries of the production effect. Memory & Cognition, 40 , 1046–1055. https://doi.org/10.3758/s13421-012-0210-8

Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models . Cambridge University Press.

Book   Google Scholar  

Gionet, S., Guitard, D., & Saint-Aubin, J. (2022). The production effect interacts with serial positions: Further evidence from a between-subjects manipulation. Experimental Psychology, 69 , 12–22. https://doi.org/10.1027/1618-3169/a000540

Hassall, C. D., Quinlan, C. K., Turk, D. J., Taylor, T. L., & Krigolson, O. E. (2016). A preliminary investigation into the neural basis of the production effect. Canadian Journal of Experimental Psychology, 70 (2), 139–146. https://doi.org/10.1037/cep0000093

Hebb, D. O. (1949). The organization of behavior; a neuropsychological theory . Wiley.

Google Scholar  

Hopkins, R. H., & Edwards, R. E. (1972). Pronunciation effects in recognition memory. Journal of Verbal Learning and Verbal Behavior, 11 (4), 534–537. https://doi.org/10.1016/S0022-5371(72)80036-7

Hourihan, K. L., & Churchill, L. A. (2020). Production of picture names improves picture recognition. Canadian Journal of Experimental Psychology, 74 , 35–43. https://doi.org/10.1037/cep0000185

Jacoby, L. L. (1983). Remembering the data: Analyzing interactive processes in reading. Journal of Verbal Learning and Verbal Behavior, 22 (5), 485–508. https://doi.org/10.1016/S0022-5371(83)90301-8

James, W. (1890).  The principles of psychology, Vol. 1.  Henry Holt and Co.  https://doi.org/10.1037/10538-000

Johnson, C. J., Paivio, A., & Clark, J. M. (1996). Cognitive components of picture naming. Psychological Bulletin, 120 , 113–139. https://doi.org/10.1037/0033-2909.120.1.113

Jones, A. C., & Pyc, M. A. (2014). The production effect: Costs and benefits in free recall. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40 , 300–305. https://doi.org/10.1037/a0033337

Jurica, P. J., & Shimamura, A. P. (1999). Monitoring item and source information: Evidence for a negative generation effect in source memory. Memory & Cognition, 27 (4), 648–656. https://doi.org/10.3758/BF03211558

Lawrence, M. A. (2010). Estimating the probability and fidelity of memory. Behavior Research Methods, 42 (4), 957–968. https://doi.org/10.3758/BRM.42.4.957

Lin, O. Y. H., & MacLeod, C. M. (2012). Aging and the production effect: A test of the distinctiveness account. Canadian Journal of Experimental Psychology, 66 (3), 212–216. https://doi.org/10.1037/a0028309

MacLeod, C. M. (2011). I said, you said: The production effect gets personal. Psychonomic Bulletin & Review, 18 (6), 1197–1202. https://doi.org/10.3758/s13423-011-0168-8

MacLeod, C. M., & Bodner, G. E. (2017). The production effect in memory. Current Directions in Psychological Science, 26 (4), 390–395. https://doi.org/10.1177/0963721417691356

MacLeod, C. M., Gopie, N., Hourihan, K. L., Neary, K. R., & Ozubko, J. D. (2010). The production effect: Delineation of a phenomenon. Journal of Experimental Psychology. Learning, Memory, and Cognition , 36 (3), 671–685. https://doi.org/10.1037/a0018785

MacLeod, C. M., Ozubko, J. D., Hourihan, K. L., & Major, J. C. (2022). The production effect is consistent over material variations: Support for the distinctiveness account. Memory, 30 (8), 1000–1007. https://doi.org/10.1080/09658211.2022.2069270

Mama, Y., Fostick, L., & Icht, M. (2018). The impact of different background noises on the production effect. Acta Psychologica, 185 , 235–242. https://doi.org/10.1016/j.actpsy.2018.03.002

Mama, Y., & Icht, M. (2016). Auditioning the distinctiveness account: Expanding the production effect to the auditory modality reveals the superiority of writing over vocalising. Memory, 24 (1), 98–113. https://doi.org/10.1080/09658211.2014.986135

Mulligan, N. W. (2004). Generation and Memory for Contextual Detail. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30 (4), 838–855. https://doi.org/10.1037/0278-7393.30.4.838

Mulligan, N. W. (2011). Generation disrupts memory for intrinsic context but not extrinsic context. The Quarterly Journal of Experimental Psychology, 64 (8), 1543–1562. https://doi.org/10.1080/17470218.2011.562980

Mulligan, N. W., Lozito, J. P., & Rosner, Z. A. (2006). Generation and context memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32 (4), 836–846. https://doi.org/10.1037/0278-7393.32.4.836

Nieznański, M. (2011). Generation difficulty and memory for source. The Quarterly Journal of Experimental Psychology, 64 (8), 1593–1608. https://doi.org/10.1080/17470218.2011.566620

Nieznański, M. (2012). Effects of generation on source memory: A test of the resource tradeoff versus processing hypothesis. Journal of Cognitive Psychology, 24 (7), 765–780. https://doi.org/10.1080/20445911.2012.690555

Overkott, C., & Souza, A. S. (2022). Verbal descriptions improve visual working memory but have limited impact on visual long-term memory. Journal of Experimental Psychology: General, 151 (2), 321–347. https://doi.org/10.1037/xge0001084

Overkott, C., Souza, A. S., & Morey, C. C. (2023). The developing impact of verbal labels on visual memories in children. Journal of Experimental Psychology: General, 152 (3), 825–838. https://doi.org/10.1037/xge0001305

Ozubko, J. D., Gopie, N., & MacLeod, C. M. (2012). Production benefits both recollection and familiarity. Memory & Cognition, 40 (3), 326–338. https://doi.org/10.3758/s13421-011-0165-1

Ozubko, J. D., & MacLeod, C. M. (2010). The production effect in memory: Evidence that distinctiveness underlies the benefit. Journal of Experimental Psychology: Learning, Memory, and Cognition, 36 (6), 1543–1547. https://doi.org/10.1037/a0020604

Ozubko, J. D., Major, J., & MacLeod, C. M. (2014). Remembered study mode: Support for the distinctiveness account of the production effect. Memory, 22 (5), 509–524. https://doi.org/10.1080/09658211.2013.800554

Park, D. C., & Mason, D. A. (1982). Is there evidence for automatic processing of spatial and color attributes present in pictures and words? Memory & Cognition, 10 , 76–81. https://doi.org/10.3758/BF03197628

Quinlan, C. K., & Taylor, T. L. (2013). Enhancing the production effect in memory. Memory (hove), 21 (8), 904–915. https://doi.org/10.1080/09658211.2013.766754

Quinlan, C. K., & Taylor, T. L. (2019). Mechanisms Underlying the Production Effect for Singing. Canadian Journal of Experimental Psychology, 73 (4), 254–264. https://doi.org/10.1037/cep0000179

R Core Team (2020). R: A language and environment for statistical computing . R Foundation for Statistical Computing, Vienna, Austria. Retrieved from https://www.R-project.org/ . Accessed 1 Jan 2024

Richler, J. J., Palmeri, T. J., & Gauthier, I. (2013). How does using object names influence visual recognition memory? Journal of Memory and Language, 68 , 10–25. https://doi.org/10.1016/j.jml.2012.09.001

Saint-Aubin, J., Yearsley, J. M., Poirier, M., Cyr, V., & Guitard, D. (2021). A model of the production effect over the short-term: The cost of relative distinctiveness. Journal of Memory and Language, 118 , 104219. https://doi.org/10.1016/j.jml.2021.104219

Thomas, K. A., & Clifford, S. (2017). Validity and Mechanical Turk: An assessment of exclusion methods and interactive experiments. Computers in Human Behavior, 77 , 184–197. https://doi.org/10.1016/j.chb.2017.08.038

Varao Sousa, T. L., Carriere, J. S., & Smilek, D. (2013). The way we encounter reading material influences how frequently we mind wander. Frontiers in Psychology, 4 , 892. https://doi.org/10.3389/fpsyg.2013.00892

Willoughby, H. V., Tiller, J., Hourihan, K. L., & Fawcett, J. M. (2019).  The pupillometric production effect: Measuring attentional engagement during a production task  [Paper presentation]. CSBBCS 2019 Meeting, Waterloo, Canada.

Zhang, B., Meng, Z., Li, Q., Chen, A., & Bodner, G. E. (2023). EEG-based univariate and multivariate analyses reveal that multiple processes contribute to the production effect in recognition. Cortex, 165 , 57–69. https://doi.org/10.1016/j.cortex.2023.04.006

Zhang, W., & Luck, S. J. (2008). Discrete fixed-resolution representations in visual working memory. Nature, 453 (7192), 233–235. https://doi.org/10.1038/nature06860

Zormpa, E., Brehm, L. E., Hoedemaker, R. S., & Meyer, A. S. (2019a). The production effect and the generation effect improve memory in picture naming. Memory, 27 (3), 340–352. https://doi.org/10.1080/09658211.2018.1510966

Zormpa, E., Meyer, A. S., & Brehm, L. E. (2019b). Slow naming of pictures facilitates memory for their names. Psychonomic Bulletin & Review, 26 , 1675–1682. https://doi.org/10.3758/s13423-019-01620-x

Download references

Jonathan M. Fawcett was supported by a grant from the Natural Sciences and Engineering Research Council of Canada (NSERC).

Author information

Authors and affiliations.

Department of Psychology, Memorial University of Newfoundland, St John’s, NF, Canada

Jedidiah W. Whitridge, Chris A. Clark, Kathleen L. Hourihan & Jonathan M. Fawcett

Psychology Department, Memorial University of Newfoundland, St John’s, NL, Canada

Jonathan M. Fawcett

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Jonathan M. Fawcett .

Ethics declarations

Ethics approval.

The study was approved by the Interdisciplinary Committee on Ethics in Human Research (ICEHR) at Memorial University of Newfoundland and Labrador. All aspects of the study adhered to ethical standards outlined in the 1964 Helinski Declaration.

Consent to participate

All participants in this study provided informed consent to participate.

Consent to publish

All participants in this study provided informed consent to have their data published.

Open practices statement

The experiment reported herein was not preregistered. The materials for the experiment are available at http://timbrady.org/resources.html . Data and analysis scripts are available at https://doi.org/ https://doi.org/10.5281/zenodo.13270669 .

Conflicts of interest

The authors declare no competing financial or proprietary interests relevant to the content of the article.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 233 KB)

Rights and permissions.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Whitridge, J.W., Clark, C.A., Hourihan, K.L. et al. Generation (not production) improves the fidelity of visual representations in picture naming. Psychon Bull Rev (2024). https://doi.org/10.3758/s13423-024-02566-5

Download citation

Accepted : 18 June 2024

Published : 26 August 2024

DOI : https://doi.org/10.3758/s13423-024-02566-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Distinctiveness
  • Find a journal
  • Publish with us
  • Track your research

COMMENTS

  1. What is Visual Representation?

    Visual Representation refers to the principles by which markings on a surface are made and interpreted. Designers use representations like typography and illustrations to communicate information, emotions and concepts. Color, imagery, typography and layout are crucial in this communication. Alan Blackwell, cognition scientist and professor ...

  2. What is visual representation? » Design Match

    Visual representation is the act of conveying information, ideas, or concepts through visual elements such as images, charts, graphs, maps, and other graphical forms. Learn how visual representation enhances understanding, facilitates data visualization, and expresses cultural and artistic narratives.

  3. IRIS

    Page 5: Visual Representations. Yet another evidence-based strategy to help students learn abstract mathematics concepts and solve problems is the use of visual representations. More than simply a picture or detailed illustration, a visual representation—often referred to as a schematic representation or schematic diagram— is an accurate ...

  4. The role of visual representations in scientific practices: from

    The use of visual representations (i.e., photographs, diagrams, models) has been part of science, and their use makes it possible for scientists to interact with and represent complex phenomena, not observable in other ways. Despite a wealth of research in science education on visual representations, the emphasis of such research has mainly been on the conceptual understanding when using ...

  5. Visualizations That Really Work

    Summary. Not long ago, the ability to create smart data visualizations (or dataviz) was a nice-to-have skill for design- and data-minded managers. But now it's a must-have skill for all managers ...

  6. Visualization (graphics)

    As a subject in computer science, scientific visualization is the use of interactive, sensory representations, typically visual, of abstract data to reinforce cognition, hypothesis building, and reasoning. Scientific visualization is the transformation, selection, or representation of data from simulations or experiments, with an implicit or explicit geometric structure, to allow the ...

  7. Creating visual explanations improves learning

    Chemists routinely use visual representations to investigate relationships and move between the observable, physical level and the invisible particulate level (Kozma, Chin, Russell, & Marx, 2002). Generating explanations in a visual format may be a particularly useful learning tool for this domain.

  8. The Epistemology of Visual Thinking in Mathematics

    Visual representations can help us grasp what motivates certain definitions and arguments, and thereby deepen our understanding. Abundant confirmation of this claim can be gathered from working through the text Visual Complex Analysis (Needham 1997). Some mathematical subjects have natural visual representations, which then give rise to a ...

  9. Visual Representation

    Visual representation of the external world has been exercised by humans for thousands of years and, in recent history, this has extended to abstract worlds as well. Visual metaphors have been used so widely that human cognition is considered tightly interweaved, and sometimes even identified, with human vision. ...

  10. Visual Representations for Science Teaching and Learning

    Visual representations not only promote generative learning (Fiorella & Mayer, 2015) but also, in our understanding, active learning, when it manages to mobilize people to think about what they want to learn, for example, about living beings or to plan a way of approaching a problem, such as the design of a prototype to move fragile objects ...

  11. Learning by Drawing Visual Representations: Potential, Purposes, and

    The technique of drawing to learn has received increasing attention in recent years. In this article, we will present distinct purposes for using drawing that are based on active, constructive, and interactive forms of engagement.

  12. Visualisation: visual representations of data and information

    Learn how to interpret and create visual representations of data and information that help us to see things in a different way. Explore various visualisation models such as cartograms, choropleth maps and hyperbolic trees.

  13. Step 2: Understanding Visual Representation(s)

    Consequently, a visual representation is an event, process, state, or object that carries meaning and that is perceived through the visual sensory channel. Of course, this is a broad definition. It includes writing, too, because writing is perceived visually and refers to a given meaning.

  14. The Power of Visualization in Math

    Thinking about many possible visual representations is the first step in creating a good one for students. The Progressions published in tandem with the Common Core State Standards for mathematics are one resource for finding specific visual models based on grade level and standard. In my fifth-grade example, what I constructed was a sequenced ...

  15. The Pitfalls of Visual Representations: A Review and Classification of

    Visual representations may accentuate biases in decision making by increasing attention to particular attributes or less diagnostic information. High requirement on training and resources (Chen, 2005; van Wijk, 2006) The use of certain images or visual applications requires extensive training and support.

  16. Understanding Without Words: Visual Representations in Math, Science

    The first one considered the importance of visual representations in science and its recent debate in education. It was already shown by philosophers of the Wiener Kreis that visual representation could serve for a better understanding and dissemination of knowledge to the broader public. As knowledge can be condensed in different non-verbal ...

  17. (PDF) Effective Use of Visual Representation in Research and Teaching

    experiences of using various forms of visual represe ntation in their research, academic. practice and learning and teaching. 2. Visual representation in the process of learning and teaching ...

  18. What is Information Visualization?

    Additionally, visual representations facilitate easier retention and recall of information. What are the cons of data visualization? Data visualization has advantages and disadvantages. One big challenge is misinterpretation. The visualization of data can be misleading if presented inappropriately. It can also lead to false conclusions ...

  19. Visual Representation

    Visual representation in the context of Computer Science refers to the analysis of images and the technologies used to create them. It is a multidisciplinary concept that encompasses philosophical, historical, and cultural aspects. Visual representations go beyond the mere depiction of natural objects and incorporate artistic conventions and ...

  20. Visual Representations: Unleashing the Power of Data Visualization

    Visual representations are the use of images to represent different types of data and ideas. They can improve understanding, decision making, communication, and learning in various contexts. Learn about different types of visual representations, such as spider diagrams, cluster diagrams, pie charts, and bar charts.

  21. 17 Important Data Visualization Techniques

    A waterfall chart is a visual representation that illustrates how a value changes as it's influenced by different factors, such as time. The main goal of this chart is to show the viewer how a value has grown or declined over a defined period. For example, waterfall charts are popular for showing spending or earnings over time.

  22. Full article: Using Visual Representations to Enhance Students

    3.1. Visual Representations Highlighting Certain Aspects of a Phenomenon. Several scholars suggest that different visual representations of the same content may communicate different aspects of the content and thereby facilitate different conceptions of it (Danielsson & Selander, Citation 2014; Kress, Citation 2010; Kress & Van Leeuwen, Citation 2006).

  23. Learning high-level visual representations from a child's perspective

    Visual representations are thought to develop from visual experience and inductive biases. Orhan and Lake show that modern machine learning algorithms can learn visual knowledge from a few hundred ...

  24. Universal dimensions of visual representation

    Universal representations are found at all levels of visual processing in deep networks, from low-level image properties in early layers to high-level semantics in late layers. Together, these findings suggest that machines and humans share a set of general-purpose visual representations that can be learned under highly varied conditions.

  25. Neural dynamics of visual working memory representation during ...

    In this work, Degutis and colleagues addressed an interesting issue related to the concurrent coding of sensory percepts and visual working memory contents in visual cortices. They used generalization analyses to test whether working memory representations change over time, diverge from sensory percepts, and vary across distraction conditions.

  26. Real-Time Dense Visual SLAM with Neural Factor Representation

    Developing a high-quality, real-time, dense visual SLAM system poses a significant challenge in the field of computer vision. NeRF introduces neural implicit representation, marking a notable advancement in visual SLAM research. However, existing neural implicit SLAM methods suffer from long runtimes and face challenges when modeling complex structures in scenes. In this paper, we propose a ...

  27. Generation (not production) improves the fidelity of visual ...

    Two experiments were conducted to evaluate whether production improves the probability and/or fidelity of visual representations; in either, participants completed a modified production task involving the naming of colored objects with a continuous color and location judgment at test. These experiments differed only with respect to whether a ...