U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Perspect Clin Res
  • v.14(1); Jan-Mar 2023
  • PMC10003579

Introduction to qualitative research methods – Part I

Shagufta bhangu.

Department of Global Health and Social Medicine, King's College London, London, United Kingdom

Fabien Provost

Carlo caduff.

Qualitative research methods are widely used in the social sciences and the humanities, but they can also complement quantitative approaches used in clinical research. In this article, we discuss the key features and contributions of qualitative research methods.

INTRODUCTION

Qualitative research methods refer to techniques of investigation that rely on nonstatistical and nonnumerical methods of data collection, analysis, and evidence production. Qualitative research techniques provide a lens for learning about nonquantifiable phenomena such as people's experiences, languages, histories, and cultures. In this article, we describe the strengths and role of qualitative research methods and how these can be employed in clinical research.

Although frequently employed in the social sciences and humanities, qualitative research methods can complement clinical research. These techniques can contribute to a better understanding of the social, cultural, political, and economic dimensions of health and illness. Social scientists and scholars in the humanities rely on a wide range of methods, including interviews, surveys, participant observation, focus groups, oral history, and archival research to examine both structural conditions and lived experience [ Figure 1 ]. Such research can not only provide robust and reliable data but can also humanize and add richness to our understanding of the ways in which people in different parts of the world perceive and experience illness and how they interact with medical institutions, systems, and therapeutics.

An external file that holds a picture, illustration, etc.
Object name is PCR-14-39-g001.jpg

Examples of qualitative research techniques

Qualitative research methods should not be seen as tools that can be applied independently of theory. It is important for these tools to be based on more than just method. In their research, social scientists and scholars in the humanities emphasize social theory. Departing from a reductionist psychological model of individual behavior that often blames people for their illness, social theory focuses on relations – disease happens not simply in people but between people. This type of theoretically informed and empirically grounded research thus examines not just patients but interactions between a wide range of actors (e.g., patients, family members, friends, neighbors, local politicians, medical practitioners at all levels, and from many systems of medicine, researchers, policymakers) to give voice to the lived experiences, motivations, and constraints of all those who are touched by disease.

PHILOSOPHICAL FOUNDATIONS OF QUALITATIVE RESEARCH METHODS

In identifying the factors that contribute to the occurrence and persistence of a phenomenon, it is paramount that we begin by asking the question: what do we know about this reality? How have we come to know this reality? These two processes, which we can refer to as the “what” question and the “how” question, are the two that all scientists (natural and social) grapple with in their research. We refer to these as the ontological and epistemological questions a research study must address. Together, they help us create a suitable methodology for any research study[ 1 ] [ Figure 2 ]. Therefore, as with quantitative methods, there must be a justifiable and logical method for understanding the world even for qualitative methods. By engaging with these two dimensions, the ontological and the epistemological, we open a path for learning that moves away from commonsensical understandings of the world, and the perpetuation of stereotypes and toward robust scientific knowledge production.

An external file that holds a picture, illustration, etc.
Object name is PCR-14-39-g002.jpg

Developing a research methodology

Every discipline has a distinct research philosophy and way of viewing the world and conducting research. Philosophers and historians of science have extensively studied how these divisions and specializations have emerged over centuries.[ 1 , 2 , 3 ] The most important distinction between quantitative and qualitative research techniques lies in the nature of the data they study and analyze. While the former focus on statistical, numerical, and quantitative aspects of phenomena and employ the same in data collection and analysis, qualitative techniques focus on humanistic, descriptive, and qualitative aspects of phenomena.[ 4 ]

For the findings of any research study to be reliable, they must employ the appropriate research techniques that are uniquely tailored to the phenomena under investigation. To do so, researchers must choose techniques based on their specific research questions and understand the strengths and limitations of the different tools available to them. Since clinical work lies at the intersection of both natural and social phenomena, it means that it must study both: biological and physiological phenomena (natural, quantitative, and objective phenomena) and behavioral and cultural phenomena (social, qualitative, and subjective phenomena). Therefore, clinical researchers can gain from both sets of techniques in their efforts to produce medical knowledge and bring forth scientifically informed change.

KEY FEATURES AND CONTRIBUTIONS OF QUALITATIVE RESEARCH METHODS

In this section, we discuss the key features and contributions of qualitative research methods [ Figure 3 ]. We describe the specific strengths and limitations of these techniques and discuss how they can be deployed in scientific investigations.

An external file that holds a picture, illustration, etc.
Object name is PCR-14-39-g003.jpg

Key features of qualitative research methods

One of the most important contributions of qualitative research methods is that they provide rigorous, theoretically sound, and rational techniques for the analysis of subjective, nebulous, and difficult-to-pin-down phenomena. We are aware, for example, of the role that social factors play in health care but find it hard to qualify and quantify these in our research studies. Often, we find researchers basing their arguments on “common sense,” developing research studies based on assumptions about the people that are studied. Such commonsensical assumptions are perhaps among the greatest impediments to knowledge production. For example, in trying to understand stigma, surveys often make assumptions about its reasons and frequently associate it with vague and general common sense notions of “fear” and “lack of information.” While these may be at work, to make such assumptions based on commonsensical understandings, and without conducting research inhibit us from exploring the multiple social factors that are at work under the guise of stigma.

In unpacking commonsensical understandings and researching experiences, relationships, and other phenomena, qualitative researchers are assisted by their methodological commitment to open-ended research. By open-ended research, we mean that these techniques take on an unbiased and exploratory approach in which learnings from the field and from research participants, are recorded and analyzed to learn about the world.[ 5 ] This orientation is made possible by qualitative research techniques that are particularly effective in learning about specific social, cultural, economic, and political milieus.

Second, qualitative research methods equip us in studying complex phenomena. Qualitative research methods provide scientific tools for exploring and identifying the numerous contributing factors to an occurrence. Rather than establishing one or the other factor as more important, qualitative methods are open-ended, inductive (ground-up), and empirical. They allow us to understand the object of our analysis from multiple vantage points and in its dispersion and caution against predetermined notions of the object of inquiry. They encourage researchers instead to discover a reality that is not yet given, fixed, and predetermined by the methods that are used and the hypotheses that underlie the study.

Once the multiple factors at work in a phenomenon have been identified, we can employ quantitative techniques and embark on processes of measurement, establish patterns and regularities, and analyze the causal and correlated factors at work through statistical techniques. For example, a doctor may observe that there is a high patient drop-out in treatment. Before carrying out a study which relies on quantitative techniques, qualitative research methods such as conversation analysis, interviews, surveys, or even focus group discussions may prove more effective in learning about all the factors that are contributing to patient default. After identifying the multiple, intersecting factors, quantitative techniques can be deployed to measure each of these factors through techniques such as correlational or regression analyses. Here, the use of quantitative techniques without identifying the diverse factors influencing patient decisions would be premature. Qualitative techniques thus have a key role to play in investigations of complex realities and in conducting rich exploratory studies while embracing rigorous and philosophically grounded methodologies.

Third, apart from subjective, nebulous, and complex phenomena, qualitative research techniques are also effective in making sense of irrational, illogical, and emotional phenomena. These play an important role in understanding logics at work among patients, their families, and societies. Qualitative research techniques are aided by their ability to shift focus away from the individual as a unit of analysis to the larger social, cultural, political, economic, and structural forces at work in health. As health-care practitioners and researchers focused on biological, physiological, disease and therapeutic processes, sociocultural, political, and economic conditions are often peripheral or ignored in day-to-day clinical work. However, it is within these latter processes that both health-care practices and patient lives are entrenched. Qualitative researchers are particularly adept at identifying the structural conditions such as the social, cultural, political, local, and economic conditions which contribute to health care and experiences of disease and illness.

For example, the decision to delay treatment by a patient may be understood as an irrational choice impacting his/her chances of survival, but the same may be a result of the patient treating their child's education as a financial priority over his/her own health. While this appears as an “emotional” choice, qualitative researchers try to understand the social and cultural factors that structure, inform, and justify such choices. Rather than assuming that it is an irrational choice, qualitative researchers try to understand the norms and logical grounds on which the patient is making this decision. By foregrounding such logics, stories, fears, and desires, qualitative research expands our analytic precision in learning about complex social worlds, recognizing reasons for medical successes and failures, and interrogating our assumptions about human behavior. These in turn can prove useful in arriving at conclusive, actionable findings which can inform institutional and public health policies and have a very important role to play in any change and transformation we may wish to bring to the societies in which we work.

Financial support and sponsorship

Conflicts of interest.

There are no conflicts of interest.

  • Find My Rep

You are here

Qualitative Psychology: A Practical Guide to Research Methods

Qualitative Psychology: A Practical Guide to Research Methods

  • Jonathan A. Smith - Birkbeck University of London, UK
  • Description

Undertaking qualitative research in psychology can seem like a daunting and complex process, especially when it comes to selecting the most appropriate approach for your project. This book provides a comprehensive and practical introduction to the key approaches in qualitative psychology research from a world-leading group of academics and researchers.

This Fourth Edition features timely updates that reflect the most current practice in the field.

Supplements

A step-by-step guide to understanding qualitative theories and methods!

Preview this book

For instructors.

Please select a format:

Select a Purchasing Option

  • Electronic Order Options VitalSource Amazon Kindle Google Play eBooks.com Kobo

The Academic Skills Handbook 2e book cover with text: from acing essays to using AI: enable independent learning

Related Products

Research Methods in Psychology

introduction to qualitative research methods in psychology

  • Health, Fitness & Dieting
  • Psychology & Counseling

Sorry, there was a problem.

Kindle app logo image

Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required .

Read instantly on your browser with Kindle for Web.

Using your mobile phone camera - scan the code below and download the Kindle app.

QR code to download the Kindle App

Image Unavailable

Introduction to Qualitative Research Methods in Psychology

  • To view this video download Flash Player

Follow the author

Dennis Howitt

Introduction to Qualitative Research Methods in Psychology Paperback – February 1, 2016

  • Print length 600 pages
  • Language English
  • Publisher Pearson Education
  • Publication date February 1, 2016
  • Dimensions 7.6 x 0.91 x 10.39 inches
  • ISBN-10 1292082992
  • ISBN-13 978-1292082998
  • See all details

Product details

  • Publisher ‏ : ‎ Pearson Education; 3rd edition (February 1, 2016)
  • Language ‏ : ‎ English
  • Paperback ‏ : ‎ 600 pages
  • ISBN-10 ‏ : ‎ 1292082992
  • ISBN-13 ‏ : ‎ 978-1292082998
  • Item Weight ‏ : ‎ 1.1 pounds
  • Dimensions ‏ : ‎ 7.6 x 0.91 x 10.39 inches
  • #4,238 in Medical Psychology Research
  • #4,672 in Popular Psychology Research

About the author

Dennis howitt.

Discover more of the author’s books, see similar authors, read author blogs and more

Customer reviews

  • 5 star 4 star 3 star 2 star 1 star 5 star 45% 30% 25% 0% 0% 45%
  • 5 star 4 star 3 star 2 star 1 star 4 star 45% 30% 25% 0% 0% 30%
  • 5 star 4 star 3 star 2 star 1 star 3 star 45% 30% 25% 0% 0% 25%
  • 5 star 4 star 3 star 2 star 1 star 2 star 45% 30% 25% 0% 0% 0%
  • 5 star 4 star 3 star 2 star 1 star 1 star 45% 30% 25% 0% 0% 0%

Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.

To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.

  • Sort reviews by Top reviews Most recent Top reviews

Top reviews from the United States

Top reviews from other countries.

introduction to qualitative research methods in psychology

Essentials of Descriptive-Interpretive Qualitative Research

Robert elliott and ladislav timulak.

This practical, step-by-step guide explains the most important principles for using a generic approach to descriptive-interpretive qualitative research. This book offers a no-nonsense, step-by-step approach to qualitative research in psychology and related fields, presenting principles for using a generic approach to descriptive-interpretive qualitative research. Based on more than 50 years of combined experience doing qualitative research on psychotherapy, the authors offer an overarching framework of best research practices common to a wide range of approaches.

750 First Street NE Washington, DC 20002 www.apa.org | [email protected]

Terms of Use | Privacy Statement ©2023 American Psychological Association. All Rights Reserved.

Essentials of Qualitative Methods Series

Essentials of narrative analysis, ruthellen josselson and phillip l. hammack.

In this book, Ruthellen Josselson and Phillip L. Hammack introduce readers to narrative analysis, a qualitative method that investigates how people make meaning of their lives and experiences in both social and cultural contexts. This method offers researchers a window into how individuals’ stories are shaped by the categories they inhabit, such as gender, race, class, and sexual identity, and it preserves the voice of the individual through a close textual analysis of their storytelling.

Essentials of Autoethnography

Christopher n. poulos.

In this step-by-step guide to writing autoethnography, Christopher Poulos provides a step-by-step guide to writing autoethnography, illustrating its essential features and practices with excerpts from his own and others’ work. 

Essentials of Critical-Constructivist Grounded Theory Research

Heidi m. levitt.

This practical, step-by-step guide explains how to use critical-constructivist grounded theory methods, a flexible approach to investigating topics within psychological, interpersonal, and sociocultural contexts. This flexible approach can aid researchers in investigating topics within psychological, interpersonal, and sociocultural contexts.

Essentials of Ideal-Type Analysis

Emily stapley, sally o'keeffe, and nicholas j. midgley.

This book explains how to conduct a qualitative research study using ideal-type analysis, an ideal method for studying individual cases in depth but also understanding patterns across multiple study participants. Essentials of Ideal-Type Analysis is the perfect guide for qualitative researchers who want to explore individual cases in depth, but also understand patterns across multiple study participants. Ideal-type analysis is a method for forming typologies from qualitative data.

Essentials of Discursive Psychology

Linda m. mcmullen.

In this step-by-step guide to conducting a research study, Linda McMullen describes the innovative ways in which discursive psychology analyzes language at both the micro and macro levels. Discursive psychologists reconceptualize talk and text as being situated in a social context, rather than thinking of talk as a route to our thoughts. For example, this approach could be used to study how people use arguments for and against the notion of human-induced climate change, or how they criticize each other in face-to-face encounters. Discursive psychologists reconceptualize talk and text as being situated in a social context, rather than thinking of talk as a route to our thoughts. For example, this approach could be used to study how people use arguments for and against the notion of human-induced climate change, or how they criticize each other in face-to-face encounters.

Essentials of Consensual Qualitative Research

Clara e. hill and sarah knox.

This concise, practical guide provides step-by-step advice and tips on how to plan and conduct a consensual qualitative research study. In this volume, Clara E. Hill and Sarah Knox describe consensual qualitative research (CQR), an inductive method characterized by open-ended interview questions, small samples, a reliance on words over numbers, the importance of context, an integration of multiple viewpoints (for example, the consensus of the research team and auditors), and a high emphasis on rigor and replicability.

Essentials of Existential Phenomenological Research

Scott d. churchill.

In this book, Scott D. Churchill introduces readers to existential phenomenological research, an approach that seeks an in-depth, embodied understanding of subjective human existence that reflects a person’s values, purposes, ideals, intentions, emotions, and relationships. This method helps researchers understand the lives and needs of others by helping identify and set aside theoretical and ideological prejudgments.

Essentials of Interpretative Phenomenological Analysis

Jonathan a. smith and isabella e. nizza.

Essentials of Interpretative Phenomenological Analysis is a step-by-step guide to a research method that investigates how people make sense of their lived experience in the context of their personal and social worlds. It is especially well-suited to exploring experiences perceived as highly significant, such as major life and relationship changes, health challenges, and other emotion-laden events.

Essentials of Critical Participatory Action Research

Michelle fine and maría elena torre.

In this book, Michelle Fine and María Elena Torre provide an introduction to critical participatory action research, an approach that reveals the everyday stories of struggle and survival of the persons being studied, combats social injustice, and leverages social science research for action. Critical participatory action research challenges the traditional and narrow ways in which research has been conducted and elevates the voices and perspectives of formerly marginalized groups.

Essentials of Thematic Analysis

Gareth terry and nikki hayfield.

In this book, Gareth Terry and Nikki Hayfield introduce readers to reflexive thematic analysis, a method of analyzing interview and focus group transcripts, qualitative survey responses, and other qualitative data. Central to this method is the recognition that we are all situated in a particular context, and that we see and speak from that position. This leads researchers to produce knowledge that represents situated truths, providing insights into people’s perspectives on a given topic.

Essentials of Conversation Analysis

Alexa hepburn and jonathan potter.

In this book, Alexa Hepburn and Jonathan Potter provide an introduction to conversation analysis, a qualitative approach that examines the actions and interactions that take place in face-to-face conversations, phone calls, texts, and various forms of media. The book is designed as a practical analytic handbook that provides a comprehensive introduction to the different elements and phases in analyzing conversation. The authors guide the reader through data collection, transcription, analysis, and writing papers, providing an invaluable starting point for researchers who wish to explore conversation analysis and get a foothold in its literature.

Introduction to Research Methods in Psychology

There are several different research methods in psychology , each of which can help researchers learn more about the way people think, feel, and behave. If you're a psychology student or just want to know the types of research in psychology, here are the main ones as well as how they work.

Three Main Types of Research in Psychology

stevecoleimages/Getty Images

Psychology research can usually be classified as one of three major types.

1. Causal or Experimental Research

When most people think of scientific experimentation, research on cause and effect is most often brought to mind. Experiments on causal relationships investigate the effect of one or more variables on one or more outcome variables. This type of research also determines if one variable causes another variable to occur or change.

An example of this type of research in psychology would be changing the length of a specific mental health treatment and measuring the effect on study participants.

2. Descriptive Research

Descriptive research seeks to depict what already exists in a group or population. Three types of psychology research utilizing this method are:

  • Case studies
  • Observational studies

An example of this psychology research method would be an opinion poll to determine which presidential candidate people plan to vote for in the next election. Descriptive studies don't try to measure the effect of a variable; they seek only to describe it.

3. Relational or Correlational Research

A study that investigates the connection between two or more variables is considered relational research. The variables compared are generally already present in the group or population.

For example, a study that looks at the proportion of males and females that would purchase either a classical CD or a jazz CD would be studying the relationship between gender and music preference.

Theory vs. Hypothesis in Psychology Research

People often confuse the terms theory and hypothesis or are not quite sure of the distinctions between the two concepts. If you're a psychology student, it's essential to understand what each term means, how they differ, and how they're used in psychology research.

A theory is a well-established principle that has been developed to explain some aspect of the natural world. A theory arises from repeated observation and testing and incorporates facts, laws, predictions, and tested hypotheses that are widely accepted.

A hypothesis is a specific, testable prediction about what you expect to happen in your study. For example, an experiment designed to look at the relationship between study habits and test anxiety might have a hypothesis that states, "We predict that students with better study habits will suffer less test anxiety." Unless your study is exploratory in nature, your hypothesis should always explain what you expect to happen during the course of your experiment or research.

While the terms are sometimes used interchangeably in everyday use, the difference between a theory and a hypothesis is important when studying experimental design.

Some other important distinctions to note include:

  • A theory predicts events in general terms, while a hypothesis makes a specific prediction about a specified set of circumstances.
  • A theory has been extensively tested and is generally accepted, while a hypothesis is a speculative guess that has yet to be tested.

The Effect of Time on Research Methods in Psychology

There are two types of time dimensions that can be used in designing a research study:

  • Cross-sectional research takes place at a single point in time. All tests, measures, or variables are administered to participants on one occasion. This type of research seeks to gather data on present conditions instead of looking at the effects of a variable over a period of time.
  • Longitudinal research is a study that takes place over a period of time. Data is first collected at the beginning of the study, and may then be gathered repeatedly throughout the length of the study. Some longitudinal studies may occur over a short period of time, such as a few days, while others may take place over a period of months, years, or even decades.

The effects of aging are often investigated using longitudinal research.

Causal Relationships Between Psychology Research Variables

What do we mean when we talk about a “relationship” between variables? In psychological research, we're referring to a connection between two or more factors that we can measure or systematically vary.

One of the most important distinctions to make when discussing the relationship between variables is the meaning of causation.

A causal relationship is when one variable causes a change in another variable. These types of relationships are investigated by experimental research to determine if changes in one variable actually result in changes in another variable.

Correlational Relationships Between Psychology Research Variables

A correlation is the measurement of the relationship between two variables. These variables already occur in the group or population and are not controlled by the experimenter.

  • A positive correlation is a direct relationship where, as the amount of one variable increases, the amount of a second variable also increases.
  • In a negative correlation , as the amount of one variable goes up, the levels of another variable go down.

In both types of correlation, there is no evidence or proof that changes in one variable cause changes in the other variable. A correlation simply indicates that there is a relationship between the two variables.

The most important concept is that correlation does not equal causation. Many popular media sources make the mistake of assuming that simply because two variables are related, a causal relationship exists.

Psychologists use descriptive, correlational, and experimental research designs to understand behavior . In:  Introduction to Psychology . Minneapolis, MN: University of Minnesota Libraries Publishing; 2010.

Caruana EJ, Roman M, Herandez-Sanchez J, Solli P. Longitudinal studies . Journal of Thoracic Disease. 2015;7(11):E537-E540. doi:10.3978/j.issn.2072-1439.2015.10.63

University of Berkeley. Science at multiple levels . Understanding Science 101 . Published 2012.

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

We’re fighting to restore access to 500,000+ books in court this week. Join us!

Internet Archive Audio

introduction to qualitative research methods in psychology

  • This Just In
  • Grateful Dead
  • Old Time Radio
  • 78 RPMs and Cylinder Recordings
  • Audio Books & Poetry
  • Computers, Technology and Science
  • Music, Arts & Culture
  • News & Public Affairs
  • Spirituality & Religion
  • Radio News Archive

introduction to qualitative research methods in psychology

  • Flickr Commons
  • Occupy Wall Street Flickr
  • NASA Images
  • Solar System Collection
  • Ames Research Center

introduction to qualitative research methods in psychology

  • All Software
  • Old School Emulation
  • MS-DOS Games
  • Historical Software
  • Classic PC Games
  • Software Library
  • Kodi Archive and Support File
  • Vintage Software
  • CD-ROM Software
  • CD-ROM Software Library
  • Software Sites
  • Tucows Software Library
  • Shareware CD-ROMs
  • Software Capsules Compilation
  • CD-ROM Images
  • ZX Spectrum
  • DOOM Level CD

introduction to qualitative research methods in psychology

  • Smithsonian Libraries
  • FEDLINK (US)
  • Lincoln Collection
  • American Libraries
  • Canadian Libraries
  • Universal Library
  • Project Gutenberg
  • Children's Library
  • Biodiversity Heritage Library
  • Books by Language
  • Additional Collections

introduction to qualitative research methods in psychology

  • Prelinger Archives
  • Democracy Now!
  • Occupy Wall Street
  • TV NSA Clip Library
  • Animation & Cartoons
  • Arts & Music
  • Computers & Technology
  • Cultural & Academic Films
  • Ephemeral Films
  • Sports Videos
  • Videogame Videos
  • Youth Media

Search the history of over 866 billion web pages on the Internet.

Mobile Apps

  • Wayback Machine (iOS)
  • Wayback Machine (Android)

Browser Extensions

Archive-it subscription.

  • Explore the Collections
  • Build Collections

Save Page Now

Capture a web page as it appears now for use as a trusted citation in the future.

Please enter a valid web address

  • Donate Donate icon An illustration of a heart shape

Introduction to qualitative methods in psychology

Bookreader item preview, share or embed this item, flag this item for.

  • Graphic Violence
  • Explicit Sexual Content
  • Hate Speech
  • Misinformation/Disinformation
  • Marketing/Phishing/Advertising
  • Misleading/Inaccurate/Missing Metadata

[WorldCat (this item)]

plus-circle Add Review comment Reviews

17 Previews

Better World Books

DOWNLOAD OPTIONS

No suitable files to display here.

PDF access not available for this item.

IN COLLECTIONS

Uploaded by station53.cebu on April 19, 2023

SIMILAR ITEMS (based on metadata)

American Psychological Association Logo

Qualitative Research for Intervention Development and Evaluation:

  • Conducting Research

Qualitative Research

October 2022

  • Slides (PDF, 1MB)

This content is disabled due to your privacy settings. To re-enable, please adjust your cookie preferences.

This webinar describes how qualitative evaluation can make a vital contribution to every stage of developing and optimizing an intervention.

This program does not offer CE credit.

Lucy Yardley, PhD

Lucy Yardley, PhD

University of Bristol and University of Southampton, UK

More in this series

Presents a handful of key strategies to maintain our mental well-being when trying to help others do the same.

October 2022 On Demand Webinar

Emphasizes the basics of classic grounded theory and shows how the original tenets of the method guide the procedures.

Provides practical guidance to help researchers carry out IPA studies which take advantage of the strengths and potential for flexibility within the approach.

September 2022 On Demand Webinar

Reviews four different strategies for integrating qualitative and quantitative data or results that invite a more instrumental role for a qualitative inquiry in contributing analytical insight.

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Prediction of infectious diseases using sentiment analysis on social media data

Roles Conceptualization, Data curation, Formal analysis, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

Affiliation Department of Industrial & Systems Engineering, Dongguk University, Jung-gu, Seoul, South Korea

Roles Conceptualization, Formal analysis, Funding acquisition, Project administration, Supervision, Validation, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

ORCID logo

  • Youngchul Song, 
  • Byungun Yoon

PLOS

  • Published: September 4, 2024
  • https://doi.org/10.1371/journal.pone.0309842
  • Reader Comments

Table 1

As the influence and risk of infectious diseases increase, efforts are being made to predict the number of confirmed infectious disease patients, but research involving the qualitative opinions of social media users is scarce. However, social data can change the psychology and behaviors of crowds through information dissemination, which can affect the spread of infectious diseases. Existing studies have used the number of confirmed cases and spatial data to predict the number of confirmed cases of infectious diseases. However, studies using opinions from social data that affect changes in human behavior in relation to the spread of infectious diseases are inadequate. Therefore, herein, we propose a new approach for sentiment analysis of social data by using opinion mining and to predict the number of confirmed cases of infectious diseases by using machine learning techniques. To build a sentiment dictionary specialized for predicting infectious diseases, we used Word2Vec to expand the existing sentiment dictionary and calculate the daily sentiment polarity by dividing it into positive and negative polarities from collected social data. Thereafter, we developed an algorithm to predict the number of confirmed infectious patients by using both positive and negative polarities with DNN, LSTM and GRU. The method proposed herein showed that the prediction results of the number of confirmed cases obtained using opinion mining were 1.12% and 3% better than those obtained without using opinion mining in LSTM and GRU model, and it is expected that social data will be used from a qualitative perspective for predicting the number of confirmed cases of infectious diseases.

Citation: Song Y, Yoon B (2024) Prediction of infectious diseases using sentiment analysis on social media data. PLoS ONE 19(9): e0309842. https://doi.org/10.1371/journal.pone.0309842

Editor: Shady Elbassuoni, American University of Beirut, LEBANON

Received: June 24, 2023; Accepted: August 20, 2024; Published: September 4, 2024

Copyright: © 2024 Song, Yoon. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: This work was supported by the National Research Foundation of Korea under Grant NRF-2021R1I1A2045721 and the funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Infectious diseases are diseases that can spread from person to person and have continued to occur throughout human history. Since the first epidemic was recorded around 430 B.C., many infectious diseases have had huge impacts on mankind, such as the Black Death, smallpox, Spanish flu, and cholera. The Black Death killed approximately a third of Europe’s population, and smallpox has killed more than a billion people thus far. These disease epidemics have had major impacts on the overall economic conditions of the countries in which they occurred. COVID-19, which started in December 2019, has influenced many countries and has changed the lives of modern humankind. The World Health Organization (WHO) declared COVID-19 a pandemic, which is the highest risk level for infectious diseases, in March 2020. The declaration served as a starting point for the establishment of quarantine systems in each country in recognition of the severity of the pandemic. As human and property damage due to the COVID-19 pandemic increase [ 1 ], the pandemic can be classified as a social disaster that has caused large-scale damage at the national level. To date, the need to present health strategies for predicting infectious diseases and minimizing damage has emerged in the world, such as the implementation of distance-by-step and COVID-19 support policies.

With the increasing risk and impact of infectious diseases, researchers are uncovering the necessary data and methods to accurately forecast the number of confirmed cases. From a data perspective, most studies have employed daily confirmed case data to make predictions using regression or machine learning (ML) techniques [ 2 – 4 ]. In addition, some studies have been carried out to forecast the number of confirmed cases by identifying additional elements that influence the transmission of infectious illnesses, such as spatial data [ 5 , 6 ]. However, there is a notable deficiency in integrating the subjective parts of social data, such as sentiment analysis, into models used for predicting infectious diseases. Thus, our study anticipates that including social data with these parameters will yield advantages.

This study begins with the assumption that the spread of infectious diseases is related to the sentiment polarity of social media. If a lot of negative sentiments are posted on social media, people will act more carefully, reducing the spread of the epidemic, and if the word "it’s okay" comes out a lot, people will be able to act casually and speed up the spread of the epidemic. When information pertaining to the risk of the coronavirus is spread through social networks, negative events can be transmitted through repeated exposure, resulting in acute stress [ 7 ]. The stress of this infectious disease causes people to change their behaviors to cope with it [ 8 ]. Since the start of COVID-19, people using social media data have been used to understand public psychological responses related to infectious diseases. In a survey, 93.3% of respondents stated that they avoid going to public places, 89.6% of the respondents reduced holiday-related activities, and more than 70% of the respondents stated said they take precautions to avoid infection [ 9 ]. Changes in people’s behaviors and the implementation of preventive measures in infected areas can affect the population density and quarantine, thereby curbing the spread of infectious diseases [ 10 – 12 ]. Therefore, it is considered meaningful to predict the number of confirmed infectious disease cases by analyzing people’s opinions pertaining to infectious diseases on social networks. This study aims to predict the number of confirmed cases of infectious diseases by using anonymized social media data containing collective public opinions on infectious diseases.

Considering this perspective, search volumes were used to predict the number of confirmed cases [ 13 ]. Sentiment analysis was conducted to explore the qualitative aspect of social data, and in [ 14 ], the number of future vaccinations was predicted on the basis of an setiment analysis of tweet data. To predict the number of confirmed infectious disease patients, daily numbers of confirmed cases and quantitative approaches to social and public data are being used. However, the above-referenced studies reflecting the qualitative characteristics of social data, which affect people’s psychology in terms of the number of confirmed infectious disease patients, are insufficient. Therefore, this study analyzes the qualitative characteristics of social data by means of opinion mining to check whether there exists a relationship between people’s sentiment states and prediction of the number of confirmed cases.

The motivation for this study lies in the observation that the social networking behavior of individuals can have an impact on the transmission of infectious diseases. Therefore, it is important to take this factor into account when forecasting the number of confirmed cases. This study utilizes data from social network services (SNS) to examine how the public responds to information about infectious diseases. It uses sentiment analysis, a method within the field of opinion mining, to analyze the sentiment expressed in these answers. The sentiment data that is retrieved is subsequently employed to forecast the quantity of confirmed cases of infectious diseases by utilizing machine learning models, with the objective of evaluating the accuracy of the predictions. The key findings of this study indicate that incorporating social media sentiment data into infectious disease prediction models results in better predictive performance compared to models that do not consider such data. This underscores the potential significance of social media data in improving the accuracy of infectious disease predictions. The study is structured as follows. Background explains the background theory of the contents covered in this study. Research Framework explains the research framework. The methods used herein are described in Results, and the results obtained using these methods are presented in Implications & Discussion. Finally, Conclusion presents the limitations and future directions of this research.

In this section, we review the extant literature on epidemic prediction, latest opinion mining processes, and ML models used for time-series prediction. First, we review how studies on infectious disease prediction have been conducted thus far, ML techniques used herein to predict the number of confirmed cases, and methods for opinion mining of social data.

Predicting infectious diseases

To predict infectious diseases, Kemack and McKendrick proposed an infectious disease spread model by devising an SIR (Susceptible, Infectious, Recovered) model that considers uninfected, infected, and recovered people [ 15 ]. Assuming that all populations have the above population configuration, a series of differential equations were used to indicate the state of the overall population in terms of the number of infections. In this model, the formula was completed using the infection rate and recovery rate for each infectious disease, and studies on infectious diseases are still being conducted by using the SIR model and the modified SEIR (Susceptible, Exposed, Infectious, Recovered) model [ 16 – 18 ].

Moreover, in recent studies, with the advancement of artificial intelligence (AI), the number of confirmed infectious disease patients has been predicted using the ML and deep learning (DL) approaches, which are unlike the conventional model. The AI-based approaches consider diverse variables that affect infection, rather than merely considering the infection rate and recovery rate, which represent the unique characteristics of existing infectious diseases. This improves the prediction ability in dynamic situations. The number of confirmed cases in the early stages of COVID-19 was predicted using the ARIMA and TP-SMM-AR self-regression time-series models, respectively [ 19 ]. The Holt’s time series model was also used for forecasting confirmed cases, relying solely on global confirmed case data to predict future cases [ 4 ]. The ARIMA, Holt, Splines, and TBATS models were also used to predict confirmed cases, deaths, and cured cases of And USA and Italy [ 20 ]. In another study, simulations were conducted to create confirmed scenarios, and the impact and transmission order of spread were studied [ 5 ]. In studies using ML and DL, DNN, LSTM and gated recurrent unit (GRU) were used to predict the number of confirmed infectious disease patients [ 2 , 6 , 18 , 21 – 24 ]. In addition, several ML techniques (K-nearest neighbor (KNN), support vector machine (SVM), and random forest (RF)) have been used to predict the number of people vaccinated [ 14 ]. The study exploited past pandemic case data to create a nonlinear autoregressive neural network time series model for forecasting confirmed cases. The studies primarily focused on making time series forecasts using solely confirmed case data, but also using other forms of data such as spatial data. While several studies have made predictions about the number of confirmed cases based on social data, they mostly relied on quantitative indicators obtained from social networks [ 13 ]. The models and data used in the previous studies are shown in Table 1 . Some of these studies argue that social information can be analyzed for predicting confirmed infectious disease patients.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0309842.t001

The best tools and data for predicting a dynamic epidemic such as COVID-19 are not specified. The data and tools that can be used to predict infectious diseases continue to be discovered to date. From the data perspective, a model that employs the results of opinion mining of social data can be tried.

Opinion mining

Opinion mining is a big data analysis technique for analyzing and processing vast amounts of social text data. At the system level, it calculates the sentiment polarity of text sentences and is also called sentiment analysis. Many people read other people’s writings, and their behaviors are influenced by these writings, which can be analyzed through sentiment analysis [ 25 , 26 ]. Sentiment analysis yielded significantly superior results on opinion-classification tasks than those of other text mining approaches [ 27 ]. Opinion mining can be used to identify people’s behavioral characteristics and expected phenomena through trend analysis and future prediction by using large numbers of opinions published on the Internet. The opinion mining of text data related to a specific topic facilitates the development of interesting approaches to the topic. An example is Obama’s successful 2012 election campaign, in which opinion mining was used, and analyses of buyers and users’ reviews by using opinion mining to gain insights in many customer analysis studies [ 28 – 30 ].

Usually, the process of opinion mining is as follows. First, the study targets are identified, and data with characteristics that the targets write or represent the target is collected and preprocessed. Thereafter, attributes such as opinions and attitudes, degrees of positivity/negativity, and satisfaction are used to select the characteristics to be extracted from the data. In the sentiment analysis conducted herein, positive/negative values are extracted, and to extract polarities, sentiment dictionaries and rule-based polarities are typically derived. Sentiment dictionary can analyze text data by using the words, rules, and polarities predefined in the sentiment dictionary to calculate positive/negative values depending on keyword appearance or rules [ 29 – 31 ]. Recently, a method of sentiment classification using ML and DL was studied [ 31 ].

Studies on the sentiment dictionaries used in sentiment analysis are being conducted. Because sentiment dictionaries use predefined values, it is important to build a sentiment dictionary that tailored to the corpus being analyzed. In previous studies, sentiment dictionaries were expanded successfully by using Word2Vec. Word2Vec is a word embedding technique that was introduced in 2013, and it uses a continuous bag of words (CBOW) learning method that predicts one blank by using multiple inputs and a skip-gram learning method that predicts surrounding blanks by using one input. The words learned in this manner have their respective vector values. In previous studies, the existing sentiment dictionaries were expanded using the cosine similarity of the Word2Vec results, and word dictionaries that were better optimized for the dataset to be analyzed were established [ 32 – 34 ]. In this study, sentiment analysis of social data is conducted by producing an extended sentiment dictionary by using Word2Vec in line with the changing characteristics of the existing sentiment dictionaries and social data.

Machine learning

ML is being used in many predictive studies. ML is mainly divided into guidance, semi-supervised, and unsupervised learning depending on the learning method. Although ML is a black box model, meaning that how the model arrives at its results is not known, it is generally used in many fields such as recognition, classification, and prediction. Moreover, many predictive studies are underway to demonstrate strengths in the field of time-series prediction, and RNN techniques specialized for time-series analysis by remembering existing data are available. In addition, LSTM and GRU techniques have been derived from RNN. These models continue to be used for predicting infectious diseases. The present study aims to predict the number of confirmed infectious disease patients by using a deep neural network (DNN), a basic machine learning technique, in conjunction with LSTM and GRU specialized for time-series analysis.

A DNN is an artificial neural network that calculates outputs by multiplying weights across multiple hidden layers [ 35 ]. The DNN structure, illustrated in Fig 1 , consists of an input layer, a hidden layer, and an output layer. These layers are connected to each other, and values are transformed and moved by using weights and activation functions. Each weight is modified by learning, and the network is created using the modified weights. DNNs are mainly used in supervised learning to solve classification and regression problems. When the predetermined learning process is completed, the result value of the new input value is derived using the final calculated weight. This DNN structure is also used for various tasks by connecting it to other ML techniques.

thumbnail

https://doi.org/10.1371/journal.pone.0309842.g001

LSTM is a circular neural network technique that was developed to overcome the limitations of RNN, which exhibits reduced learning ability owing to weak influence of past information [ 36 ]. The structure of LSTM is depicted in Fig 2 , and LSTM learns by controlling the memory or by forgetting past information. In the figure, the flow of Ct refers to the cell state of the previous data; new information and previous ht are used to decide whether to preserve or discard information; input gate is added and multiplied using the sigmoid and tanh functions; and, finally, cell state is updated. In the output gate, ht is calculated using the sigmoid and tanh functions, which represents the short-term memory status and is identical to the value calculated in the corresponding cell and flowing out to the output. In conclusion, the result value is learned and derived using long-term memory, short-term memory, and new input information. LSTM with these characteristics is widely used for time-series analysis, and specifically, it is useful for time-series analysis involving volatility. The LSTM model has also been used from a time-series perspective in extant studies on predicting confirmed infectious disease cases [ 2 , 6 , 21 ].

thumbnail

Structures of LSTM (left) and GRU (right).

https://doi.org/10.1371/journal.pone.0309842.g002

The GRU model evolved from LSTM, and it simplifies LSTM to reduce learning time, thus resulting in similar performance but faster data learning [ 37 , 38 ]. Unlike LSTM, GRU has a reset gate and an update gate, where the reset gate calculates the degree of reflection of the previous state (ht), and its role is similar to that of the forget gate. Meanwhile, the update gate determines the rate at which to reflect the previous state (ht) and the current input state ( Fig 2 ). As with LSTM, the GRU model, too, has been used extensively for time-series analysis in recent years, and it has been used in studies on predicting the number of confirmed cases of infectious diseases [ 22 , 24 ].

Research framework

Overall framework.

In this study, data were obtained from Twitter, a social networking service (SNS) where one can freely write their thoughts, Pre-processing and part-of-speech (POS) tagging of these data were performed, and the positive/negative polarities of each tweet were derived daily using a sentiment dictionary. The number of confirmed cases was predicted through ML shown in Fig 3 .

thumbnail

https://doi.org/10.1371/journal.pone.0309842.g003

Data collection and preprocessing

Among various SNS data, the tweet data of Twitter ( https://twitter.com/ ) can be accessed by everyone. Moreover, people can freely express their thoughts on Twitter, and the amount of data on Twitter is adequate for analysis. Owing to these characteristics, this study preemptively found Twitter data to be suitable for use in this study. Tweet data containing keywords related to COVID-19 were extracted from Twitter. Tweet data of 30 months after the first confirmed case of COVID-19 were collected using Python by the collection and analysis method complying with the terms and conditions for the source of the data. The number of COVID-19 confirmed patients used in the study is collected at the Seoul Open Data Plaza ( https://data.seoul.go.kr/ ). Duplicate data were deleted from the collected social data, and news data and promotional posts that did not contain user opinions were excluded. Thereafter, data in Korean only were created through preprocessing, and POS tagging was performed using Kkma.

Opinion mining on social data

This study assumes that the information from social data can influence the spread of infectious diseases and that utilizing this data can lead to more accurate predictions of the number of confirmed cases. Therefore, the proposed methodology employs sentiment analysis of opinion mining to extract meaningful information from the social data. The opinion mining method used herein calculates the polarity of a sentence in terms of the average of polarities from the word perspective to determine the polarity of each text data. To start this process, it is necessary to define a sentiment dictionary to set the polarity of each word. Although a Korean-language sentiment dictionary is available, it has been expanded to match the characteristics of the SNS data collected using the Korean Sentiment Analysis Corpus (KOSAC) Korean sentiment dictionary [ 39 ], which, according to previous studies [ 27 , 40 ], provides better results if a sentiment dictionary is written considering the characteristics of the each document.

In previous studies, the cosine similarity of Word2Vec was used to successfully expand the sentiment dictionary [ 32 – 34 ]. Therefore, in this study, the expansion of the sentiment dictionary using Word2Vec is confirmed to be necessary for better sentiment analysis. Polarities are determined based on the cosine similarity of words corresponding to positive/negative words by using the Word2Vec method. In case of the existing KOSAC Korean sentiment dictionary, each word has a label value for positive/negative as +1 for positive, -1 for negative, and 0 for neutral.

The Word2Vec model learned the collected 1.08 million text data. Between the CBOW and Skip-Bow learning models, we used the Skip-Bow model, which learns more data. This model was trained by setting the minimum number of appearances to 100, which was 0.01% of the amount of text data collected. By using the produced sentiment dictionary, positive/negative words and words with high cosine similarity were extracted by inputting words of sentiment dictionary into the Word2Vec model. Cosine similarity is calculated as shown in Eq1. Studies have demonstrated that a sentiment dictionary can be established successfully when the similarity is 0.5 or higher [ 34 ], and in this study, this study expanded the sentiment dictionary by considering a word an equivalent word with the same positive/negative label when the similarity of the word was 0.8 or higher to ensure high reliability ( Fig 4 ). If a particular word originated from both positive/negative labels, the mean of cosine similarity was checked to provide a more similar positive/negative label.

introduction to qualitative research methods in psychology

https://doi.org/10.1371/journal.pone.0309842.g004

The average polarity of each tweet was calculated by substituting the text data with adjectives, verbs, adverbs, nouns, and radix polarities in the produced sentiment dictionary ( Table 2 ). Thereafter, the polarities of the daily text data were collected, and the daily polarity was calculated and used as the input to the model for predicting the number of confirmed patients. The formula for calculating the sentiment value of each tweet is given in Eq2. In Eq2, t represents each tweet, x represents the number of words in t that have sentiment polarities, and w represents the word in set x.

introduction to qualitative research methods in psychology

https://doi.org/10.1371/journal.pone.0309842.t002

Predicting number of confirmed cases

Based on successful cases of predicting the number of confirmed cases using machine learning, this study also employs models from the machine learning family (DNN, LSTM, GRU) that have demonstrated high effectiveness [ 2 , 6 , 18 , 21 – 24 ]. In this part, predictions with and without daily positive/negative polarities obtained from opinion mining are compared. First, predictions were generated using the DNN, LSTM, and GRU models by using only the number of confirmed patients per day, and predictions were generated under the same conditions by including the positive/negative polarities. To compare the prediction accuracy in this process, the Mean Absolute Percentage Error (MAPE), which calculates the ratio of the difference between the predicted value and the actual value according to the characteristics of the number of confirmed patients with a large range, was used. To predict the number of confirmed cases of infectious diseases, the DNN, LSTM, and GRU ML models consisting of two hidden layers, as shown in Fig 5 , were applied to finally predict linear values. The data used for prediction were the daily positive/negative polarities extracted in opinion mining on social data part and the data on the number of confirmed patients in Korea. These data were divided in a 7:3 ratio into the learning dataset and verification dataset, and the prediction model was applied to these two datasets. An example of input data is depicted in the blue box in ( Fig 6 ). After predicting the number of confirmed cases on the next day by using the daily number of confirmed cases and positive/negative polarities of n-days before the forecast date, the MAPE values of the actual and predicted values were calculated to measure the prediction accuracy.

thumbnail

https://doi.org/10.1371/journal.pone.0309842.g005

thumbnail

https://doi.org/10.1371/journal.pone.0309842.g006

Before executing the final prediction algorithm, the number of confirmed cases and the daily polarity calculated in opinion mining on social data part, were applied to the model as input values, and the optimal model and duration were confirmed by conducting several experiments. Subsequently, in this study, the predicted number of confirmed cases on the next day obtained by using only the data of the number of confirmed cases and the prediction results obtained using daily polarities are compared to confirm the prediction accuracy ( Fig 5 ). The input data are used as daily polarities, and the number of confirmed cases of n-days before the forecast date and MAPE values are calculated by comparing the predicted and actual values of the next day to confirm the results.

Search terms were collected using a total of five words, including four Corona-related words (“Corona,” “COVID-19,” “COVID-19 confirmed and “COVID-19 Vaccine” based on Google Trends) and “epidemic.” Prior to collecting data for machine learning techniques, this study considered whether a small amount of data could be used. To measure the daily number of confirmed cases of infectious diseases, data from when the epidemic is active should be used, because there were numbers of units that did not fit perfectly in the category of big data. However, recent papers predicting the number of confirmed cases of infectious diseases using machine learning have also been confirmed using a small amount of data like Table 3 . Therefore, although limited in this study, the prediction was conducted using 756 points of data. In addition, fields that require actual infectious disease prediction will also require rapid response, and the model proposed in this study reflects situations in which they are forced to use less data.

thumbnail

https://doi.org/10.1371/journal.pone.0309842.t003

The data-collecting period spanned from February 24, 2020, to March 21, 2022. A total of 1,080,000 data points were obtained after undergoing preprocessing procedures to exclude duplicate or missing information, as well as advertisement messages from the social media site (Twitter). The collected data include both the date and the corresponding text generated. A total of 1,423 data points were gathered on a daily basis, with a standard deviation of 318.23. Furthermore, data regarding the number of confirmed COVID-19 cases in Korea within the aforementioned time frame was also gathered. POS tagging of these text data was performed using a Kkma POS tagger, and finally, the data were produced, as summarized in Table 4 .

thumbnail

https://doi.org/10.1371/journal.pone.0309842.t004

To match the data collected in the KOSAC Korean sentiment dictionary and the social data, a sentiment dictionary was produced using the Word2Vec technique. Before Word2Vec was used, it learned the entire POS-tagged text data summarized in Table 1 .The minimum number of appearances was 100, which accounted for 0.01% of the total sentence data, and the Skip-Bow model was used as the learning method. As the input data, words from the KOSAC sentiment dictionary were inserted, and words with a cosine similarity of 0.8 or higher, derived through Word2Vec, were added to the new sentiment dictionary because they were considered to have the same positive/negative sentiment polarities. To account for the morphemes of the words, an sentiment dictionary comprising nouns, verbs, adverbs, and adjectives was collated, and a total of 3,070 sentiment words and values were finally extracted ( Table 5 ).

thumbnail

https://doi.org/10.1371/journal.pone.0309842.t005

The average of polarities was calculated for each text data collected using the produced sentiment dictionary. The decision was made considering the two methods used to calculate the daily polarity values from the text data polarity values. As illustrated in Fig 7 , Case 1 has positive and negative sentiment polarities from -1 to 1 on each date, and Case 2 uses two input data that are calculated daily by separating texts with positive polarities from those with negative polarities.

  • Case 1: Using the average of daily polarities
  • Case 2: Using the mean of each positive and negative daily polarities

thumbnail

https://doi.org/10.1371/journal.pone.0309842.g007

The final calculation method was the one that yielded the better prediction results in terms of the number of confirmed infectious disease patients. As a comparative index of the final prediction result, the MAPE values of the predicted and measured values were used, and the results are summarized in Table 6 . In terms of minimum value, the MAPE values were 11.57% in Case 1 and 10.09% in Case 2. Therefore, as indicated by Case 2 in Fig 7 , the method of calculating the polarity by dividing it into positive and negative was adopted. Table 7 summarizes the polarity of each text data, and Table 8 is a normalized table containing the average values obtained by dividing the daily polarity by positive and negative polarities. The daily polarity represents the degree of positive/negative COVID-19-related opinions of users in the text data obtained from SNS on the corresponding date, and it is finally input into the prediction model in the form of Table 8 .

thumbnail

https://doi.org/10.1371/journal.pone.0309842.t006

thumbnail

https://doi.org/10.1371/journal.pone.0309842.t007

thumbnail

https://doi.org/10.1371/journal.pone.0309842.t008

Predicting the number of confirmed cases

In this section, the number of confirmed cases is predicted using DNN, LSTM, and GRU, which are the machine learning models proposed in the research framework. The input values of the model include the number of confirmed cases in Korea between February 24, 2020, and March 21, 2022, which is the period when the number of confirmed cases appeared steadily in Korea; number of confirmed cases; and positive/negative polarities derived through opinion mining. The data were divided in a ratio of 7:3 to obtain the training and verification datasets, and learning was performed. As for the activation function of DNN, the RELU function with the best results was applied after comparing the experimental results of the sigmoid, RELU, and softmax models; the epoch of each model was set to 500, and learning was performed. The results were confirmed using the Adam optimizer, which yielded the best experimental results among the candidate optimizers, namely Root Mean Square propagation(RMSP), Stochastic Gradient Descent(SGD), Adaptive Moment Estimation(Adam), and Nesterov Accelerated Gradient Adam(Nadam).

The prediction results were organized, as shown in Table 9 , depending on whether the daily polarities were included and by considering the scope of data application. Depending on the presence or absence of polarities, the daily polarity data were divided into applied and notapplied. The prediction inclusion period was used to set the number of data matches required to generate predictions based on the prediction date. For example, if the prediction inclusion period was 14, the value of the prediction point was calculated using the data of 14 days, including the day before the prediction point. In this study, 7 days, the average incubation period expected by the Korea Centers for Disease Control and Prevention; 14 days, the longest officially announced incubation period; and 28 days, the period considering the impact of the previous incubation period due to the nature of the epidemic were used. The MAPE, MSE, RMSE, MAE results summarized in Table 9 were expressed as the average of 30 prediction results. The number of confirmed cases of infectious diseases has an exponential characteristic. Therefore, if the results are presented using only error figures such as MSE, RMSE, and MAE, the MAPE value that can be expressed as a ratio of errors is presented in this study because a model that performs prediction well may be judged to be better when the number of confirmed cases is relatively large.

thumbnail

https://doi.org/10.1371/journal.pone.0309842.t009

The study found that the GRU model achieved the lowest error rate value of 10.093%, including polarities, for a 14-day period. This aligns with the expected incubation period for COVID-19 (1–14 days) announced by the Korea Centers for Disease Control and Prevention. Furthermore, for DNN, the data without polarities exhibited greater predictive power ( Fig 8 ). Conversely, the RNN family models—LSTM and GRU—achieved satisfactory prediction outcomes when utilizing data that had polarities (Figs 9 and 10 ). A t-test was performed to compare the accuracy of 100 learning/test runs using LSTM and GRU models on 14-day data. The comparison was done using both data sets, with and without sentiment polarities. The t-tests resulted in p-values of 1.28e-09 for LSTM and 5.92e-153 for GRU. These values indicate that the results obtained from data that included polarities were statistically significantly superior than those obtained from data that excluded polarities. The analysis and evaluation of 100 learning/test runs highlight the strength and reliability of the findings.

thumbnail

DNN results obtained using 14-day data with polarity excluded (left) and included (right).

https://doi.org/10.1371/journal.pone.0309842.g008

thumbnail

LSTM results obtained using 14-day data with polarity excluded (left) and included (right).

https://doi.org/10.1371/journal.pone.0309842.g009

thumbnail

GRU results obtained using 14-day data with polarity excluded (left) and included (right).

https://doi.org/10.1371/journal.pone.0309842.g010

In addition to the t-test, a binomial test was performed to verify the statistical significance of the win/loss information for each trial. This is crucial because the proposed strategy might "lose" more comparisons but still have a lower average, or alternatively, "win" more comparisons in both the 14 days and 28 days settings but have a lower average in the 28 days setting. For the LSTM results over a 14 days period, the model that included polarities won 82 out of 100 comparisons. This result allowed to reject the null hypothesis that the win probabilities of the two models are equal, with a p-value of 6.14e-11. In the 14 days GRU comparison, which demonstrated the best predictive performance, the model including polarities won all 100 comparisons. These results strongly support that the proposed feature is more significant when it comes to the actual model training. This analysis confirms the effectiveness of the proposed strategy and highlights the importance of incorporating polarities into the model for better predictive performance.

This study also compares its results with other research methods. This work selects the ARIMA model, which utilizes machine learning to make predictions based on time series data [ 19 , 20 ]. Prior research has indicated that the ARIMA model outperforms the Holt, Splines, and TBATS models in predicting the number of confirmed cases on weekly intervals [ 20 ]. Hence, in order to assess performance, this study used the approach of forecasting the weekly count of confirmed cases and thereafter comparing the results. The comparison is made by displaying the MAPE values at weekly intervals starting from the initial prediction date [ 20 ]. The ARIMA model, which demonstrated superior accuracy in prior research, is being compared by the results obtained for situations with and without sentiment polarity. The model’s performance is adequate for forecasting the number of COVID-19 cases in Korea and was evaluated using the ARIMA (2,1,3) parameters suggested in [ 41 ]. Table 10 shows the MAPE values for these models during a six-week period starting from the prediction’s initial date. It also presents a comparison of their average values over the entire period. On average, the GRU model outperformed the ARIMA model in terms of MAPE performance, as indicated by the comparison results. In addition, while evaluating the average performance over the entire period, it was found that the GRU model outperformed the ARIMA model (Table 10 ). This study examines the impact of incorporating sentiment polarity on the quality of results. The trials utilizing the ARIMA model also indicate that the results, which incorporate the sentiment polarities, show some improvement. Furthermore, with the exception of the data from Period1, the study consistently validated that the models incorporating GRU and sentiment polarity had superior performance on average. This comparison highlights the significance of taking sentiment polarity into account when making predictions. It demonstrates that the findings obtained by including sentiment polarity had reduced MAPE values, even when it is used in the method of previous studies.

thumbnail

https://doi.org/10.1371/journal.pone.0309842.t010

Implications & discussion

The results of this study indicate whether the qualitative opinions in social data were considered when predicting the number of confirmed infectious disease patients. In addition, the prediction results obtained using various ML models (DNN, LSTM, GRU) are presented. Finally, the best predictive power was obtained when the GRU model was applied to the data that included polarities. Moreover, all RNN family models yielded statistically significantly better predictive results when using the data that included polarities. According to the LSTM and GRU prediction graphs in Figs 9 and 10 obtained using the data that included or excluded polarities, respectively, the predicted values are smooth when the polarities are excluded, but they have trailing graphs. Trailing graphs indicate low efficiency in real environments. Trailing graph responds late to the forecast flow because it is similar to the amount of data immediately preceding it. This can make it difficult to utilize the prediction results. By contrast, when the polarity is included, the graph is relative rough, but it seems to yield a predictive value that is appropriate for the timing. In addition to the MAPE set as the error value, the characteristics of the graph showed more remarkable results. In addition, the results were compared using the ARIMA model among previous research methods, and it was also confirmed that the model with GRU and sentimental polarity showed the best performance. Therefore, according to our study, better predictive are generated by considering the qualitative characteristics of social data in the prediction process. Additionally, in this study, a model was developed to reduce errors in the predicted and measured values of the number of confirmed cases, but it is expected that it will be developed as a more effective model if a model for rise and fall is presented for future purposes.

During the research process, two methods for calculating the daily polarity were proposed to predict the number of confirmed patients. The first method involved viewing all polarities as an average for each day, and the second method involved calculating the positive and negative polarities separately. As a result of the experiment, the average was obtained by dividing the positive and negative polarities, and when this method was applied to the prediction model, the prediction accuracy increased. The reason for the application of this method was that if multiple data were to be combined using the central limit theorem, the value would remain at a certain level, which would reduce the data dimension that could be expressed for each degree. Moreover, the results were superior when multiple data were included. In future studies on opinion mining and sentiment analysis, it will be possible to consider the method of using polarities by dividing positive and negative properties. In this study, when applying opinion mining to social data, only the method that considered the frequency of words in the existing sentiment dictionaries was used. In future research, this part will be supplemented to reflect advanced research on opinion mining methods. Recently, with the advancement of NLP in the opinion mining and sentiment analysis domains, many studies have been conducted. For example, studies that measure polarities of social data through the use of Transformers, including BERT, are actively underway, and if these tools can analyze polarities from various angles and reflect them, more useful and improved research results can be expected.

It was also meaningful to confirm the data period for predicting the number of confirmed cases in this study. The incubation period proposed by the Korea Centers for Disease Control and Prevention was considered to determine the period for including previous data as the input data before generating predictions using the ML model. The Korea Centers for Disease Control and Prevention announced that the average and maximum incubation periods were 7 days and 14 days, respectively. Therefore, this study was conducted for up to 28 days in consideration of the average incubation period of 7 days, longest incubation period of 14 days, and the 14 day period before the infected person was affected. According to the study results, the LSTM and GRU models yielded the best predictions when using 14day data that included polarities. The meaning of 14 days overlaps with the meaning of 2 times the average incubation period of 7 days suggested by the Korea Centers for Disease Control and Prevention and the maximum incubation period of 14 days. These results suggest that further analysis is necessary to determine the significance of the relationship between the incubation period announced by the Centers for Disease Control and Prevention and the use of social data to predict infectious diseases.

In the social data covered intensively in this study, new words or new expressions appear over time owing to the characteristics of language. In this study, this study proposed a method for including these expressions in sentiment analysis by developing an existing sentiment dictionary using Word2Vec. This method can automatically collect data that reflect the changing characteristics of SNS language without needing a qualitative process involving experts. In addition, it is possible to update the sentiment dictionary to reflect the newly emerging language trends and conduct sentiment analysis automatically. This feature ensures that the proposed model can be updated and applied at a certain point in time in the future. In order to utilize the results of this study, users can collect social data containing the degree of positivity to infectious diseases and use the extracted sentiment polarities of each content as a parameter for infectious disease prediction algorithms. In order to extract the sentiment polarity of each data, an sentiment dictionary must be established considering the characteristics of each language, and it is expected that analysis can be performed according to the characteristics of each country and epidemic spread. Predicting the number of confirmed cases of the pandemic will keep individuals alert, enable policymakers to pre-imagine health-related resources and personnel plan, and allow them to move toward a quick end to the pandemic, taking into account when planning a response to preventive measures to prevent it.

Notwithstanding these contributions, it should be noted that the findings being given are applicable only to particular places and circumstances. This study employed qualitative aspects of social data to forecast the number of confirmed instances of infectious illnesses. To ensure accurate utilization, it is important to account for the amount of people engaged in social data and the regional influence of such data. Furthermore, it is important to incorporate variations in language and grammar structures, disparities in social media usage and recognition patterns, as well as cultural norms and frequency of social media engagement across different nations, since these factors can significantly impact social media dynamics and user behavior. This article presents the findings of a research endeavor that involved the development and validation of an epidemic prediction model. The model was constructed by leveraging opinion mining outcomes derived from social data in Korea, a country characterized by dense population and extensive utilization of social network services. In the future, it will be necessary to construct models using opinion mining in various languages and nations.

This study aimed to propose a methodology for predicting the number of confirmed cases of infectious diseases by using opinion mining, which allows for the inclusion of qualitative opinions from social data in epidemic prediction. To this end, about 1 million SNS Twitter data were collected, and the Word2Vec model was learned using the collected social data to expand the existing sentiment dictionary for sentiment analysis. After that, a model was developed to predict the number of confirmed COVID-19 patients by using the calculated sentiment polarities, and predictions were generated. As a result, when predicting using sentiment polarities, the predictive performances of LSTM and GRU increased by 1.12% and 3%, respectively, compared to those when sentiment polarities were not used, and these differences were statistically significant. These results also confirmed the differences through a binomial test for the win/loss of the two model outcomes, and the results were compared using the periodical model comparison method utilized in previous studies. Despite these comparisons, it was shown that using sentiment polarities from social data for prediction is more significant. Additionally, these results indicate that it is possible to predict the number of confirmed cases by continuously monitoring both the number of confirmed cases and the sentiment state.

Through continuous monitoring of social sentiment states, it is possible to develop and adjust policies that reflect changes in public perception. Policymakers can evaluate the effectiveness of policies based on real-time sentiment data and swiftly adjust them as needed to meet public demands. In addition, it is possible to prevent the spread of misinformation and gain public trust. Based on the results of social media sentiment analysis, tailored messages can be crafted and distributed to the public, and communication strategies can be established to promptly counteract misinformation.

However, the study has limitations in terms of the data and models used therein. In the collection of social data, the data of other media and news cannot be included by analyzing only Twitter data. In case of the model, the comparative analysis results presented herein consider only the DNN, LSTM, and GRU ML models. In addition, as an opinion mining method, only sentiment analysis was used considering the appearance frequencies of positive/negative keywords in the sentiment dictionary.

In the future, studies should be to collect large volumes of high quality social data, conduct experiments using predictive models that are based on methods different from those used in this study, and present a model that predicts a week or longer ahead to produce practical results. In addition to sentiment analysis, the opinion methodology can be confirmed through future tasks to derive results by using various recently emerged models, including DL.

This study started with the aim of improving the prediction of the number of confirmed patients by incorporating sentiment polarities from social data. The results confirmed that including polarity allowed for statistically significantly higher accuracy in predictions compared to excluding polarity. While many previous studies relied solely on quantitative social data, this study highlighted the importance of qualitative opinions from social data in predicting the number of confirmed infectious disease patients. Therefore, it underscores the need for further research using social data and opinion mining in the field of infectious disease prediction.

Supporting information

S1 data. collected social data1..

https://doi.org/10.1371/journal.pone.0309842.s001

S1 File. Collected social data2 and Korea’s daily number of confirmed cases.

https://doi.org/10.1371/journal.pone.0309842.s002

  • View Article
  • Google Scholar
  • PubMed/NCBI
  • 25. Liu B., Sentiment analysis and opinion mining. 2022: Springer Nature.
  • 28. Gräbner D., et al., Classification of customer reviews based on sentiment analysis, in Information and communication technologies in tourism 2012. 2012, Springer. p. 460–470.
  • 30. Singla Z., Randhawa S., and Jain S. Sentiment analysis of customer product reviews using machine learning. in 2017 international conference on intelligent computing and control (I2C2). 2017. IEEE.
  • 37. Cho K., et al., Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
  • 38. Chung J., et al., Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
  • 39. Jang H., Kim M., and Shin H. KOSAC: A full-fledged Korean sentiment analysis corpus. in Proceedings of the 27th Pacific Asia Conference on Language, Information and Computation. 2013. Waseda University.

IMAGES

  1. Introduction to Qualitative Research Methods in Psychology (3rd Edition

    introduction to qualitative research methods in psychology

  2. 9781526402783: Doing Qualitative Research in Psychology: A Practical

    introduction to qualitative research methods in psychology

  3. (PDF) Qualitative Methods in Psychology: A research guide

    introduction to qualitative research methods in psychology

  4. Understanding Qualitative Research: An In-Depth Study Guide

    introduction to qualitative research methods in psychology

  5. Doing Qualitative Research In Psychology A Practical Guide

    introduction to qualitative research methods in psychology

  6. 9780335241514: Qualitative research methods in psychology: combining

    introduction to qualitative research methods in psychology

VIDEO

  1. Research Approaches

  2. An introduction to Qualitative research for Psychology

  3. Qualitative Research

  4. 14. Introduction to Methods of Qualitative Research Phenomenological Research

  5. PSY 2120: Why study research methods in psychology?

  6. Introduction to Qualitative Research Methods, Part II: Research Tools and Analysis Preparation

COMMENTS

  1. Introduction to Qualitative Research Methods in Psychology

    Pearson UK, Feb 27, 2019 - Psychology - 560 pages. Now in its 4th Edition, Introduction to Qualitative Research Methods in Psychology by Dennis Howitt provides a comprehensive, practical and up to date coverage of the area. With a clear and straightforward style, the book introduces qualitative research from data collection to analysis.

  2. Introduction to Qualitative Research Methods in Psychology

    Now in its fourth edition, Introduction to Qualitative Research Methods in Psychology by Dennis Howitt provides a comprehensive, practical and up to date coverage of the area. With a clear and straightforward style, the book introduces qualitative research from data collection to analysis. Examples of real research and practical guidance for ...

  3. Introduction to qualitative research methods

    Qualitative research methods refer to techniques of investigation that rely on nonstatistical and nonnumerical methods of data collection, analysis, and evidence production. Qualitative research techniques provide a lens for learning about nonquantifiable phenomena such as people's experiences, languages, histories, and cultures.

  4. Introduction to Qualitative Research Methods in Psychology: Putting

    Introduction to Qualitative Research Methods in Psychology by Dennis Howitt provides a comprehensive, practical and up to date coverage of the area. For the fourth edition, the text has been extensively revised for easier reading and comprehension. With a clear and straightforward style, the book introduces qualitative research from data collection to analysis. Examples of real research and ...

  5. Introduction to Qualitative Research Methods in Psychology eBook PDF

    Dennis Howitt's Introduction to Qualitative Methods in Psychology is better than ever. This trusted and valuable student resources provides clear explanations and examples that take the reader through qualitative research from data collection to analysis.

  6. The SAGE Handbook of Qualitative Research in Psychology

    The Second Edition of The SAGE Handbook of Qualitative Research in Psychology provides comprehensive coverage of the qualitative methods, strategies, and research issues in psychology. Qualitative research in psychology has been transformed since the first edition's publication. Responding to this evolving field, existing chapters have been updated while three new chapters have been added on ...

  7. The SAGE Handbook of Qualitative Research in Psychology

    The SAGE Handbook of Qualitative Research in Psychology provides comprehensive coverage of the qualitative methods, strategies and research issues in psychology, combining 'how-to-do-it' summaries with an examination of historical and theoretical foundations. Examples from recent research are used to illustrate how each method has been applied ...

  8. The SAGE Handbook of Qualitative Research in Psychology

    Sage Handbook of Qualitative Research in Psychology is comprehensive and bold, celebrating the wide range of methods, approaches, perspectives and applications among qualitative research in psychology.

  9. Qualitative Psychology: A Practical Guide to Research Methods

    Undertaking qualitative research in psychology can seem like a daunting and complex process, especially when it comes to selecting the most appropriate approach for your project. This book provides a comprehensive and practical introduction to the key approaches in qualitative psychology research from a world-leading group of academics and researchers.

  10. Introduction to Qualitative Research Methods in Psychology

    Only 1 left in stock - order soon. Now in its third edition, Dennis Howitt's Introduction to Qualitative Methods in Psychology is better than ever. This trusted and valuable student resources provides clear explanations and examples that take the reader through qualitative research from data collection to analysis.

  11. Teaching Qualitative Research Methods in Psychology: An Introduction to

    The Status of Qualitative Methods in Psychology Qualitative approaches have always formed an important part of psychology's methodological toolkit. As Dennis Howitt (2010) reminds us in his recent introductory text on qualitative methods in psychology, the suggestion that, until relatively recently, 'mainstream psychology was a quantitative monolith smothering any other perspective on what ...

  12. PDF Introduction to Qualitative Research Methods in Psychology

    INTRODUCTION TO QUALITATIVE RESEARCH METHODS IN PSYCHOLOGY: PUTTING THEORY INTO PRACTICE. Step 2. Rough transcription Stud ox which proides asic adice on how a transcription should e laid out, and also loo at the transcription proided earlier ememer that these are stle guidelines and that some things are proal etter left until last nserting ...

  13. APA Handbook of Research Methods in Psychology

    Additional chapters cover various aspects of quantitative, qualitative, neuropsychological, and biological research designs, presenting an array of options and their nuanced distinctions. Chapters on techniques for data analysis follow, and important issues in writing up research to share with the community of psychologists are discussed in the handbook's concluding chapters.

  14. Introducing Qualitative Research in Psychology, 3rd edition

    Introducing Qualitative Research in Psychology is a vital resource for students new to qualitative psychology. It offers a clear introduction to the topic by taking eight different approaches to qualitative methods and explaining when each one should be used, the procedures and techniques involved, and any limitations associated with such research.

  15. Qualitative Research Methods: A Practice-Oriented Introduction

    The book examines questions such as why people do such research, how they go about doing it, what results it leads to, and how results can be presented in a plausible and useful way. Its ...

  16. PDF Introduction to Qualitative Research Methods in Psychology Putting

    In accord with contemporary philosophy of science, scientific activity - that is, research - is a means of organizing, sifting, and mak-ing sense in relation to a phenomenon of interest. In qualitative psychology, our science is a collective effort to understand people in the contexts in which they live and function.

  17. PDF Qualitative Research Methods in Psychology

    In the scientific community, and particularly in psychology and health, there has been an active and ongoing debate on the relative merits of adopting either quantitative or qualitative methods, especially when researching into human behaviour (Bowling, 2009; Oakley, 2000; Smith, 1995a, 1995b; Smith, 1998).

  18. Essentials of Qualitative Methods Series

    This book offers a no-nonsense, step-by-step approach to qualitative research in psychology and related fields, presenting principles for using a generic approach to descriptive-interpretive qualitative research.

  19. Research Methods in Psychology

    Research Methods in Psychology. Comprehensive, clear, and practical, Introduction to Research Methods in Psychology is the essential student guide to understanding and undertaking quantitative and qualitative research in psychology. Revised throughout, this new edition includes a new chapter on 'Managing your research project'.

  20. Introduction to Research Methods in Psychology

    There are several different research methods in psychology, each of which can help researchers learn more about the way people think, feel, and behave. If you're a psychology student or just want to know the types of research in psychology, here are the main ones as well as how they work.

  21. Introduction to qualitative methods in psychology

    by Howitt, Dennis Publication date 2010 Topics Psychology -- Research, Psychology -- Research -- Methodology Publisher Harlow, England : Financial Times Prentice Hall Collection internetarchivebooks; inlibrary; printdisabled Contributor Internet Archive Language English Item Size 1.3G xxii, 464 pages : 27 cm Includes bibliographical references ...

  22. (PDF) Qualitative Research Methods in Psychology

    Qualitative Research Methods in Psychology 177. More recently, in the UK, the British Psychological Society now has a members' section for. Qualitative Methods in Psychology (QMiP) which held a ...

  23. Qualitative Research for Intervention Development and Evaluation:

    Topics in Psychology. Explore how scientific research by psychologists can inform our professional lives, family and community relationships, emotional wellness, and more. ... Qualitative Research for Intervention Development and Evaluation: ... Emphasizes the basics of classic grounded theory and shows how the original tenets of the method ...

  24. Prediction of infectious diseases using sentiment analysis on social

    As the influence and risk of infectious diseases increase, efforts are being made to predict the number of confirmed infectious disease patients, but research involving the qualitative opinions of social media users is scarce. However, social data can change the psychology and behaviors of crowds through information dissemination, which can affect the spread of infectious diseases.