logo

FOR EMPLOYERS

Top 10 real-world data science case studies.

Data Science Case Studies

Aditya Sharma

Aditya is a content writer with 5+ years of experience writing for various industries including Marketing, SaaS, B2B, IT, and Edtech among others. You can find him watching anime or playing games when he’s not writing.

Frequently Asked Questions

Real-world data science case studies differ significantly from academic examples. While academic exercises often feature clean, well-structured data and simplified scenarios, real-world projects tackle messy, diverse data sources with practical constraints and genuine business objectives. These case studies reflect the complexities data scientists face when translating data into actionable insights in the corporate world.

Real-world data science projects come with common challenges. Data quality issues, including missing or inaccurate data, can hinder analysis. Domain expertise gaps may result in misinterpretation of results. Resource constraints might limit project scope or access to necessary tools and talent. Ethical considerations, like privacy and bias, demand careful handling.

Lastly, as data and business needs evolve, data science projects must adapt and stay relevant, posing an ongoing challenge.

Real-world data science case studies play a crucial role in helping companies make informed decisions. By analyzing their own data, businesses gain valuable insights into customer behavior, market trends, and operational efficiencies.

These insights empower data-driven strategies, aiding in more effective resource allocation, product development, and marketing efforts. Ultimately, case studies bridge the gap between data science and business decision-making, enhancing a company's ability to thrive in a competitive landscape.

Key takeaways from these case studies for organizations include the importance of cultivating a data-driven culture that values evidence-based decision-making. Investing in robust data infrastructure is essential to support data initiatives. Collaborating closely between data scientists and domain experts ensures that insights align with business goals.

Finally, continuous monitoring and refinement of data solutions are critical for maintaining relevance and effectiveness in a dynamic business environment. Embracing these principles can lead to tangible benefits and sustainable success in real-world data science endeavors.

Data science is a powerful driver of innovation and problem-solving across diverse industries. By harnessing data, organizations can uncover hidden patterns, automate repetitive tasks, optimize operations, and make informed decisions.

In healthcare, for example, data-driven diagnostics and treatment plans improve patient outcomes. In finance, predictive analytics enhances risk management. In transportation, route optimization reduces costs and emissions. Data science empowers industries to innovate and solve complex challenges in ways that were previously unimaginable.

Hire remote developers

Tell us the skills you need and we'll find the best developer for you in days, not weeks.

10 Real World Data Science Case Studies Projects with Example

Top 10 Data Science Case Studies Projects with Examples and Solutions in Python to inspire your data science learning in 2023.

10 Real World Data Science Case Studies Projects with Example

BelData science has been a trending buzzword in recent times. With wide applications in various sectors like healthcare , education, retail, transportation, media, and banking -data science applications are at the core of pretty much every industry out there. The possibilities are endless: analysis of frauds in the finance sector or the personalization of recommendations on eCommerce businesses.  We have developed ten exciting data science case studies to explain how data science is leveraged across various industries to make smarter decisions and develop innovative personalized products tailored to specific customers.

data_science_project

Walmart Sales Forecasting Data Science Project

Downloadable solution code | Explanatory videos | Tech Support

Table of Contents

Data science case studies in retail , data science case study examples in entertainment industry , data analytics case study examples in travel industry , case studies for data analytics in social media , real world data science projects in healthcare, data analytics case studies in oil and gas, what is a case study in data science, how do you prepare a data science case study, 10 most interesting data science case studies with examples.

data science case studies

So, without much ado, let's get started with data science business case studies !

With humble beginnings as a simple discount retailer, today, Walmart operates in 10,500 stores and clubs in 24 countries and eCommerce websites, employing around 2.2 million people around the globe. For the fiscal year ended January 31, 2021, Walmart's total revenue was $559 billion showing a growth of $35 billion with the expansion of the eCommerce sector. Walmart is a data-driven company that works on the principle of 'Everyday low cost' for its consumers. To achieve this goal, they heavily depend on the advances of their data science and analytics department for research and development, also known as Walmart Labs. Walmart is home to the world's largest private cloud, which can manage 2.5 petabytes of data every hour! To analyze this humongous amount of data, Walmart has created 'Data Café,' a state-of-the-art analytics hub located within its Bentonville, Arkansas headquarters. The Walmart Labs team heavily invests in building and managing technologies like cloud, data, DevOps , infrastructure, and security.

ProjectPro Free Projects on Big Data and Data Science

Walmart is experiencing massive digital growth as the world's largest retailer . Walmart has been leveraging Big data and advances in data science to build solutions to enhance, optimize and customize the shopping experience and serve their customers in a better way. At Walmart Labs, data scientists are focused on creating data-driven solutions that power the efficiency and effectiveness of complex supply chain management processes. Here are some of the applications of data science  at Walmart:

i) Personalized Customer Shopping Experience

Walmart analyses customer preferences and shopping patterns to optimize the stocking and displaying of merchandise in their stores. Analysis of Big data also helps them understand new item sales, make decisions on discontinuing products, and the performance of brands.

ii) Order Sourcing and On-Time Delivery Promise

Millions of customers view items on Walmart.com, and Walmart provides each customer a real-time estimated delivery date for the items purchased. Walmart runs a backend algorithm that estimates this based on the distance between the customer and the fulfillment center, inventory levels, and shipping methods available. The supply chain management system determines the optimum fulfillment center based on distance and inventory levels for every order. It also has to decide on the shipping method to minimize transportation costs while meeting the promised delivery date.

Here's what valued users are saying about ProjectPro

user profile

Savvy Sahai

Data Science Intern, Capgemini

user profile

Director Data Analytics at EY / EY Tech

Not sure what you are looking for?

iii) Packing Optimization 

Also known as Box recommendation is a daily occurrence in the shipping of items in retail and eCommerce business. When items of an order or multiple orders for the same customer are ready for packing, Walmart has developed a recommender system that picks the best-sized box which holds all the ordered items with the least in-box space wastage within a fixed amount of time. This Bin Packing problem is a classic NP-Hard problem familiar to data scientists .

Whenever items of an order or multiple orders placed by the same customer are picked from the shelf and are ready for packing, the box recommendation system determines the best-sized box to hold all the ordered items with a minimum of in-box space wasted. This problem is known as the Bin Packing Problem, another classic NP-Hard problem familiar to data scientists.

Here is a link to a sales prediction data science case study to help you understand the applications of Data Science in the real world. Walmart Sales Forecasting Project uses historical sales data for 45 Walmart stores located in different regions. Each store contains many departments, and you must build a model to project the sales for each department in each store. This data science case study aims to create a predictive model to predict the sales of each product. You can also try your hands-on Inventory Demand Forecasting Data Science Project to develop a machine learning model to forecast inventory demand accurately based on historical sales data.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Amazon is an American multinational technology-based company based in Seattle, USA. It started as an online bookseller, but today it focuses on eCommerce, cloud computing , digital streaming, and artificial intelligence . It hosts an estimate of 1,000,000,000 gigabytes of data across more than 1,400,000 servers. Through its constant innovation in data science and big data Amazon is always ahead in understanding its customers. Here are a few data analytics case study examples at Amazon:

i) Recommendation Systems

Data science models help amazon understand the customers' needs and recommend them to them before the customer searches for a product; this model uses collaborative filtering. Amazon uses 152 million customer purchases data to help users to decide on products to be purchased. The company generates 35% of its annual sales using the Recommendation based systems (RBS) method.

Here is a Recommender System Project to help you build a recommendation system using collaborative filtering. 

ii) Retail Price Optimization

Amazon product prices are optimized based on a predictive model that determines the best price so that the users do not refuse to buy it based on price. The model carefully determines the optimal prices considering the customers' likelihood of purchasing the product and thinks the price will affect the customers' future buying patterns. Price for a product is determined according to your activity on the website, competitors' pricing, product availability, item preferences, order history, expected profit margin, and other factors.

Check Out this Retail Price Optimization Project to build a Dynamic Pricing Model.

iii) Fraud Detection

Being a significant eCommerce business, Amazon remains at high risk of retail fraud. As a preemptive measure, the company collects historical and real-time data for every order. It uses Machine learning algorithms to find transactions with a higher probability of being fraudulent. This proactive measure has helped the company restrict clients with an excessive number of returns of products.

You can look at this Credit Card Fraud Detection Project to implement a fraud detection model to classify fraudulent credit card transactions.

New Projects

Let us explore data analytics case study examples in the entertainment indusry.

Ace Your Next Job Interview with Mock Interviews from Experts to Improve Your Skills and Boost Confidence!

Data Science Interview Preparation

Netflix started as a DVD rental service in 1997 and then has expanded into the streaming business. Headquartered in Los Gatos, California, Netflix is the largest content streaming company in the world. Currently, Netflix has over 208 million paid subscribers worldwide, and with thousands of smart devices which are presently streaming supported, Netflix has around 3 billion hours watched every month. The secret to this massive growth and popularity of Netflix is its advanced use of data analytics and recommendation systems to provide personalized and relevant content recommendations to its users. The data is collected over 100 billion events every day. Here are a few examples of data analysis case studies applied at Netflix :

i) Personalized Recommendation System

Netflix uses over 1300 recommendation clusters based on consumer viewing preferences to provide a personalized experience. Some of the data that Netflix collects from its users include Viewing time, platform searches for keywords, Metadata related to content abandonment, such as content pause time, rewind, rewatched. Using this data, Netflix can predict what a viewer is likely to watch and give a personalized watchlist to a user. Some of the algorithms used by the Netflix recommendation system are Personalized video Ranking, Trending now ranker, and the Continue watching now ranker.

ii) Content Development using Data Analytics

Netflix uses data science to analyze the behavior and patterns of its user to recognize themes and categories that the masses prefer to watch. This data is used to produce shows like The umbrella academy, and Orange Is the New Black, and the Queen's Gambit. These shows seem like a huge risk but are significantly based on data analytics using parameters, which assured Netflix that they would succeed with its audience. Data analytics is helping Netflix come up with content that their viewers want to watch even before they know they want to watch it.

iii) Marketing Analytics for Campaigns

Netflix uses data analytics to find the right time to launch shows and ad campaigns to have maximum impact on the target audience. Marketing analytics helps come up with different trailers and thumbnails for other groups of viewers. For example, the House of Cards Season 5 trailer with a giant American flag was launched during the American presidential elections, as it would resonate well with the audience.

Here is a Customer Segmentation Project using association rule mining to understand the primary grouping of customers based on various parameters.

Get FREE Access to Machine Learning Example Codes for Data Cleaning , Data Munging, and Data Visualization

In a world where Purchasing music is a thing of the past and streaming music is a current trend, Spotify has emerged as one of the most popular streaming platforms. With 320 million monthly users, around 4 billion playlists, and approximately 2 million podcasts, Spotify leads the pack among well-known streaming platforms like Apple Music, Wynk, Songza, amazon music, etc. The success of Spotify has mainly depended on data analytics. By analyzing massive volumes of listener data, Spotify provides real-time and personalized services to its listeners. Most of Spotify's revenue comes from paid premium subscriptions. Here are some of the examples of case study on data analytics used by Spotify to provide enhanced services to its listeners:

i) Personalization of Content using Recommendation Systems

Spotify uses Bart or Bayesian Additive Regression Trees to generate music recommendations to its listeners in real-time. Bart ignores any song a user listens to for less than 30 seconds. The model is retrained every day to provide updated recommendations. A new Patent granted to Spotify for an AI application is used to identify a user's musical tastes based on audio signals, gender, age, accent to make better music recommendations.

Spotify creates daily playlists for its listeners, based on the taste profiles called 'Daily Mixes,' which have songs the user has added to their playlists or created by the artists that the user has included in their playlists. It also includes new artists and songs that the user might be unfamiliar with but might improve the playlist. Similar to it is the weekly 'Release Radar' playlists that have newly released artists' songs that the listener follows or has liked before.

ii) Targetted marketing through Customer Segmentation

With user data for enhancing personalized song recommendations, Spotify uses this massive dataset for targeted ad campaigns and personalized service recommendations for its users. Spotify uses ML models to analyze the listener's behavior and group them based on music preferences, age, gender, ethnicity, etc. These insights help them create ad campaigns for a specific target audience. One of their well-known ad campaigns was the meme-inspired ads for potential target customers, which was a huge success globally.

iii) CNN's for Classification of Songs and Audio Tracks

Spotify builds audio models to evaluate the songs and tracks, which helps develop better playlists and recommendations for its users. These allow Spotify to filter new tracks based on their lyrics and rhythms and recommend them to users like similar tracks ( collaborative filtering). Spotify also uses NLP ( Natural language processing) to scan articles and blogs to analyze the words used to describe songs and artists. These analytical insights can help group and identify similar artists and songs and leverage them to build playlists.

Here is a Music Recommender System Project for you to start learning. We have listed another music recommendations dataset for you to use for your projects: Dataset1 . You can use this dataset of Spotify metadata to classify songs based on artists, mood, liveliness. Plot histograms, heatmaps to get a better understanding of the dataset. Use classification algorithms like logistic regression, SVM, and Principal component analysis to generate valuable insights from the dataset.

Explore Categories

Below you will find case studies for data analytics in the travel and tourism industry.

Airbnb was born in 2007 in San Francisco and has since grown to 4 million Hosts and 5.6 million listings worldwide who have welcomed more than 1 billion guest arrivals in almost every country across the globe. Airbnb is active in every country on the planet except for Iran, Sudan, Syria, and North Korea. That is around 97.95% of the world. Using data as a voice of their customers, Airbnb uses the large volume of customer reviews, host inputs to understand trends across communities, rate user experiences, and uses these analytics to make informed decisions to build a better business model. The data scientists at Airbnb are developing exciting new solutions to boost the business and find the best mapping for its customers and hosts. Airbnb data servers serve approximately 10 million requests a day and process around one million search queries. Data is the voice of customers at AirBnB and offers personalized services by creating a perfect match between the guests and hosts for a supreme customer experience. 

i) Recommendation Systems and Search Ranking Algorithms

Airbnb helps people find 'local experiences' in a place with the help of search algorithms that make searches and listings precise. Airbnb uses a 'listing quality score' to find homes based on the proximity to the searched location and uses previous guest reviews. Airbnb uses deep neural networks to build models that take the guest's earlier stays into account and area information to find a perfect match. The search algorithms are optimized based on guest and host preferences, rankings, pricing, and availability to understand users’ needs and provide the best match possible.

ii) Natural Language Processing for Review Analysis

Airbnb characterizes data as the voice of its customers. The customer and host reviews give a direct insight into the experience. The star ratings alone cannot be an excellent way to understand it quantitatively. Hence Airbnb uses natural language processing to understand reviews and the sentiments behind them. The NLP models are developed using Convolutional neural networks .

Practice this Sentiment Analysis Project for analyzing product reviews to understand the basic concepts of natural language processing.

iii) Smart Pricing using Predictive Analytics

The Airbnb hosts community uses the service as a supplementary income. The vacation homes and guest houses rented to customers provide for rising local community earnings as Airbnb guests stay 2.4 times longer and spend approximately 2.3 times the money compared to a hotel guest. The profits are a significant positive impact on the local neighborhood community. Airbnb uses predictive analytics to predict the prices of the listings and help the hosts set a competitive and optimal price. The overall profitability of the Airbnb host depends on factors like the time invested by the host and responsiveness to changing demands for different seasons. The factors that impact the real-time smart pricing are the location of the listing, proximity to transport options, season, and amenities available in the neighborhood of the listing.

Here is a Price Prediction Project to help you understand the concept of predictive analysis which is widely common in case studies for data analytics. 

Uber is the biggest global taxi service provider. As of December 2018, Uber has 91 million monthly active consumers and 3.8 million drivers. Uber completes 14 million trips each day. Uber uses data analytics and big data-driven technologies to optimize their business processes and provide enhanced customer service. The Data Science team at uber has been exploring futuristic technologies to provide better service constantly. Machine learning and data analytics help Uber make data-driven decisions that enable benefits like ride-sharing, dynamic price surges, better customer support, and demand forecasting. Here are some of the real world data science projects used by uber:

i) Dynamic Pricing for Price Surges and Demand Forecasting

Uber prices change at peak hours based on demand. Uber uses surge pricing to encourage more cab drivers to sign up with the company, to meet the demand from the passengers. When the prices increase, the driver and the passenger are both informed about the surge in price. Uber uses a predictive model for price surging called the 'Geosurge' ( patented). It is based on the demand for the ride and the location.

ii) One-Click Chat

Uber has developed a Machine learning and natural language processing solution called one-click chat or OCC for coordination between drivers and users. This feature anticipates responses for commonly asked questions, making it easy for the drivers to respond to customer messages. Drivers can reply with the clock of just one button. One-Click chat is developed on Uber's machine learning platform Michelangelo to perform NLP on rider chat messages and generate appropriate responses to them.

iii) Customer Retention

Failure to meet the customer demand for cabs could lead to users opting for other services. Uber uses machine learning models to bridge this demand-supply gap. By using prediction models to predict the demand in any location, uber retains its customers. Uber also uses a tier-based reward system, which segments customers into different levels based on usage. The higher level the user achieves, the better are the perks. Uber also provides personalized destination suggestions based on the history of the user and their frequently traveled destinations.

You can take a look at this Python Chatbot Project and build a simple chatbot application to understand better the techniques used for natural language processing. You can also practice the working of a demand forecasting model with this project using time series analysis. You can look at this project which uses time series forecasting and clustering on a dataset containing geospatial data for forecasting customer demand for ola rides.

Explore More  Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

7) LinkedIn 

LinkedIn is the largest professional social networking site with nearly 800 million members in more than 200 countries worldwide. Almost 40% of the users access LinkedIn daily, clocking around 1 billion interactions per month. The data science team at LinkedIn works with this massive pool of data to generate insights to build strategies, apply algorithms and statistical inferences to optimize engineering solutions, and help the company achieve its goals. Here are some of the real world data science projects at LinkedIn:

i) LinkedIn Recruiter Implement Search Algorithms and Recommendation Systems

LinkedIn Recruiter helps recruiters build and manage a talent pool to optimize the chances of hiring candidates successfully. This sophisticated product works on search and recommendation engines. The LinkedIn recruiter handles complex queries and filters on a constantly growing large dataset. The results delivered have to be relevant and specific. The initial search model was based on linear regression but was eventually upgraded to Gradient Boosted decision trees to include non-linear correlations in the dataset. In addition to these models, the LinkedIn recruiter also uses the Generalized Linear Mix model to improve the results of prediction problems to give personalized results.

ii) Recommendation Systems Personalized for News Feed

The LinkedIn news feed is the heart and soul of the professional community. A member's newsfeed is a place to discover conversations among connections, career news, posts, suggestions, photos, and videos. Every time a member visits LinkedIn, machine learning algorithms identify the best exchanges to be displayed on the feed by sorting through posts and ranking the most relevant results on top. The algorithms help LinkedIn understand member preferences and help provide personalized news feeds. The algorithms used include logistic regression, gradient boosted decision trees and neural networks for recommendation systems.

iii) CNN's to Detect Inappropriate Content

To provide a professional space where people can trust and express themselves professionally in a safe community has been a critical goal at LinkedIn. LinkedIn has heavily invested in building solutions to detect fake accounts and abusive behavior on their platform. Any form of spam, harassment, inappropriate content is immediately flagged and taken down. These can range from profanity to advertisements for illegal services. LinkedIn uses a Convolutional neural networks based machine learning model. This classifier trains on a training dataset containing accounts labeled as either "inappropriate" or "appropriate." The inappropriate list consists of accounts having content from "blocklisted" phrases or words and a small portion of manually reviewed accounts reported by the user community.

Here is a Text Classification Project to help you understand NLP basics for text classification. You can find a news recommendation system dataset to help you build a personalized news recommender system. You can also use this dataset to build a classifier using logistic regression, Naive Bayes, or Neural networks to classify toxic comments.

Get confident to build end-to-end projects

Access to a curated library of 250+ end-to-end industry projects with solution code, videos and tech support.

Pfizer is a multinational pharmaceutical company headquartered in New York, USA. One of the largest pharmaceutical companies globally known for developing a wide range of medicines and vaccines in disciplines like immunology, oncology, cardiology, and neurology. Pfizer became a household name in 2010 when it was the first to have a COVID-19 vaccine with FDA. In early November 2021, The CDC has approved the Pfizer vaccine for kids aged 5 to 11. Pfizer has been using machine learning and artificial intelligence to develop drugs and streamline trials, which played a massive role in developing and deploying the COVID-19 vaccine. Here are a few data analytics case studies by Pfizer :

i) Identifying Patients for Clinical Trials

Artificial intelligence and machine learning are used to streamline and optimize clinical trials to increase their efficiency. Natural language processing and exploratory data analysis of patient records can help identify suitable patients for clinical trials. These can help identify patients with distinct symptoms. These can help examine interactions of potential trial members' specific biomarkers, predict drug interactions and side effects which can help avoid complications. Pfizer's AI implementation helped rapidly identify signals within the noise of millions of data points across their 44,000-candidate COVID-19 clinical trial.

ii) Supply Chain and Manufacturing

Data science and machine learning techniques help pharmaceutical companies better forecast demand for vaccines and drugs and distribute them efficiently. Machine learning models can help identify efficient supply systems by automating and optimizing the production steps. These will help supply drugs customized to small pools of patients in specific gene pools. Pfizer uses Machine learning to predict the maintenance cost of equipment used. Predictive maintenance using AI is the next big step for Pharmaceutical companies to reduce costs.

iii) Drug Development

Computer simulations of proteins, and tests of their interactions, and yield analysis help researchers develop and test drugs more efficiently. In 2016 Watson Health and Pfizer announced a collaboration to utilize IBM Watson for Drug Discovery to help accelerate Pfizer's research in immuno-oncology, an approach to cancer treatment that uses the body's immune system to help fight cancer. Deep learning models have been used recently for bioactivity and synthesis prediction for drugs and vaccines in addition to molecular design. Deep learning has been a revolutionary technique for drug discovery as it factors everything from new applications of medications to possible toxic reactions which can save millions in drug trials.

You can create a Machine learning model to predict molecular activity to help design medicine using this dataset . You may build a CNN or a Deep neural network for this data analyst case study project.

Access Data Science and Machine Learning Project Code Examples

9) Shell Data Analyst Case Study Project

Shell is a global group of energy and petrochemical companies with over 80,000 employees in around 70 countries. Shell uses advanced technologies and innovations to help build a sustainable energy future. Shell is going through a significant transition as the world needs more and cleaner energy solutions to be a clean energy company by 2050. It requires substantial changes in the way in which energy is used. Digital technologies, including AI and Machine Learning, play an essential role in this transformation. These include efficient exploration and energy production, more reliable manufacturing, more nimble trading, and a personalized customer experience. Using AI in various phases of the organization will help achieve this goal and stay competitive in the market. Here are a few data analytics case studies in the petrochemical industry:

i) Precision Drilling

Shell is involved in the processing mining oil and gas supply, ranging from mining hydrocarbons to refining the fuel to retailing them to customers. Recently Shell has included reinforcement learning to control the drilling equipment used in mining. Reinforcement learning works on a reward-based system based on the outcome of the AI model. The algorithm is designed to guide the drills as they move through the surface, based on the historical data from drilling records. It includes information such as the size of drill bits, temperatures, pressures, and knowledge of the seismic activity. This model helps the human operator understand the environment better, leading to better and faster results will minor damage to machinery used. 

ii) Efficient Charging Terminals

Due to climate changes, governments have encouraged people to switch to electric vehicles to reduce carbon dioxide emissions. However, the lack of public charging terminals has deterred people from switching to electric cars. Shell uses AI to monitor and predict the demand for terminals to provide efficient supply. Multiple vehicles charging from a single terminal may create a considerable grid load, and predictions on demand can help make this process more efficient.

iii) Monitoring Service and Charging Stations

Another Shell initiative trialed in Thailand and Singapore is the use of computer vision cameras, which can think and understand to watch out for potentially hazardous activities like lighting cigarettes in the vicinity of the pumps while refueling. The model is built to process the content of the captured images and label and classify it. The algorithm can then alert the staff and hence reduce the risk of fires. You can further train the model to detect rash driving or thefts in the future.

Here is a project to help you understand multiclass image classification. You can use the Hourly Energy Consumption Dataset to build an energy consumption prediction model. You can use time series with XGBoost to develop your model.

10) Zomato Case Study on Data Analytics

Zomato was founded in 2010 and is currently one of the most well-known food tech companies. Zomato offers services like restaurant discovery, home delivery, online table reservation, online payments for dining, etc. Zomato partners with restaurants to provide tools to acquire more customers while also providing delivery services and easy procurement of ingredients and kitchen supplies. Currently, Zomato has over 2 lakh restaurant partners and around 1 lakh delivery partners. Zomato has closed over ten crore delivery orders as of date. Zomato uses ML and AI to boost their business growth, with the massive amount of data collected over the years from food orders and user consumption patterns. Here are a few examples of data analyst case study project developed by the data scientists at Zomato:

i) Personalized Recommendation System for Homepage

Zomato uses data analytics to create personalized homepages for its users. Zomato uses data science to provide order personalization, like giving recommendations to the customers for specific cuisines, locations, prices, brands, etc. Restaurant recommendations are made based on a customer's past purchases, browsing history, and what other similar customers in the vicinity are ordering. This personalized recommendation system has led to a 15% improvement in order conversions and click-through rates for Zomato. 

You can use the Restaurant Recommendation Dataset to build a restaurant recommendation system to predict what restaurants customers are most likely to order from, given the customer location, restaurant information, and customer order history.

ii) Analyzing Customer Sentiment

Zomato uses Natural language processing and Machine learning to understand customer sentiments using social media posts and customer reviews. These help the company gauge the inclination of its customer base towards the brand. Deep learning models analyze the sentiments of various brand mentions on social networking sites like Twitter, Instagram, Linked In, and Facebook. These analytics give insights to the company, which helps build the brand and understand the target audience.

iii) Predicting Food Preparation Time (FPT)

Food delivery time is an essential variable in the estimated delivery time of the order placed by the customer using Zomato. The food preparation time depends on numerous factors like the number of dishes ordered, time of the day, footfall in the restaurant, day of the week, etc. Accurate prediction of the food preparation time can help make a better prediction of the Estimated delivery time, which will help delivery partners less likely to breach it. Zomato uses a Bidirectional LSTM-based deep learning model that considers all these features and provides food preparation time for each order in real-time. 

Data scientists are companies' secret weapons when analyzing customer sentiments and behavior and leveraging it to drive conversion, loyalty, and profits. These 10 data science case studies projects with examples and solutions show you how various organizations use data science technologies to succeed and be at the top of their field! To summarize, Data Science has not only accelerated the performance of companies but has also made it possible to manage & sustain their performance with ease.

FAQs on Data Analysis Case Studies

A case study in data science is an in-depth analysis of a real-world problem using data-driven approaches. It involves collecting, cleaning, and analyzing data to extract insights and solve challenges, offering practical insights into how data science techniques can address complex issues across various industries.

To create a data science case study, identify a relevant problem, define objectives, and gather suitable data. Clean and preprocess data, perform exploratory data analysis, and apply appropriate algorithms for analysis. Summarize findings, visualize results, and provide actionable recommendations, showcasing the problem-solving potential of data science techniques.

Access Solved Big Data and Data Science Projects

About the Author

author profile

ProjectPro is the only online platform designed to help professionals gain practical, hands-on experience in big data, data engineering, data science, and machine learning related technologies. Having over 270+ reusable project templates in data science and big data with step-by-step walkthroughs,

arrow link

© 2024

© 2024 Iconiq Inc.

Privacy policy

User policy

Write for ProjectPro

Data Analytics Case Study: Complete Guide in 2024

Data Analytics Case Study: Complete Guide in 2024

What are data analytics case study interviews.

When you’re trying to land a data analyst job, the last thing to stand in your way is the data analytics case study interview.

One reason they’re so challenging is that case studies don’t typically have a right or wrong answer.

Instead, case study interviews require you to come up with a hypothesis for an analytics question and then produce data to support or validate your hypothesis. In other words, it’s not just about your technical skills; you’re also being tested on creative problem-solving and your ability to communicate with stakeholders.

This article provides an overview of how to answer data analytics case study interview questions. You can find an in-depth course in the data analytics learning path .

How to Solve Data Analytics Case Questions

Check out our video below on How to solve a Data Analytics case study problem:

Data Analytics Case Study Vide Guide

With data analyst case questions, you will need to answer two key questions:

  • What metrics should I propose?
  • How do I write a SQL query to get the metrics I need?

In short, to ace a data analytics case interview, you not only need to brush up on case questions, but you also should be adept at writing all types of SQL queries and have strong data sense.

These questions are especially challenging to answer if you don’t have a framework or know how to answer them. To help you prepare , we created this step-by-step guide to answering data analytics case questions.

We show you how to use a framework to answer case questions, provide example analytics questions, and help you understand the difference between analytics case studies and product metrics case studies .

Data Analytics Cases vs Product Metrics Questions

Product case questions sometimes get lumped in with data analytics cases.

Ultimately, the type of case question you are asked will depend on the role. For example, product analysts will likely face more product-oriented questions.

Product metrics cases tend to focus on a hypothetical situation. You might be asked to:

Investigate Metrics - One of the most common types will ask you to investigate a metric, usually one that’s going up or down. For example, “Why are Facebook friend requests falling by 10 percent?”

Measure Product/Feature Success - A lot of analytics cases revolve around the measurement of product success and feature changes. For example, “We want to add X feature to product Y. What metrics would you track to make sure that’s a good idea?”

With product data cases, the key difference is that you may or may not be required to write the SQL query to find the metric.

Instead, these interviews are more theoretical and are designed to assess your product sense and ability to think about analytics problems from a product perspective. Product metrics questions may also show up in the data analyst interview , but likely only for product data analyst roles.

case study for data analysis

TRY CHECKING: Marketing Analytics Case Study Guide

Data Analytics Case Study Question: Sample Solution

Data Analytics Case Study Sample Solution

Let’s start with an example data analytics case question :

You’re given a table that represents search results from searches on Facebook. The query column is the search term, the position column represents each position the search result came in, and the rating column represents the human rating from 1 to 5, where 5 is high relevance, and 1 is low relevance.

Each row in the search_events table represents a single search, with the has_clicked column representing if a user clicked on a result or not. We have a hypothesis that the CTR is dependent on the search result rating.

Write a query to return data to support or disprove this hypothesis.

search_results table:

Column Type
VARCHAR
INTEGER
INTEGER
INTEGER

search_events table

Column Type
INTEGER
VARCHAR
BOOLEAN

Step 1: With Data Analytics Case Studies, Start by Making Assumptions

Hint: Start by making assumptions and thinking out loud. With this question, focus on coming up with a metric to support the hypothesis. If the question is unclear or if you think you need more information, be sure to ask.

Answer. The hypothesis is that CTR is dependent on search result rating. Therefore, we want to focus on the CTR metric, and we can assume:

  • If CTR is high when search result ratings are high, and CTR is low when the search result ratings are low, then the hypothesis is correct.
  • If CTR is low when the search ratings are high, or there is no proven correlation between the two, then our hypothesis is not proven.

Step 2: Provide a Solution for the Case Question

Hint: Walk the interviewer through your reasoning. Talking about the decisions you make and why you’re making them shows off your problem-solving approach.

Answer. One way we can investigate the hypothesis is to look at the results split into different search rating buckets. For example, if we measure the CTR for results rated at 1, then those rated at 2, and so on, we can identify if an increase in rating is correlated with an increase in CTR.

First, I’d write a query to get the number of results for each query in each bucket. We want to look at the distribution of results that are less than a rating threshold, which will help us see the relationship between search rating and CTR.

This CTE aggregates the number of results that are less than a certain rating threshold. Later, we can use this to see the percentage that are in each bucket. If we re-join to the search_events table, we can calculate the CTR by then grouping by each bucket.

Step 3: Use Analysis to Backup Your Solution

Hint: Be prepared to justify your solution. Interviewers will follow up with questions about your reasoning, and ask why you make certain assumptions.

Answer. By using the CASE WHEN statement, I calculated each ratings bucket by checking to see if all the search results were less than 1, 2, or 3 by subtracting the total from the number within the bucket and seeing if it equates to 0.

I did that to get away from averages in our bucketing system. Outliers would make it more difficult to measure the effect of bad ratings. For example, if a query had a 1 rating and another had a 5 rating, that would equate to an average of 3. Whereas in my solution, a query with all of the results under 1, 2, or 3 lets us know that it actually has bad ratings.

Product Data Case Question: Sample Solution

product analytics on screen

In product metrics interviews, you’ll likely be asked about analytics, but the discussion will be more theoretical. You’ll propose a solution to a problem, and supply the metrics you’ll use to investigate or solve it. You may or may not be required to write a SQL query to get those metrics.

We’ll start with an example product metrics case study question :

Let’s say you work for a social media company that has just done a launch in a new city. Looking at weekly metrics, you see a slow decrease in the average number of comments per user from January to March in this city.

The company has been consistently growing new users in the city from January to March.

What are some reasons why the average number of comments per user would be decreasing and what metrics would you look into?

Step 1: Ask Clarifying Questions Specific to the Case

Hint: This question is very vague. It’s all hypothetical, so we don’t know very much about users, what the product is, and how people might be interacting. Be sure you ask questions upfront about the product.

Answer: Before I jump into an answer, I’d like to ask a few questions:

  • Who uses this social network? How do they interact with each other?
  • Has there been any performance issues that might be causing the problem?
  • What are the goals of this particular launch?
  • Has there been any changes to the comment features in recent weeks?

For the sake of this example, let’s say we learn that it’s a social network similar to Facebook with a young audience, and the goals of the launch are to grow the user base. Also, there have been no performance issues and the commenting feature hasn’t been changed since launch.

Step 2: Use the Case Question to Make Assumptions

Hint: Look for clues in the question. For example, this case gives you a metric, “average number of comments per user.” Consider if the clue might be helpful in your solution. But be careful, sometimes questions are designed to throw you off track.

Answer: From the question, we can hypothesize a little bit. For example, we know that user count is increasing linearly. That means two things:

  • The decreasing comments issue isn’t a result of a declining user base.
  • The cause isn’t loss of platform.

We can also model out the data to help us get a better picture of the average number of comments per user metric:

  • January: 10000 users, 30000 comments, 3 comments/user
  • February: 20000 users, 50000 comments, 2.5 comments/user
  • March: 30000 users, 60000 comments, 2 comments/user

One thing to note: Although this is an interesting metric, I’m not sure if it will help us solve this question. For one, average comments per user doesn’t account for churn. We might assume that during the three-month period users are churning off the platform. Let’s say the churn rate is 25% in January, 20% in February and 15% in March.

Step 3: Make a Hypothesis About the Data

Hint: Don’t worry too much about making a correct hypothesis. Instead, interviewers want to get a sense of your product initiation and that you’re on the right track. Also, be prepared to measure your hypothesis.

Answer. I would say that average comments per user isn’t a great metric to use, because it doesn’t reveal insights into what’s really causing this issue.

That’s because it doesn’t account for active users, which are the users who are actually commenting. A better metric to investigate would be retained users and monthly active users.

What I suspect is causing the issue is that active users are commenting frequently and are responsible for the increase in comments month-to-month. New users, on the other hand, aren’t as engaged and aren’t commenting as often.

Step 4: Provide Metrics and Data Analysis

Hint: Within your solution, include key metrics that you’d like to investigate that will help you measure success.

Answer: I’d say there are a few ways we could investigate the cause of this problem, but the one I’d be most interested in would be the engagement of monthly active users.

If the growth in comments is coming from active users, that would help us understand how we’re doing at retaining users. Plus, it will also show if new users are less engaged and commenting less frequently.

One way that we could dig into this would be to segment users by their onboarding date, which would help us to visualize engagement and see how engaged some of our longest-retained users are.

If engagement of new users is the issue, that will give us some options in terms of strategies for addressing the problem. For example, we could test new onboarding or commenting features designed to generate engagement.

Step 5: Propose a Solution for the Case Question

Hint: In the majority of cases, your initial assumptions might be incorrect, or the interviewer might throw you a curveball. Be prepared to make new hypotheses or discuss the pitfalls of your analysis.

Answer. If the cause wasn’t due to a lack of engagement among new users, then I’d want to investigate active users. One potential cause would be active users commenting less. In that case, we’d know that our earliest users were churning out, and that engagement among new users was potentially growing.

Again, I think we’d want to focus on user engagement since the onboarding date. That would help us understand if we were seeing higher levels of churn among active users, and we could start to identify some solutions there.

Tip: Use a Framework to Solve Data Analytics Case Questions

Analytics case questions can be challenging, but they’re much more challenging if you don’t use a framework. Without a framework, it’s easier to get lost in your answer, to get stuck, and really lose the confidence of your interviewer. Find helpful frameworks for data analytics questions in our data analytics learning path and our product metrics learning path .

Once you have the framework down, what’s the best way to practice? Mock interviews with our coaches are very effective, as you’ll get feedback and helpful tips as you answer. You can also learn a lot by practicing P2P mock interviews with other Interview Query students. No data analytics background? Check out how to become a data analyst without a degree .

Finally, if you’re looking for sample data analytics case questions and other types of interview questions, see our guide on the top data analyst interview questions .

  • Digital Marketing
  • Facebook Marketing
  • Instagram Marketing
  • Ecommerce Marketing
  • Content Marketing
  • Data Science Certification
  • Machine Learning
  • Artificial Intelligence
  • Data Analytics
  • Graphic Design
  • Adobe Illustrator
  • Web Designing
  • UX UI Design
  • Interior Design
  • Front End Development
  • Back End Development Courses
  • Business Analytics
  • Entrepreneurship
  • Supply Chain
  • Financial Modeling
  • Corporate Finance
  • Project Finance
  • Harvard University
  • Stanford University
  • Yale University
  • Princeton University
  • Duke University
  • UC Berkeley
  • Harvard University Executive Programs
  • MIT Executive Programs
  • Stanford University Executive Programs
  • Oxford University Executive Programs
  • Cambridge University Executive Programs
  • Yale University Executive Programs
  • Kellog Executive Programs
  • CMU Executive Programs
  • 45000+ Free Courses
  • Free Certification Courses
  • Free DigitalDefynd Certificate
  • Free Harvard University Courses
  • Free MIT Courses
  • Free Excel Courses
  • Free Google Courses
  • Free Finance Courses
  • Free Coding Courses
  • Free Digital Marketing Courses

Top 25 Data Science Case Studies [2024]

In an era where data is the new gold, harnessing its power through data science has led to groundbreaking advancements across industries. From personalized marketing to predictive maintenance, the applications of data science are not only diverse but transformative. This compilation of the top 25 data science case studies showcases the profound impact of intelligent data utilization in solving real-world problems. These examples span various sectors, including healthcare, finance, transportation, and manufacturing, illustrating how data-driven decisions shape business operations’ future, enhance efficiency, and optimize user experiences. As we delve into these case studies, we witness the incredible potential of data science to innovate and drive success in today’s data-centric world.

Related: Interesting Data Science Facts

Top 25 Data Science Case Studies [2024]

Case study 1 – personalized marketing (amazon).

Challenge:  Amazon aimed to enhance user engagement by tailoring product recommendations to individual preferences, requiring the real-time processing of vast data volumes.

Solution:  Amazon implemented a sophisticated machine learning algorithm known as collaborative filtering, which analyzes users’ purchase history, cart contents, product ratings, and browsing history, along with the behavior of similar users. This approach enables Amazon to offer highly personalized product suggestions.

Overall Impact:

  • Increased Customer Satisfaction:  Tailored recommendations improved the shopping experience.
  • Higher Sales Conversions:  Relevant product suggestions boosted sales.

Key Takeaways:

  • Personalized Marketing Significantly Enhances User Engagement:  Demonstrating how tailored interactions can deepen user involvement and satisfaction.
  • Effective Use of Big Data and Machine Learning Can Transform Customer Experiences:  These technologies redefine the consumer landscape by continuously adapting recommendations to changing user preferences and behaviors.

This strategy has proven pivotal in increasing Amazon’s customer loyalty and sales by making the shopping experience more relevant and engaging.

Case Study 2 – Real-Time Pricing Strategy (Uber)

Challenge:  Uber needed to adjust its pricing dynamically to reflect real-time demand and supply variations across different locations and times, aiming to optimize driver incentives and customer satisfaction without manual intervention.

Solution:  Uber introduced a dynamic pricing model called “surge pricing.” This system uses data science to automatically calculate fares in real time based on current demand and supply data. The model incorporates traffic conditions, weather forecasts, and local events to adjust prices appropriately.

  • Optimized Ride Availability:  The model reduced customer wait times by incentivizing more drivers to be available during high-demand periods.
  • Increased Driver Earnings:  Drivers benefitted from higher earnings during surge periods, aligning their incentives with customer demand.
  • Efficient Balance of Supply and Demand:  Dynamic pricing matches ride availability with customer needs.
  • Importance of Real-Time Data Processing:  The real-time processing of data is crucial for responsive and adaptive service delivery.

Uber’s implementation of surge pricing illustrates the power of using real-time data analytics to create a flexible and responsive pricing system that benefits both consumers and service providers, enhancing overall service efficiency and satisfaction.

Case Study 3 – Fraud Detection in Banking (JPMorgan Chase)

Challenge:  JPMorgan Chase faced the critical need to enhance its fraud detection capabilities to safeguard the institution and its customers from financial losses. The primary challenge was detecting fraudulent transactions swiftly and accurately in a vast stream of legitimate banking activities.

Solution:  The bank implemented advanced machine learning models that analyze real-time transaction patterns and customer behaviors. These models are continuously trained on vast amounts of historical fraud data, enabling them to identify and flag transactions that significantly deviate from established patterns, which may indicate potential fraud.

  • Substantial Reduction in Fraudulent Transactions:  The advanced detection capabilities led to a marked decrease in fraud occurrences.
  • Enhanced Security for Customer Accounts:  Customers experienced greater security and trust in their transactions.
  • Effectiveness of Machine Learning in Fraud Detection:  Machine learning models are greatly effective at identifying fraud activities within large datasets.
  • Importance of Ongoing Training and Updates:  Continuous training and updating of models are crucial to adapt to evolving fraudulent techniques and maintain detection efficacy.

JPMorgan Chase’s use of machine learning for fraud detection demonstrates how financial institutions can leverage advanced analytics to enhance security measures, protect financial assets, and build customer trust in their banking services.

Case Study 4 – Optimizing Healthcare Outcomes (Mayo Clinic)

Challenge:  The Mayo Clinic aimed to enhance patient outcomes by predicting diseases before they reach critical stages. This involved analyzing large volumes of diverse data, including historical patient records and real-time health metrics from various sources like lab results and patient monitors.

Solution:  The Mayo Clinic employed predictive analytics to integrate and analyze this data to build models that predict patient risk for diseases such as diabetes and heart disease, enabling earlier and more targeted interventions.

  • Improved Patient Outcomes:  Early identification of at-risk patients allowed for timely medical intervention.
  • Reduction in Healthcare Costs:  Preventing disease progression reduces the need for more extensive and costly treatments later.
  • Early Identification of Health Risks:  Predictive models are essential for identifying at-risk patients early, improving the chances of successful interventions.
  • Integration of Multiple Data Sources:  Combining historical and real-time data provides a comprehensive view that enhances the accuracy of predictions.

Case Study 5 – Streamlining Operations in Manufacturing (General Electric)

Challenge:  General Electric needed to optimize its manufacturing processes to reduce costs and downtime by predicting when machines would likely require maintenance to prevent breakdowns.

Solution:  GE leveraged data from sensors embedded in machinery to monitor their condition continuously. Data science algorithms analyze this sensor data to predict when a machine is likely to disappoint, facilitating preemptive maintenance and scheduling.

  • Reduction in Unplanned Machine Downtime:  Predictive maintenance helped avoid unexpected breakdowns.
  • Lower Maintenance Costs and Improved Machine Lifespan:  Regular maintenance based on predictive data reduced overall costs and extended the life of machinery.
  • Predictive Maintenance Enhances Operational Efficiency:  Using data-driven predictions for maintenance can significantly reduce downtime and operational costs.
  • Value of Sensor Data:  Continuous monitoring and data analysis are crucial for forecasting equipment health and preventing failures.

Related: Data Engineering vs. Data Science

Case Study 6 – Enhancing Supply Chain Management (DHL)

Challenge:  DHL sought to optimize its global logistics and supply chain operations to decreases expenses and enhance delivery efficiency. It required handling complex data from various sources for better route planning and inventory management.

Solution:  DHL implemented advanced analytics to process and analyze data from its extensive logistics network. This included real-time tracking of shipments, analysis of weather conditions, traffic patterns, and inventory levels to optimize route planning and warehouse operations.

  • Enhanced Efficiency in Logistics Operations:  More precise route planning and inventory management improved delivery times and reduced resource wastage.
  • Reduced Operational Costs:  Streamlined operations led to significant cost savings across the supply chain.
  • Critical Role of Comprehensive Data Analysis:  Effective supply chain management depends on integrating and analyzing data from multiple sources.
  • Benefits of Real-Time Data Integration:  Real-time data enhances logistical decision-making, leading to more efficient and cost-effective operations.

Case Study 7 – Predictive Maintenance in Aerospace (Airbus)

Challenge:  Airbus faced the challenge of predicting potential failures in aircraft components to enhance safety and reduce maintenance costs. The key was to accurately forecast the lifespan of parts under varying conditions and usage patterns, which is critical in the aerospace industry where safety is paramount.

Solution:  Airbus tackled this challenge by developing predictive models that utilize data collected from sensors installed on aircraft. These sensors continuously monitor the condition of various components, providing real-time data that the models analyze. The predictive algorithms assess the likelihood of component failure, enabling maintenance teams to schedule repairs or replacements proactively before actual failures occur.

  • Increased Safety:  The ability to predict and prevent potential in-flight failures has significantly improved the safety of Airbus aircraft.
  • Reduced Costs:  By optimizing maintenance schedules and minimizing unnecessary checks, Airbus has been able to cut down on maintenance expenses and reduce aircraft downtime.
  • Enhanced Safety through Predictive Analytics:  The use of predictive analytics in monitoring aircraft components plays a crucial role in preventing failures, thereby enhancing the overall safety of aviation operations.
  • Valuable Insights from Sensor Data:  Real-time data from operational use is critical for developing effective predictive maintenance strategies. This data provides insights for understanding component behavior under various conditions, allowing for more accurate predictions.

This case study demonstrates how Airbus leverages advanced data science techniques in predictive maintenance to ensure higher safety standards and more efficient operations, setting an industry benchmark in the aerospace sector.

Case Study 8 – Enhancing Film Recommendations (Netflix)

Challenge:  Netflix aimed to improve customer retention and engagement by enhancing the accuracy of its recommendation system. This task involved processing and analyzing vast amounts of data to understand diverse user preferences and viewing habits.

Solution:  Netflix employed collaborative filtering techniques, analyzing user behaviors (like watching, liking, or disliking content) and similarities between content items. This data-driven approach allows Netflix to refine and personalize recommendations continuously based on real-time user interactions.

  • Increased Viewer Engagement:  Personalized recommendations led to longer viewing sessions.
  • Higher Customer Satisfaction and Retention Rates:  Tailored viewing experiences improved overall customer satisfaction, enhancing loyalty.
  • Tailoring User Experiences:  Machine learning is pivotal in personalizing media content, significantly impacting viewer engagement and satisfaction.
  • Importance of Continuous Updates:  Regularly updating recommendation algorithms is essential to maintain relevance and effectiveness in user engagement.

Case Study 9 – Traffic Flow Optimization (Google)

Challenge:  Google needed to optimize traffic flow within its Google Maps service to reduce congestion and improve routing decisions. This required real-time analysis of extensive traffic data to predict and manage traffic conditions accurately.

Solution:  Google Maps integrates data from multiple sources, including satellite imagery, sensor data, and real-time user location data. These data points are used to model traffic patterns and predict future conditions dynamically, which informs updated routing advice.

  • Reduced Traffic Congestion:  More efficient routing reduced overall traffic buildup.
  • Enhanced Accuracy of Traffic Predictions and Routing:  Improved predictions led to better user navigation experiences.
  • Integration of Multiple Data Sources:  Combining various data streams enhances the accuracy of traffic management systems.
  • Advanced Modeling Techniques:  Sophisticated models are crucial for accurately predicting traffic patterns and optimizing routes.

Case Study 10 – Risk Assessment in Insurance (Allstate)

Challenge:  Allstate sought to refine its risk assessment processes to offer more accurately priced insurance products, challenging the limitations of traditional actuarial models through more nuanced data interpretations.

Solution:  Allstate enhanced its risk assessment framework by integrating machine learning, allowing for granular risk factor analysis. This approach utilizes individual customer data such as driving records, home location specifics, and historical claim data to tailor insurance offerings more accurately.

  • More Precise Risk Assessment:  Improved risk evaluation led to more tailored insurance offerings.
  • Increased Market Competitiveness:  Enhanced pricing accuracy boosted Allstate’s competitive edge in the insurance market.
  • Nuanced Understanding of Risk:  Machine learning provides a deeper, more nuanced understanding of risk than traditional models, leading to better risk pricing.
  • Personalized Pricing Strategies:  Leveraging detailed customer data in pricing strategies enhances customer satisfaction and business performance.

Related: Can you move from Cybersecurity to Data Science?

Case Study 11 – Energy Consumption Reduction (Google DeepMind)

Challenge:  Google DeepMind aimed to significantly reduce the high energy consumption required for cooling Google’s data centers, which are crucial for maintaining server performance but also represent a major operational cost.

Solution:  DeepMind implemented advanced AI algorithms to optimize the data center cooling systems. These algorithms predict temperature fluctuations and adjust cooling processes accordingly, saving energy and reducing equipment wear and tear.

  • Reduction in Energy Consumption:  Achieved a 40% reduction in energy used for cooling.
  • Decrease in Operational Costs and Environmental Impact:  Lower energy usage resulted in cost savings and reduced environmental footprint.
  • AI-Driven Optimization:  AI can significantly decrease energy usage in large-scale infrastructure.
  • Operational Efficiency Gains:  Efficiency improvements in operational processes lead to cost savings and environmental benefits.

Case Study 12 – Improving Public Safety (New York City Police Department)

Challenge:  The NYPD needed to enhance its crime prevention strategies by better predicting where and when crimes were most likely to occur, requiring sophisticated analysis of historical crime data and environmental factors.

Solution:  The NYPD implemented a predictive policing system that utilizes data analytics to identify potential crime hotspots based on trends and patterns in past crime data. Officers are preemptively dispatched to these areas to deter criminal activities.

  • Reduction in Crime Rates:  There is a notable decrease in crime in areas targeted by predictive policing.
  • More Efficient Use of Police Resources:  Enhanced allocation of resources where needed.
  • Effectiveness of Data-Driven Crime Prevention:  Targeting resources based on data analytics can significantly reduce crime.
  • Proactive Law Enforcement:  Predictive analytics enable a shift from reactive to proactive law enforcement strategies.

Case Study 13 – Enhancing Agricultural Yields (John Deere)

Challenge:  John Deere aimed to help farmers increase agricultural productivity and sustainability by optimizing various farming operations from planting to harvesting.

Solution:  Utilizing data from sensors on equipment and satellite imagery, John Deere developed algorithms that provide actionable insights for farmers on optimal planting times, water usage, and harvest schedules.

  • Increased Crop Yields:  More efficient farming methods led to higher yields.
  • Enhanced Sustainability of Farming Practices:  Improved resource management contributed to more sustainable agriculture.
  • Precision Agriculture:  Significantly improves productivity and resource efficiency.
  • Data-Driven Decision-Making:  Enables better farming decisions through timely and accurate data.

Case Study 14 – Streamlining Drug Discovery (Pfizer)

Challenge:  Pfizer faced the need to accelerate the process of discoverying drug and improve the success rates of clinical trials.

Solution:  Pfizer employed data science to simulate and predict outcomes of drug trials using historical data and predictive models, optimizing trial parameters and improving the selection of drug candidates.

  • Accelerated Drug Development:  Reduced time to market for new drugs.
  • Increased Efficiency and Efficacy in Clinical Trials:  More targeted trials led to better outcomes.
  • Reduction in Drug Development Time and Costs:  Data science streamlines the R&D process.
  • Improved Clinical Trial Success Rates:  Predictive modeling enhances the accuracy of trial outcomes.

Case Study 15 – Media Buying Optimization (Procter & Gamble)

Challenge:  Procter & Gamble aimed to maximize the ROI of their extensive advertising budget by optimizing their media buying strategy across various channels.

Solution:  P&G analyzed extensive data on consumer behavior and media consumption to identify the most effective times and channels for advertising, allowing for highly targeted ads that reach the intended audience at optimal times.

  • Improved Effectiveness of Advertising Campaigns:  More effective ads increased campaign impact.
  • Increased Sales and Better Budget Allocation:  Enhanced ROI from more strategic media spending.
  • Enhanced Media Buying Strategies:  Data analytics significantly improves media buying effectiveness.
  • Insights into Consumer Behavior:  Understanding consumer behavior is crucial for optimizing advertising ROI.

Related: Is Data Science Certificate beneficial for your career?

Case Study 16 – Reducing Patient Readmission Rates with Predictive Analytics (Mount Sinai Health System)

Challenge:  Mount Sinai Health System sought to reduce patient readmission rates, a significant indicator of healthcare quality and a major cost factor. The challenge involved identifying patients at high risk of being readmitted within 30 days of discharge.

Solution:  The health system implemented a predictive analytics platform that analyzes real-time patient data and historical health records. The system detects patterns and risk factors contributing to high readmission rates by utilizing machine learning algorithms. Factors such as past medical history, discharge conditions, and post-discharge care plans were integrated into the predictive model.

  • Reduced Readmission Rates:  Early identification of at-risk patients allowed for targeted post-discharge interventions, significantly reducing readmission rates.
  • Enhanced Patient Outcomes: Patients received better follow-up care tailored to their health risks.
  • Predictive Analytics in Healthcare:  Effective for managing patient care post-discharge.
  • Holistic Patient Data Utilization: Integrating various data points provides a more accurate prediction and better healthcare outcomes.

Case Study 17 – Enhancing E-commerce Customer Experience with AI (Zalando)

Challenge:  Zalando aimed to enhance the online shopping experience by improving the accuracy of size recommendations, a common issue that leads to high return rates in online apparel shopping.

Solution:  Zalando developed an AI-driven size recommendation engine that analyzes past purchase and return data in combination with customer feedback and preferences. This system utilizes machine learning to predict the best-fit size for customers based on their unique body measurements and purchase history.

  • Reduced Return Rates:  More accurate size recommendations decreased the returns due to poor fit.
  • Improved Customer Satisfaction: Customers experienced a more personalized shopping journey, enhancing overall satisfaction.
  • Customization Through AI:  Personalizing customer experience can significantly impact satisfaction and business metrics.
  • Data-Driven Decision-Making: Utilizing customer data effectively can improve business outcomes by reducing costs and enhancing the user experience.

Case Study 18 – Optimizing Energy Grid Performance with Machine Learning (Enel Group)

Challenge:  Enel Group, one of the largest power companies, faced challenges in managing and optimizing the performance of its vast energy grids. The primary goal was to increase the efficiency of energy distribution and reduce operational costs while maintaining reliability in the face of fluctuating supply and demand.

Solution:  Enel Group implemented a machine learning-based system that analyzes real-time data from smart meters, weather stations, and IoT devices across the grid. This system is designed to predict peak demand times, potential outages, and equipment failures before they occur. By integrating these predictions with automated grid management tools, Enel can dynamically adjust energy flows, allocate resources more efficiently, and schedule maintenance proactively.

  • Enhanced Grid Efficiency:  Improved distribution management, reduced energy wastage, and optimized resource allocation.
  • Reduced Operational Costs: Predictive maintenance and better grid management decreased the frequency and cost of repairs and outages.
  • Predictive Maintenance in Utility Networks:  Advanced analytics can preemptively identify issues, saving costs and enhancing service reliability.
  • Real-Time Data Integration: Leveraging data from various sources in real-time enables more agile and informed decision-making in energy management.

Case Study 19 – Personalizing Movie Streaming Experience (WarnerMedia)

Challenge:  WarnerMedia sought to enhance viewer engagement and subscription retention rates on its streaming platforms by providing more personalized content recommendations.

Solution:  WarnerMedia deployed a sophisticated data science strategy, utilizing deep learning algorithms to analyze viewer behaviors, including viewing history, ratings given to shows and movies, search patterns, and demographic data. This analysis helped create highly personalized viewer profiles, which were then used to tailor content recommendations, homepage layouts, and promotional offers specifically to individual preferences.

  • Increased Viewer Engagement:  Personalized recommendations resulted in extended viewing times and increased interactions with the platform.
  • Higher Subscription Retention: Tailored user experiences improved overall satisfaction, leading to lower churn rates.
  • Deep Learning Enhances Personalization:  Deep learning algorithms allow a more nuanced knowledge of consumer preferences and behavior.
  • Data-Driven Customization is Key to User Retention: Providing a customized experience based on data analytics is critical for maintaining and growing a subscriber base in the competitive streaming market.

Case Study 20 – Improving Online Retail Sales through Customer Sentiment Analysis (Zappos)

Challenge:  Zappos, an online shoe and clothing retailer, aimed to enhance customer satisfaction and boost sales by better understanding customer sentiments and preferences across various platforms.

Solution:  Zappos implemented a comprehensive sentiment analysis program that utilized natural language processing (NLP) techniques to gather and analyze customer feedback from social media, product reviews, and customer support interactions. This data was used to identify emerging trends, customer pain points, and overall sentiment towards products and services. The insights derived from this analysis were subsequently used to customize marketing strategies, enhance product offerings, and improve customer service practices.

  • Enhanced Product Selection and Marketing:  Insight-driven adjustments to inventory and marketing strategies increased relevancy and customer satisfaction.
  • Improved Customer Experience: By addressing customer concerns and preferences identified through sentiment analysis, Zappos enhanced its overall customer service, increasing loyalty and repeat business.
  • Power of Sentiment Analysis in Retail:  Understanding and reacting to customer emotions and opinions can significantly impact sales and customer satisfaction.
  • Strategic Use of Customer Feedback: Leveraging customer feedback to drive business decisions helps align product offerings and services with customer expectations, fostering a positive brand image.

Related: Data Science Industry in the US

Case Study 21 – Streamlining Airline Operations with Predictive Analytics (Delta Airlines)

Challenge:  Delta Airlines faced operational challenges, including flight delays, maintenance scheduling inefficiencies, and customer service issues, which impacted passenger satisfaction and operational costs.

Solution:  Delta implemented a predictive analytics system that integrates data from flight operations, weather reports, aircraft sensor data, and historical maintenance records. The system predicts potential delays using machine learning models and suggests optimal maintenance scheduling. Additionally, it forecasts passenger load to optimize staffing and resource allocation at airports.

  • Reduced Flight Delays:  Predictive insights allowed for better planning and reduced unexpected delays.
  • Enhanced Maintenance Efficiency:  Maintenance could be scheduled proactively, decreasing the time planes spend out of service.
  • Improved Passenger Experience: With better resource management, passenger handling became more efficient, enhancing overall customer satisfaction.
  • Operational Efficiency Through Predictive Analytics:  Leveraging data for predictive purposes significantly improves operational decision-making.
  • Data Integration Across Departments: Coordinating data from different sources provides a holistic view crucial for effective airline management.

Case Study 22 – Enhancing Financial Advisory Services with AI (Morgan Stanley)

Challenge:  Morgan Stanley sought to offer clients more personalized and effective financial guidance. The challenge was seamlessly integrating vast financial data with individual client profiles to deliver tailored investment recommendations.

Solution:  Morgan Stanley developed an AI-powered platform that utilizes natural language processing and ML to analyze financial markets, client portfolios, and historical investment performance. The system identifies patterns and predicts market trends while considering each client’s financial goals, risk tolerance, and investment history. This integrated approach enables financial advisors to offer highly customized advice and proactive investment strategies.

  • Improved Client Satisfaction:  Clients received more relevant and timely investment recommendations, enhancing their overall satisfaction and trust in the advisory services.
  • Increased Efficiency: Advisors were able to manage client portfolios more effectively, using AI-driven insights to make faster and more informed decisions.
  • Personalization through AI:  Advanced analytics and AI can significantly enhance the personalization of financial services, leading to better client engagement.
  • Data-Driven Decision Making: Leveraging diverse data sets provides a comprehensive understanding crucial for tailored financial advising.

Case Study 23 – Optimizing Inventory Management in Retail (Walmart)

Challenge:  Walmart sought to improve inventory management across its vast network of stores and warehouses to reduce overstock and stockouts, which affect customer satisfaction and operational efficiency.

Solution:  Walmart implemented a robust data analytics system that integrates real-time sales data, supply chain information, and predictive analytics. This system uses machine learning algorithms to forecast demand for thousands of products at a granular level, considering factors such as seasonality, local events, and economic trends. The predictive insights allow Walmart to dynamically adjust inventory levels, optimize restocking schedules, and manage distribution logistics more effectively.

  • Reduced Inventory Costs:  More accurate demand forecasts helped minimize overstock and reduce waste.
  • Enhanced Customer Satisfaction: Improved stock availability led to better in-store experiences and higher customer satisfaction.
  • Precision in Demand Forecasting:  Advanced data analytics and machine learning significantly enhance demand forecasting accuracy in retail.
  • Integrated Data Systems:  Combining various data sources provides a comprehensive view of inventory needs, improving overall supply chain efficiency.

Case Study 24: Enhancing Network Security with Predictive Analytics (Cisco)

Challenge:  Cisco encountered difficulties protecting its extensive network infrastructure from increasingly complex cyber threats. The objective was to bolster their security protocols by anticipating potential breaches before they happen.

Solution:  Cisco developed a predictive analytics solution that leverages ML algorithms to analyze patterns in network traffic and identify anomalies that could suggest a security threat. By integrating this system with their existing security protocols, Cisco can dynamically adjust defenses and alert system administrators about potential vulnerabilities in real-time.

  • Improved Security Posture:  The predictive system enabled proactive responses to potential threats, significantly reducing the incidence of successful cyber attacks.
  • Enhanced Operational Efficiency: Automating threat detection and response processes allowed Cisco to manage network security more efficiently, with fewer resources dedicated to manual monitoring.
  • Proactive Security Measures:  Employing predictive cybersecurity analytics helps organizations avoid potential threats.
  • Integration of Machine Learning: Machine learning is crucial for effectively detecting patterns and anomalies that human analysts might overlook, leading to stronger security measures.

Case Study 25 – Improving Agricultural Efficiency with IoT and AI (Bayer Crop Science)

Challenge:  Bayer Crop Science aimed to enhance agricultural efficiency and crop yields for farmers worldwide, facing the challenge of varying climatic conditions and soil types that affect crop growth differently.

Solution:  Bayer deployed an integrated platform that merges IoT sensors, satellite imagery, and AI-driven analytics. This platform gathers real-time weather conditions, soil quality, and crop health data. Utilizing machine learning models, the system processes this data to deliver precise agricultural recommendations to farmers, including optimal planting times, watering schedules, and pest management strategies.

  • Increased Crop Yields:  Tailored agricultural practices led to higher productivity per hectare.
  • Reduced Resource Waste: Efficient water use, fertilizers, and pesticides minimized environmental impact and operational costs.
  • Precision Agriculture:  Leveraging IoT and AI enables more precise and data-driven agricultural practices, enhancing yield and efficiency.
  • Sustainability in Farming:  Advanced data analytics enhance the sustainability of farming by optimizing resource utilization and minimizing waste.

Related: Is Data Science Overhyped?

The power of data science in transforming industries is undeniable, as demonstrated by these 25 compelling case studies. Through the strategic application of machine learning, predictive analytics, and AI, companies are solving complex challenges and gaining a competitive edge. The insights gleaned from these cases highlight the critical role of data science in enhancing decision-making processes, improving operational efficiency, and elevating customer satisfaction. As we look to the future, the role of data science is set to grow, promising even more innovative solutions and smarter strategies across all sectors. These case studies inspire and serve as a roadmap for harnessing the transformative power of data science in the journey toward digital transformation.

  • What is Narrow AI [Pros & Cons] [Deep Analysis] [2024]
  • Use of AI in Medicine: 5 Transformative Case Studies [2024]

Team DigitalDefynd

We help you find the best courses, certifications, and tutorials online. Hundreds of experts come together to handpick these recommendations based on decades of collective experience. So far we have served 4 Million+ satisfied learners and counting.

case study for data analysis

Career in Cybersecurity vs Data Science: Which Is Better? [2024]

case study for data analysis

10 Alternative Career Paths for Data Science Professionals [2024]

case study for data analysis

Impact of Data Science on Future of Education and Learning[2024]

case study for data analysis

12 Reasons Why You Must Learn Data Analytics [2024]

case study for data analysis

Is Data Science Over-Hyped? [2024]

case study for data analysis

Is Data Engineering a Good & Safe Career Option? [2024]

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Qualitative case study data analysis: an example from practice

Affiliation.

  • 1 School of Nursing and Midwifery, National University of Ireland, Galway, Republic of Ireland.
  • PMID: 25976531
  • DOI: 10.7748/nr.22.5.8.e1307

Aim: To illustrate an approach to data analysis in qualitative case study methodology.

Background: There is often little detail in case study research about how data were analysed. However, it is important that comprehensive analysis procedures are used because there are often large sets of data from multiple sources of evidence. Furthermore, the ability to describe in detail how the analysis was conducted ensures rigour in reporting qualitative research.

Data sources: The research example used is a multiple case study that explored the role of the clinical skills laboratory in preparing students for the real world of practice. Data analysis was conducted using a framework guided by the four stages of analysis outlined by Morse ( 1994 ): comprehending, synthesising, theorising and recontextualising. The specific strategies for analysis in these stages centred on the work of Miles and Huberman ( 1994 ), which has been successfully used in case study research. The data were managed using NVivo software.

Review methods: Literature examining qualitative data analysis was reviewed and strategies illustrated by the case study example provided. Discussion Each stage of the analysis framework is described with illustration from the research example for the purpose of highlighting the benefits of a systematic approach to handling large data sets from multiple sources.

Conclusion: By providing an example of how each stage of the analysis was conducted, it is hoped that researchers will be able to consider the benefits of such an approach to their own case study analysis.

Implications for research/practice: This paper illustrates specific strategies that can be employed when conducting data analysis in case study research and other qualitative research designs.

Keywords: Case study data analysis; case study research methodology; clinical skills research; qualitative case study methodology; qualitative data analysis; qualitative research.

PubMed Disclaimer

Similar articles

  • Using Framework Analysis in nursing research: a worked example. Ward DJ, Furber C, Tierney S, Swallow V. Ward DJ, et al. J Adv Nurs. 2013 Nov;69(11):2423-31. doi: 10.1111/jan.12127. Epub 2013 Mar 21. J Adv Nurs. 2013. PMID: 23517523
  • Rigour in qualitative case-study research. Houghton C, Casey D, Shaw D, Murphy K. Houghton C, et al. Nurse Res. 2013 Mar;20(4):12-7. doi: 10.7748/nr2013.03.20.4.12.e326. Nurse Res. 2013. PMID: 23520707
  • Selection, collection and analysis as sources of evidence in case study research. Houghton C, Casey D, Smyth S. Houghton C, et al. Nurse Res. 2017 Mar 22;24(4):36-41. doi: 10.7748/nr.2017.e1482. Nurse Res. 2017. PMID: 28326917
  • Qualitative case study methodology in nursing research: an integrative review. Anthony S, Jack S. Anthony S, et al. J Adv Nurs. 2009 Jun;65(6):1171-81. doi: 10.1111/j.1365-2648.2009.04998.x. Epub 2009 Apr 3. J Adv Nurs. 2009. PMID: 19374670 Review.
  • Avoiding and identifying errors in health technology assessment models: qualitative study and methodological review. Chilcott J, Tappenden P, Rawdin A, Johnson M, Kaltenthaler E, Paisley S, Papaioannou D, Shippam A. Chilcott J, et al. Health Technol Assess. 2010 May;14(25):iii-iv, ix-xii, 1-107. doi: 10.3310/hta14250. Health Technol Assess. 2010. PMID: 20501062 Review.
  • Genital Cosmetic Surgery in Women of Different Generations: A Qualitative Study. Yıldırım Bayraktar BN, Ada G, Hamlacı Başkaya Y, Ilçioğlu K. Yıldırım Bayraktar BN, et al. Aesthetic Plast Surg. 2024 Aug 15. doi: 10.1007/s00266-024-04290-w. Online ahead of print. Aesthetic Plast Surg. 2024. PMID: 39145811
  • The lived experiences of fatigue among patients receiving haemodialysis in Oman: a qualitative exploration. Al-Naamani Z, Gormley K, Noble H, Santin O, Al Omari O, Al-Noumani H, Madkhali N. Al-Naamani Z, et al. BMC Nephrol. 2024 Jul 29;25(1):239. doi: 10.1186/s12882-024-03647-2. BMC Nephrol. 2024. PMID: 39075347 Free PMC article.
  • How a National Organization Works in Partnership With People Who Have Lived Experience in Mental Health Improvement Programs: Protocol for an Exploratory Case Study. Robertson C, Hibberd C, Shepherd A, Johnston G. Robertson C, et al. JMIR Res Protoc. 2024 Apr 19;13:e51779. doi: 10.2196/51779. JMIR Res Protoc. 2024. PMID: 38640479 Free PMC article.
  • Implementation of an office-based addiction treatment model for Medicaid enrollees: A mixed methods study. Treitler P, Enich M, Bowden C, Mahone A, Lloyd J, Crystal S. Treitler P, et al. J Subst Use Addict Treat. 2024 Jan;156:209212. doi: 10.1016/j.josat.2023.209212. Epub 2023 Nov 5. J Subst Use Addict Treat. 2024. PMID: 37935350
  • Using the quadruple aim to understand the impact of virtual delivery of care within Ontario community health centres: a qualitative study. Bhatti S, Dahrouge S, Muldoon L, Rayner J. Bhatti S, et al. BJGP Open. 2022 Dec 20;6(4):BJGPO.2022.0031. doi: 10.3399/BJGPO.2022.0031. Print 2022 Dec. BJGP Open. 2022. PMID: 36109022 Free PMC article.
  • Search in MeSH
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

case study for data analysis

Data Analytics Case Study Guide 2024

by Sam McKay, CFA | Data Analytics

case study for data analysis

Data analytics case studies reveal how businesses harness data for informed decisions and growth.

For aspiring data professionals, mastering the case study process will enhance your skills and increase your career prospects.

So, how do you approach a case study?

Sales Now On Advertisement

Use these steps to process a data analytics case study:

Understand the Problem: Grasp the core problem or question addressed in the case study.

Collect Relevant Data: Gather data from diverse sources, ensuring accuracy and completeness.

Apply Analytical Techniques: Use appropriate methods aligned with the problem statement.

Visualize Insights: Utilize visual aids to showcase patterns and key findings.

Derive Actionable Insights: Focus on deriving meaningful actions from the analysis.

This article will give you detailed steps to navigate a case study effectively and understand how it works in real-world situations.

By the end of the article, you will be better equipped to approach a data analytics case study, strengthening your analytical prowess and practical application skills.

Let’s dive in!

Data Analytics Case Study Guide

Table of Contents

What is a Data Analytics Case Study?

A data analytics case study is a real or hypothetical scenario where analytics techniques are applied to solve a specific problem or explore a particular question.

It’s a practical approach that uses data analytics methods, assisting in deciphering data for meaningful insights. This structured method helps individuals or organizations make sense of data effectively.

Additionally, it’s a way to learn by doing, where there’s no single right or wrong answer in how you analyze the data.

So, what are the components of a case study?

Key Components of a Data Analytics Case Study

Key Components of a Data Analytics Case Study

A data analytics case study comprises essential elements that structure the analytical journey:

Problem Context: A case study begins with a defined problem or question. It provides the context for the data analysis , setting the stage for exploration and investigation.

Data Collection and Sources: It involves gathering relevant data from various sources , ensuring data accuracy, completeness, and relevance to the problem at hand.

Analysis Techniques: Case studies employ different analytical methods, such as statistical analysis, machine learning algorithms, or visualization tools, to derive meaningful conclusions from the collected data.

Insights and Recommendations: The ultimate goal is to extract actionable insights from the analyzed data, offering recommendations or solutions that address the initial problem or question.

Now that you have a better understanding of what a data analytics case study is, let’s talk about why we need and use them.

Why Case Studies are Integral to Data Analytics

Why Case Studies are Integral to Data Analytics

Case studies serve as invaluable tools in the realm of data analytics, offering multifaceted benefits that bolster an analyst’s proficiency and impact:

Real-Life Insights and Skill Enhancement: Examining case studies provides practical, real-life examples that expand knowledge and refine skills. These examples offer insights into diverse scenarios, aiding in a data analyst’s growth and expertise development.

Validation and Refinement of Analyses: Case studies demonstrate the effectiveness of data-driven decisions across industries, providing validation for analytical approaches. They showcase how organizations benefit from data analytics. Also, this helps in refining one’s own methodologies

Showcasing Data Impact on Business Outcomes: These studies show how data analytics directly affects business results, like increasing revenue, reducing costs, or delivering other measurable advantages. Understanding these impacts helps articulate the value of data analytics to stakeholders and decision-makers.

Learning from Successes and Failures: By exploring a case study, analysts glean insights from others’ successes and failures, acquiring new strategies and best practices. This learning experience facilitates professional growth and the adoption of innovative approaches within their own data analytics work.

Including case studies in a data analyst’s toolkit helps gain more knowledge, improve skills, and understand how data analytics affects different industries.

Using these real-life examples boosts confidence and success, guiding analysts to make better and more impactful decisions in their organizations.

But not all case studies are the same.

Let’s talk about the different types.

Types of Data Analytics Case Studies

 Types of Data Analytics Case Studies

Data analytics encompasses various approaches tailored to different analytical goals:

Exploratory Case Study: These involve delving into new datasets to uncover hidden patterns and relationships, often without a predefined hypothesis. They aim to gain insights and generate hypotheses for further investigation.

Predictive Case Study: These utilize historical data to forecast future trends, behaviors, or outcomes. By applying predictive models, they help anticipate potential scenarios or developments.

Diagnostic Case Study: This type focuses on understanding the root causes or reasons behind specific events or trends observed in the data. It digs deep into the data to provide explanations for occurrences.

Prescriptive Case Study: This case study goes beyond analytics; it provides actionable recommendations or strategies derived from the analyzed data. They guide decision-making processes by suggesting optimal courses of action based on insights gained.

Each type has a specific role in using data to find important insights, helping in decision-making, and solving problems in various situations.

Regardless of the type of case study you encounter, here are some steps to help you process them.

Roadmap to Handling a Data Analysis Case Study

Roadmap to Handling a Data Analysis Case Study

Embarking on a data analytics case study requires a systematic approach, step-by-step, to derive valuable insights effectively.

Here are the steps to help you through the process:

Step 1: Understanding the Case Study Context: Immerse yourself in the intricacies of the case study. Delve into the industry context, understanding its nuances, challenges, and opportunities.

Data Mentor Advertisement

Identify the central problem or question the study aims to address. Clarify the objectives and expected outcomes, ensuring a clear understanding before diving into data analytics.

Step 2: Data Collection and Validation: Gather data from diverse sources relevant to the case study. Prioritize accuracy, completeness, and reliability during data collection. Conduct thorough validation processes to rectify inconsistencies, ensuring high-quality and trustworthy data for subsequent analysis.

Data Collection and Validation in case study

Step 3: Problem Definition and Scope: Define the problem statement precisely. Articulate the objectives and limitations that shape the scope of your analysis. Identify influential variables and constraints, providing a focused framework to guide your exploration.

Step 4: Exploratory Data Analysis (EDA): Leverage exploratory techniques to gain initial insights. Visualize data distributions, patterns, and correlations, fostering a deeper understanding of the dataset. These explorations serve as a foundation for more nuanced analysis.

Step 5: Data Preprocessing and Transformation: Cleanse and preprocess the data to eliminate noise, handle missing values, and ensure consistency. Transform data formats or scales as required, preparing the dataset for further analysis.

Data Preprocessing and Transformation in case study

Step 6: Data Modeling and Method Selection: Select analytical models aligning with the case study’s problem, employing statistical techniques, machine learning algorithms, or tailored predictive models.

In this phase, it’s important to develop data modeling skills. This helps create visuals of complex systems using organized data, which helps solve business problems more effectively.

Understand key data modeling concepts, utilize essential tools like SQL for database interaction, and practice building models from real-world scenarios.

Furthermore, strengthen data cleaning skills for accurate datasets, and stay updated with industry trends to ensure relevance.

Data Modeling and Method Selection in case study

Step 7: Model Evaluation and Refinement: Evaluate the performance of applied models rigorously. Iterate and refine models to enhance accuracy and reliability, ensuring alignment with the objectives and expected outcomes.

Step 8: Deriving Insights and Recommendations: Extract actionable insights from the analyzed data. Develop well-structured recommendations or solutions based on the insights uncovered, addressing the core problem or question effectively.

Step 9: Communicating Results Effectively: Present findings, insights, and recommendations clearly and concisely. Utilize visualizations and storytelling techniques to convey complex information compellingly, ensuring comprehension by stakeholders.

Communicating Results Effectively

Step 10: Reflection and Iteration: Reflect on the entire analysis process and outcomes. Identify potential improvements and lessons learned. Embrace an iterative approach, refining methodologies for continuous enhancement and future analyses.

This step-by-step roadmap provides a structured framework for thorough and effective handling of a data analytics case study.

Now, after handling data analytics comes a crucial step; presenting the case study.

Presenting Your Data Analytics Case Study

Presenting Your Data Analytics Case Study

Presenting a data analytics case study is a vital part of the process. When presenting your case study, clarity and organization are paramount.

To achieve this, follow these key steps:

Structuring Your Case Study: Start by outlining relevant and accurate main points. Ensure these points align with the problem addressed and the methodologies used in your analysis.

Crafting a Narrative with Data: Start with a brief overview of the issue, then explain your method and steps, covering data collection, cleaning, stats, and advanced modeling.

Visual Representation for Clarity: Utilize various visual aids—tables, graphs, and charts—to illustrate patterns, trends, and insights. Ensure these visuals are easy to comprehend and seamlessly support your narrative.

Visual Representation for Clarity

Highlighting Key Information: Use bullet points to emphasize essential information, maintaining clarity and allowing the audience to grasp key takeaways effortlessly. Bold key terms or phrases to draw attention and reinforce important points.

Addressing Audience Queries: Anticipate and be ready to answer audience questions regarding methods, assumptions, and results. Demonstrating a profound understanding of your analysis instills confidence in your work.

Integrity and Confidence in Delivery: Maintain a neutral tone and avoid exaggerated claims about findings. Present your case study with integrity, clarity, and confidence to ensure the audience appreciates and comprehends the significance of your work.

Integrity and Confidence in Delivery

By organizing your presentation well, telling a clear story through your analysis, and using visuals wisely, you can effectively share your data analytics case study.

This method helps people understand better, stay engaged, and draw valuable conclusions from your work.

We hope by now, you are feeling very confident processing a case study. But with any process, there are challenges you may encounter.

EDNA AI Advertisement

Key Challenges in Data Analytics Case Studies

Key Challenges in Data Analytics Case Studies

A data analytics case study can present various hurdles that necessitate strategic approaches for successful navigation:

Challenge 1: Data Quality and Consistency

Challenge: Inconsistent or poor-quality data can impede analysis, leading to erroneous insights and flawed conclusions.

Solution: Implement rigorous data validation processes, ensuring accuracy, completeness, and reliability. Employ data cleansing techniques to rectify inconsistencies and enhance overall data quality.

Challenge 2: Complexity and Scale of Data

Challenge: Managing vast volumes of data with diverse formats and complexities poses analytical challenges.

Solution: Utilize scalable data processing frameworks and tools capable of handling diverse data types. Implement efficient data storage and retrieval systems to manage large-scale datasets effectively.

Challenge 3: Interpretation and Contextual Understanding

Challenge: Interpreting data without contextual understanding or domain expertise can lead to misinterpretations.

Solution: Collaborate with domain experts to contextualize data and derive relevant insights. Invest in understanding the nuances of the industry or domain under analysis to ensure accurate interpretations.

Interpretation and Contextual Understanding

Challenge 4: Privacy and Ethical Concerns

Challenge: Balancing data access for analysis while respecting privacy and ethical boundaries poses a challenge.

Solution: Implement robust data governance frameworks that prioritize data privacy and ethical considerations. Ensure compliance with regulatory standards and ethical guidelines throughout the analysis process.

Challenge 5: Resource Limitations and Time Constraints

Challenge: Limited resources and time constraints hinder comprehensive analysis and exhaustive data exploration.

Solution: Prioritize key objectives and allocate resources efficiently. Employ agile methodologies to iteratively analyze and derive insights, focusing on the most impactful aspects within the given timeframe.

Recognizing these challenges is key; it helps data analysts adopt proactive strategies to mitigate obstacles. This enhances the effectiveness and reliability of insights derived from a data analytics case study.

Now, let’s talk about the best software tools you should use when working with case studies.

Top 5 Software Tools for Case Studies

Top Software Tools for Case Studies

In the realm of case studies within data analytics, leveraging the right software tools is essential.

Here are some top-notch options:

Tableau : Renowned for its data visualization prowess, Tableau transforms raw data into interactive, visually compelling representations, ideal for presenting insights within a case study.

Python and R Libraries: These flexible programming languages provide many tools for handling data, doing statistics, and working with machine learning, meeting various needs in case studies.

Microsoft Excel : A staple tool for data analytics, Excel provides a user-friendly interface for basic analytics, making it useful for initial data exploration in a case study.

SQL Databases : Structured Query Language (SQL) databases assist in managing and querying large datasets, essential for organizing case study data effectively.

Statistical Software (e.g., SPSS , SAS ): Specialized statistical software enables in-depth statistical analysis, aiding in deriving precise insights from case study data.

Choosing the best mix of these tools, tailored to each case study’s needs, greatly boosts analytical abilities and results in data analytics.

Final Thoughts

Case studies in data analytics are helpful guides. They give real-world insights, improve skills, and show how data-driven decisions work.

Using case studies helps analysts learn, be creative, and make essential decisions confidently in their data work.

Check out our latest clip below to further your learning!

Frequently Asked Questions

What are the key steps to analyzing a data analytics case study.

When analyzing a case study, you should follow these steps:

Clarify the problem : Ensure you thoroughly understand the problem statement and the scope of the analysis.

Make assumptions : Define your assumptions to establish a feasible framework for analyzing the case.

Gather context : Acquire relevant information and context to support your analysis.

Analyze the data : Perform calculations, create visualizations, and conduct statistical analysis on the data.

Provide insights : Draw conclusions and develop actionable insights based on your analysis.

How can you effectively interpret results during a data scientist case study job interview?

During your next data science interview, interpret case study results succinctly and clearly. Utilize visual aids and numerical data to bolster your explanations, ensuring comprehension.

Frame the results in an audience-friendly manner, emphasizing relevance. Concentrate on deriving insights and actionable steps from the outcomes.

How do you showcase your data analyst skills in a project?

To demonstrate your skills effectively, consider these essential steps. Begin by selecting a problem that allows you to exhibit your capacity to handle real-world challenges through analysis.

Methodically document each phase, encompassing data cleaning, visualization, statistical analysis, and the interpretation of findings.

Utilize descriptive analysis techniques and effectively communicate your insights using clear visual aids and straightforward language. Ensure your project code is well-structured, with detailed comments and documentation, showcasing your proficiency in handling data in an organized manner.

Lastly, emphasize your expertise in SQL queries, programming languages, and various analytics tools throughout the project. These steps collectively highlight your competence and proficiency as a skilled data analyst, demonstrating your capabilities within the project.

Can you provide an example of a successful data analytics project using key metrics?

A prime illustration is utilizing analytics in healthcare to forecast hospital readmissions. Analysts leverage electronic health records, patient demographics, and clinical data to identify high-risk individuals.

Implementing preventive measures based on these key metrics helps curtail readmission rates, enhancing patient outcomes and cutting healthcare expenses.

This demonstrates how data analytics, driven by metrics, effectively tackles real-world challenges, yielding impactful solutions.

Why would a company invest in data analytics?

Companies invest in data analytics to gain valuable insights, enabling informed decision-making and strategic planning. This investment helps optimize operations, understand customer behavior, and stay competitive in their industry.

Ultimately, leveraging data analytics empowers companies to make smarter, data-driven choices, leading to enhanced efficiency, innovation, and growth.

Related Posts

How To Choose the Right Tool for the Task – Power BI, Python, R or SQL?

How To Choose the Right Tool for the Task – Power BI, Python, R or SQL?

Data Analytics

A step-by-step guide to understanding when and why to use Power BI, Python, R, and SQL for business analysis.

Choosing the Right Visual for Your Data

Data Analytics , Data Visualization

Explore the crucial role of appropriate visual selection for various types of data including categorical, numerical, temporal, and spatial data.

4 Types of Data Analytics: Explained

4 Types of Data Analytics: Explained

In a world full of data, data analytics is the heart and soul of an operation. It's what transforms raw...

Data Analytics Outsourcing: Pros and Cons Explained

Data Analytics Outsourcing: Pros and Cons Explained

In today's data-driven world, businesses are constantly swimming in a sea of information, seeking the...

Ultimate Guide to Mastering Color in Data Visualization

Ultimate Guide to Mastering Color in Data Visualization

Color plays a vital role in the success of data visualization. When used effectively, it can help guide...

Beginner’s Guide to Choosing the Right Data Visualization

As a beginner in data visualization, you’ll need to learn the various chart types to effectively...

Simple To Use Best Practises For Data Visualization

So you’ve got a bunch of data and you want to make it look pretty. Or maybe you’ve heard about this...

Exploring The Benefits Of Geospatial Data Visualization Techniques

Data visualization has come a long way from simple bar charts and line graphs. As the volume and...

What Does a Data Analyst Do on a Daily Basis?

What Does a Data Analyst Do on a Daily Basis?

In the digital age, data plays a significant role in helping organizations make informed decisions and...

case study for data analysis

case study for data analysis

Data Analysis Case Study: Learn From Humana’s Automated Data Analysis Project

free data analysis case study

Lillian Pierson, P.E.

Playback speed:

Got data? Great! Looking for that perfect data analysis case study to help you get started using it? You’re in the right place.

If you’ve ever struggled to decide what to do next with your data projects, to actually find meaning in the data, or even to decide what kind of data to collect, then KEEP READING…

Deep down, you know what needs to happen. You need to initiate and execute a data strategy that really moves the needle for your organization. One that produces seriously awesome business results.

But how you’re in the right place to find out..

As a data strategist who has worked with 10 percent of Fortune 100 companies, today I’m sharing with you a case study that demonstrates just how real businesses are making real wins with data analysis. 

In the post below, we’ll look at:

  • A shining data success story;
  • What went on ‘under-the-hood’ to support that successful data project; and
  • The exact data technologies used by the vendor, to take this project from pure strategy to pure success

If you prefer to watch this information rather than read it, it’s captured in the video below:

Here’s the url too: https://youtu.be/xMwZObIqvLQ

3 Action Items You Need To Take

To actually use the data analysis case study you’re about to get – you need to take 3 main steps. Those are:

  • Reflect upon your organization as it is today (I left you some prompts below – to help you get started)
  • Review winning data case collections (starting with the one I’m sharing here) and identify 5 that seem the most promising for your organization given it’s current set-up
  • Assess your organization AND those 5 winning case collections. Based on that assessment, select the “QUICK WIN” data use case that offers your organization the most bang for it’s buck

Step 1: Reflect Upon Your Organization

Whenever you evaluate data case collections to decide if they’re a good fit for your organization, the first thing you need to do is organize your thoughts with respect to your organization as it is today.

Before moving into the data analysis case study, STOP and ANSWER THE FOLLOWING QUESTIONS – just to remind yourself:

  • What is the business vision for our organization?
  • What industries do we primarily support?
  • What data technologies do we already have up and running, that we could use to generate even more value?
  • What team members do we have to support a new data project? And what are their data skillsets like?
  • What type of data are we mostly looking to generate value from? Structured? Semi-Structured? Un-structured? Real-time data? Huge data sets? What are our data resources like?

Jot down some notes while you’re here. Then keep them in mind as you read on to find out how one company, Humana, used its data to achieve a 28 percent increase in customer satisfaction. Also include its 63 percent increase in employee engagement! (That’s such a seriously impressive outcome, right?!)

Step 2: Review Data Case Studies

Here we are, already at step 2. It’s time for you to start reviewing data analysis case studies  (starting with the one I’m sharing below). I dentify 5 that seem the most promising for your organization given its current set-up.

Humana’s Automated Data Analysis Case Study

The key thing to note here is that the approach to creating a successful data program varies from industry to industry .

Let’s start with one to demonstrate the kind of value you can glean from these kinds of success stories.

Humana has provided health insurance to Americans for over 50 years. It is a service company focused on fulfilling the needs of its customers. A great deal of Humana’s success as a company rides on customer satisfaction, and the frontline of that battle for customers’ hearts and minds is Humana’s customer service center.

Call centers are hard to get right. A lot of emotions can arise during a customer service call, especially one relating to health and health insurance. Sometimes people are frustrated. At times, they’re upset. Also, there are times the customer service representative becomes aggravated, and the overall tone and progression of the phone call goes downhill. This is of course very bad for customer satisfaction.

Humana wanted to use artificial intelligence to improve customer satisfaction (and thus, customer retention rates & profits per customer).

Humana wanted to find a way to use artificial intelligence to monitor their phone calls and help their agents do a better job connecting with their customers in order to improve customer satisfaction (and thus, customer retention rates & profits per customer ).

In light of their business need, Humana worked with a company called Cogito, which specializes in voice analytics technology.

Cogito offers a piece of AI technology called Cogito Dialogue. It’s been trained to identify certain conversational cues as a way of helping call center representatives and supervisors stay actively engaged in a call with a customer.

The AI listens to cues like the customer’s voice pitch.

If it’s rising, or if the call representative and the customer talk over each other, then the dialogue tool will send out electronic alerts to the agent during the call.

Humana fed the dialogue tool customer service data from 10,000 calls and allowed it to analyze cues such as keywords, interruptions, and pauses, and these cues were then linked with specific outcomes. For example, if the representative is receiving a particular type of cues, they are likely to get a specific customer satisfaction result.

The Outcome

Customers were happier, and customer service representatives were more engaged..

This automated solution for data analysis has now been deployed in 200 Humana call centers and the company plans to roll it out to 100 percent of its centers in the future.

The initiative was so successful, Humana has been able to focus on next steps in its data program. The company now plans to begin predicting the type of calls that are likely to go unresolved, so they can send those calls over to management before they become frustrating to the customer and customer service representative alike.

What does this mean for you and your business?

Well, if you’re looking for new ways to generate value by improving the quantity and quality of the decision support that you’re providing to your customer service personnel, then this may be a perfect example of how you can do so.

Humana’s Business Use Cases

Humana’s data analysis case study includes two key business use cases:

  • Analyzing customer sentiment; and
  • Suggesting actions to customer service representatives.

Analyzing Customer Sentiment

First things first, before you go ahead and collect data, you need to ask yourself who and what is involved in making things happen within the business.

In the case of Humana, the actors were:

  • The health insurance system itself
  • The customer, and
  • The customer service representative

As you can see in the use case diagram above, the relational aspect is pretty simple. You have a customer service representative and a customer. They are both producing audio data, and that audio data is being fed into the system.

Humana focused on collecting the key data points, shown in the image below, from their customer service operations.

By collecting data about speech style, pitch, silence, stress in customers’ voices, length of call, speed of customers’ speech, intonation, articulation, silence, and representatives’  manner of speaking, Humana was able to analyze customer sentiment and introduce techniques for improved customer satisfaction.

Having strategically defined these data points, the Cogito technology was able to generate reports about customer sentiment during the calls.

Suggesting actions to customer service representatives.

The second use case for the Humana data program follows on from the data gathered in the first case.

In Humana’s case, Cogito generated a host of call analyses and reports about key call issues.

In the second business use case, Cogito was able to suggest actions to customer service representatives, in real-time , to make use of incoming data and help improve customer satisfaction on the spot.

The technology Humana used provided suggestions via text message to the customer service representative, offering the following types of feedback:

  • The tone of voice is too tense
  • The speed of speaking is high
  • The customer representative and customer are speaking at the same time

These alerts allowed the Humana customer service representatives to alter their approach immediately , improving the quality of the interaction and, subsequently, the customer satisfaction.

The preconditions for success in this use case were:

  • The call-related data must be collected and stored
  • The AI models must be in place to generate analysis on the data points that are recorded during the calls

Evidence of success can subsequently be found in a system that offers real-time suggestions for courses of action that the customer service representative can take to improve customer satisfaction.

Thanks to this data-intensive business use case, Humana was able to increase customer satisfaction, improve customer retention rates, and drive profits per customer.

The Technology That Supports This Data Analysis Case Study

I promised to dip into the tech side of things. This is especially for those of you who are interested in the ins and outs of how projects like this one are actually rolled out.

Here’s a little rundown of the main technologies we discovered when we investigated how Cogito runs in support of its clients like Humana.

  • For cloud data management Cogito uses AWS, specifically the Athena product
  • For on-premise big data management, the company used Apache HDFS – the distributed file system for storing big data
  • They utilize MapReduce, for processing their data
  • And Cogito also has traditional systems and relational database management systems such as PostgreSQL
  • In terms of analytics and data visualization tools, Cogito makes use of Tableau
  • And for its machine learning technology, these use cases required people with knowledge in Python, R, and SQL, as well as deep learning (Cogito uses the PyTorch library and the TensorFlow library)

These data science skill sets support the effective computing, deep learning , and natural language processing applications employed by Humana for this use case.

If you’re looking to hire people to help with your own data initiative, then people with those skills listed above, and with experience in these specific technologies, would be a huge help.

Step 3: S elect The “Quick Win” Data Use Case

Still there? Great!

It’s time to close the loop.

Remember those notes you took before you reviewed the study? I want you to STOP here and assess. Does this Humana case study seem applicable and promising as a solution, given your organization’s current set-up…

YES ▶ Excellent!

Earmark it and continue exploring other winning data use cases until you’ve identified 5 that seem like great fits for your businesses needs. Evaluate those against your organization’s needs, and select the very best fit to be your “quick win” data use case. Develop your data strategy around that.

NO , Lillian – It’s not applicable. ▶  No problem.

Discard the information and continue exploring the winning data use cases we’ve categorized for you according to business function and industry. Save time by dialing down into the business function you know your business really needs help with now. Identify 5 winning data use cases that seem like great fits for your businesses needs. Evaluate those against your organization’s needs, and select the very best fit to be your “quick win” data use case. Develop your data strategy around that data use case.

More resources to get ahead...

Get income-generating ideas for data professionals, are you tired of relying on one employer for your income are you dreaming of a side hustle that won’t put you at risk of getting fired or sued well, my friend, you’re in luck..

ideas for data analyst side jobs

This 48-page listing is here to rescue you from the drudgery of corporate slavery and set you on the path to start earning more money from your existing data expertise. Spend just 1 hour with this pdf and I can guarantee you’ll be bursting at the seams with practical, proven & profitable ideas for new income-streams you can create from your existing expertise. Learn more here!

case study for data analysis

We love helping tech brands gain exposure and brand awareness among our active audience of 530,000 data professionals. If you’d like to explore our alternatives for brand partnerships and content collaborations, you can reach out directly on this page and book a time to speak.

case study for data analysis

DOES YOUR GROWTH STRATEGY PASS THE AI-READINESS TEST?

I've put these processes to work for Fortune 100 companies, and now I'm handing them to you...

case study for data analysis

  • Marketing Optimization Toolkit
  • CMO Portfolio
  • Fractional CMO Services
  • Marketing Consulting
  • The Power Hour
  • Integrated Leader
  • Advisory Support
  • VIP Strategy Intensive
  • MBA Strategy

Get In Touch

Privacy Overview

case study for data analysis

DISCOVER UNTAPPED PROFITS IN YOUR MARKETING EFFORTS TODAY!

If you’re ready to reach your next level of growth.

case study for data analysis

banner-in1

  • Data Science

12 Data Science Case Studies: Across Various Industries

Home Blog Data Science 12 Data Science Case Studies: Across Various Industries

Play icon

Data science has become popular in the last few years due to its successful application in making business decisions. Data scientists have been using data science techniques to solve challenging real-world issues in healthcare, agriculture, manufacturing, automotive, and many more. For this purpose, a data enthusiast needs to stay updated with the latest technological advancements in AI. An excellent way to achieve this is through reading industry data science case studies. I recommend checking out Data Science With Python course syllabus to start your data science journey.   In this discussion, I will present some case studies to you that contain detailed and systematic data analysis of people, objects, or entities focusing on multiple factors present in the dataset. Almost every industry uses data science in some way. You can learn more about data science fundamentals in this Data Science course content .

Let’s look at the top data science case studies in this article so you can understand how businesses from many sectors have benefitted from data science to boost productivity, revenues, and more.

case study for data analysis

List of Data Science Case Studies 2024

  • Hospitality:  Airbnb focuses on growth by  analyzing  customer voice using data science.  Qantas uses predictive analytics to mitigate losses
  • Healthcare:  Novo Nordisk  is  Driving innovation with NLP.  AstraZeneca harnesses data for innovation in medicine  
  • Covid 19:  Johnson and Johnson use s  d ata science  to fight the Pandemic  
  • E-commerce:  Amazon uses data science to personalize shop p ing experiences and improve customer satisfaction  
  • Supply chain management:  UPS optimizes supp l y chain with big data analytics
  • Meteorology:  IMD leveraged data science to achieve a rec o rd 1.2m evacuation before cyclone ''Fani''  
  • Entertainment Industry:  Netflix  u ses data science to personalize the content and improve recommendations.  Spotify uses big   data to deliver a rich user experience for online music streaming  
  • Banking and Finance:  HDFC utilizes Big  D ata Analytics to increase income and enhance  the  banking experience
  • Urban Planning and Smart Cities:  Traffic management in smart cities such as Pune and Bhubaneswar
  • Agricultural Yield Prediction:  Farmers Edge in Canada uses Data science to help farmers improve their produce
  • Transportation Industry:  Uber optimizes their ride-sharing feature and track the delivery routes through data analysis
  • Environmental Industry:  NASA utilizes Data science to predict potential natural disasters, World Wildlife analyzes deforestation to protect the environment

Top 12 Data Science Case Studies

1. data science in hospitality industry.

In the hospitality sector, data analytics assists hotels in better pricing strategies, customer analysis, brand marketing, tracking market trends, and many more.

Airbnb focuses on growth by analyzing customer voice using data science.  A famous example in this sector is the unicorn '' Airbnb '', a startup that focussed on data science early to grow and adapt to the market faster. This company witnessed a 43000 percent hypergrowth in as little as five years using data science. They included data science techniques to process the data, translate this data for better understanding the voice of the customer, and use the insights for decision making. They also scaled the approach to cover all aspects of the organization. Airbnb uses statistics to analyze and aggregate individual experiences to establish trends throughout the community. These analyzed trends using data science techniques impact their business choices while helping them grow further.  

Travel industry and data science

Predictive analytics benefits many parameters in the travel industry. These companies can use recommendation engines with data science to achieve higher personalization and improved user interactions. They can study and cross-sell products by recommending relevant products to drive sales and increase revenue. Data science is also employed in analyzing social media posts for sentiment analysis, bringing invaluable travel-related insights. Whether these views are positive, negative, or neutral can help these agencies understand the user demographics, the expected experiences by their target audiences, and so on. These insights are essential for developing aggressive pricing strategies to draw customers and provide better customization to customers in the travel packages and allied services. Travel agencies like Expedia and Booking.com use predictive analytics to create personalized recommendations, product development, and effective marketing of their products. Not just travel agencies but airlines also benefit from the same approach. Airlines frequently face losses due to flight cancellations, disruptions, and delays. Data science helps them identify patterns and predict possible bottlenecks, thereby effectively mitigating the losses and improving the overall customer traveling experience.  

How Qantas uses predictive analytics to mitigate losses  

Qantas , one of Australia's largest airlines, leverages data science to reduce losses caused due to flight delays, disruptions, and cancellations. They also use it to provide a better traveling experience for their customers by reducing the number and length of delays caused due to huge air traffic, weather conditions, or difficulties arising in operations. Back in 2016, when heavy storms badly struck Australia's east coast, only 15 out of 436 Qantas flights were cancelled due to their predictive analytics-based system against their competitor Virgin Australia, which witnessed 70 cancelled flights out of 320.  

2. Data Science in Healthcare

The  Healthcare sector  is immensely benefiting from the advancements in AI. Data science, especially in medical imaging, has been helping healthcare professionals come up with better diagnoses and effective treatments for patients. Similarly, several advanced healthcare analytics tools have been developed to generate clinical insights for improving patient care. These tools also assist in defining personalized medications for patients reducing operating costs for clinics and hospitals. Apart from medical imaging or computer vision,  Natural Language Processing (NLP)  is frequently used in the healthcare domain to study the published textual research data.     

A. Pharmaceutical

Driving innovation with NLP: Novo Nordisk.  Novo Nordisk  uses the Linguamatics NLP platform from internal and external data sources for text mining purposes that include scientific abstracts, patents, grants, news, tech transfer offices from universities worldwide, and more. These NLP queries run across sources for the key therapeutic areas of interest to the Novo Nordisk R&D community. Several NLP algorithms have been developed for the topics of safety, efficacy, randomized controlled trials, patient populations, dosing, and devices. Novo Nordisk employs a data pipeline to capitalize the tools' success on real-world data and uses interactive dashboards and cloud services to visualize this standardized structured information from the queries for exploring commercial effectiveness, market situations, potential, and gaps in the product documentation. Through data science, they are able to automate the process of generating insights, save time and provide better insights for evidence-based decision making.  

How AstraZeneca harnesses data for innovation in medicine.  AstraZeneca  is a globally known biotech company that leverages data using AI technology to discover and deliver newer effective medicines faster. Within their R&D teams, they are using AI to decode the big data to understand better diseases like cancer, respiratory disease, and heart, kidney, and metabolic diseases to be effectively treated. Using data science, they can identify new targets for innovative medications. In 2021, they selected the first two AI-generated drug targets collaborating with BenevolentAI in Chronic Kidney Disease and Idiopathic Pulmonary Fibrosis.   

Data science is also helping AstraZeneca redesign better clinical trials, achieve personalized medication strategies, and innovate the process of developing new medicines. Their Center for Genomics Research uses  data science and AI  to analyze around two million genomes by 2026. Apart from this, they are training their AI systems to check these images for disease and biomarkers for effective medicines for imaging purposes. This approach helps them analyze samples accurately and more effortlessly. Moreover, it can cut the analysis time by around 30%.   

AstraZeneca also utilizes AI and machine learning to optimize the process at different stages and minimize the overall time for the clinical trials by analyzing the clinical trial data. Summing up, they use data science to design smarter clinical trials, develop innovative medicines, improve drug development and patient care strategies, and many more.

C. Wearable Technology  

Wearable technology is a multi-billion-dollar industry. With an increasing awareness about fitness and nutrition, more individuals now prefer using fitness wearables to track their routines and lifestyle choices.  

Fitness wearables are convenient to use, assist users in tracking their health, and encourage them to lead a healthier lifestyle. The medical devices in this domain are beneficial since they help monitor the patient's condition and communicate in an emergency situation. The regularly used fitness trackers and smartwatches from renowned companies like Garmin, Apple, FitBit, etc., continuously collect physiological data of the individuals wearing them. These wearable providers offer user-friendly dashboards to their customers for analyzing and tracking progress in their fitness journey.

3. Covid 19 and Data Science

In the past two years of the Pandemic, the power of data science has been more evident than ever. Different  pharmaceutical companies  across the globe could synthesize Covid 19 vaccines by analyzing the data to understand the trends and patterns of the outbreak. Data science made it possible to track the virus in real-time, predict patterns, devise effective strategies to fight the Pandemic, and many more.  

How Johnson and Johnson uses data science to fight the Pandemic   

The  data science team  at  Johnson and Johnson  leverages real-time data to track the spread of the virus. They built a global surveillance dashboard (granulated to county level) that helps them track the Pandemic's progress, predict potential hotspots of the virus, and narrow down the likely place where they should test its investigational COVID-19 vaccine candidate. The team works with in-country experts to determine whether official numbers are accurate and find the most valid information about case numbers, hospitalizations, mortality and testing rates, social compliance, and local policies to populate this dashboard. The team also studies the data to build models that help the company identify groups of individuals at risk of getting affected by the virus and explore effective treatments to improve patient outcomes.

4. Data Science in E-commerce  

In the  e-commerce sector , big data analytics can assist in customer analysis, reduce operational costs, forecast trends for better sales, provide personalized shopping experiences to customers, and many more.  

Amazon uses data science to personalize shopping experiences and improve customer satisfaction.  Amazon  is a globally leading eCommerce platform that offers a wide range of online shopping services. Due to this, Amazon generates a massive amount of data that can be leveraged to understand consumer behavior and generate insights on competitors' strategies. Data science case studies reveal how Amazon uses its data to provide recommendations to its users on different products and services. With this approach, Amazon is able to persuade its consumers into buying and making additional sales. This approach works well for Amazon as it earns 35% of the revenue yearly with this technique. Additionally, Amazon collects consumer data for faster order tracking and better deliveries.     

Similarly, Amazon's virtual assistant, Alexa, can converse in different languages; uses speakers and a   camera to interact with the users. Amazon utilizes the audio commands from users to improve Alexa and deliver a better user experience. 

5. Data Science in Supply Chain Management

Predictive analytics and big data are driving innovation in the Supply chain domain. They offer greater visibility into the company operations, reduce costs and overheads, forecasting demands, predictive maintenance, product pricing, minimize supply chain interruptions, route optimization, fleet management, drive better performance, and more.     

Optimizing supply chain with big data analytics: UPS

UPS  is a renowned package delivery and supply chain management company. With thousands of packages being delivered every day, on average, a UPS driver makes about 100 deliveries each business day. On-time and safe package delivery are crucial to UPS's success. Hence, UPS offers an optimized navigation tool ''ORION'' (On-Road Integrated Optimization and Navigation), which uses highly advanced big data processing algorithms. This tool for UPS drivers provides route optimization concerning fuel, distance, and time. UPS utilizes supply chain data analysis in all aspects of its shipping process. Data about packages and deliveries are captured through radars and sensors. The deliveries and routes are optimized using big data systems. Overall, this approach has helped UPS save 1.6 million gallons of gasoline in transportation every year, significantly reducing delivery costs.    

6. Data Science in Meteorology

Weather prediction is an interesting  application of data science . Businesses like aviation, agriculture and farming, construction, consumer goods, sporting events, and many more are dependent on climatic conditions. The success of these businesses is closely tied to the weather, as decisions are made after considering the weather predictions from the meteorological department.   

Besides, weather forecasts are extremely helpful for individuals to manage their allergic conditions. One crucial application of weather forecasting is natural disaster prediction and risk management.  

Weather forecasts begin with a large amount of data collection related to the current environmental conditions (wind speed, temperature, humidity, clouds captured at a specific location and time) using sensors on IoT (Internet of Things) devices and satellite imagery. This gathered data is then analyzed using the understanding of atmospheric processes, and machine learning models are built to make predictions on upcoming weather conditions like rainfall or snow prediction. Although data science cannot help avoid natural calamities like floods, hurricanes, or forest fires. Tracking these natural phenomena well ahead of their arrival is beneficial. Such predictions allow governments sufficient time to take necessary steps and measures to ensure the safety of the population.  

IMD leveraged data science to achieve a record 1.2m evacuation before cyclone ''Fani''   

Most  d ata scientist’s responsibilities  rely on satellite images to make short-term forecasts, decide whether a forecast is correct, and validate models. Machine Learning is also used for pattern matching in this case. It can forecast future weather conditions if it recognizes a past pattern. When employing dependable equipment, sensor data is helpful to produce local forecasts about actual weather models. IMD used satellite pictures to study the low-pressure zones forming off the Odisha coast (India). In April 2019, thirteen days before cyclone ''Fani'' reached the area,  IMD  (India Meteorological Department) warned that a massive storm was underway, and the authorities began preparing for safety measures.  

It was one of the most powerful cyclones to strike India in the recent 20 years, and a record 1.2 million people were evacuated in less than 48 hours, thanks to the power of data science.   

7. Data Science in the Entertainment Industry

Due to the Pandemic, demand for OTT (Over-the-top) media platforms has grown significantly. People prefer watching movies and web series or listening to the music of their choice at leisure in the convenience of their homes. This sudden growth in demand has given rise to stiff competition. Every platform now uses data analytics in different capacities to provide better-personalized recommendations to its subscribers and improve user experience.   

How Netflix uses data science to personalize the content and improve recommendations  

Netflix  is an extremely popular internet television platform with streamable content offered in several languages and caters to various audiences. In 2006, when Netflix entered this media streaming market, they were interested in increasing the efficiency of their existing ''Cinematch'' platform by 10% and hence, offered a prize of $1 million to the winning team. This approach was successful as they found a solution developed by the BellKor team at the end of the competition that increased prediction accuracy by 10.06%. Over 200 work hours and an ensemble of 107 algorithms provided this result. These winning algorithms are now a part of the Netflix recommendation system.  

Netflix also employs Ranking Algorithms to generate personalized recommendations of movies and TV Shows appealing to its users.   

Spotify uses big data to deliver a rich user experience for online music streaming  

Personalized online music streaming is another area where data science is being used.  Spotify  is a well-known on-demand music service provider launched in 2008, which effectively leveraged big data to create personalized experiences for each user. It is a huge platform with more than 24 million subscribers and hosts a database of nearly 20million songs; they use the big data to offer a rich experience to its users. Spotify uses this big data and various algorithms to train machine learning models to provide personalized content. Spotify offers a "Discover Weekly" feature that generates a personalized playlist of fresh unheard songs matching the user's taste every week. Using the Spotify "Wrapped" feature, users get an overview of their most favorite or frequently listened songs during the entire year in December. Spotify also leverages the data to run targeted ads to grow its business. Thus, Spotify utilizes the user data, which is big data and some external data, to deliver a high-quality user experience.  

8. Data Science in Banking and Finance

Data science is extremely valuable in the Banking and  Finance industry . Several high priority aspects of Banking and Finance like credit risk modeling (possibility of repayment of a loan), fraud detection (detection of malicious or irregularities in transactional patterns using machine learning), identifying customer lifetime value (prediction of bank performance based on existing and potential customers), customer segmentation (customer profiling based on behavior and characteristics for personalization of offers and services). Finally, data science is also used in real-time predictive analytics (computational techniques to predict future events).    

How HDFC utilizes Big Data Analytics to increase revenues and enhance the banking experience    

One of the major private banks in India,  HDFC Bank , was an early adopter of AI. It started with Big Data analytics in 2004, intending to grow its revenue and understand its customers and markets better than its competitors. Back then, they were trendsetters by setting up an enterprise data warehouse in the bank to be able to track the differentiation to be given to customers based on their relationship value with HDFC Bank. Data science and analytics have been crucial in helping HDFC bank segregate its customers and offer customized personal or commercial banking services. The analytics engine and SaaS use have been assisting the HDFC bank in cross-selling relevant offers to its customers. Apart from the regular fraud prevention, it assists in keeping track of customer credit histories and has also been the reason for the speedy loan approvals offered by the bank.  

9. Data Science in Urban Planning and Smart Cities  

Data Science can help the dream of smart cities come true! Everything, from traffic flow to energy usage, can get optimized using data science techniques. You can use the data fetched from multiple sources to understand trends and plan urban living in a sorted manner.  

The significant data science case study is traffic management in Pune city. The city controls and modifies its traffic signals dynamically, tracking the traffic flow. Real-time data gets fetched from the signals through cameras or sensors installed. Based on this information, they do the traffic management. With this proactive approach, the traffic and congestion situation in the city gets managed, and the traffic flow becomes sorted. A similar case study is from Bhubaneswar, where the municipality has platforms for the people to give suggestions and actively participate in decision-making. The government goes through all the inputs provided before making any decisions, making rules or arranging things that their residents actually need.  

10. Data Science in Agricultural Prediction   

Have you ever wondered how helpful it can be if you can predict your agricultural yield? That is exactly what data science is helping farmers with. They can get information about the number of crops they can produce in a given area based on different environmental factors and soil types. Using this information, the farmers can make informed decisions about their yield and benefit the buyers and themselves in multiple ways.  

Data Science in Agricultural Yield Prediction

Farmers across the globe and overseas use various data science techniques to understand multiple aspects of their farms and crops. A famous example of data science in the agricultural industry is the work done by Farmers Edge. It is a company in Canada that takes real-time images of farms across the globe and combines them with related data. The farmers use this data to make decisions relevant to their yield and improve their produce. Similarly, farmers in countries like Ireland use satellite-based information to ditch traditional methods and multiply their yield strategically.  

11. Data Science in the Transportation Industry   

Transportation keeps the world moving around. People and goods commute from one place to another for various purposes, and it is fair to say that the world will come to a standstill without efficient transportation. That is why it is crucial to keep the transportation industry in the most smoothly working pattern, and data science helps a lot in this. In the realm of technological progress, various devices such as traffic sensors, monitoring display systems, mobility management devices, and numerous others have emerged.  

Many cities have already adapted to the multi-modal transportation system. They use GPS trackers, geo-locations and CCTV cameras to monitor and manage their transportation system. Uber is the perfect case study to understand the use of data science in the transportation industry. They optimize their ride-sharing feature and track the delivery routes through data analysis. Their data science case studies approach enabled them to serve more than 100 million users, making transportation easy and convenient. Moreover, they also use the data they fetch from users daily to offer cost-effective and quickly available rides.  

12. Data Science in the Environmental Industry    

Increasing pollution, global warming, climate changes and other poor environmental impacts have forced the world to pay attention to environmental industry. Multiple initiatives are being taken across the globe to preserve the environment and make the world a better place. Though the industry recognition and the efforts are in the initial stages, the impact is significant, and the growth is fast.  

The popular use of data science in the environmental industry is by NASA and other research organizations worldwide. NASA gets data related to the current climate conditions, and this data gets used to create remedial policies that can make a difference. Another way in which data science is actually helping researchers is they can predict natural disasters well before time and save or at least reduce the potential damage considerably. A similar case study is with the World Wildlife Fund. They use data science to track data related to deforestation and help reduce the illegal cutting of trees. Hence, it helps preserve the environment.  

Where to Find Full Data Science Case Studies?  

Data science is a highly evolving domain with many practical applications and a huge open community. Hence, the best way to keep updated with the latest trends in this domain is by reading case studies and technical articles. Usually, companies share their success stories of how data science helped them achieve their goals to showcase their potential and benefit the greater good. Such case studies are available online on the respective company websites and dedicated technology forums like Towards Data Science or Medium.  

Additionally, we can get some practical examples in recently published research papers and textbooks in data science.  

What Are the Skills Required for Data Scientists?  

Data scientists play an important role in the data science process as they are the ones who work on the data end to end. To be able to work on a data science case study, there are several skills required for data scientists like a good grasp of the fundamentals of data science, deep knowledge of statistics, excellent programming skills in Python or R, exposure to data manipulation and data analysis, ability to generate creative and compelling data visualizations, good knowledge of big data, machine learning and deep learning concepts for model building & deployment. Apart from these technical skills, data scientists also need to be good storytellers and should have an analytical mind with strong communication skills.    

Opt for the best business analyst training  elevating your expertise. Take the leap towards becoming a distinguished business analysis professional

Conclusion  

These were some interesting  data science case studies  across different industries. There are many more domains where data science has exciting applications, like in the Education domain, where data can be utilized to monitor student and instructor performance, develop an innovative curriculum that is in sync with the industry expectations, etc.   

Almost all the companies looking to leverage the power of big data begin with a SWOT analysis to narrow down the problems they intend to solve with data science. Further, they need to assess their competitors to develop relevant data science tools and strategies to address the challenging issue.  Thus, the utility of data science in several sectors is clearly visible, a lot is left to be explored, and more is yet to come. Nonetheless, data science will continue to boost the performance of organizations in this age of big data.  

Frequently Asked Questions (FAQs)

A case study in data science requires a systematic and organized approach for solving the problem. Generally, four main steps are needed to tackle every data science case study: 

  • Defining the problem statement and strategy to solve it  
  • Gather and pre-process the data by making relevant assumptions  
  • Select tool and appropriate algorithms to build machine learning /deep learning models 
  • Make predictions, accept the solutions based on evaluation metrics, and improve the model if necessary. 

Getting data for a case study starts with a reasonable understanding of the problem. This gives us clarity about what we expect the dataset to include. Finding relevant data for a case study requires some effort. Although it is possible to collect relevant data using traditional techniques like surveys and questionnaires, we can also find good quality data sets online on different platforms like Kaggle, UCI Machine Learning repository, Azure open data sets, Government open datasets, Google Public Datasets, Data World and so on.  

Data science projects involve multiple steps to process the data and bring valuable insights. A data science project includes different steps - defining the problem statement, gathering relevant data required to solve the problem, data pre-processing, data exploration & data analysis, algorithm selection, model building, model prediction, model optimization, and communicating the results through dashboards and reports.  

Profile

Devashree Madhugiri

Devashree holds an M.Eng degree in Information Technology from Germany and a background in Data Science. She likes working with statistics and discovering hidden insights in varied datasets to create stunning dashboards. She enjoys sharing her knowledge in AI by writing technical articles on various technological platforms. She loves traveling, reading fiction, solving Sudoku puzzles, and participating in coding competitions in her leisure time.

Something went wrong

Upcoming Data Science Batches & Dates

NameDateFeeKnow more

Course advisor icon

  • Privacy Policy

Research Method

Home » Case Study – Methods, Examples and Guide

Case Study – Methods, Examples and Guide

Table of Contents

Case Study Research

A case study is a research method that involves an in-depth examination and analysis of a particular phenomenon or case, such as an individual, organization, community, event, or situation.

It is a qualitative research approach that aims to provide a detailed and comprehensive understanding of the case being studied. Case studies typically involve multiple sources of data, including interviews, observations, documents, and artifacts, which are analyzed using various techniques, such as content analysis, thematic analysis, and grounded theory. The findings of a case study are often used to develop theories, inform policy or practice, or generate new research questions.

Types of Case Study

Types and Methods of Case Study are as follows:

Single-Case Study

A single-case study is an in-depth analysis of a single case. This type of case study is useful when the researcher wants to understand a specific phenomenon in detail.

For Example , A researcher might conduct a single-case study on a particular individual to understand their experiences with a particular health condition or a specific organization to explore their management practices. The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as content analysis or thematic analysis. The findings of a single-case study are often used to generate new research questions, develop theories, or inform policy or practice.

Multiple-Case Study

A multiple-case study involves the analysis of several cases that are similar in nature. This type of case study is useful when the researcher wants to identify similarities and differences between the cases.

For Example, a researcher might conduct a multiple-case study on several companies to explore the factors that contribute to their success or failure. The researcher collects data from each case, compares and contrasts the findings, and uses various techniques to analyze the data, such as comparative analysis or pattern-matching. The findings of a multiple-case study can be used to develop theories, inform policy or practice, or generate new research questions.

Exploratory Case Study

An exploratory case study is used to explore a new or understudied phenomenon. This type of case study is useful when the researcher wants to generate hypotheses or theories about the phenomenon.

For Example, a researcher might conduct an exploratory case study on a new technology to understand its potential impact on society. The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as grounded theory or content analysis. The findings of an exploratory case study can be used to generate new research questions, develop theories, or inform policy or practice.

Descriptive Case Study

A descriptive case study is used to describe a particular phenomenon in detail. This type of case study is useful when the researcher wants to provide a comprehensive account of the phenomenon.

For Example, a researcher might conduct a descriptive case study on a particular community to understand its social and economic characteristics. The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as content analysis or thematic analysis. The findings of a descriptive case study can be used to inform policy or practice or generate new research questions.

Instrumental Case Study

An instrumental case study is used to understand a particular phenomenon that is instrumental in achieving a particular goal. This type of case study is useful when the researcher wants to understand the role of the phenomenon in achieving the goal.

For Example, a researcher might conduct an instrumental case study on a particular policy to understand its impact on achieving a particular goal, such as reducing poverty. The researcher collects data from multiple sources, such as interviews, observations, and documents, and uses various techniques to analyze the data, such as content analysis or thematic analysis. The findings of an instrumental case study can be used to inform policy or practice or generate new research questions.

Case Study Data Collection Methods

Here are some common data collection methods for case studies:

Interviews involve asking questions to individuals who have knowledge or experience relevant to the case study. Interviews can be structured (where the same questions are asked to all participants) or unstructured (where the interviewer follows up on the responses with further questions). Interviews can be conducted in person, over the phone, or through video conferencing.

Observations

Observations involve watching and recording the behavior and activities of individuals or groups relevant to the case study. Observations can be participant (where the researcher actively participates in the activities) or non-participant (where the researcher observes from a distance). Observations can be recorded using notes, audio or video recordings, or photographs.

Documents can be used as a source of information for case studies. Documents can include reports, memos, emails, letters, and other written materials related to the case study. Documents can be collected from the case study participants or from public sources.

Surveys involve asking a set of questions to a sample of individuals relevant to the case study. Surveys can be administered in person, over the phone, through mail or email, or online. Surveys can be used to gather information on attitudes, opinions, or behaviors related to the case study.

Artifacts are physical objects relevant to the case study. Artifacts can include tools, equipment, products, or other objects that provide insights into the case study phenomenon.

How to conduct Case Study Research

Conducting a case study research involves several steps that need to be followed to ensure the quality and rigor of the study. Here are the steps to conduct case study research:

  • Define the research questions: The first step in conducting a case study research is to define the research questions. The research questions should be specific, measurable, and relevant to the case study phenomenon under investigation.
  • Select the case: The next step is to select the case or cases to be studied. The case should be relevant to the research questions and should provide rich and diverse data that can be used to answer the research questions.
  • Collect data: Data can be collected using various methods, such as interviews, observations, documents, surveys, and artifacts. The data collection method should be selected based on the research questions and the nature of the case study phenomenon.
  • Analyze the data: The data collected from the case study should be analyzed using various techniques, such as content analysis, thematic analysis, or grounded theory. The analysis should be guided by the research questions and should aim to provide insights and conclusions relevant to the research questions.
  • Draw conclusions: The conclusions drawn from the case study should be based on the data analysis and should be relevant to the research questions. The conclusions should be supported by evidence and should be clearly stated.
  • Validate the findings: The findings of the case study should be validated by reviewing the data and the analysis with participants or other experts in the field. This helps to ensure the validity and reliability of the findings.
  • Write the report: The final step is to write the report of the case study research. The report should provide a clear description of the case study phenomenon, the research questions, the data collection methods, the data analysis, the findings, and the conclusions. The report should be written in a clear and concise manner and should follow the guidelines for academic writing.

Examples of Case Study

Here are some examples of case study research:

  • The Hawthorne Studies : Conducted between 1924 and 1932, the Hawthorne Studies were a series of case studies conducted by Elton Mayo and his colleagues to examine the impact of work environment on employee productivity. The studies were conducted at the Hawthorne Works plant of the Western Electric Company in Chicago and included interviews, observations, and experiments.
  • The Stanford Prison Experiment: Conducted in 1971, the Stanford Prison Experiment was a case study conducted by Philip Zimbardo to examine the psychological effects of power and authority. The study involved simulating a prison environment and assigning participants to the role of guards or prisoners. The study was controversial due to the ethical issues it raised.
  • The Challenger Disaster: The Challenger Disaster was a case study conducted to examine the causes of the Space Shuttle Challenger explosion in 1986. The study included interviews, observations, and analysis of data to identify the technical, organizational, and cultural factors that contributed to the disaster.
  • The Enron Scandal: The Enron Scandal was a case study conducted to examine the causes of the Enron Corporation’s bankruptcy in 2001. The study included interviews, analysis of financial data, and review of documents to identify the accounting practices, corporate culture, and ethical issues that led to the company’s downfall.
  • The Fukushima Nuclear Disaster : The Fukushima Nuclear Disaster was a case study conducted to examine the causes of the nuclear accident that occurred at the Fukushima Daiichi Nuclear Power Plant in Japan in 2011. The study included interviews, analysis of data, and review of documents to identify the technical, organizational, and cultural factors that contributed to the disaster.

Application of Case Study

Case studies have a wide range of applications across various fields and industries. Here are some examples:

Business and Management

Case studies are widely used in business and management to examine real-life situations and develop problem-solving skills. Case studies can help students and professionals to develop a deep understanding of business concepts, theories, and best practices.

Case studies are used in healthcare to examine patient care, treatment options, and outcomes. Case studies can help healthcare professionals to develop critical thinking skills, diagnose complex medical conditions, and develop effective treatment plans.

Case studies are used in education to examine teaching and learning practices. Case studies can help educators to develop effective teaching strategies, evaluate student progress, and identify areas for improvement.

Social Sciences

Case studies are widely used in social sciences to examine human behavior, social phenomena, and cultural practices. Case studies can help researchers to develop theories, test hypotheses, and gain insights into complex social issues.

Law and Ethics

Case studies are used in law and ethics to examine legal and ethical dilemmas. Case studies can help lawyers, policymakers, and ethical professionals to develop critical thinking skills, analyze complex cases, and make informed decisions.

Purpose of Case Study

The purpose of a case study is to provide a detailed analysis of a specific phenomenon, issue, or problem in its real-life context. A case study is a qualitative research method that involves the in-depth exploration and analysis of a particular case, which can be an individual, group, organization, event, or community.

The primary purpose of a case study is to generate a comprehensive and nuanced understanding of the case, including its history, context, and dynamics. Case studies can help researchers to identify and examine the underlying factors, processes, and mechanisms that contribute to the case and its outcomes. This can help to develop a more accurate and detailed understanding of the case, which can inform future research, practice, or policy.

Case studies can also serve other purposes, including:

  • Illustrating a theory or concept: Case studies can be used to illustrate and explain theoretical concepts and frameworks, providing concrete examples of how they can be applied in real-life situations.
  • Developing hypotheses: Case studies can help to generate hypotheses about the causal relationships between different factors and outcomes, which can be tested through further research.
  • Providing insight into complex issues: Case studies can provide insights into complex and multifaceted issues, which may be difficult to understand through other research methods.
  • Informing practice or policy: Case studies can be used to inform practice or policy by identifying best practices, lessons learned, or areas for improvement.

Advantages of Case Study Research

There are several advantages of case study research, including:

  • In-depth exploration: Case study research allows for a detailed exploration and analysis of a specific phenomenon, issue, or problem in its real-life context. This can provide a comprehensive understanding of the case and its dynamics, which may not be possible through other research methods.
  • Rich data: Case study research can generate rich and detailed data, including qualitative data such as interviews, observations, and documents. This can provide a nuanced understanding of the case and its complexity.
  • Holistic perspective: Case study research allows for a holistic perspective of the case, taking into account the various factors, processes, and mechanisms that contribute to the case and its outcomes. This can help to develop a more accurate and comprehensive understanding of the case.
  • Theory development: Case study research can help to develop and refine theories and concepts by providing empirical evidence and concrete examples of how they can be applied in real-life situations.
  • Practical application: Case study research can inform practice or policy by identifying best practices, lessons learned, or areas for improvement.
  • Contextualization: Case study research takes into account the specific context in which the case is situated, which can help to understand how the case is influenced by the social, cultural, and historical factors of its environment.

Limitations of Case Study Research

There are several limitations of case study research, including:

  • Limited generalizability : Case studies are typically focused on a single case or a small number of cases, which limits the generalizability of the findings. The unique characteristics of the case may not be applicable to other contexts or populations, which may limit the external validity of the research.
  • Biased sampling: Case studies may rely on purposive or convenience sampling, which can introduce bias into the sample selection process. This may limit the representativeness of the sample and the generalizability of the findings.
  • Subjectivity: Case studies rely on the interpretation of the researcher, which can introduce subjectivity into the analysis. The researcher’s own biases, assumptions, and perspectives may influence the findings, which may limit the objectivity of the research.
  • Limited control: Case studies are typically conducted in naturalistic settings, which limits the control that the researcher has over the environment and the variables being studied. This may limit the ability to establish causal relationships between variables.
  • Time-consuming: Case studies can be time-consuming to conduct, as they typically involve a detailed exploration and analysis of a specific case. This may limit the feasibility of conducting multiple case studies or conducting case studies in a timely manner.
  • Resource-intensive: Case studies may require significant resources, including time, funding, and expertise. This may limit the ability of researchers to conduct case studies in resource-constrained settings.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Questionnaire

Questionnaire – Definition, Types, and Examples

Triangulation

Triangulation in Research – Types, Methods and...

Transformative Design

Transformative Design – Methods, Types, Guide

Phenomenology

Phenomenology – Methods, Examples and Guide

One-to-One Interview in Research

One-to-One Interview – Methods and Guide

Experimental Research Design

Experimental Design – Types, Methods, Guide

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • What Is a Case Study? | Definition, Examples & Methods

What Is a Case Study? | Definition, Examples & Methods

Published on May 8, 2019 by Shona McCombes . Revised on November 20, 2023.

A case study is a detailed study of a specific subject, such as a person, group, place, event, organization, or phenomenon. Case studies are commonly used in social, educational, clinical, and business research.

A case study research design usually involves qualitative methods , but quantitative methods are sometimes also used. Case studies are good for describing , comparing, evaluating and understanding different aspects of a research problem .

Table of contents

When to do a case study, step 1: select a case, step 2: build a theoretical framework, step 3: collect your data, step 4: describe and analyze the case, other interesting articles.

A case study is an appropriate research design when you want to gain concrete, contextual, in-depth knowledge about a specific real-world subject. It allows you to explore the key characteristics, meanings, and implications of the case.

Case studies are often a good choice in a thesis or dissertation . They keep your project focused and manageable when you don’t have the time or resources to do large-scale research.

You might use just one complex case study where you explore a single subject in depth, or conduct multiple case studies to compare and illuminate different aspects of your research problem.

Case study examples
Research question Case study
What are the ecological effects of wolf reintroduction? Case study of wolf reintroduction in Yellowstone National Park
How do populist politicians use narratives about history to gain support? Case studies of Hungarian prime minister Viktor Orbán and US president Donald Trump
How can teachers implement active learning strategies in mixed-level classrooms? Case study of a local school that promotes active learning
What are the main advantages and disadvantages of wind farms for rural communities? Case studies of three rural wind farm development projects in different parts of the country
How are viral marketing strategies changing the relationship between companies and consumers? Case study of the iPhone X marketing campaign
How do experiences of work in the gig economy differ by gender, race and age? Case studies of Deliveroo and Uber drivers in London

Prevent plagiarism. Run a free check.

Once you have developed your problem statement and research questions , you should be ready to choose the specific case that you want to focus on. A good case study should have the potential to:

  • Provide new or unexpected insights into the subject
  • Challenge or complicate existing assumptions and theories
  • Propose practical courses of action to resolve a problem
  • Open up new directions for future research

TipIf your research is more practical in nature and aims to simultaneously investigate an issue as you solve it, consider conducting action research instead.

Unlike quantitative or experimental research , a strong case study does not require a random or representative sample. In fact, case studies often deliberately focus on unusual, neglected, or outlying cases which may shed new light on the research problem.

Example of an outlying case studyIn the 1960s the town of Roseto, Pennsylvania was discovered to have extremely low rates of heart disease compared to the US average. It became an important case study for understanding previously neglected causes of heart disease.

However, you can also choose a more common or representative case to exemplify a particular category, experience or phenomenon.

Example of a representative case studyIn the 1920s, two sociologists used Muncie, Indiana as a case study of a typical American city that supposedly exemplified the changing culture of the US at the time.

While case studies focus more on concrete details than general theories, they should usually have some connection with theory in the field. This way the case study is not just an isolated description, but is integrated into existing knowledge about the topic. It might aim to:

  • Exemplify a theory by showing how it explains the case under investigation
  • Expand on a theory by uncovering new concepts and ideas that need to be incorporated
  • Challenge a theory by exploring an outlier case that doesn’t fit with established assumptions

To ensure that your analysis of the case has a solid academic grounding, you should conduct a literature review of sources related to the topic and develop a theoretical framework . This means identifying key concepts and theories to guide your analysis and interpretation.

There are many different research methods you can use to collect data on your subject. Case studies tend to focus on qualitative data using methods such as interviews , observations , and analysis of primary and secondary sources (e.g., newspaper articles, photographs, official records). Sometimes a case study will also collect quantitative data.

Example of a mixed methods case studyFor a case study of a wind farm development in a rural area, you could collect quantitative data on employment rates and business revenue, collect qualitative data on local people’s perceptions and experiences, and analyze local and national media coverage of the development.

The aim is to gain as thorough an understanding as possible of the case and its context.

In writing up the case study, you need to bring together all the relevant aspects to give as complete a picture as possible of the subject.

How you report your findings depends on the type of research you are doing. Some case studies are structured like a standard scientific paper or thesis , with separate sections or chapters for the methods , results and discussion .

Others are written in a more narrative style, aiming to explore the case from various angles and analyze its meanings and implications (for example, by using textual analysis or discourse analysis ).

In all cases, though, make sure to give contextual details about the case, connect it back to the literature and theory, and discuss how it fits into wider patterns or debates.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Ecological validity

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

McCombes, S. (2023, November 20). What Is a Case Study? | Definition, Examples & Methods. Scribbr. Retrieved September 4, 2024, from https://www.scribbr.com/methodology/case-study/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, primary vs. secondary sources | difference & examples, what is a theoretical framework | guide to organizing, what is action research | definition & examples, get unlimited documents corrected.

✔ Free APA citation check included ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

Case Study Research in Software Engineering: Guidelines and Examples by Per Runeson, Martin Höst, Austen Rainer, Björn Regnell

Get full access to Case Study Research in Software Engineering: Guidelines and Examples and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

DATA ANALYSIS AND INTERPRETATION

5.1 introduction.

Once data has been collected the focus shifts to analysis of data. It can be said that in this phase, data is used to understand what actually has happened in the studied case, and where the researcher understands the details of the case and seeks patterns in the data. This means that there inevitably is some analysis going on also in the data collection phase where the data is studied, and for example when data from an interview is transcribed. The understandings in the earlier phases are of course also valid and important, but this chapter is more focusing on the separate phase that starts after the data has been collected.

Data analysis is conducted differently for quantitative and qualitative data. Sections 5.2 – 5.5 describe how to analyze qualitative data and how to assess the validity of this type of analysis. In Section 5.6 , a short introduction to quantitative analysis methods is given. Since quantitative analysis is covered extensively in textbooks on statistical analysis, and case study research to a large extent relies on qualitative data, this section is kept short.

5.2 ANALYSIS OF DATA IN FLEXIBLE RESEARCH

5.2.1 introduction.

As case study research is a flexible research method, qualitative data analysis methods are commonly used [176]. The basic objective of the analysis is, as in any other analysis, to derive conclusions from the data, keeping a clear chain of evidence. The chain of evidence means that a reader ...

Get Case Study Research in Software Engineering: Guidelines and Examples now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Cover of Software Architecture Patterns

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

case study for data analysis

  • Cancer Nursing Practice
  • Emergency Nurse
  • Evidence-Based Nursing
  • Learning Disability Practice
  • Mental Health Practice
  • Nurse Researcher
  • Nursing Children and Young People
  • Nursing Management
  • Nursing Older People
  • Nursing Standard
  • Primary Health Care
  • RCN Nursing Awards
  • Nursing Live
  • Nursing Careers and Job Fairs
  • CPD webinars on-demand
  • --> Advanced -->
|

case study for data analysis

  • Clinical articles
  • Expert advice
  • Career advice
  • Revalidation
  • CPD Quizzes

Data analysis Previous     Next

Qualitative case study data analysis: an example from practice, catherine houghton lecturer, school of nursing and midwifery, national university of ireland, galway, republic of ireland, kathy murphy professor of nursing, national university of ireland, galway, ireland, david shaw lecturer, open university, milton keynes, uk, dympna casey senior lecturer, national university of ireland, galway, ireland.

Aim To illustrate an approach to data analysis in qualitative case study methodology.

Background There is often little detail in case study research about how data were analysed. However, it is important that comprehensive analysis procedures are used because there are often large sets of data from multiple sources of evidence. Furthermore, the ability to describe in detail how the analysis was conducted ensures rigour in reporting qualitative research.

Data sources The research example used is a multiple case study that explored the role of the clinical skills laboratory in preparing students for the real world of practice. Data analysis was conducted using a framework guided by the four stages of analysis outlined by Morse ( 1994 ): comprehending, synthesising, theorising and recontextualising. The specific strategies for analysis in these stages centred on the work of Miles and Huberman ( 1994 ), which has been successfully used in case study research. The data were managed using NVivo software.

Review methods Literature examining qualitative data analysis was reviewed and strategies illustrated by the case study example provided.

Discussion Each stage of the analysis framework is described with illustration from the research example for the purpose of highlighting the benefits of a systematic approach to handling large data sets from multiple sources.

Conclusion By providing an example of how each stage of the analysis was conducted, it is hoped that researchers will be able to consider the benefits of such an approach to their own case study analysis.

Implications for research/practice This paper illustrates specific strategies that can be employed when conducting data analysis in case study research and other qualitative research designs.

Nurse Researcher . 22, 5, 8-12. doi: 10.7748/nr.22.5.8.e1307

This article has been subject to double blind peer review

None declared

Received: 02 February 2014

Accepted: 16 April 2014

Case study data analysis - case study research methodology - clinical skills research - qualitative case study methodology - qualitative data analysis - qualitative research

User not found

Want to read more?

Already have access log in, 3-month trial offer for £5.25/month.

  • Unlimited access to all 10 RCNi Journals
  • RCNi Learning featuring over 175 modules to easily earn CPD time
  • NMC-compliant RCNi Revalidation Portfolio to stay on track with your progress
  • Personalised newsletters tailored to your interests
  • A customisable dashboard with over 200 topics

Alternatively, you can purchase access to this article for the next seven days. Buy now

Are you a student? Our student subscription has content especially for you. Find out more

case study for data analysis

15 May 2015 / Vol 22 issue 5

TABLE OF CONTENTS

DIGITAL EDITION

  • LATEST ISSUE
  • SIGN UP FOR E-ALERT
  • WRITE FOR US
  • PERMISSIONS

Share article: Qualitative case study data analysis: an example from practice

We use cookies on this site to enhance your user experience.

By clicking any link on this page you are giving your consent for us to set cookies.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Sage Choice
  • PMC11334375

Logo of sageopen

Methodologic and Data-Analysis Triangulation in Case Studies: A Scoping Review

Margarithe charlotte schlunegger.

1 Department of Health Professions, Applied Research & Development in Nursing, Bern University of Applied Sciences, Bern, Switzerland

2 Faculty of Health, School of Nursing Science, Witten/Herdecke University, Witten, Germany

Maya Zumstein-Shaha

Rebecca palm.

3 Department of Health Care Research, Carl von Ossietzky University Oldenburg, Oldenburg, Germany

Associated Data

Supplemental material, sj-docx-1-wjn-10.1177_01939459241263011 for Methodologic and Data-Analysis Triangulation in Case Studies: A Scoping Review by Margarithe Charlotte Schlunegger, Maya Zumstein-Shaha and Rebecca Palm in Western Journal of Nursing Research

We sought to explore the processes of methodologic and data-analysis triangulation in case studies using the example of research on nurse practitioners in primary health care.

Design and methods:

We conducted a scoping review within Arksey and O’Malley’s methodological framework, considering studies that defined a case study design and used 2 or more data sources, published in English or German before August 2023.

Data sources:

The databases searched were MEDLINE and CINAHL, supplemented with hand searching of relevant nursing journals. We also examined the reference list of all the included studies.

In total, 63 reports were assessed for eligibility. Ultimately, we included 8 articles. Five studies described within-method triangulation, whereas 3 provided information on between/across-method triangulation. No study reported within-method triangulation of 2 or more quantitative data-collection procedures. The data-collection procedures were interviews, observation, documentation/documents, service records, and questionnaires/assessments. The data-analysis triangulation involved various qualitative and quantitative methods of analysis. Details about comparing or contrasting results from different qualitative and mixed-methods data were lacking.

Conclusions:

Various processes for methodologic and data-analysis triangulation are described in this scoping review but lack detail, thus hampering standardization in case study research, potentially affecting research traceability. Triangulation is complicated by terminological confusion. To advance case study research in nursing, authors should reflect critically on the processes of triangulation and employ existing tools, like a protocol or mixed-methods matrix, for transparent reporting. The only existing reporting guideline should be complemented with directions on methodologic and data-analysis triangulation.

Case study research is defined as “an empirical method that investigates a contemporary phenomenon (the ‘case’) in depth and within its real-world context, especially when the boundaries between phenomenon and context may not be clearly evident. A case study relies on multiple sources of evidence, with data needing to converge in a triangulating fashion.” 1 (p15) This design is described as a stand-alone research approach equivalent to grounded theory and can entail single and multiple cases. 1 , 2 However, case study research should not be confused with single clinical case reports. “Case reports are familiar ways of sharing events of intervening with single patients with previously unreported features.” 3 (p107) As a methodology, case study research encompasses substantially more complexity than a typical clinical case report. 1 , 3

A particular characteristic of case study research is the use of various data sources, such as quantitative data originating from questionnaires as well as qualitative data emerging from interviews, observations, or documents. Therefore, a case study always draws on multiple sources of evidence, and the data must converge in a triangulating manner. 1 When using multiple data sources, a case or cases can be examined more convincingly and accurately, compensating for the weaknesses of the respective data sources. 1 Another characteristic is the interaction of various perspectives. This involves comparing or contrasting perspectives of people with different points of view, eg, patients, staff, or leaders. 4 Through triangulation, case studies contribute to the completeness of the research on complex topics, such as role implementation in clinical practice. 1 , 5 Triangulation involves a combination of researchers from various disciplines, of theories, of methods, and/or of data sources. By creating connections between these sources (ie, investigator, theories, methods, data sources, and/or data analysis), a new understanding of the phenomenon under study can be obtained. 6 , 7

This scoping review focuses on methodologic and data-analysis triangulation because concrete procedures are missing, eg, in reporting guidelines. Methodologic triangulation has been called methods, mixed methods, or multimethods. 6 It can encompass within-method triangulation and between/across-method triangulation. 7 “Researchers using within-method triangulation use at least 2 data-collection procedures from the same design approach.” 6 (p254) Within-method triangulation is either qualitative or quantitative but not both. Therefore, within-method triangulation can also be considered data source triangulation. 8 In contrast, “researchers using between/across-method triangulation employ both qualitative and quantitative data-collection methods in the same study.” 6 (p254) Hence, methodologic approaches are combined as well as various data sources. For this scoping review, the term “methodologic triangulation” is maintained to denote between/across-method triangulation. “Data-analysis triangulation is the combination of 2 or more methods of analyzing data.” 6 (p254)

Although much has been published on case studies, there is little consensus on the quality of the various data sources, the most appropriate methods, or the procedures for conducting methodologic and data-analysis triangulation. 5 According to the EQUATOR (Enhancing the QUAlity and Transparency Of health Research) clearinghouse for reporting guidelines, one standard exists for organizational case studies. 9 Organizational case studies provide insights into organizational change in health care services. 9 Rodgers et al 9 pointed out that, although high-quality studies are being funded and published, they are sometimes poorly articulated and methodologically inadequate. In the reporting checklist by Rodgers et al, 9 a description of the data collection is included, but reporting directions on methodologic and data-analysis triangulation are missing. Therefore, the purpose of this study was to examine the process of methodologic and data-analysis triangulation in case studies. Accordingly, we conducted a scoping review to elicit descriptions of and directions for triangulation methods and analysis, drawing on case studies of nurse practitioners (NPs) in primary health care as an example. Case studies are recommended to evaluate the implementation of new roles in (primary) health care, such as that of NPs. 1 , 5 Case studies on new role implementation can generate a unique and in-depth understanding of specific roles (individual), teams (smaller groups), family practices or similar institutions (organization), and social and political processes in health care systems. 1 , 10 The integration of NPs into health care systems is at different stages of progress around the world. 11 Therefore, studies are needed to evaluate this process.

The methodological framework by Arksey and O’Malley 12 guided this scoping review. We examined the current scientific literature on the use of methodologic and data-analysis triangulation in case studies on NPs in primary health care. The review process included the following stages: (1) establishing the research question; (2) identifying relevant studies; (3) selecting the studies for inclusion; (4) charting the data; (5) collating, summarizing, and reporting the results; and (6) consulting experts in the field. 12 Stage 6 was not performed due to a lack of financial resources. The reporting of the review followed the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Review) guideline by Tricco et al 13 (guidelines for reporting systematic reviews and meta-analyses [ Supplementary Table A ]). Scoping reviews are not eligible for registration in PROSPERO.

Stage 1: Establishing the Research Question

The aim of this scoping review was to examine the process of triangulating methods and analysis in case studies on NPs in primary health care to improve the reporting. We sought to answer the following question: How have methodologic and data-analysis triangulation been conducted in case studies on NPs in primary health care? To answer the research question, we examined the following elements of the selected studies: the research question, the study design, the case definition, the selected data sources, and the methodologic and data-analysis triangulation.

Stage 2: Identifying Relevant Studies

A systematic database search was performed in the MEDLINE (via PubMed) and CINAHL (via EBSCO) databases between July and September 2020 to identify relevant articles. The following terms were used as keyword search strategies: (“Advanced Practice Nursing” OR “nurse practitioners”) AND (“primary health care” OR “Primary Care Nursing”) AND (“case study” OR “case studies”). Searches were limited to English- and German-language articles. Hand searches were conducted in the journals Nursing Inquiry , BMJ Open , and BioMed Central ( BMC ). We also screened the reference lists of the studies included. The database search was updated in August 2023. The complete search strategy for all the databases is presented in Supplementary Table B .

Stage 3: Selecting the Studies

Inclusion and exclusion criteria.

We used the inclusion and exclusion criteria reported in Table 1 . We included studies of NPs who had at least a master’s degree in nursing according to the definition of the International Council of Nurses. 14 This scoping review considered studies that were conducted in primary health care practices in rural, urban, and suburban regions. We excluded reviews and study protocols in which no data collection had occurred. Articles were included without limitations on the time period or country of origin.

Inclusion and Exclusion Criteria.

CriteriaInclusionExclusion
Population- NPs with a master’s degree in nursing or higher - Nurses with a bachelor’s degree in nursing or lower
- Pre-registration nursing students
- No definition of master’s degree in nursing described in the publication
Interest- Description/definition of a case study design
- Two or more data sources
- Reviews
- Study protocols
- Summaries/comments/discussions
Context- Primary health care
- Family practices and home visits (including adult practices, internal medicine practices, community health centers)
- Nursing homes, hospital, hospice

Screening process

After the search, we collated and uploaded all the identified records into EndNote v.X8 (Clarivate Analytics, Philadelphia, Pennsylvania) and removed any duplicates. Two independent reviewers (MCS and SA) screened the titles and abstracts for assessment in line with the inclusion criteria. They retrieved and assessed the full texts of the selected studies while applying the inclusion criteria. Any disagreements about the eligibility of studies were resolved by discussion or, if no consensus could be reached, by involving experienced researchers (MZ-S and RP).

Stages 4 and 5: Charting the Data and Collating, Summarizing, and Reporting the Results

The first reviewer (MCS) extracted data from the selected publications. For this purpose, an extraction tool developed by the authors was used. This tool comprised the following criteria: author(s), year of publication, country, research question, design, case definition, data sources, and methodologic and data-analysis triangulation. First, we extracted and summarized information about the case study design. Second, we narratively summarized the way in which the data and methodological triangulation were described. Finally, we summarized the information on within-case or cross-case analysis. This process was performed using Microsoft Excel. One reviewer (MCS) extracted data, whereas another reviewer (SA) cross-checked the data extraction, making suggestions for additions or edits. Any disagreements between the reviewers were resolved through discussion.

A total of 149 records were identified in 2 databases. We removed 20 duplicates and screened 129 reports by title and abstract. A total of 46 reports were assessed for eligibility. Through hand searches, we identified 117 additional records. Of these, we excluded 98 reports after title and abstract screening. A total of 17 reports were assessed for eligibility. From the 2 databases and the hand search, 63 reports were assessed for eligibility. Ultimately, we included 8 articles for data extraction. No further articles were included after the reference list screening of the included studies. A PRISMA flow diagram of the study selection and inclusion process is presented in Figure 1 . As shown in Tables 2 and ​ and3, 3 , the articles included in this scoping review were published between 2010 and 2022 in Canada (n = 3), the United States (n = 2), Australia (n = 2), and Scotland (n = 1).

An external file that holds a picture, illustration, etc.
Object name is 10.1177_01939459241263011-fig1.jpg

PRISMA flow diagram.

Characteristics of Articles Included.

AuthorContandriopoulos et al Flinter Hogan et al Hungerford et al O’Rourke Roots and MacDonald Schadewaldt et al Strachan et al
CountryCanadaThe United StatesThe United StatesAustraliaCanadaCanadaAustraliaScotland
How or why research questionNo information on the research questionSeveral how or why research questionsWhat and how research questionNo information on the research questionSeveral how or why research questionsNo information on the research questionWhat research questionWhat and why research questions
Design and referenced author of methodological guidanceSix qualitative case studies
Robert K. Yin
Multiple-case studies design
Robert K. Yin
Multiple-case studies design
Robert E. Stake
Case study design
Robert K. Yin
Qualitative single-case study
Robert K. Yin
Robert E. Stake
Sharan Merriam
Single-case study design
Robert K. Yin
Sharan Merriam
Multiple-case studies design
Robert K. Yin
Robert E. Stake
Multiple-case studies design
Case definitionTeam of health professionals
(Small group)
Nurse practitioners
(Individuals)
Primary care practices (Organization)Community-based NP model of practice
(Organization)
NP-led practice
(Organization)
Primary care practices
(Organization)
No information on case definitionHealth board (Organization)

Overview of Within-Method, Between/Across-Method, and Data-Analysis Triangulation.

AuthorContandriopoulos et al Flinter Hogan et al Hungerford et al O’Rourke Roots and MacDonald Schadewaldt et al Strachan et al
Within-method triangulation (using within-method triangulation use at least 2 data-collection procedures from the same design approach)
:
 InterviewsXxxxx
 Observationsxx
 Public documentsxxx
 Electronic health recordsx
Between/across-method (using both qualitative and quantitative data-collection procedures in the same study)
:
:
 Interviewsxxx
 Observationsxx
 Public documentsxx
 Electronic health recordsx
:
 Self-assessmentx
 Service recordsx
 Questionnairesx
Data-analysis triangulation (combination of 2 or more methods of analyzing data)
:
:
 Deductivexxx
 Inductivexx
 Thematicxx
 Content
:
 Descriptive analysisxxx
:
:
 Deductivexxxx
 Inductivexx
 Thematicx
 Contentx

Research Question, Case Definition, and Case Study Design

The following sections describe the research question, case definition, and case study design. Case studies are most appropriate when asking “how” or “why” questions. 1 According to Yin, 1 how and why questions are explanatory and lead to the use of case studies, histories, and experiments as the preferred research methods. In 1 study from Canada, eg, the following research question was presented: “How and why did stakeholders participate in the system change process that led to the introduction of the first nurse practitioner-led Clinic in Ontario?” (p7) 19 Once the research question has been formulated, the case should be defined and, subsequently, the case study design chosen. 1 In typical case studies with mixed methods, the 2 types of data are gathered concurrently in a convergent design and the results merged to examine a case and/or compare multiple cases. 10

Research question

“How” or “why” questions were found in 4 studies. 16 , 17 , 19 , 22 Two studies additionally asked “what” questions. Three studies described an exploratory approach, and 1 study presented an explanatory approach. Of these 4 studies, 3 studies chose a qualitative approach 17 , 19 , 22 and 1 opted for mixed methods with a convergent design. 16

In the remaining studies, either the research questions were not clearly stated or no “how” or “why” questions were formulated. For example, “what” questions were found in 1 study. 21 No information was provided on exploratory, descriptive, and explanatory approaches. Schadewaldt et al 21 chose mixed methods with a convergent design.

Case definition and case study design

A total of 5 studies defined the case as an organizational unit. 17 , 18 - 20 , 22 Of the 8 articles, 4 reported multiple-case studies. 16 , 17 , 22 , 23 Another 2 publications involved single-case studies. 19 , 20 Moreover, 2 publications did not state the case study design explicitly.

Within-Method Triangulation

This section describes within-method triangulation, which involves employing at least 2 data-collection procedures within the same design approach. 6 , 7 This can also be called data source triangulation. 8 Next, we present the single data-collection procedures in detail. In 5 studies, information on within-method triangulation was found. 15 , 17 - 19 , 22 Studies describing a quantitative approach and the triangulation of 2 or more quantitative data-collection procedures could not be included in this scoping review.

Qualitative approach

Five studies used qualitative data-collection procedures. Two studies combined face-to-face interviews and documents. 15 , 19 One study mixed in-depth interviews with observations, 18 and 1 study combined face-to-face interviews and documentation. 22 One study contained face-to-face interviews, observations, and documentation. 17 The combination of different qualitative data-collection procedures was used to present the case context in an authentic and complex way, to elicit the perspectives of the participants, and to obtain a holistic description and explanation of the cases under study.

All 5 studies used qualitative interviews as the primary data-collection procedure. 15 , 17 - 19 , 22 Face-to-face, in-depth, and semi-structured interviews were conducted. The topics covered in the interviews included processes in the introduction of new care services and experiences of barriers and facilitators to collaborative work in general practices. Two studies did not specify the type of interviews conducted and did not report sample questions. 15 , 18

Observations

In 2 studies, qualitative observations were carried out. 17 , 18 During the observations, the physical design of the clinical patients’ rooms and office spaces was examined. 17 Hungerford et al 18 did not explain what information was collected during the observations. In both studies, the type of observation was not specified. Observations were generally recorded as field notes.

Public documents

In 3 studies, various qualitative public documents were studied. 15 , 19 , 22 These documents included role description, education curriculum, governance frameworks, websites, and newspapers with information about the implementation of the role and general practice. Only 1 study failed to specify the type of document and the collected data. 15

Electronic health records

In 1 study, qualitative documentation was investigated. 17 This included a review of dashboards (eg, provider productivity reports or provider quality dashboards in the electronic health record) and quality performance reports (eg, practice-wide or co-management team-wide performance reports).

Between/Across-Method Triangulation

This section describes the between/across methods, which involve employing both qualitative and quantitative data-collection procedures in the same study. 6 , 7 This procedure can also be denoted “methodologic triangulation.” 8 Subsequently, we present the individual data-collection procedures. In 3 studies, information on between/across triangulation was found. 16 , 20 , 21

Mixed methods

Three studies used qualitative and quantitative data-collection procedures. One study combined face-to-face interviews, documentation, and self-assessments. 16 One study employed semi-structured interviews, direct observation, documents, and service records, 20 and another study combined face-to-face interviews, non-participant observation, documents, and questionnaires. 23

All 3 studies used qualitative interviews as the primary data-collection procedure. 16 , 20 , 23 Face-to-face and semi-structured interviews were conducted. In the interviews, data were collected on the introduction of new care services and experiences of barriers to and facilitators of collaborative work in general practices.

Observation

In 2 studies, direct and non-participant qualitative observations were conducted. 20 , 23 During the observations, the interaction between health professionals or the organization and the clinical context was observed. Observations were generally recorded as field notes.

In 2 studies, various qualitative public documents were examined. 20 , 23 These documents included role description, newspapers, websites, and practice documents (eg, flyers). In the documents, information on the role implementation and role description of NPs was collected.

Individual journals

In 1 study, qualitative individual journals were studied. 16 These included reflective journals from NPs, who performed the role in primary health care.

Service records

Only 1 study involved quantitative service records. 20 These service records were obtained from the primary care practices and the respective health authorities. They were collected before and after the implementation of an NP role to identify changes in patients’ access to health care, the volume of patients served, and patients’ use of acute care services.

Questionnaires/Assessment

In 2 studies, quantitative questionnaires were used to gather information about the teams’ satisfaction with collaboration. 16 , 21 In 1 study, 3 validated scales were used. The scales measured experience, satisfaction, and belief in the benefits of collaboration. 21 Psychometric performance indicators of these scales were provided. However, the time points of data collection were not specified; similarly, whether the questionnaires were completed online or by hand was not mentioned. A competency self-assessment tool was used in another study. 16 The assessment comprised 70 items and included topics such as health promotion, protection, disease prevention and treatment, the NP-patient relationship, the teaching-coaching function, the professional role, managing and negotiating health care delivery systems, monitoring and ensuring the quality of health care practice, and cultural competence. Psychometric performance indicators were provided. The assessment was completed online with 2 measurement time points (pre self-assessment and post self-assessment).

Data-Analysis Triangulation

This section describes data-analysis triangulation, which involves the combination of 2 or more methods of analyzing data. 6 Subsequently, we present within-case analysis and cross-case analysis.

Mixed-methods analysis

Three studies combined qualitative and quantitative methods of analysis. 16 , 20 , 21 Two studies involved deductive and inductive qualitative analysis, and qualitative data were analyzed thematically. 20 , 21 One used deductive qualitative analysis. 16 The method of analysis was not specified in the studies. Quantitative data were analyzed using descriptive statistics in 3 studies. 16 , 20 , 23 The descriptive statistics comprised the calculation of the mean, median, and frequencies.

Qualitative methods of analysis

Two studies combined deductive and inductive qualitative analysis, 19 , 22 and 2 studies only used deductive qualitative analysis. 15 , 18 Qualitative data were analyzed thematically in 1 study, 22 and data were treated with content analysis in the other. 19 The method of analysis was not specified in the 2 studies.

Within-case analysis

In 7 studies, a within-case analysis was performed. 15 - 20 , 22 Six studies used qualitative data for the within-case analysis, and 1 study employed qualitative and quantitative data. Data were analyzed separately, consecutively, or in parallel. The themes generated from qualitative data were compared and then summarized. The individual cases were presented mostly as a narrative description. Quantitative data were integrated into the qualitative description with tables and graphs. Qualitative and quantitative data were also presented as a narrative description.

Cross-case analyses

Of the multiple-case studies, 5 carried out cross-case analyses. 15 - 17 , 20 , 22 Three studies described the cross-case analysis using qualitative data. Two studies reported a combination of qualitative and quantitative data for the cross-case analysis. In each multiple-case study, the individual cases were contrasted to identify the differences and similarities between the cases. One study did not specify whether a within-case or a cross-case analysis was conducted. 23

Confirmation or contradiction of data

This section describes confirmation or contradiction through qualitative and quantitative data. 1 , 4 Qualitative and quantitative data were reported separately, with little connection between them. As a result, the conclusions on neither the comparisons nor the contradictions could be clearly determined.

Confirmation or contradiction among qualitative data

In 3 studies, the consistency of the results of different types of qualitative data was highlighted. 16 , 19 , 21 In particular, documentation and interviews or interviews and observations were contrasted:

  • Confirmation between interviews and documentation: The data from these sources corroborated the existence of a common vision for an NP-led clinic. 19
  • Confirmation among interviews and observation: NPs experienced pressure to find and maintain their position within the existing system. Nurse practitioners and general practitioners performed complete episodes of care, each without collaborative interaction. 21
  • Contradiction among interviews and documentation: For example, interviewees mentioned that differentiating the scope of practice between NPs and physicians is difficult as there are too many areas of overlap. However, a clear description of the scope of practice for the 2 roles was provided. 21

Confirmation through a combination of qualitative and quantitative data

Both types of data showed that NPs and general practitioners wanted to have more time in common to discuss patient cases and engage in personal exchanges. 21 In addition, the qualitative and quantitative data confirmed the individual progression of NPs from less competent to more competent. 16 One study pointed out that qualitative and quantitative data obtained similar results for the cases. 20 For example, integrating NPs improved patient access by increasing appointment availability.

Contradiction through a combination of qualitative and quantitative data

Although questionnaire results indicated that NPs and general practitioners experienced high levels of collaboration and satisfaction with the collaborative relationship, the qualitative results drew a more ambivalent picture of NPs’ and general practitioners’ experiences with collaboration. 21

Research Question and Design

The studies included in this scoping review evidenced various research questions. The recommended formats (ie, how or why questions) were not applied consistently. Therefore, no case study design should be applied because the research question is the major guide for determining the research design. 2 Furthermore, case definitions and designs were applied variably. The lack of standardization is reflected in differences in the reporting of these case studies. Generally, case study research is viewed as allowing much more freedom and flexibility. 5 , 24 However, this flexibility and the lack of uniform specifications lead to confusion.

Methodologic Triangulation

Methodologic triangulation, as described in the literature, can be somewhat confusing as it can refer to either data-collection methods or research designs. 6 , 8 For example, methodologic triangulation can allude to qualitative and quantitative methods, indicating a paradigmatic connection. Methodologic triangulation can also point to qualitative and quantitative data-collection methods, analysis, and interpretation without specific philosophical stances. 6 , 8 Regarding “data-collection methods with no philosophical stances,” we would recommend using the wording “data source triangulation” instead. Thus, the demarcation between the method and the data-collection procedures will be clearer.

Within-Method and Between/Across-Method Triangulation

Yin 1 advocated the use of multiple sources of evidence so that a case or cases can be investigated more comprehensively and accurately. Most studies included multiple data-collection procedures. Five studies employed a variety of qualitative data-collection procedures, and 3 studies used qualitative and quantitative data-collection procedures (mixed methods). In contrast, no study contained 2 or more quantitative data-collection procedures. In particular, quantitative data-collection procedures—such as validated, reliable questionnaires, scales, or assessments—were not used exhaustively. The prerequisites for using multiple data-collection procedures are availability, the knowledge and skill of the researcher, and sufficient financial funds. 1 To meet these prerequisites, research teams consisting of members with different levels of training and experience are necessary. Multidisciplinary research teams need to be aware of the strengths and weaknesses of different data sources and collection procedures. 1

Qualitative methods of analysis and results

When using multiple data sources and analysis methods, it is necessary to present the results in a coherent manner. Although the importance of multiple data sources and analysis has been emphasized, 1 , 5 the description of triangulation has tended to be brief. Thus, traceability of the research process is not always ensured. The sparse description of the data-analysis triangulation procedure may be due to the limited number of words in publications or the complexity involved in merging the different data sources.

Only a few concrete recommendations regarding the operationalization of the data-analysis triangulation with the qualitative data process were found. 25 A total of 3 approaches have been proposed 25 : (1) the intuitive approach, in which researchers intuitively connect information from different data sources; (2) the procedural approach, in which each comparative or contrasting step in triangulation is documented to ensure transparency and replicability; and (3) the intersubjective approach, which necessitates a group of researchers agreeing on the steps in the triangulation process. For each case study, one of these 3 approaches needs to be selected, carefully carried out, and documented. Thus, in-depth examination of the data can take place. Farmer et al 25 concluded that most researchers take the intuitive approach; therefore, triangulation is not clearly articulated. This trend is also evident in our scoping review.

Mixed-methods analysis and results

Few studies in this scoping review used a combination of qualitative and quantitative analysis. However, creating a comprehensive stand-alone picture of a case from both qualitative and quantitative methods is challenging. Findings derived from different data types may not automatically coalesce into a coherent whole. 4 O’Cathain et al 26 described 3 techniques for combining the results of qualitative and quantitative methods: (1) developing a triangulation protocol; (2) following a thread by selecting a theme from 1 component and following it across the other components; and (3) developing a mixed-methods matrix.

The most detailed description of the conducting of triangulation is the triangulation protocol. The triangulation protocol takes place at the interpretation stage of the research process. 26 This protocol was developed for multiple qualitative data but can also be applied to a combination of qualitative and quantitative data. 25 , 26 It is possible to determine agreement, partial agreement, “silence,” or dissonance between the results of qualitative and quantitative data. The protocol is intended to bring together the various themes from the qualitative and quantitative results and identify overarching meta-themes. 25 , 26

The “following a thread” technique is used in the analysis stage of the research process. To begin, each data source is analyzed to identify the most important themes that need further investigation. Subsequently, the research team selects 1 theme from 1 data source and follows it up in the other data source, thereby creating a thread. The individual steps of this technique are not specified. 26 , 27

A mixed-methods matrix is used at the end of the analysis. 26 All the data collected on a defined case are examined together in 1 large matrix, paying attention to cases rather than variables or themes. In a mixed-methods matrix (eg, a table), the rows represent the cases for which both qualitative and quantitative data exist. The columns show the findings for each case. This technique allows the research team to look for congruency, surprises, and paradoxes among the findings as well as patterns across multiple cases. In our review, we identified only one of these 3 approaches in the study by Roots and MacDonald. 20 These authors mentioned that a causal network analysis was performed using a matrix. However, no further details were given, and reference was made to a later publication. We could not find this publication.

Case Studies in Nursing Research and Recommendations

Because it focused on the implementation of NPs in primary health care, the setting of this scoping review was narrow. However, triangulation is essential for research in this area. This type of research was found to provide a good basis for understanding methodologic and data-analysis triangulation. Despite the lack of traceability in the description of the data and methodological triangulation, we believe that case studies are an appropriate design for exploring new nursing roles in existing health care systems. This is evidenced by the fact that case study research is widely used in many social science disciplines as well as in professional practice. 1 To strengthen this research method and increase the traceability in the research process, we recommend using the reporting guideline and reporting checklist by Rodgers et al. 9 This reporting checklist needs to be complemented with methodologic and data-analysis triangulation. A procedural approach needs to be followed in which each comparative step of the triangulation is documented. 25 A triangulation protocol or a mixed-methods matrix can be used for this purpose. 26 If there is a word limit in a publication, the triangulation protocol or mixed-methods matrix needs to be identified. A schematic representation of methodologic and data-analysis triangulation in case studies can be found in Figure 2 .

An external file that holds a picture, illustration, etc.
Object name is 10.1177_01939459241263011-fig2.jpg

Schematic representation of methodologic and data-analysis triangulation in case studies (own work).

Limitations

This study suffered from several limitations that must be acknowledged. Given the nature of scoping reviews, we did not analyze the evidence reported in the studies. However, 2 reviewers independently reviewed all the full-text reports with respect to the inclusion criteria. The focus on the primary care setting with NPs (master’s degree) was very narrow, and only a few studies qualified. Thus, possible important methodological aspects that would have contributed to answering the questions were omitted. Studies describing the triangulation of 2 or more quantitative data-collection procedures could not be included in this scoping review due to the inclusion and exclusion criteria.

Conclusions

Given the various processes described for methodologic and data-analysis triangulation, we can conclude that triangulation in case studies is poorly standardized. Consequently, the traceability of the research process is not always given. Triangulation is complicated by the confusion of terminology. To advance case study research in nursing, we encourage authors to reflect critically on methodologic and data-analysis triangulation and use existing tools, such as the triangulation protocol or mixed-methods matrix and the reporting guideline checklist by Rodgers et al, 9 to ensure more transparent reporting.

Supplemental Material

Acknowledgments.

The authors thank Simona Aeschlimann for her support during the screening process.

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author(s) received no financial support for the research, authorship, and/or publication of this article.

An external file that holds a picture, illustration, etc.
Object name is 10.1177_01939459241263011-img1.jpg

Supplemental Material: Supplemental material for this article is available online.

Node attribute analysis for cultural data analytics: a case study on Italian XX–XXI century music

  • Open access
  • Published: 05 September 2024
  • Volume 9 , article number  56 , ( 2024 )

Cite this article

You have full access to this open access article

case study for data analysis

  • Michele Coscia 1  

Cultural data analytics aims to use analytic methods to explore cultural expressions—for instance art, literature, dance, music. The common thing between cultural expressions is that they have multiple qualitatively different facets that interact with each other in non trivial and non learnable ways. To support this observation, we use the Italian music record industry from 1902 to 2024 as a case study. In this scenario, a possible research objective could be to discuss the relationships between different music genres as they are performed by different bands. Estimating genre similarity by counting the number of records each band published performing a given genre is not enough, because it assumes bands operate independently from each other. In reality, bands share members and have complex relationships. These relationships cannot be automatically learned, both because we miss the data behind their creation, but also because they are established in a serendipitous way between artists, without following consistent patterns. However, we can be map them in a complex network. We can then use the counts of band records with a given genre as a node attribute in a band network. In this paper we show how recently developed techniques for node attribute analysis are a natural choice to analyze such attributes. Alternative network analysis techniques focus on analyzing nodes, rather than node attributes, ending up either being inapplicable in this scenario, or requiring the creation of more complex n-partite high order structures that can result less intuitive. By using node attribute analysis techniques, we show that we are able to describe which music genres concentrate or spread out in this network, which time periods show a balance of exploration-versus-exploitation, which Italian regions correlate more with which music genres, and a new approach to classify clusters of coherent music genres or eras of activity by the distance on this network between genres or years.

Explore related subjects

  • Artificial Intelligence

Introduction

Node attribute analysis has recently been enlarged by the introduction of techniques to calculate the variance of a node attribute (Devriendt et al. 2022 ), estimate distances between two node attributes (Coscia 2020 ), calculating their Pearson correlations (Coscia 2021 ), and cluster them (Damstrup et al. 2023 ) without assuming they live in a simple Euclidean space—or learnable deformation thereof.

These techniques are useful only insofar the network being analyzed has rich node attribute data, and that analyzing their relationships is interesting. This is normally the case in cultural analytics, the use of analytic methods for the exploration of contemporary and historical cultures (Manovich 2020 ; Candia et al. 2019 ). Example range from archaeology—where related artifacts have a number of physical characteristics and can be from different places/ages (Schich et al. 2008 ; Brughmans 2013 ; Mills et al. 2013 ); to art history—where related visual artifacts can be described by a number of meaningful visual characteristics (Salah et al. 2013 ; Hristova 2016 ; Karjus et al. 2023 ); to sociology—where different ideas and opinions distribute over a social network as node attributes (Bail 2014 ; Hohmann et al. 2023 ); to linguistics—with different people in a social network producing content in different languages (Ronen et al. 2014 ); to music—with complex relations between players and informing meta-relationships between the genres they play (McAndrew and Everett 2015 ; Vlegels and Lievens 2017 ).

In this paper we aim at showing the usefulness of node attribute analysis in cultural analytics. We focus on the Italian record music industry since its beginnings in the early XX century until the present time. We build a temporally-evolving bipartite network connecting players with the bands they play in. For each band we know how many records of a given genre they publish, whether they published a record in a given year, and from which Italian region they originate—all node attributes of the band. By applying node attribute analysis, we can address a number of interesting questions. For instance:

How related is a particular music genre to a period? Or to a specific Italian region?

Is the production of a specific genre concentrated in a restricted group of bands or generally spread through the network?

Does clustering genres according to their distribution on the collaboration network conform to our expectation of meta-genres or can we discover a new network-based classification?

Can we use the productivity of related bands across the years as the basis to find eras in music production?

The music scene has been the subject of extensive analysis using networks. Some works focus on music production as an import–export network between countries (Moon et al. 2010 ). Other model composers and performers as nodes connected by collaboration or friendship links (Stebbins 2004 ; Park et al. 2007 ; Gleiser and Danon 2003 ; Teitelbaum et al. 2008 ; McAndrew and Everett 2015 ). Studies investigate how music consumption can inform us about genres (Vlegels and Lievens 2017 ) and listeners influencing each other (Baym and Ledbetter 2009 ; Pennacchioli et al. 2013 ; Pálovics and Benczúr 2013 ). Differently from these studies, we do not focus on asking questions about the network structure itself. For our work, the network structure is interesting only insofar it is the mediator of the relationships between node attributes—the genres, years, and regions the bands are active on –, rather than being the focus of the analysis.

This is an important qualitative distinction, because if one wanted to perform our genre-regional analysis on the music collaboration network without our node attribute analysis, they would have to deal with complex n-partite objects—a player-band-year-genre-region network—which can become unwieldy and unintuitive. On the other hand, with our approach one can work with a unipartite projection of the player-band relationships, and use years, genres, and regions as node attributes, maintaining a highly intuitive representation.

Deep learning techniques and specifically deep neural networks can handle the richness of our data (Aljalbout et al. 2018 ; Aggarwal et al. 2018 ; Pang et al. 2021 ; Ezugwu et al. 2022 ). These approaches can attempt to learn, e.g., the true non-Euclidean distances between genres played by bands (Mahalanobis 1936 ; Xie et al. 2016 ). The problem is that this learning is severely limited if the space is defined by a complex network (Bronstein et al. 2017 ), as is the case here. Therefore, one would have to use Graph Neural Networks (GNN) (Scarselli et al. 2008 ; Wu et al. 2022 ; Zhou et al. 2020 ). However, GNNs focus on node analysis (Bo et al. 2020 ; Tsitsulin et al. 2020 ; Bianchi et al. 2020 ; Zhou et al. 2020 ), usually via finding the best way of creating node embeddings (Perozzi et al. 2014 ; Hamilton et al. 2017 ). GNNs only use node attributes for the purpose of aiding the analysis of nodes rather than analyzing the attributes themselves (Perozzi et al. 2014 ; Zhang et al. 2019 ; Wang et al. 2019 ; Lin et al. 2021 ; Cheng et al. 2021 ; Yang et al. 2023 ). Previous research shows that, when focusing on node attributes rather than on nodes, the techniques we use here are more suitable than adapting GNNs developed with a different focus (Damstrup et al. 2023 ).

Another class of alternative to deal with this data richness is to use hypergraphs (Bretto 2013 ) and high order networks (Bianconi 2021 ; Benson et al. 2016 ; Lambiotte et al. 2019 ; Xu et al. 2016 ). With these techniques, it is possible to analyze relationships involving multiple actors at the same time—rather than only dyadic relationships like in simpler network representations—and encode path dependencies—e.g. using high order random walks where a larger portion of the network is taken into account to decide which node to visit next (Kaufman and Oppenheim 2020 ; Carletti et al. 2020 ). While a comparative analysis between these techniques and the ones used in this paper is interesting, in this paper we exclusively focus on the usefulness of techniques based on node attribute analysis. We leave the comparison with hypergraphs and high order networks as a future work.

Our analysis shows that the node attribute techniques can help addressing a number of interesting research tasks in cultural data analytics. We show that we are able to describe the eclecticism required by music genres—or expressed in time periods –, by how dispersed they are on the music network. We can determine the geographical connection of specific genres, by estimating their correlation not merely based on how many bands from a specific region play a genre, but how bands not playing that genre relate with those that do. We can create new genre categories by looking at how close they are to each other on the music network. We can apply the same logic to discover eras in Italian music production, clustering years into coherent periods.

Finally, we show that our node attribute analysis rest on some assumptions that are likely to be true in our network—that bands tend to share artists if they play similar genres, in similar time periods, and hailing from similar regions.

We release our data as a public good freely accessible by anyone (Coscia 2024 ), along with all the code necessary to reproduce our analysis. Footnote 1

In this section we present our data model and a summary description of the data’s main features. Supplementary Material Section 1 provides all the details necessary to understand our choices when it comes to data collection, cleaning, and pre-processing.

To obtain a coherent network and to limit the scope of our data collection, we focus exclusively on the record credits from published Italian bands. The data from this project comes from crowd-sourced user-generated data. We mainly use Wikipedia Footnote 2 and Discogs. Footnote 3 We should note that these sources have a bias favoring English-speaking productions. While this bias does not affect our data collection too much, since we focus on Italy without comparing it to a different country/culture, it makes it more likely that there are Italian records without credits, or that are simply missing.

figure 1

Our bipartite network data model. Artists in blue, bands in red. Edges are labeled with the first-last year in which the collaboration was active. The edge width is proportional to the weight, which is the number of years in which the artist participated to records released by the band

Figure  1 shows our data model, which is a bipartite network \(G = (V_1, V_2, E)\) . The nodes in the first class \(V_1\) are artists. An artist is a disambiguated physical real person. The nodes in the second class \(V_2\) are bands, which are identified by their name. Note that we consider solo artists as bands, and they are logically different from the artist with the same name. Note how in Fig.  1 we have two nodes labeled “Ginevra Di Marco”, one in red for the band and the other in blue for the artist.

Each edge \((v_1, v_2, t)\) —with \(v_1 \in V_1\) and \(v_2 \in V_2\) —connects an artist if they participated in a record of the band. The bipartite network is temporal. Each edge has a single attribute t reporting the year in which this performance happened. This implies that there are multiple edges between the same artist and the same band, one per year in which the connection existed—for notation convenience, we can use \(w_{v_1,v_2}\) to denote this count for an arbitrary node pair \((v_1, v_2)\) , since it is equivalent to the edge’s weight.

We have multiple attributes on the band. The attributes are divided in three classes. First, we have genres. We recover from Discogs 477 different genres/styles that have been used by at least one band in the network. Each of these genres is an attribute of the band, and the value of the attribute is the number of records the band has released with that genre. We use S to indicate the set of all genres, and show an example of these attributes in Table 1 (first section). The second attribute class is the one-hot encoded geographical region of origin, with each region being a binary vector equal to one if the band originates from the region, zero otherwise. We use R to indicate the set of regions. Table 1 (second section) shows a sample of the values of these attributes. The final attribute class is the activity status of a band in a given year—with Y being the set of years. Similarly to the geographical region, this is a one-hot encoded binary attribute. Table 1 (third section) shows a sample of the values of these attributes.

Summary description

For the remainder of the paper, we limit the scope of the analysis to a projection of our bipartite network. We focus on the band projection of the network, connecting bands if they share artists. We do so to keep the scope contained and show that even by looking at a limited perspective on the data, node attribute analysis can be versatile and open many possibilities. Supplementary Section 2 contains summary statistics about the bipartite network and the other projection—connecting artists with common bands.

There are many ways to perform this projection (Newman 2001 ; Zhou et al. 2007 ; Yildirim and Coscia 2014 ), which result in different edge weights. Here we weight edges by counting the number of years a shared artist has played for either band. Supplementary Material Section 1 contains more details about this weighting scheme. Since we care about the statistical significance—assuming a certain amount of noise in user-generated data—we deploy a network backboning technique to ensure we are not analyzing random fluctuations (Coscia and Neffke 2017 ).

Table 2 shows that the band projection has a low average degree and density, with high clustering coefficient and modularity—which indicate that one can find meaningful communities in the band projection. These are are typical characteristics of a wide variety of complex networks that can be found in the literature.

Table 3 summarizes the top 10 bands according to three standard centrality measures: degree, closeness, and betweenness centrality. Degree is biased by the density of the hip hop cluster—which, as we will see, is a large quasi-clique, including only hip hop bands. Closeness is mostly dominated by alternative rock bands, as they happen to be in the center of mass of the network. The top bands according to betweenness are those bands that are truly the bridges connecting different times, genres, and Italian regions. Note that we analyze the network as a cumulative structure, therefore these centrality rankings are prone to overemphasize bands that are in the central period of the network, as they naturally bridge the whole final structure. In other words, it is harder to be central for very recent or very old bands.

figure 2

The temporal component of the band projection. Each node is a band. Edges connect bands with significant number of artist overlap. The edge’s color encodes its statistical significance (in increasing significance from bright to dark). The edge’s thickness is proportional to the overlap weight. The node’s size is proportional to its betweenness centrality. The node’s color encodes the average year of the band in the data—from blue (low year, less recent) to red (high year, more recent)

We visualize the band projection to show visually the driving forces behind the edge creation process: temporal and genre assortativity. For this reason we produce two visualizations. First, we take on the temporal component in Fig.  2 . The network has a clear temporal dimension, which we decide to place on a left-to-right axis in the visualization, going from older to more recent.

Second, we show the genre component in Fig.  3 , which instead causes clustering—the tendency of bands playing the same genre to connect to each other more than with any other band. For simplicity, we focus on the big three genres—pop, rock, and electronic—plus hip hop, since the latter creates the strongest and most evident cluster notwithstanding being less popular than the other three genres. For each node, if the band published more than a given threshold records in one of those four genres, we color the node with the most popular genre among them. If none of those genres meets the threshold, we count the band as playing an “other” generic category.

figure 3

The genre component of the band projection. Same legend as Fig.  2 , except for the node’s color. Here, color encodes the dominant genre among pop (green), rock (red), electronic (purple), hip hop (blue), and other (gray)

This node categorization achieves a modularity score of 0.524, which is remarkably high considering that it uses no network information at all—and it is not a given that this is the correct number of communities. This is a sign that the network is strongly assortative by genre. With our division in four genres plus other, we observe an assortativity coefficient of 0.689, which is quite high. The assortativity coefficient for the average year of activity is even higher (0.91).

We omit showing the network using the regional information on the bands for two reason. First, there are too many regions (20) to visualize them by using different colors for nodes. Second, the structural relationship between the network and the regions is weaker—the assortativity coefficient being 0.223—which would lead to a less clear visualization.

From the figures and the preliminary analysis, it appears quite evident that the structure of the network has a set of complex and interesting interactions with time, genres, and, to a lesser extent, geography. This means that it is meaningful to use the network structure to estimate the relationship between genres, time, and space. This is the main topic of the paper and we now turn our attention to this analysis.

In this section we investigate a number of potential research questions in cultural data analytics. Each of them is tackled with a different node attribute analysis technique: network variance (Devriendt et al. 2022 ), network correlation (Coscia 2021 ; Coscia and Devriendt 2024 ), and Generalized Euclidean distance (Coscia 2020 )—which is at the basis of node attribute clustering (Damstrup et al. 2023 ) and era discovery. Supplementary Material Section 3 explains in details each of these methods.

Genre specialization

When focusing on the genre attributes of the nodes, their network variance can tell us how concentrated or dispersed they are in the network. A disperse genre means that the bands playing that genre do not share artists, not even indirectly: they are scattered in the structure. Vice versa, a low-variance genre implies that there is a clique of artist playing it, and they are shared by most of the bands releasing records with that particular genre. Table 4 reports the five most (and least) concentrated genres in the network.

We only focus on genres that have a minimum level of use, in this specific case at least 1% of bands must have released at least one record using that specific genre. The values of network variance should be compared with a null version of the genre—the values themselves do not tell us whether they are significant or if we would get that level of variance simply given the popularity of the genre. For this reason we bootstrap a pseudo p-value for the variance.

Let’s assume that \(\mathcal {S}\) is a \(|V| \times |S|\) genre matrix. The \(\mathcal {S}_{v,s}\) entry tells us how many records with genre s the band v has published. We can create \(\mathcal {S}'\) , a randomized null version of \(\mathcal {S}\) . In \(\mathcal {S}'\) , we ensure that each null genre has the same number of records as it has in \(\mathcal {S}\) . We do so by extracting with replacement at random \(\sum \limits _{v \in V} \mathcal {S}_{v,s}\) bands for genre s . The random extraction is not uniform: each band has a probability of being extracted proportional to \(\sum \limits _{s \in S} \mathcal {S}_{v,s}\) . In this way, \(\mathcal {S}'\) has the same column sum and similar row sum as \(\mathcal {S}\) . In other word, we randomize \(\mathcal {S}\) preserving the popularity of each genre and each band. Then, we can count the number of such random \(\mathcal {S}'\) s in which the null genre has a higher (lower) variance than the observed genre.

Table 4 shows that stoner rock has a high and significant variance, indicating that bands playing stoner rock have a low degree of specialization. This can be contextualized by the fact that stoner rock was tried out unsystematically by a few unrelated bands, ranging from heavy metal to indie rock. On the other hand, many variants of heavy metal have low variance. This can be explained by the fact that heavy metal is a niche genre in Italy, and all bands playing specific heavy metal variants know each other and share members.

figure 4

Two genres ( a Hip Hop, b Beat) with different variance. Node size, node definition, and edge thickness, color, and definition is the same as Fig. 2 . The color is proportional to the genre-band node attribute value, with bright colors for low values and dark colors for high values

In Fig.  4 we pick two representative genres—Hip Hop and Beat—which both have the same relatively high popularity in number of bands playing them, and have a significant (low or high) variance and we show how they look like on the network. The figure shows that the variance measure does what we intuitively think it should be doing: the Hip Hop bands have low variance and therefore strongly cluster in the network, while the Beat bands are more scattered.

Temporal variety

We are not limited to the calculation of variances for genres: we can perform the same operation for the years. If the variance of a genre tells us how diverse the set of bands playing is, the variance of a year can tell us how diverse the year was. Figure  5 shows the evolution of variances per year. We test the statistical significance of the observed variance value by shuffling the values of the node attribute for a given year a number of times, testing whether the observation is significantly higher, lower, or equal to this expectation.

figure 5

The network variance (y axis) for a given decade (x axis). Background color indicates the statistical significance: red = lower than expected, green = higher than expected, white = not significantly different from expectation

From the figure we can see that there seems to be two phase transitions. In the first regime, we have an infancy phase with low activity and low variance. The first phase transition starts in the year 1960 and brings the network to a second regime of high activity and high variance. After the peak around the year 1980, a second phase transition introduces the third regime from the mid 90 s until the present, with high activity but low variance. In the latter years, we see hip hop cannibalizing all genres and compressing the record releases to its tightly-knit cluster.

Node attribute correlation

We can now shift our attention from describing a single node attribute at a time—its variance as we saw in the previous sections—to describing the relationships between pairs of attributes. In this section, we do so by calculating their network correlation. Specifically, we want to make a geographical analysis. The ultimate aim is to answer the question: what are some particular strong genre-region associations? We can answer the question by calculating the network correlation between two node attributes, one recording the genre intensity for a band and the other a binary value telling us whether the band is from a specific region or not. The network correlation is useful here, because it grows not only if there are a lot of bands playing that specific genre in that specific region, but also if the other bands in the region that do not play that genre are close in the network to—i.e. share members with—bands playing that genre.

In Table 5 we report some significant region-genre associations. For each region, we pick the most popular genre in the network to which they correlate at a significant level—and they have the highest correlation among all other regions that correlate significantly to that genre. The significance is estimated via bootstrapping, by randomly shuffling the region vector—i.e. changing the set of bands associated to the region while respecting its size. Table 5 does not report a genre for all regions, because for some regions there was no genre satisfying the constraints. Note that some regions might correlate more strongly or more significantly with a genre that is not reported in the table, but we omit it if there was another region with a stronger correlation for that genre.

Genre clusters

When we measure the pairwise distance between all node attributes systematically we can cluster them hierarchically. Here, we do such a network-based hierarchical clustering on the music genres and styles as recorded by Discogs. The aim is to see whether we can find groups of genres that are similar to each other, potentially informing a data-driven musical classification. Figure  6 shows a bird’s eye view of the hierarchical clustering, with the similarity matrix and the dendrogram.

figure 6

The hierarchical genre clusters. The heatmap shows the pairwise similarity among the genres—from low (dark) to high (bright) similarity. The dendrograms show the hierarchical organization of the clusters

To make sense of it, we have selected some clusters, for illustrative purposes only. Table 6 shows what genres and styles from Discogs end up in the color-highlighted clusters from Fig.  6 . We can see that the clusters include similar genres which make as a coherent set of more general music styles. The figure also highlights that there is a hierarchical structure of music styles, with meaningful clusters-within-clusters, and clear demarcation lines between groups and subgroups.

Recall that these clusters are driven exclusively by the network’s topology and do not use any feature coming from the songs themselves. This means that using a network of shared members among bands is indeed insightful in figuring our the related genres these bands play. Therefore, network-based clustering has the potential to guide the definition of new musical classifications.

Temporal clusters

We now look at the eras of Italian music we can discover in the data. Figure  7 shows the dendrogram, connecting years and groups of years at a small network distance to each other. Each era we identify colors its corresponding branch in the dendrogram. We avoid assigning an era for years pre-1906 and post-2018, due to issues with the representativeness of the data. We also notice that the 1938–1945 period is tumultuous, with many small eras in a handful of years, which is understandable given the geopolitical situation at a time, and so we ignore that period as well.

figure 7

The eras dendrogram. Clusters join at a height proportional to their similarity level (the more right, the less similar). Colors encode the detected eras with labels on the left

To make sense of temporal clustering, the standard approach in the literature would be to compare counts of activities across clusters. However, that would ignore the role of the network structure. In our framework, we can characterize eras applying the same logic used to find them. We calculate the network distance between a node attribute representing the era and each genre. The era’s node attribute simply reports, for each band, a normalized count of records they released within the bounds of that era. We normalize so that each era attribute sums to one, to avoid overpowering the signal with the scale of the largest and most active eras.

Then, for each era, we report the list of genres that have the smallest distance with that era. Note that some genres might still have a small distance with other eras, but we only report the smallest. These are the genres we use to label the eras in Fig.  7 . These genres are not the most dominant in that era—in almost all cases, pop and rock dominate—but they give an intuition of what was the most characteristic genre of the era, distinguishing it from the others.

We can see that the characterization makes intuitive sense, with the classical genres being particularly correlated with the 1906–1916 era. Beat and rock’n’roll are particularly associated to the 1965–1971 period, the dates corresponding to the British Invasion in Italy. Notably, the punk genre has its closest association with the most recent era we label, 2006–2017, proving that—at least in Italy—punk is indeed not dead.

Explaining the network

Wrapping up the analysis, one key assumption that underpins the analysis we made so far is that the connections in the band projection follow a few homophily rules. We can have meaningful genre (Sect. Genre clusters  ) and temporal (Sect.  Temporal Clusters ) clusters using our network distance measures only if bands do tend to connect if they have a genre or temporal similarity. Two bands should be more likely to share members if they play similar genres and if they do it at a similar point in time. More weakly, correlations between genres and geographical regions (Sect. Node Attribute Correlation  ) also make sense if bands with similar geographical origins also tend to share members more often than expected.

While proving this assumption would require a paper on its own, we can at least provide some evidence in favor of its reasonableness. We do so by running two linear regressions. In the first regression, we want to explain the likelihood of an edge to exist in the band projection with the genre, temporal, and geographical similarity between bands, or:

In this formula:

\(Y_{u,v}\) is a binary variable, equal to 1 if bands u and v shared at least one member, and zero otherwise;

\(\mathcal {G}_{u,v}\) is the genre similarity, which is the cosine similarity between the vectors recording how many records of a given genre bands u and v have published;

\(\mathcal {R}_{u,v}\) is the region similarity, equal to 1 if the bands originate from the same region, and zero otherwise;

\(\mathcal {T}_{u,v}\) is the temporal similarity, in which we take the logarithm of the number of years in which both bands released a record, plus one to counter the issue when the bands did not share a year;

\(\beta _0\) and \(\epsilon \) are the intercept and the residuals.

Note that \(Y_{u,v}\) contains all links with weight of at least one, even those that are not statistically significant and were dropped from our visualizations and analyses from the previous sections. Moreover, it also has to contain all non-links. However, since the network is sparse, it is not feasible to have all non-links in the regression. Thus, we perform a balanced negative sampling: for each link that exists we sample and include in \(Y_{u,v}\) a link that does not.

For \(\mathcal {G}_{u,v}\) we only consider the most popular 38 genres, since sparsely used genres would make bands more similar than what they would otherwise be.

The first column of Table 7 shows the result of the model. The first thing we can see is that we can explain 28.4% of the variance in the likelihood of a edge to exist. This means that 71.6% of the reasons why two bands share a member is not in our data—be it unrecorded social networks, random chance, impositions from labels, etc.

However, explaining 28.4% of the variance in the edge existence likelihood still provides a valid clue that our homophily assumptions should hold. All similarities we considered play a role in determining the existence of an edge: all of their coefficients are positive and statistically significant. Given that these similarity measures do not share the same units—and not even the same domain –, one cannot compare the coefficients directly. However, we can map their contributions to the \(R^2\) by estimating their relative importance (Feldman 2005 ; Grömping 2007 ), which we do in Fig.  8 . From the figure we can see that it is the temporal similarity the one playing the strongest role, closely followed by genre similarity. Spatial similarity, on the other hand, while still being statistically significant, provides little to no additional explanatory power to the other factors.

figure 8

The relative importance of each explanatory variable to determine the existence of a link between two bands in the band projection

Once we establish that the existence of the connection is related to genre, temporal, and geographical similarity, we can ask the same question about the strength of the relationship between two bands. We apply the same model as before, changing the target variable:

Here, \(\log (W_{u,v})\) is the logarithm of the edge weight. Note that here we only focus on those edges that have a non-zero weight, i.e. those that exist. This is because we do not want this model to try and predict also edge existence, beside its strength, as we already took care of that problem with the previous model.

Table 7 contains the results in its second column. We can see that, also in this case, all three factors are significant predictors of the edge weights. The number of artists two bands share goes up if the two bands play similar genres, with temporal overlap, and if they originate from the same region. The \(R^2\) is noticeably lower, though, which means that \(\log (W_{u,v})\) is harder to predict than \(Y_{u,v}\) .

Figure  9 shows the same \(R^2\) decomposition we did in Fig.  8 for \(Y_{u,v}\) . All explanatory variables explain less variance than in the previous model. Relative to each other, the temporal overlap is the factor gaining more importance than genre similarity.

figure 9

The relative importance of each explanatory variable to determine the weight of a link between two bands in the band projection

In this paper we have provided a showcase of the analyses and conclusions one could do in cultural data analytics by using node attribute analysis. We focused on the case study of Italian music from the past 120 years. We built a bipartite network connecting artists to bands and then projected it to analyze a band-band network. We have shown how one could identify genres concentrating in such a network, hinting at clusters of bands playing homogeneous genres, using network variance. We have shown a geographical analysis, calculating the network correlation between the region of origin of bands and the genres they play. We have shown how one could create a new music genre taxonomy by performing node attribute clustering on music genre data. We also proposed a novel way of performing era detection in a network, by finding clusters of similar consecutive years, where years are node attributes.

While we believe our analysis is insightful, there are a number of considerations that need to be made to contextualize our work. We can broadly categorize the limitations in two categories: the one relating to the domain of analysis, and the methodological ones.

When it comes to cultural data analytics, we acknowledge the fact that we are working with user-generated data. There is no guarantee that the data is free from significant mistakes, misleading entries, and incompleteness. Furthermore, our results might not be conclusive. We process data semi-automatically, and the coding process is not complete, meaning we miss a considerable amount of the lesser known artists. This also means that there could be biases in the data collection, induced by our decision on the order in which we explore the structure—which might be focusing too much or too little on specific areas of Italian music. As a specific example, in our project we have ignored another potentially rich source of node attributes: information about the music labels/publishers. This is available on Discogs, and we could envision a label to be represented as a node vector, whose entries are the number of records a specific label published for a specific band. We plan to use this information for future work. The coding process is still ongoing, and we expect to be able to complete the network in the near future.

On the methodlogical side, we point out that what we did is only possible in the presence of rich metadata—dozens if not hundreds of node attributes. Networks with scarce node attribute data would not be amenable to be analyzed with the techniques we propose here. However, in cultural data analytics, there is usually a high richness of metadata. Furthermore, many of the node attribute techniques only make sense if the node attributes are somehow correlated with the network structure. The musical genre clustering or the era detection would not produce meaningful results if the probability of two nodes of connecting was not influenced by their attributes—i.e. if the homophily hypothesis does not hold. In our case, the homophily assumption likely holds, as we show in Sect.  Explaining the Network .

When considering some specific analyses we performed other limitations emerge. For instance, our era discovery approach exclusively looks at node activities. However, structural changes in the network’s connections also play a key role in determining discontinuities with the past (Berlingerio et al. 2013 ). We should explore in future work how to integrate our node attribute approach with structural methods. When it comes to the use of network variance, how to properly estimate its confidence intervals without using bootstrapping remains a future work. Therefore, the results we present here should be taken with caution, as it might be that some of the patterns we highlight are not statistically significant.

On a more practical side, our node attribute techniques hinge on specific matrix operations. While these can be efficiently computed on GPU using tensor representations, this might put a limit on the size of the networks analyzed, which have to fit in the GPU’s memory.

Availability of data and materials

All data and code necessary to replicate our results are available at http://www.michelecoscia.com/?page_id=2336 and Coscia ( 2024 ).

http://www.michelecoscia.com/?page_id=2336 .

https://it.wikipedia.org .

https://www.discogs.com/ .

Aggarwal CC (2018) Neural networks and deep learning. Springer 10(978):3

Google Scholar  

Aljalbout E, Golkov V, Siddiqui Y, Strobel M, Cremers D (2018) Clustering with deep learning: taxonomy and new methods. arXiv preprint arXiv:1801.07648

Bail CA (2014) The cultural environment: measuring culture with big data. Theory Soc 43:465–482

Article   Google Scholar  

Baym NK, Ledbetter A (2009) Tunes that bind? predicting friendship strength in a music-based social network. Inf Commun Soc 12(3):408–427

Benson AR, Gleich DF, Leskovec J (2016) Higher-order organization of complex networks. Science 353(6295):163–166

Berlingerio M, Coscia M, Giannotti F, Monreale A, Pedreschi D (2013) Evolving networks: eras and turning points. Intell Data Anal 17(1):27–48

Bianchi FM, Grattarola D, Alippi C (2020) Spectral clustering with graph neural networks for graph pooling. In: International conference on machine learning, pp 874–883. PMLR

Bianconi G (2021) Higher-order networks. Cambridge University Press, Cambridge

Book   Google Scholar  

Bo D, Wang X, Shi C, Zhu M, Lu E, Cui P (2020) Structural deep clustering network. In: Proceedings of the web conference 2020, pp 1400–1410

Bretto A (2013) Hypergraph theory. An introduction. Mathematical Engineering. Springer, Cham 1

Bronstein MM, Bruna J, LeCun Y, Szlam A, Vandergheynst P (2017) Geometric deep learning: going beyond Euclidean data. IEEE Signal Process Mag 34(4):18–42

Brughmans T (2013) Thinking through networks: a review of formal network methods in archaeology. J Archaeol Method Theory 20:623–662

Candia C, Jara-Figueroa C, Rodriguez-Sickert C, Barabási A-L, Hidalgo CA (2019) The universal decay of collective memory and attention. Nat Hum Behav 3(1):82–91

Carletti T, Battiston F, Cencetti G, Fanelli D (2020) Random walks on hypergraphs. Phys Rev E 101(2):022308

Article   MathSciNet   Google Scholar  

Cheng J, Wang Q, Tao Z, Xie D, Gao Q (2021) Multi-view attribute graph convolution networks for clustering. In: Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence, pp 2973–2979

Coscia M Italian XX-XXI Century Music. https://doi.org/10.5281/zenodo.13309793

Coscia M (2020) Generalized euclidean measure to estimate network distances. Proc Int AAAI Conf Web Soc Media 14:119–129

Coscia M (2021) Pearson correlations on complex networks. J Complex Netw 9(6):036

Coscia M, Devriendt K (2024) Pearson correlations on networks: Corrigendum. arXiv preprint arXiv:2402.09489

Coscia M, Neffke FM (2017) Network backboning with noisy data. In: 2017 IEEE 33rd international conference on data engineering (ICDE), pp 425–436 . IEEE

Damstrup ASR, Madsen ST, Coscia M (2023) Revised learning via network-aware embeddings. arXiv preprint arXiv:2309.10408

Devriendt K, Martin-Gutierrez S, Lambiotte R (2022) Variance and covariance of distributions on graphs. SIAM Rev 64(2):343–359

Ezugwu AE, Ikotun AM, Oyelade OO, Abualigah L, Agushaka JO, Eke CI, Akinyelu AA (2022) A comprehensive survey of clustering algorithms: state-of-the-art machine learning applications, taxonomy, challenges, and future research prospects. Eng Appl Artif Intell 110:104743

Feldman BE (2005) Relative importance and value. Available at SSRN 2255827

Gleiser PM, Danon L (2003) Community structure in jazz. Adv Complex Syst 6(04):565–573

Grömping U (2007) Relative importance for linear regression in r: the package relaimpo. J Stat Softw 17:1–27

Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. Adv Neural Inf Process Syst 30

Hohmann M, Devriendt K, Coscia M (2023) Quantifying ideological polarization on a network using generalized Euclidean distance. Sci Adv 9(9):2044

Hristova S (2016) Images as data: cultural analytics and aby warburg’s mnemosyne. Int J Digital Art History (2)

Karjus A, Solà MC, Ohm T, Ahnert SE, Schich M (2023) Compression ensembles quantify aesthetic complexity and the evolution of visual art. EPJ Data Science 12(1):21

Kaufman T, Oppenheim I (2020) High order random walks: beyond spectral gap. Combinatorica 40:245–281

Lambiotte R, Rosvall M, Scholtes I (2019) From networks to optimal higher-order models of complex systems. Nat Phys 15(4):313–320

Lin Z, Kang Z, Zhang L, Tian L (2021) Multi-view attributed graph clustering. IEEE Trans knowl Data Eng

Mahalanobis P (1936) On the generalized distance in statistics. National Institute of Science of India

Manovich L (2020) Cultural analytics. MIT Press, Cambridge

McAndrew S, Everett M (2015) Music as collective invention: a social network analysis of composers. Cult Sociol 9(1):56–80

Mills BJ, Clark JJ, Peeples MA, Haas WR Jr, Roberts JM Jr, Hill JB, Huntley DL, Borck L, Breiger RL, Clauset A (2013) Transformation of social networks in the late pre-hispanic us southwest. Proc Natl Acad Sci 110(15):5785–5790

Moon S-I, Barnett GA, Lim YS (2010) The structure of international music flows using network analysis. New Media Soc 12(3):379–399

Newman ME (2001) Scientific collaboration networks. ii. Shortest paths, weighted networks, and centrality. Phys Rev E 64(1):016132

Pálovics R, Benczúr AA (2013) Temporal influence over the last. FM social network. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining, pp 486–493

Pang G, Shen C, Cao L, Hengel AVD (2021) Deep learning for anomaly detection: a review. ACM Comput Surv (CSUR) 54(2):1–38

Park J, Celma O, Koppenberger M, Cano P, Buldú JM (2007) The social network of contemporary popular musicians. Int J Bifurc Chaos 17(07):2281–2288

Pennacchioli D, Rossetti G, Pappalardo L, Pedreschi D, Giannotti F, Coscia M (2013) The three dimensions of social prominence. In: International conference on social informatics, pp 319–332 . Springer

Perozzi B, Akoglu L, Iglesias Sánchez P, Müller E (2014) Focused clustering and outlier detection in large attributed graphs. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1346–1355

Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 701–710

Ronen S, Gonçalves B, Hu KZ, Vespignani A, Pinker S, Hidalgo CA (2014) Links that speak: the global language network and its association with global fame. Proc Natl Acad Sci 111(52):5616–5622

Salah AA, Manovich L, Salah AA, Chow J (2013) Combining cultural analytics and networks analysis: Studying a social network site with user-generated content. J Broadcast Electron Media 57(3):409–426

Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2008) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80

Schich M, Hidalgo C, Lehmann S, Park J (2008) The network of subject co-popularity in classical archaeology

Stebbins RA (2004) Music among friends: the social networks of amateur musicians’. Popular music: critical concepts in media and cultural studies 1:227–245

Teitelbaum T, Balenzuela P, Cano P, Buldú JM (2008) Community structures and role detection in music networks. Chaos Interdiscipl J Nonlinear Sci 18(4)

Tsitsulin A, Palowitch J, Perozzi B, Müller E (2020) Graph clustering with graph neural networks. arXiv preprint arXiv:2006.16904

Vlegels J, Lievens J (2017) Music classification, genres, and taste patterns: a ground-up network analysis on the clustering of artist preferences. Poetics 60:76–89

Wang C, Pan S, Hu R, Long G, Jiang J, Zhang C (2019) Attributed graph clustering: a deep attentional embedding approach. arXiv preprint arXiv:1906.06532

Wu L, Cui P, Pei J, Zhao L, Guo X (2022) Graph neural networks: foundation, frontiers and applications. In: Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, pp 4840–4841

Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International Conference on Machine Learning, pp. 478–487 . PMLR

Xu J, Wickramarathne TL, Chawla NV (2016) Representing higher-order dependencies in networks. Sci Adv 2(5):1600028

Yang S, Verma S, Cai B, Jiang J, Yu K, Chen F, Yu S (2023) Variational co-embedding learning for attributed network clustering. Knowl-Based Syst 270:110530

Yildirim MA, Coscia M (2014) Using random walks to generate associations between objects. PLoS ONE 9(8):104813

Zhang C, Song D, Huang C, Swami A, Chawla NV (2019) Heterogeneous graph neural network. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 793–803

Zhou T, Ren J, Medo M, Zhang Y-C (2007) Bipartite network projection and personal recommendation. Phys Rev E 76(4):046115

Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2020) Graph neural networks: a review of methods and applications. AI Open 1:57–81

Zhou K, Huang X, Li Y, Zha D, Chen R, Hu X (2020) Towards deeper graph neural networks with differentiable group normalization. Adv Neural Inf Process Syst 33:4917–4928

Download references

Acknowledgements

The author is thankful to Amy Ruskin for the project’s idea, and to Seth Pate and Clara Vandeweerdt for insightful discussions.

Author information

Authors and affiliations.

CS Department, IT University of Copenhagen, Copenhagen, Denmark

Michele Coscia

You can also search for this author in PubMed   Google Scholar

Contributions

M.C. designed and performed all experiments, prepared figures, and wrote and approved the manuscript.

Corresponding author

Correspondence to Michele Coscia .

Ethics declarations

Competing interests.

The authors declare that they have no competing interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary material 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Coscia, M. Node attribute analysis for cultural data analytics: a case study on Italian XX–XXI century music. Appl Netw Sci 9 , 56 (2024). https://doi.org/10.1007/s41109-024-00669-5

Download citation

Received : 17 May 2024

Accepted : 27 August 2024

Published : 05 September 2024

DOI : https://doi.org/10.1007/s41109-024-00669-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Cultural data analytics
  • Complex networks
  • Data clustering
  • Temporal analysis
  • Find a journal
  • Publish with us
  • Track your research

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

ijgi-logo

Article Menu

case study for data analysis

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Post-occupancy evaluation of the improved old residential neighborhood satisfaction using principal component analysis: the case of wuxi, china.

case study for data analysis

1. Introduction

1.1. research background, 1.2. post-occupancy evaluation, 2. materials and methods, 2.1. study site, 2.2. sampling, 2.3. data collection, 2.3.1. survey instruments and procedure, 2.3.2. ethical considerations, 2.4. data analysis, 2.5. principal component analysis (pca), 3.1. residents’ socioeconomic characteristics, 3.2. main factors, 4. discussion, 4.1. outdoor recreation, 4.2. transport facilities and small parks, 4.3. public service facilities, 4.4. natural environment condition, 4.5. social and human environment, 4.6. safety and security, 4.7. infrastructure and entrance structures, 4.8. public environment and waste facilities, 4.9. limitation of the study, 5. conclusions, supplementary materials, author contributions, data availability statement, conflicts of interest.

  • Alonso, J.M.; Andrews, R.; Jorda, V. Do Neighbourhood Renewal Programs Reduce Crime Rates? Evidence from England. J. Urban Econ. 2019 , 110 , 51–69. [ Google Scholar ] [ CrossRef ]
  • Dai, C.; Maruthaveeran, S.; Shahidan, M.F.; Chu, Y. Landscape Preference Evaluation of Old Residential Neighbourhoods: A Case Study in Shi Jiazhuang, Hebei Province, China. Forests 2023 , 14 , 375. [ Google Scholar ] [ CrossRef ]
  • Carmon, N. Three Generations of Urban Renewal Policies: Analysis and Policy Implications. Geoforum 1999 , 30 , 145–158. [ Google Scholar ] [ CrossRef ]
  • Givord, P.; Quantin, S.; Trevien, C. A Long-Term Evaluation of the First Generation of French Urban Enterprise Zones. J. Urban Econ. 2018 , 105 , 149–161. [ Google Scholar ] [ CrossRef ]
  • Müller, A.; Hummel, M.; Smet, K.; Grabner, D.; Litschauer, K.; Imamovic, I.; Özer, F.E.; Kranzl, L. Why Renovation Obligations Can Boost Social Justice and Might Reduce Energy Poverty in a Highly Decarbonised Housing Sector. Energy Policy 2024 , 191 , 114168. [ Google Scholar ] [ CrossRef ]
  • van Gent, W.P.C.; Musterd, S.; Ostendorf, W. Disentangling Neighbourhood Problems: Area-Based Interventions in Western European Cities. Urban Res. Pract. 2009 , 2 , 53–67. [ Google Scholar ] [ CrossRef ]
  • Guo, B.; Li, Y.; Wang, J. The Improvement Strategy on the Management Status of the Old Residence Community in Chinese Cities: An Empirical Research Based on Social Cognitive Perspective. Cogn. Syst. Res. 2018 , 52 , 556–570. [ Google Scholar ] [ CrossRef ]
  • Jiang, W.; Feng, T.; Timmermans, H.; Li, H. A Gap-Theoretical Path Model of Residential Satisfaction and Intention to Move House Applied to Renovated Historical Blocks in Two Chinese Cities. Cities 2017 , 71 , 19–29. [ Google Scholar ] [ CrossRef ]
  • Dai, C.; Maruthaveeran, S.; Shahidan, M.F.; Chu, Y. Usage of and Barriers to Green Spaces in Disadvantaged Neighborhoods: A Case Study in Shi Jiazhuang, Hebei Province, China. Forests 2023 , 14 , 435. [ Google Scholar ] [ CrossRef ]
  • Yu, Q. Outdoor Space Analysis and Thinking in Old Urban Residential Area Under the Goal of Healthy Settlements ; Atlantis Press: Amsterdam, The Netherlands, 2020. [ Google Scholar ]
  • Wang, S.; Zhang, J.; Wang, F.; Dong, Y. How to Achieve a Balance between Functional Improvement and Heritage Conservation? A Case Study on the Renewal of Old Beijing City. Sustain. Cities Soc. 2023 , 98 , 104790. [ Google Scholar ] [ CrossRef ]
  • Liu, Z.; Ma, L. Residential Experiences and Satisfaction of Public Housing Renters in Beijing, China: A before-after Relocation Assessment. Cities 2021 , 113 , 103148. [ Google Scholar ] [ CrossRef ]
  • Fu, Z.; Dong, P.; Li, S.; Ju, Y.; Liu, H. How Blockchain Renovate the Electric Vehicle Charging Services in the Urban Area? A Case Study of Shanghai, China. J. Clean. Prod. 2021 , 315 , 128172. [ Google Scholar ] [ CrossRef ]
  • Gao, H.; Wang, T.; Gu, S. A Study of Resident Satisfaction and Factors That Influence Old Community Renewal Based on Community Governance in Hangzhou: An Empirical Analysis. Land 2022 , 11 , 1421. [ Google Scholar ] [ CrossRef ]
  • Li, N.; Miao, X.; Geng, W.; Li, Z.; Li, L. Comprehensive Renovation and Optimization Design of Balconies in Old Residential Buildings in Beijing: A Study. Energy Build. 2023 , 295 , 113296. [ Google Scholar ] [ CrossRef ]
  • Zhang, W.; Zhang, X.; Wu, G. The Network Governance of Urban Renewal: A Comparative Analysis of Two Cities in China. Land Use Policy 2021 , 106 , 105448. [ Google Scholar ] [ CrossRef ]
  • Zhao, Y.; An, N.; Chen, H.; Tao, W. Politics of Urban Renewal: An Anatomy of the Conflicting Discourses on the Renovation of China’s Urban Village. Cities 2021 , 111 , 103075. [ Google Scholar ] [ CrossRef ]
  • Liu, Y.; Li, H.; Li, W.; Wang, S. Renovation Priorities for Old Residential Districts Based on Resident Satisfaction: An Application of Asymmetric Impact-Performance Analysis in Xi’an, China. PLoS ONE 2021 , 16 , e0254372. [ Google Scholar ] [ CrossRef ]
  • Dong, Y.; Li, F.; Cao, J.; Dong, W. What Neighborhood Factors Are Critical to Resident Satisfaction with Old Neighborhoods? An Integration of Ground Theory and Impact Asymmetry Analysis. Cities 2023 , 141 , 104460. [ Google Scholar ] [ CrossRef ]
  • de Magalhães, C. Urban Regeneration. In International Encyclopedia of the Social & Behavioral Sciences , 2nd ed.; Wright, J.D., Ed.; Elsevier: Oxford, UK, 2015; pp. 919–925. ISBN 978-0-08-097087-5. [ Google Scholar ]
  • Mouratidis, K. Neighborhood Characteristics, Neighborhood Satisfaction, and Well-Being: The Links with Neighborhood Deprivation. Land Use Policy 2020 , 99 , 104886. [ Google Scholar ] [ CrossRef ]
  • Lovejoy, K.; Handy, S.; Mokhtarian, P. Neighborhood Satisfaction in Suburban versus Traditional Environments: An Evaluation of Contributing Characteristics in Eight California Neighborhoods. Landsc. Urban Plan. 2010 , 97 , 37–48. [ Google Scholar ] [ CrossRef ]
  • Galster, G.C.; Hesser, G.W. Residential Satisfaction: Compositional and Contextual Correlates. Environ. Behav. 1981 , 13 , 735–758. [ Google Scholar ] [ CrossRef ]
  • Dong, W.; Cao, X.; Wu, X.; Dong, Y. Examining Pedestrian Satisfaction in Gated and Open Communities: An Integration of Gradient Boosting Decision Trees and Impact-Asymmetry Analysis. Landsc. Urban Plan. 2019 , 185 , 246–257. [ Google Scholar ] [ CrossRef ]
  • Temelová, J.; Dvořáková, N. Residential Satisfaction of Elderly in the City Centre: The Case of Revitalizing Neighbourhoods in Prague. Cities 2012 , 29 , 310–317. [ Google Scholar ] [ CrossRef ]
  • Zimring, C.; Reizenstein, J. Post-Occupancy Evaluation. Environment and Behavior 1980 , 2 , 429–450. [ Google Scholar ] [ CrossRef ]
  • Preiser, W.F.E. (Ed.) Building Evaluation ; Springer US: Boston, MA, USA, 1989; ISBN 978-1-4899-3724-7. [ Google Scholar ]
  • Bai, X.; Xie, Z.; Dewancker, B.J. Exploring the Factors Affecting User Satisfaction in Poverty Alleviation Relocation Housing for Minorities through Post-Occupancy Evaluation: A Case Study of Pu’er. Sustainability 2022 , 14 , 15167. [ Google Scholar ] [ CrossRef ]
  • Carnemolla, P.; Debono, D.; Hourihan, F.; Hor, S.; Robertson, H.; Travaglia, J. The Influence of the Built Environment in Enacting a Household Model of Residential Aged Care for People Living with a Mental Health Condition: A Qualitative Post-Occupancy Evaluation. Health Place 2021 , 71 , 102624. [ Google Scholar ] [ CrossRef ]
  • David Jiboye, A. Post-Occupancy Evaluation of Residential Satisfaction in Lagos, Nigeria: Feedback for Residential Improvement. Front. Archit. Res. 2012 , 1 , 236–243. [ Google Scholar ] [ CrossRef ]
  • Ha, M.J. A Study on the POE (Post Occupancy Evaluation) According to the Residential Environment of Mixed-Use Apartment Complexes In Seoul. Int. J. High-Rise Build. 2020 , 9 , 197–212. [ Google Scholar ] [ CrossRef ]
  • Agha-Hossein, M.M.; El-Jouzi, S.; Elmualim, A.A.; Ellis, J.; Williams, M. Post-Occupancy Studies of an Office Environment: Energy Performance and Occupants’ Satisfaction. Build. Environ. 2013 , 69 , 121–130. [ Google Scholar ] [ CrossRef ]
  • Colclough, S.; Hegarty, R.O.; Murray, M.; Lennon, D.; Rieux, E.; Colclough, M.; Kinnane, O. Post Occupancy Evaluation of 12 Retrofit nZEB Dwellings: The Impact of Occupants and High in-Use Interior Temperatures on the Predictive Accuracy of the nZEB Energy Standard. Energy Build. 2022 , 254 , 111563. [ Google Scholar ] [ CrossRef ]
  • Pastore, L.; Andersen, M. Building Energy Certification versus User Satisfaction with the Indoor Environment: Findings from a Multi-Site Post-Occupancy Evaluation (POE) in Switzerland. Build. Environ. 2019 , 150 , 60–74. [ Google Scholar ] [ CrossRef ]
  • Zhang, Q.; Lee, J.; Jiang, B.; Kim, G. Revitalization of the Waterfront Park Based on Industrial Heritage Using Post-Occupancy Evaluation—A Case Study of Shanghai (China). IJERPH 2022 , 19 , 9107. [ Google Scholar ] [ CrossRef ]
  • Byrne, J.J.; Morrison, G.M. Pre- and Post-Occupancy Evaluation of Resident Motivations for and Experiences of Establishing a Home in a Low-Carbon Development. Sustainability 2019 , 11 , 3970. [ Google Scholar ] [ CrossRef ]
  • El-Darwish, I.I.; El-Gendy, R.A. Post Occupancy Evaluation of Thermal Comfort in Higher Educational Buildings in a Hot Arid Climate. Alex. Eng. J. 2018 , 57 , 3167–3177. [ Google Scholar ] [ CrossRef ]
  • Ildiri, N.; Bazille, H.; Lou, Y.; Hinkelman, K.; Gray, W.A.; Zuo, W. Impact of WELL Certification on Occupant Satisfaction and Perceived Health, Well-Being, and Productivity: A Multi-Office Pre- versus Post-Occupancy Evaluation. Build. Environ. 2022 , 224 , 109539. [ Google Scholar ] [ CrossRef ]
  • Lei, Q.; Lau, S.S.Y.; Chao, Y.; Qi, Y. Post-Occupancy Evaluation of the Biophilic Design in the Workplace for Health and Wellbeing. Buildings 2022 , 12 , 417. [ Google Scholar ] [ CrossRef ]
  • Preiser, W.; Nasar, J. Assessing Building Performance: Its Evolution from Post-Occupancy Evaluation. Archnet-IJAR Int. J. Archit. Res. 2008 , 2 , 84–99. [ Google Scholar ] [ CrossRef ]
  • Alborz, N.; Berardi, U. A Post Occupancy Evaluation Framework for LEED Certified U.S. Higher Education Residence Halls. Procedia Eng. 2015 , 118 , 19–27. [ Google Scholar ] [ CrossRef ]
  • Hotelling, H. Analysis of a Complex of Statistical Variables into Principal Components. J. Educ. Psychol. 1933 , 24 , 417–441. [ Google Scholar ] [ CrossRef ]
  • Rao, C.R. The Use and Interpretation of Principal Component Analysis in Applied Research. Sankhyā Indian J. Stat. 1964 , 26 , 329–358. [ Google Scholar ]
  • Girshick, M.A. Principal Components. J. Am. Stat. Assoc. 1936 , 31 , 519–528. [ Google Scholar ] [ CrossRef ]
  • Dikmen, N.; Elias-Ozkan, S.T. Housing after Disaster: A Post Occupancy Evaluation of a Reconstruction Project. Int. J. Disaster Risk Reduct. 2016 , 19 , 167–178. [ Google Scholar ] [ CrossRef ]
  • Kaitilla, S. Post-Occupancy Evaluation in Self-Help Housing Schemes: Tensiti Settlement, Lae, PNG. Cities 1994 , 11 , 312–324. [ Google Scholar ] [ CrossRef ]
  • Mundo-Hernández, J.; Valerdi-Nochebuena, M.C.; Sosa-Oliver, J. Post-Occupancy Evaluation of a Restored Industrial Building: A Contemporary Art and Design Gallery in Mexico. Front. Archit. Res. 2015 , 4 , 330–340. [ Google Scholar ] [ CrossRef ]
  • Patlakas, P.; Musso, M.; Larkham, P. A Digital Curation Model for Post-Occupancy Evaluation Data. Archit. Eng. Des. Manag. 2022 , 18 , 128–148. [ Google Scholar ] [ CrossRef ]
  • Wuxi, China; Charming Wuxi. Available online: http://en.wuxi.gov.cn/ (accessed on 31 August 2024).
  • Lamola, A.A.; Yamane, T. Sensitized Photodimerization of Thymine in DNA. Proc. Natl. Acad. Sci. USA 1967 , 58 , 443–446. [ Google Scholar ] [ CrossRef ]
  • Sun, G.; Hu, Z.; Zhang, J.; Xue, H. Comprehensive Evaluation of Ballastless-Track Sealants Based on Macro–Micro Tests and Principal Component Analysis. Constr. Build. Mater. 2023 , 400 , 132708. [ Google Scholar ] [ CrossRef ]
  • Sterge, N.J. Computational-Statistical Tradeoff in Apporoximate Kernel Principal Component Analysis. Ph.D. Thesis, The Pennsylvania State University, University Park, PA, USA, 2022. [ Google Scholar ]
  • Zhong, D. Beyond the Sum Score: A Multidimensional Examination of Allostatic Load Using Principal Component Analysis and Latent Profile Analysis in Previously Institutionalized Youth. Ph.D. Thesis, University of Minnesota, Minneapolis, MN, USA, 2023. [ Google Scholar ]
  • Tian, P.; Zhan, G.F.; Nai, L. Comprehensive Evaluation of Asphalt-Mixture Performance Based on Principal Component Analysis. Adv. Mater. Res. 2015 , 1095 , 280–283. [ Google Scholar ] [ CrossRef ]
  • Liu, J.; Kang, H.; Tao, W.; Li, H.; He, D.; Ma, L.; Tang, H.; Wu, S.; Yang, K.; Li, X. A Spatial Distribution—Principal Component Analysis (SD-PCA) Model to Assess Pollution of Heavy Metals in Soil. Sci. Total Environ. 2023 , 859 , 160112. [ Google Scholar ] [ CrossRef ]
  • Iheanacho, O.N. Post-Occupancy Evaluation of Outdoor Spaces of Public Housing Eatates for Housing Satisfaction of Middle Income Residents in Enugu, Nigeria. Ph.D. Thesis, University of Nigeria, Nsukka, Nigeria, 2018. [ Google Scholar ]
  • Abdul Aziz, F.; Hussain, N.; Ujang, N. The Implication of Slum Relocations into Low-Cost High-Rises: An Analysis through the Infrastructure of Everyday Life. Environ. Behav. Proc. J. 2016 , 1 , 33. [ Google Scholar ] [ CrossRef ]
  • Bristowe, A.; Heckert, M. How the COVID-19 Pandemic Changed Patterns of Green Infrastructure Use: A Scoping Review. Urban For. Urban Green. 2023 , 81 , 127848. [ Google Scholar ] [ CrossRef ]
  • Cohen, D.A.; Williamson, S.; Han, B. Gender Differences in Physical Activity Associated with Urban Neighborhood Parks: Findings from the National Study of Neighborhood Parks. Women’s Health Issues 2021 , 31 , 236–244. [ Google Scholar ] [ CrossRef ]
  • Yue, Y.; Yang, D.; Van Dyck, D. Urban Greenspace and Mental Health in Chinese Older Adults: Associations across Different Greenspace Measures and Mediating Effects of Environmental Perceptions. Health Place 2022 , 76 , 102856. [ Google Scholar ] [ CrossRef ]
  • Zhang, R.; Zhang, C.-Q.; Lai, P.C.; Kwan, M.-P. Park and Neighbourhood Environmental Characteristics Associated with Park-Based Physical Activity among Children in a High-Density City. Urban For. Urban Green. 2022 , 68 , 127479. [ Google Scholar ] [ CrossRef ]
  • Jiang, Y.; Huang, G. Urban Residential Quarter Green Space and Life Satisfaction. Urban For. Urban Green. 2022 , 69 , 127510. [ Google Scholar ] [ CrossRef ]
  • Mouratidis, K. Urban Planning and Quality of Life: A Review of Pathways Linking the Built Environment to Subjective Well-Being. Cities 2021 , 115 , 103229. [ Google Scholar ] [ CrossRef ]
  • Yang, Y.; Peng, C.; Yeung, C.Y.; Ren, C.; Luo, H.; Lu, Y.; Yip, P.S.F.; Webster, C. Moderation Effect of Visible Urban Greenery on the Association between Neighbourhood Deprivation and Subjective Well-Being: Evidence from Hong Kong. Landsc. Urban Plan. 2023 , 231 , 104660. [ Google Scholar ] [ CrossRef ]
  • Giles-Corti, B.; Broomhall, M.H.; Knuiman, M.; Collins, C.; Douglas, K.; Ng, K.; Lange, A.; Donovan, R.J. Increasing Walking. Am. J. Prev. Med. 2005 , 28 , 169–176. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Whittem, V.; Roetzel, A.; Sadick, A.-M.; Nakai Kidd, A. How Comprehensive Is Post-Occupancy Feedback on School Buildings for Architects? A Conceptual Review Based upon Integral Sustainable Design Principles. Build. Environ. 2022 , 218 , 109109. [ Google Scholar ] [ CrossRef ]
  • Young, D.R.; Hong, B.D.; Lo, T.; Inzhakova, G.; Cohen, D.A.; Sidell, M.A. The Longitudinal Associations of Physical Activity, Time Spent Outdoors in Nature and Symptoms of Depression and Anxiety during COVID-19 Quarantine and Social Distancing in the United States. Prev. Med. 2022 , 154 , 106863. [ Google Scholar ] [ CrossRef ]
  • Carpentier-Postel, S.; Gerber, P.; Guyon, E.; Klein, O. Changes in Residential Satisfaction after Relocation: The Effects of Commuting. A Case Study of Luxembourg Cross-Border Workers. Case Stud. Transp. Policy 2022 , 10 , 1754–1766. [ Google Scholar ] [ CrossRef ]
  • Sun, B.; Liu, J.; Yin, C.; Cao, J. Residential and Workplace Neighborhood Environments and Life Satisfaction: Exploring Chain-Mediation Effects of Activity and Place Satisfaction. J. Transp. Geogr. 2022 , 104 , 103435. [ Google Scholar ] [ CrossRef ]
  • Chan, E.T.H.; Li, T.E. The Effects of Neighbourhood Attachment and Built Environment on Walking and Life Satisfaction: A Case Study of Shenzhen. Cities 2022 , 130 , 103940. [ Google Scholar ] [ CrossRef ]
  • Nolen, J. New Towns for Old: Achievements in Civic Improvement in Some American Small Towns and Neighborhoods ; University of Massachusetts Press: Amherst, MA, USA, 2005; ISBN 978-1-55849-480-0. [ Google Scholar ]
  • Wang, P.; Han, L.; Hao, R.; Mei, R. Understanding the Relationship between Small Urban Parks and Mental Health: A Case Study in Shanghai, China. Urban For. Urban Green. 2022 , 78 , 127784. [ Google Scholar ] [ CrossRef ]
  • Park, J.Y.; Ouf, M.M.; Gunay, B.; Peng, Y.; O’Brien, W.; Kjærgaard, M.B.; Nagy, Z. A Critical Review of Field Implementations of Occupant-Centric Building Controls. Build. Environ. 2019 , 165 , 106351. [ Google Scholar ] [ CrossRef ]
  • Li, H.; Ta, N.; Yu, B.; Wu, J. Are the Accessibility and Facility Environment of Parks Associated with Mental Health? A Comparative Analysis Based on Residential Areas and Workplaces. Landsc. Urban Plan. 2023 , 237 , 104807. [ Google Scholar ] [ CrossRef ]
  • Sheikh Khan, D.; Kolarik, J.; Weitzmann, P. Design and Application of Occupant Voting Systems for Collecting Occupant Feedback on Indoor Environmental Quality of Buildings—A Review. Build. Environ. 2020 , 183 , 107192. [ Google Scholar ] [ CrossRef ]
  • Ding, R.; Ujang, N.; bin Hamid, H.; Manan, M.S.A.; Li, R.; Wu, J. Heuristic Urban Transportation Network Design Method, a Multilayer Coevolution Approach. Phys. A Stat. Mech. Its Appl. 2017 , 479 , 71–83. [ Google Scholar ] [ CrossRef ]
  • Harbishettar, V.; Gowda, M.; Tenagi, S.; Chandra, M. Regulation of Long-Term Care Homes for Older Adults in India. Indian J. Psychol. Med. 2021 , 43 , S88–S96. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Salaheldin, M.H. The Development of a Holistic Framework for the Post Occupancy Evaluation of Polyclinics in Saudi Arabia. Master’s Thesis, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia, 2019. [ Google Scholar ]
  • Harrison, M.; Ryan, T.; Gardiner, C.; Jones, A. Psychological and Emotional Needs, Assessment, and Support Post-Stroke: A Multi-Perspective Qualitative Study. Top. Stroke Rehabil. 2017 , 24 , 119–125. [ Google Scholar ] [ CrossRef ]
  • Moeinaddini, M.; Asadi-Shekari, Z.; Aghaabbasi, M.; Saadi, I.; Shah, M.Z.; Cools, M. Applying Non-Parametric Models to Explore Urban Life Satisfaction in European Cities. Cities 2020 , 105 , 102851. [ Google Scholar ] [ CrossRef ]
  • Youssoufi, S.; Houot, H.; Vuidel, G.; Pujol, S.; Mauny, F.; Foltête, J.-C. Combining Visual and Noise Characteristics of a Neighborhood Environment to Model Residential Satisfaction: An Application Using GIS-Based Metrics. Landsc. Urban Plan. 2020 , 204 , 103932. [ Google Scholar ] [ CrossRef ]
  • Fan, L.; Cao, J.; Hu, M.; Yin, C. Exploring the Importance of Neighborhood Characteristics to and Their Nonlinear Effects on Life Satisfaction of Displaced Senior Farmers. Cities 2022 , 124 , 103605. [ Google Scholar ] [ CrossRef ]
  • Mouratidis, K.; Yiannakou, A. What Makes Cities Livable? Determinants of Neighborhood Satisfaction and Neighborhood Happiness in Different Contexts. Land Use Policy 2022 , 112 , 105855. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Rajkumar, E.; Rajan, A.M.; Daniel, M.; Lakshmi, R.; John, R.; George, A.J.; Abraham, J.; Varghese, J. The Psychological Impact of Quarantine Due to COVID-19: A Systematic Review of Risk, Protective Factors and Interventions Using Socio-Ecological Model Framework. Heliyon 2022 , 8 , e09765. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Ma, L.; Ye, R.; Ettema, D.; van Lierop, D. Role of the Neighborhood Environment in Psychological Resilience. Landsc. Urban Plan. 2023 , 235 , 104761. [ Google Scholar ] [ CrossRef ]
  • Moustafa, Y. Design and Neighborhood Sense of Community: An Integrative and Cross-Culturally Valid Theoretical Framework. Archnet-IJAR Int. J. Archit. Res. 2009 , 3 , 1. [ Google Scholar ] [ CrossRef ]
  • Han, J.; Lee, S.; Kwon, Y. Can Social Capital Improve the Quality of Life Satisfaction for Older Adults? Focusing on the 2016 Quality of Life Survey in Gyeonggi Province, Korea. Cities 2022 , 130 , 103853. [ Google Scholar ] [ CrossRef ]
  • Al Mughairi, M.; Beach, T.; Rezgui, Y. Post-Occupancy Evaluation for Enhancing Building Performance and Automation Deployment. J. Build. Eng. 2023 , 77 , 107388. [ Google Scholar ] [ CrossRef ]
  • Tan, T.H.; Lee, W.C. Life Satisfaction and Perceived and Objective Neighborhood Environments in a Green-Accredited Township: Quantile Regression Approach. Cities 2023 , 134 , 104196. [ Google Scholar ] [ CrossRef ]
  • Groshong, L.; Wilhelm Stanis, S.A.; Kaczynski, A.T.; Hipp, J.A. Attitudes About Perceived Park Safety Among Residents in Low-Income and High Minority Kansas City, Missouri, Neighborhoods. Environ. Behav. 2020 , 52 , 639–665. [ Google Scholar ] [ CrossRef ]
  • Tourinho, A.C.C.; Barbosa, S.A.; Göçer, Ö.; Alberto, K.C. Post-Occupancy Evaluation of Outdoor Spaces on the Campus of the Federal University of Juiz de Fora, Brazil. ARCH 2021 , 15 , 617–633. [ Google Scholar ] [ CrossRef ]
  • Xiao, Y.; Piao, Y.; Pan, C.; Lee, D.; Zhao, B. Using Buffer Analysis to Determine Urban Park Cooling Intensity: Five Estimation Methods for Nanjing, China. Sci. Total Environ. 2023 , 868 , 161463. [ Google Scholar ] [ CrossRef ]
  • Abdul Aziz, F.; Ujang, N.; Abu Bakar, N.A.; Bakar, A.; Faziawati, A. Urban High-Rise Public Housing for Squatter Resettlement: Desa Mentari as a Case Study. New Des. Ideas 2022 , 6 , 159–175. [ Google Scholar ]
  • Abdul Aziz, F. The Investigation of the Implications of Squatter Relocations in High-Risk Neighbourhoods in Malaysia, 2012.
  • Raap, S.; Knibbe, M.; Horstman, K. Clean Spaces, Community Building, and Urban Stage: The Coproduction of Health and Parks in Low-Income Neighborhoods. J. Urban Health 2022 , 99 , 680–687. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

S/NGeneral Information of Respondents Profile Frequency (No)Total Responses (No)Percentages (%)Cumulative (%)
1GenderMale195 49.249.2
Female20139650.8100.0
2Age group18–3085 21.521.5
31–45183 46.267.7
46–5591 23.090.7
56–6531 7.898.5
>6563961.5100.0
3Educational levelJunior high school or under27 6.86.8
Senior high school123 31.137.9
College211 53.391.2
Postgraduate and above353968.8100.0
4Marital statusSingle62 15.715.7
Married303 76.592.2
Divorced27 6.899.0
Widower43961.8100.0
5Occupation/Nature of EmploymentStudents91 23.023.0
Corporate sector201 50.873.7
Public sector28 7.180.8
Self-employed26 6.687.4
Unemployed13 3.390.7
Pensioner373969.3100.0
6Household registrationWuxi 326 82.382.3
Out of town7039617.7100.0
7Household income (yuan/month/per) <200026 6.66.6
2000–400082 20.727.3
4000–6000131 33.160.4
6000–800084 21.281.6
>80007339618.4100.0
8Duration of ResidencyLess than 2 years 23 5.85.8
2–5 years70 17.723.5
Up to 10 years 121 30.654.0
Up to 15 years81 20.574.5
More than 15 years 10139625.5100.0
9Resident population (per household)/Family Size1–2 people76 19.219.2
3–4 people203 51.370.5
5–6 people92 23.293.7
≥7 people253966.3100.0
10Nature of HousingPrivate house315 79.579.5
Rented house59 14.994.4
Public house223965.6100.0
Modified Outdoor SpacesFactorsFactor LoadingEigen ValuePercentage Variance
1. Outdoor recreation 7.47612.461
Creating space for playing by children0.769
Creating space for children’s recreational facilities0.739
Creating space for playing by adults0.719
Creating space for outdoor resting0.702
Provision of outdoor seating0.701
Creating space for fitness facilities0.695
Creating space for strolling0.665
Creating space for chess0.652
Creating space for jogging0.647
2. Transport facilities 4.9218.202
Creating space for non-motorized charging facilities0.748
Creating space for motor vehicles0.739
Creating space for parking for non-motorized vehicles0.720
Optimizing Pavements0.711
Creating space for motor vehicle charging facilities0.702
Repair of pavement drainage spaces0.691
Creating space for the non-motorized shed0.688
—Optimizing Traffic Organization in the neighborhood0.683
Laying of asphalt pavement0.653
3. Small park 4.9218.202
Replacement of other hardscapes0.750
Provision of Pavilion0.735
Provision of recreational seating0.726
Creating space for softscape0.704
Creating space for a garden path0.682
4. Public service facilities 4.7397.898
Public transportation is accessibility0.766
Accessibility to educational facilities0.753
Availability of community centers0.739
Accessibility to commercial facilities0.733
Availability of medical stations0.715
5. Natural environment condition 4.3787.297
Social environment (public security, organization)0.699
Ecological environment (ecology, pollution, taboos)0.676
Greening and Landscape Environment0.670
Optimizing planning layout0.634
Quiet neighborhood0.629
6. Social and Human Environment 4.1256.875
Neighborhood0.714
Level of public participation0.697
Settlement recognition0.687
Continuity of historical and cultural values0.674
Organization of residential activities0.632
7. Outdoor security 3.3635.605
Creating space for fire protection gadget0.707
Clearing fire exit and entrance0.696
Clearing firefighting landing0.685
Widening the road to meet the requirements of the fire access lane0.682
8. Outdoor Lighting 2.6584.431
Repairing the unit headlights0.700
Creating space for street lamps0.675
Creating space for courtyard lights0.662
9. Entrance structures 2.5054.175
Repairing the main entrance gate0.675
Repairing sub-entrance gate0.670
Creating space gate guard post0.631
10. Infrastructure 2.3113.852
Repairing the neighborhood wall0.673
Creating space for a ramp for Physically challenged people0.647
Creating space for drying0.632
11. Public Environment 1.9593.264
Environmental health (road, open space cleanliness) Cleanliness0.635
Residential exterior styling and color0.628
Availability of public square space0.566
12. Outdoor Waste facilities 1.7382.897
Creating space for garbage bin cleaning site0.611
Creating space for Garbage bins0.586
Creating space for garbage collection and disposal/Garbage collection station0.559
Cumulative Variance (Total) 79.438%
FactorsMeanSD
Creating space for playing by children3.641.24
Creating space for children’s recreational facilities3.641.24
Creating space for playing by adults3.621.23
Creating space for outdoor resting3.661.23
Provision of outdoor seating3.711.23
Creating space for fitness facilities3.611.25
Creating space for strolling3.691.22
Creating space for chess3.561.24
Creating space for jogging3.751.22
1. Outdoor recreation
Creating space for non-motorized charging facilities3.681.21
Creating space for motor vehicles3.661.25
Creating space for parking for non-motorized vehicles3.671.21
Optimizing Pavements3.711.22
Creating space for motor vehicle charging facilities3.581.24
Repair of pavement drainage spaces3.681.20
Creating space for the non-motorized shed3.521.31
—Optimizing Traffic Organization in the neighborhood3.651.20
Laying of asphalt pavement3.761.22
2. Transport facilities
Replacement of other hardscapes3.731.20
Provision of Pavilion3.591.21
Provision of recreational seating3.611.26
Creating space for softscape3.521.26
Creating space for a garden path3.621.23
3. Small park
Public transportation is accessibility3.731.18
Accessibility to educational facilities3.731.21
Availability of community centers3.721.20
Accessibility to commercial facilities3.731.17
Availability of medical stations3.731.19
4. Public service facilities
Social environment (public security, organization)3.641.22
Ecological environment (ecology, pollution, taboos)3.671.18
Greening and Landscape Environment3.651.21
Optimizing planning layout3.651.20
Quiet neighborhood3.651.19
5. Natural environment condition
Neighborhood3.711.21
Level of public participation3.621.21
Settlement recognition3.621.23
Continuity of historical and cultural values3.581.22
Organization of residential activities3.571.22
6. Social and Human Environment
Creating space for fire protection gadget3.701.22
Clearing fire exit and entrance3.711.20
Clearing firefighting landing3.711.20
Widening the road to meet the requirements of the fire access lane3.681.21
7. Outdoor security
Repairing the unit headlights3.621.25
Creating space for street lamps3.721.23
Creating space for courtyard lights3.631.22
8. Outdoor Lighting
Repairing the main entrance gate3.751.18
Repairing sub-entrance gate3.671.21
Creating space gate guard post3.691.20
9. Entrance structures
Repairing the neighborhood wall3.711.21
Creating space for a ramp for Physically challenged people3.701.23
Creating space for drying3.711.25
10. Infrastructure
Environmental health (road, open space cleanliness) Cleanliness3.701.24
Residential exterior styling and color3.671.23
Availability of public square space3.621.26
11. Public Environment
Creating space for garbage bin cleaning site3.661.24
Creating space for Garbage bins3.631.23
Creating space for garbage collection and disposal/Garbage collection station3.671.21
12. Outdoor Waste facilities
Outdoor
Security
Transport
Facilities
InfrastructurePublic
Service
Facilities
Satisfaction
Outdoor
Lighting
Satisfaction
Outdoor
Waste
Facilities
Satisfaction
Entrance
Structures
Satisfaction
Outdoor
Recreations
Satisfaction
Greenery
Satisfaction
Small
Park
Satisfaction
Natural
Environment
Condition
Satisfaction
Public
Environment
Satisfaction
Social
and
Human
Environment
Satisfaction
Pearson Correlation10.678 **0.610 **0.623 **0.571 **0.635 **0.642 **0.681 **0.606 **0.619 **0.640 **0.636 **0.657 **
Sig. (2-tailed) 0.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.000
Sum of Squares and Cross-products468.225302.673289.672284.851277.290305.773300.650317.976302.269293.593292.379313.364308.440
Covariance1.1850.7660.7330.7210.7020.7740.7610.8050.7650.7430.7400.7930.781
N396396396396396396396396396396396396396
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Zhao, J.; Abdul Aziz, F.; Cheng, Z.; Ujang, N.; Zhang, H.; Xu, J.; Xiao, Y.; Shi, L. Post-Occupancy Evaluation of the Improved Old Residential Neighborhood Satisfaction Using Principal Component Analysis: The Case of Wuxi, China. ISPRS Int. J. Geo-Inf. 2024 , 13 , 318. https://doi.org/10.3390/ijgi13090318

Zhao J, Abdul Aziz F, Cheng Z, Ujang N, Zhang H, Xu J, Xiao Y, Shi L. Post-Occupancy Evaluation of the Improved Old Residential Neighborhood Satisfaction Using Principal Component Analysis: The Case of Wuxi, China. ISPRS International Journal of Geo-Information . 2024; 13(9):318. https://doi.org/10.3390/ijgi13090318

Zhao, Jing, Faziawati Abdul Aziz, Ziyi Cheng, Norsidah Ujang, Hui Zhang, Jiajun Xu, Yi Xiao, and Lin Shi. 2024. "Post-Occupancy Evaluation of the Improved Old Residential Neighborhood Satisfaction Using Principal Component Analysis: The Case of Wuxi, China" ISPRS International Journal of Geo-Information 13, no. 9: 318. https://doi.org/10.3390/ijgi13090318

Article Metrics

Article access statistics, supplementary material.

ZIP-Document (ZIP, 3117 KiB)

Further Information

Mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

iMRMC (version 2.1.0)

Multi-reader, multi-case analysis methods (roc, agreement, and other metrics), description, monthly downloads, pull requests, last published, functions in imrmc (2.1.0).

  • Open access
  • Published: 04 September 2024

In-depth analysis of Bt cotton adoption: farmers' opinions, genetic landscape, and varied perspectives—a case study from Pakistan

  • Shahzad Rahil   ORCID: orcid.org/0000-0002-4111-5037 1 ,
  • Jamil Shakra 1 ,
  • Chaudhry Urooj Fatima 1 ,
  • Rahman Sajid Ur 1 &
  • Iqbal Muhammad Zaffar 1 , 2  

Journal of Cotton Research volume  7 , Article number:  31 ( 2024 ) Cite this article

Metrics details

Bt technology has played significant role in controlling bollworms and increasing cotton yield in earlier days of its introduction, a subsequent decline in yield became apparent over time. This decline may be attributed to various environmental factors, pest dynamics, or combination of both. Therefore, the present biophysical survey and questionnaire were designed to evaluate the impact of Bt cotton on bollworms management and its effect on reducing spray costs, targeting farmers with varied landholdings and educational backgrounds. Additionally, data on farmers' cultivated varieties and the prevalence of bollworms and sucking insects in their fields were recorded. Subsequently, about eleven thousand cotton samples from farmer fields were tested for Cry1Ac , Cry2Ab and Vip3A genes by strip test.

In this analysis, 83% of the farmers planting approved varieties believe that Bt technology control bollworms, while 17% hold contradictory views. Similarly, among farmers cultivating unapproved varieties, 77% agree on effectiveness of Bt technology against bollworms, while 23% disagree. On the other hand, 67% of farmers planting approved varieties believe that Bt technology does not reduce spray costs, while 33% agree with the effectiveness. Similarly, 78% of farmers cultivating unapproved varieties express doubt regarding its role to reduce spray costs, while 22% are in favour of this notion. Differences in opinions on the effectiveness of Bt cotton in controlling bollworms and reducing spray cost between farmers planting unapproved and approved varieties may stem from several factors. One major cause is the heavy infestation of sucking insects, which is probably due to the narrow genetic variation of the cultivated varieties. Additionally, the widespread cultivation of unapproved varieties (21.67%) is also an important factor to cause different opinions on the effectiveness of Bt cotton.

Based on our findings, we propose that the ineffective control of pests on cotton crop may be attributed to large scale cultivation of unapproved varieties and non-inclusion of double and triple transgene technologies in country’s sowing plan. On the basis of our findings, we suggest cotton breeders, regulatory bodies and legislative bodies to discourage the cultivation of unapproved varieties and impure seed. Moreover, the adoption of double and triple Bt genes in cottons with a broad genetic variation could facilitate the revival of the cotton industry, and presenting a promising way forward.

Cotton ( Gossypium hirsutum L.) is an important fibre crop also known as ‘White Gold’ (Ali et al.  2020 ; Jarwar et al.  2019 ). Pakistan earns major part of foreign exchange from cotton crop which contributes significantly towards economy. Pakistan is the 5 th largest cotton producer and 3 rd larger cotton consuming country in the world. It is an important crop for both agriculture and textile industries, and contributes about 0.6% of GDP and 3.1% of value addition in agriculture sector (Ministry of Finance, Government of Pakistan  2023 ). Over the time, cotton production in Pakistan has declined, due to seed adulteration, ineffective use of fertilizers and pesticides, labour mismanagement, unfavourable weather conditions, and irregular input supplies (Ali et al.  2019 ).

Since the introduction of synthetic insecticides, cotton producers relied heavily on those products to control insect pests. Certain factors i.e., insect resistance, secondary pest outbreaks, and pest resurgences cause an increasing application of synthetic insecticides (Trapero et al.  2016 ). The bollworms ( Heliothis and Helicoverpa spp.) and sucking insects ( Bemisia tabaci , Empoasca spp.) developed resistance to traditional pesticides during the 1990’s (Spielman et al.  2017 ). Afterwards, genetically modified (GM) cotton expressing Bacillus thuringiensis (Bt) toxin was introduced to control lepidopteran pests (Jamil et al., 2021a , b ). Resultantly, bollworms which have developed resistance against insecticides were effectively controlled and pesticide use was significantly reduced (Ahmad et al.  2019 ).

First official approval for general cultivation of Bt cotton in Pakistan was granted in 2010 by National Biosafety Committee within the Pakistan Environmental Protection Agency. However, substantial evidence shows cultivation of Bt cotton at farmers field prior to its official approval (Ahmad et al.  2021 ; Almas et al.  2023 ; Razzaq et al.  2021 ), which are Cry1Ac (first-generation cry gene) based and are primarily resistant to lepidopteran pests. In the earlier days of its introduction, the adoption of Bt technology led to a notable surge in cotton production, increasing from 8.7 million bales in 1999 to 14.61 million bales during 2004–2005, within just five to six years period (Rehman et al.  2019 ). Initially, both approved Bt varieties and unapproved ones showed inconsistent and potentially ineffective transgene expression due to the ineffective regulatory system overseeing commercialization of transgenic variety, releasing of new variety and distribution of seed of approved varieties (Ahmad et al.  2019 ). These loopholes in the system, combined with the challenges farmers face in visually assessing varieties genuineness and seed quality during purchase have contributed to the proliferation of spurious or low-quality seeds (Ali et al.  2019 ; Spielman et al.  2017 ).

Now, the area under Bt cotton cultivation is shrinking and the yield is decreased due to increased insect pest infestations (Arshad et al. 2021 ) owing to field evolved resistance in insects (Jaleel et al.  2020 ; Lei et al.  2021 ). Technologically advanced countries like the USA have addressed this issue of insect resistance development by adopting non-Bt cotton refuge systems and pyramiding multiple toxin genes ( Cry1Ac , Cry2Ab , and Vip3A ). However, in developing countries like China, India, and Pakistan, similar strategies were not effectively implemented, causing the field-evolved resistance in bollworms to proliferate (Jamil et al., 2021a , b ; Karthik et al.  2021 ). Another issue faced by farmers planted Bt cotton is increased infestation of sucking pests due to reduced use of pesticides (Ali et al.  2019 ; Shekhawat and Hasmi  2023 ). Hence, it is believed that interplay of various factors i.e. increased insect pest infestation, field evolved resistance, cultivation of unapproved and substandard seeds and adverse weather conditions result in huge loss of cotton production from 14 million bales in 2004–2005 to 4.91 million bales in 2023 (Ministry of Finance, Government of Pakistan  2023 ).

Keeping in view of the above mentioned facts, a survey was designed to evaluate the impact of Bt technology on cotton production across fifteen core cotton growing districts of Punjab, Pakistan and to understand the multifaceted factors affecting cotton production and to find out the root cause of declining of cotton production. In total 400 farmers possessing various landholding and educational background were surveyed to document their views on Bt. cotton's efficacy against bollworms and spray cost. Additionally, 10,986 cotton samples were tested at farmer’s field through strip tests to assess the purity of cotton varieties with respect to Bt ( Cry1Ac , Cry2Ab and Vip3A) genes.

Present study was conducted at Agricultural Biotechnology Research Institute, Ayub Agricultural Research Institute, Faisalabad 38000, Punjab, Pakistan.

Survey site

The survey was carried out in core cotton growing area of Pakistan, Punjab province. The Punjab is further divided into 36 administrative units called “districts” that vary significantly in cotton production. Out of 36 districts, fifteen were selected, i.e. Faisalabad, Toba Tek Singh, Sahiwal, Pakpattan, Multan, Lodhran, Khanewal, Vehari, Muzaffargarh, Layyah, D.G. Khan, Rajanpur, Bahawalpur, R.Y. Khan and Bahawalnagar, on the basis of acreage under cotton cultivation as outlined in AMIS.PK ( http://www.amis.pk/Agristatistics/DistrictWise/DistrictWiseData.aspx ). Subsequently, 400 farmer fields were selected from all “Tehsils” (sub-administrative unit) with various landholdings and diverse educational backgrounds particularly in the regions with intensive Bt cotton cultivation. The GPS coordinates of each farmer’s location was recorded using Latitude-Longitude App (Financept) and listed in Table  1 .

Survey questionnaire

A structured questionnaire, comprising of six questions was designed to collect data regarding farmer’s demographic factors, farmers' landholdings and viewpoint about effectiveness of Bt technology in controlling cotton bollworms. The questions were: 1) farmers landholding, classified as small (0–10 acres), medium (11–50 acres) or large (above 50 acres). 2) farmers educational background, stratified into uneducated, below matric, matric, bachelor degree, and masters or above qualifications. 3) the efficacy of Bt cotton in controlling bollworms (Yes, No). 4) the role of Bt technology in reducing the frequency of pesticide sprays and respective pesticide cost to farmers (Yes, No). 5) the variety cultivated by farmers (Table S1). 6) insect infestations of i.e. jassid, whitefly, aphid, thrips, mites, American bollworm (AB) and pink bollworm (PB) (low, medium and high). Infestation levels (low, medium, or high) were based on the economic threshold level (ETL) of each insect species. Infestations below the ETL were classified as "low", those comparable to the ETL as "medium", and those exceeding the ETL as "high". The reference point ETLs for various insect species were as follows: jassid (1 nymph or adult per leaf), whitefly (5 adults per leaf), thrips (8–10 adults per leaf), mites (2 adults per leaf), aphid (20 aphids per leaf), AB (4–5 eggs and larvae per 100 plants), and PB (8% infested bolls) (Ali et al.  2019 ; Razaq et al.  2019 ; Rehman et al.  2019 ).

Molecular analysis of Cry1Ac , Cry2Ab and Vip3A genes

Molecular analysis was performed through strip test for detection and identification of transgenes at four hundred farmer fields, and a total of 10986 samples were tested. At each field, a minimum of 25 samples were collected and tested, at least 10 samples were tested for each variety. Consequently, depending on the number of varieties cultivated by the farmers, more than 25 samples were tested in some fields. The strip tests were performed using QuickStix combo kits (EnviroLogix), which are equipped with built-in antibody coatings for the detection of Cry1Ac , Cry2Ab , and Vip3A transgenes. The procedure for strip test involved pressing the cap of a disposable eppendorf tube onto two leaves to obtain double leaf disc (weighing approximately 20 mg). Subsequently, the leaf samples were finely grinded with the help of disposable pestle by rubbing against the walls of eppendorf tube after adding 0.5 mL of 1X EB2 extraction buffer. The leaf extract and extraction buffer were homogenized by thorough mixing, ensuring the components were evenly combined for accurate and reliable downstream analysis. Following that, the QuickStix combo strips were dipped in eppendorf tube containing leaf extract with arrow pointing downward. After 10 min incubation, bands were developed on strips through antigen-antibody reaction and strips were analysed for the presence of final bands, and results were recorded (Jamil et al., 2021a , b ).

Data analysis

Frequency analysis of Cry1Ac , Cry2Ab , and Vip3A genes was performed using the “ dplyr ” package to streamline data manipulation and summarization. District-wise opinion of farmers on bollworm management and spray cost reduction were analysed using " tidyverse " functions. The association between Bt. technology adoption, farmers' landholding, and education was studied through a heatmap using the " heatmap.2 " function of " gplots " package in R software. The data regarding varieties cultivated by farmers in each district was analysed using stacked bar-chart illustrated by " ggplot2 " package of R software. Lastly, insect pest infestation data was analysed using " dplyr " and " ggplot2 " packages (Ross et al. 2017 ) and Chi-square (χ 2 ) test was performed to check the associations between different qualitative variables using “ chisq.test ” function in R software.

Survey design and farmers demographics

The purposive sampling technique was used assessing the viewpoint of farmers having diverse landholdings and differential educational backgrounds. Landholdings varied among districts showing distinct distribution of farmers with small, medium and large landholdings (Table  1 ). Notably, the highest proportion of large landholders was found in Sahiwal (43%) followed by Faisalabad (33%), Dera Ghazi Khan (31%) and Rajanpur (29%) district. In terms of medium landholders, district Rahim Yar Khan had the highest (74%), while district Layyah had the lowest (35%) proportions. Among small landholders, district Layyah displayed the highest (50%) while district Sahiwal having the lowest (7%) ratio. Overall, 60% of the farmers have medium, 18% owned small and 22% possessed large landholdings (Table  1 ).

Similarly, variability was observed among farmers on the basis of academic background (Table  2 ). The majority of farmers have completed matric (53%), 22% of farmers were below matric (22%), 12% farmers had bachelor degree, 7% farmers had master degree or above qualifications, and merely 6% farmers were uneducated. The district Sahiwal has highest ratio of uneducated farmers (22%) while highest proportion of farmers with below matric qualification was observed in district Dera Ghazi Khan (39%). Besides Dera Ghazi Khan, all other analysed districts have higher proportion of farmers with matric qualification, specifically, district Toba Tek Singh exhibited highest proportion (100%) followed by Pakpattan (70%). Furthermore, district Bahawalpur and district Layyah exhibited highest proportion of farmers among bachelor degree holders (25%) and master degree or above qualification (30%), respectively (Table  2 ).

Genetic landscape and cultivation patterns of varieties

The varieties planted at farmer fields were noted and verified based on tags issues by Federal Seed Certification and Registration Department (FSC&RD). The varieties planted at farmer fields were compared with the database of approved varieties from the government to identify the approved or unapproved variety. Overall, unapproved varieties were cultivated extensively covering significant area (21.67%). Moreover among approved varieties the top cultivating were IUB-13 (15.22%), BS-15 (12.61%), FH-142 (8.26%) and FH-Lalazar (8.04%). The lowest cultivated variety was MNH-886 (3.45%). In total of 7.27% area were cultivated with other approved varieties. The top 3 area cultivated unapproved cotton varieties were Bahawalnagar (40.73%), Layyah (38.24%), and Bahawalpur (32.40%). Conversely, unapproved variety was not found in Pakpattan or Toba Tek Singh (Fig. 1 ).

figure 1

Stacked bar-chart showcasing varietal diversity across fifteen cotton growing districts of cotton belt of Punjab, Pakistan; BWN; Bahawalnagar, BWP; Bahawalpur, DGK; Dera Ghazi Khan, FSD; Faisalabad, KWL; Khanewal, LDN; Lodhran, LYA; Layyah, MTN; Multan, MZG; Muzaffargarh, PKPTN; Pakpattan, RJNPR; Rajanpur, RYK; Rahim Yar Khan, SWL; Sahiwal, TTS; Toba Tek Singh, VHR; Vehari

Analysing region-specific cultivation of varieties, it was observed that IUB-13 was the most cultivated variety in Bahawalnagar (12.09%), Bahawalpur (15.10%), Khanewal (19.18%), Multan (30.41%), Muzaffargarh (21.74%), Faisalabad (24.33%), Rahim Yar Khan (21.10%), and Rajanpur (18.49%). FH-142 was the preferred variety in DG Khan (22.51%) and Layyah (10.66%). FH-Lalazar was most commonly cultivated in Lodhran district (25.64%), while BS-18 dominated in Vehari (21.59%). Additionally, BS-15 was prominently cultivated in Toba Tek Singh (51%), Sahiwal (26.00%), and Pakpattan district (25.60%). Toba Tek Singh and Pakpattan districts have least diversity of cultivated varieties (Fig. 1 ).

Biochemical testing of Bt cotton

To understand the genetic landscape of cultivated varieties with respect to transgenes, strip tests were performed for detection and identification of Cry1Ac , Cry2Ab and Vip3A genes. Across fifteen districts, a total of 10,986 cotton samples were tested. The Cry1Ac gene was presented in varying degrees, with highest occurrence (100%) in district Lodhran, Sahiwal, Pakpattan and Toba Tek Singh. Other districts, such as Khanewal, Bahawalpur, Bahawalnagar, Faisalabad, Layyah, Multan, Rajanpur, Rahim Yar Khan and Vehari also reported more than 80% of Cry1Ac gene in farmer fields. In contrast, Dera Ghazi Khan and Muzaffargarh districts displayed relatively lower percentage of Cry1Ac gene, 69% and 78%, respectively (Table  3 ).

The Cry2Ab gene exhibited a relatively low (9%) percentage throughout the survey area, its frequency ranged from 0% in Pakpattan to 15% in Layyah and Toba Tek Singh districts. The frequency of Cry2Ab gene was no more than 10% in Bahawalnagar, Bahawalpur, Faisalabad, Khanewal, Lodhran, Muzaffargarh, Multan, Pakpattan, Rajanpur, Rahim Yar Khan and Sahiwal districts. Further, the third Bt gene Vip3A, which has broad spectrum resistance against lepidopteron pests, was not found in a single tested sample throughout the survey area. In summary, Cry2Ab gene was found throughout the cotton cultivation regions, except Pakpattan, but the percentage was much lower than Cry1Ac gene (Table  3 ).

Pest dynamics at farmers’ field

The pest counting was performed in the survey area for major cotton pests, i.e., AB, PB, whitefly, aphid, jassid, thrips, and mites. The PB infestation was medium level in more than 50% farmers' fields in most districts except Bahawalnagar, Der Ghazi Khan, Khanewal, Pakpattan and Vehari. Lodhran and Toba Tek Singh recorded 50% of field with low level of PB, whereas Pakpattan and Vehari recorded high level PB invasions at more than 50% fields. In case of AB, Lodhran, Muzaffargarh, Pakpattan and Toba Tek Singh exhibited low AB level at all fields. However, in Bahawalpur and Layyah, 14% and 20% fields experienced medium outbreak of AB, respectively. Notably, in Faisalabad, Layyah and Sahiwal districts, high infestation of AB was observed at 12%, 10% and 7% fields, respectively. On an average, 93% of fields from all survey regions observed low AB outbreak (Table  4 ).

The whitefly remains the predominant insect throughout the survey area with high outbreaks at 68% fields on the average. Five districts including Dera Ghazi Khan, Faisalabad, Muzaffargarh, Rajanpur, and Toba Tek Singh were whitefly hotspot areas, with all survey fields recorded high outbreak. Other districts like Bahawalpur, Khanewal, Multan, Layyah, Rahim Yar Khan, and Sahiwal exhibited diverse infestation patterns. Although aphid is one of the most concerned pest, 77% farmer fields reported low outbreak, particularly in Faisalabad, Sahiwal, Pakpattan, and Toba Tek Singh districts with all observed field recorded as low infestation. On the other hand, 64% fields at Bahawalpur and 44% at Muzaffargarh recorded medium level and 28% fields at district Rahim Yar Khan recorded high level outbreak (Table  4 ).

Apart from whitefly and aphid, jassid was another alarming threat in cotton production, showing high level of invasion at 62% farmer fields. The jassid outbreak in all fields of district Faisalabad, Muzaffargarh, Pakpattan, Rajanpur, Sahiwal, and Toba Tek Singh was at high level. The jassid infestation was also high in more than 70% fields of Bahawalnagar, Dera Ghazi Khan, and Vehari districts. The pest counting of mites revealed low infestation at 62% observed fields. Faisalabad and Pakpattan districts had low infestations in all fields, whereas Dera Ghazi Khan and Muzaffargarh districts recorded medium outbreak at 73% and 86% fields, and 50% fields of Toba Tek Singh recorded as high mites outbreak. The thrips outbreak was high at 60% farmer fields on the average, Bahawalpur, Khanewal, Lodhran, Muzaffargarh, Rajanpur, Toba Tek Singh and Vehari were recorded as high outbreak in 78%, 78%, 75%, 74%, 89%, 100% and 69% fields, respectively while all fileds in Faisalabad showed medium outbreak of thrips (Table  4 ).

Chi-square test on the associations between different factors

The Chi-square (χ 2 ) test was performed to check the association of 17 pairs of factors as detailed in Table  5 . The association of transgene with varieties was non-significant, which means the type of Bt cotton either single or double transgenic cultivars is not showing distinct correlation with the approved or unapproved varieties. Moreover, farmer’s education and landholding have no impact on transgene adoption. Furthermore, association of transgene was also non-significant with thrips which indicates that thrips equally affect non-Bt, single Bt gene, or double Bt gene cotton. However, association of transgene with AB, PB, whitefly, aphid, jassid and mite infestation was significant. This indicates that AB and PB attack vary with transgene and these are interlinked. Similarly, whitefly, aphid, jassid and mites infestation also vary on non-Bt., single and double gene Bt. cotton varieties (Table  5 , S2).

Likewise, association of varieties with AB, aphid and mite was non-significant which reveals that there is no statistical difference of AB, aphid and mite infestation between approved and unapproved varieties. On the contrary, association of varieties with PB, whitefly, jassid and thrips was significant, indicating that infestation of PB, whitefly, jassid and thrips vary among approved and unapproved varieties (Table  5 ).

Farmer’s opinion on Bt cotton

The farmer’s viewpoint on efficiency of Bt technology in controlling bollworms and reducing spray cost in cotton crop was analysed and it was observed that 83% of farmers cultivating approved varieties, believed in Bt cotton's effectiveness against bollworms, while 17% hold the contrary belief. However, variation exists in different district, i.e. farmers from Bahawalpur, Faisalabad, Rahim Yar Khan, and Sahiwal unanimously agreed (100%) on Bt cotton's effectiveness against bollworms. But 50% farmers in Bahawalnagar, 33% in Toba Tek Singh and some in other districts cultivating unapproved varieties were not convinced. On the other hand, 77% of farmer cultivated unapproved varieties have faith in Bt cotton usefulness for controlling bollworms, whereas 23% expressed disbelief. All farmers cultivating unapproved varieties in Bahawalpur, Faisalabad, Rahim Yar Khan, Sahiwal and Toba Tek Singh districts unanimously believed that Bt cotton is effective against bollworms. In contrary to that, 67% farmers in Multan, 43% in Dera Ghazi Khan, 37% each in Layyah and Muzaffargarh, 33% each in Lodhran and Rajanpur, 31% in Vehari, 23% in Bahawalnagar, and 8% in Khanewal cultivating unapproved varieties are not convinced about this claim. It is evident that farmers cultivating approved varieties express higher confidence in the effectiveness of bollworm control by Bt technology as compared to those cultivating unapproved varieties (Table  6 ).

Similarly, examining the impact of Bt cotton on spray cost reduction revealed a complex scenario. Among farmers planting approved varieties, 33% believed that Bt technology has reduced spray costs, while majority (67%) disagree. Particularly, farmers in districts Bahawalnagar, Dera Ghazi Khan, Faisalabad, Muzaffargarh, Rajanpur, Toba Tek Singh and Vehari unanimously disagreed that Bt technology reduced the spray costs, while all farmers at Bahawalpur, Rahim Yar Khan and Sahiwal have opposite views. Likewise, among farmers cultivating unapproved varieties, 22% express confidence in reducing spray costs by introduction of Bt technology, while 78% hold opposite perspective. In Pakpattan and Bahawalpur 100% of farmers growing unapproved varieties believe in reduction of spray costs, while in Multan, Faisalabad, Khanewal, Layyah, Lodhran, Muzaffargarh, Rajanpur, Sahiwal, Toba Tek Singh and Vehari, all farmers disagreed with this notion. Overall, the analysis highlighted diverse opinions among farmers about the impact of Bt cotton on spray cost reduction (Table  6 ).

In the midst of the changing agricultural technology and the persistent challenges faced by cotton farmers, our study delves into the dynamics surrounding the adoption and effectiveness of Bt cotton technology. With a focus on bollworm management and spray cost reduction, our research navigates through the perceptions and practices of farmers with diverse educational backgrounds and landholdings and revealed main factors affecting cotton farming. We unravel the complexities underlying farmer beliefs, technological advancements, and regulatory frameworks, aiming to chart a course towards sustainable solutions for the revitalization of the cotton crop.

We have approached farmers from all cotton growing districts of the Punjab with diverse backgrounds, i.e. possessing varying landholdings (Table  1 ) and different educational backgrounds (Table  2 ) to increase the reliability of the results (O'Connell et al.  2022 ). The farmers have been inquired about effectiveness of Bt technology against cotton bollworms and its impact on spray cost. Overall, 60% of the farmers have medium landholdings, 22% farmers owned large landholdings and 18% farmers possessed small landholdings (Table  1 ). Likewise, from education perspective, 53% farmers have matric, 22% farmers are below matric, 12% and 7% farmers have bachelor degree and master degree or above qualifications, whereas 6% farmers were uneducated (Table  2 ) representing a mixed population from each strata of education background and landholdings to obtain meaningful information (Swami and Parthasarathy  2020 ).

These farmers' opinion have been bifurcated into two categories based on cultivation of approved and unapproved varieties. The viewpoint of 83% of farmers cultivating approved varieties is that Bt cotton has controlled the bollworms effectively and 17% have opposite opinion. But among those cultivating unapproved varieties, 77% farmers think that bollworms have been controlled after introduction of Bt cotton and 23% farmers have opposite views (Table  6 ). These findings agrees with the study that both approved and unapproved varieties have significant Bt toxin protein level to control bollworms effectively (Spielman et al.  2017 ). Given that AB and PB infestation are dependent on transgenes (Table  5 ) and have an antagonistic relationship (Table S2), and considering that nearly all cultivated varieties (either approved or unapproved) were transgenic (Table S1), the use of these transgenic varieties is likely the primary factor in controlling bollworms (Kashif et al.  2022 ). Moreover, according to a previous study, unapproved varieties are as effective in controlling bollworms as approved varieties, both expressing transgenes at levels lethal to pests (Cheema et al.  2016 ). However, Jamil et al. ( 2021a , b ) have contradictory viewpoint and believe that, unapproved varieties are the leading cause of resistance due to low Bt. toxin level which providing ideal environment for field evolved resistance (Ahmad et al.  2019 ).

In the earlier years of Bt cotton introduction, farmers were largely convinced about its efficiency to control bollworm invasions as reported in different geographies (Gore et al.  2002 ; Kranthi et al.  2005 ) and Pakistan (Arshad et al.  2009 ). However, with the passage of time, without adoption of some levels of refuge plants (plantation of 10% non-Bt crop as refuge) fields have evolved resistance in bollworms (Shahid et al.  2021 ). The situation was further aggravated due to least or no adoption of double ( Cry1Ac and Cry2Ab ) and triple transgene ( Cry1Ac , Cry2Ab and Vip3A ) technologies (Table  3 ). The double and triple transgene cotton have broad-spectrum resistance by different mode of action and corresponding receptor sites in insect gut (Chen et al. 2017 ; Llewellyn et al. 2007 ). Particularly, the Vip3A gene provides broad-spectrum resistance by encoding Bt toxin that disrupts the digestive system upon ingestion, ultimately leading to insect death. Unlike Cry1Ac , Vip3A gene acts through a different mode of action, making it effective against pests that may have developed resistance to Cry1Ac . This diversity in toxin mechanisms helps enhance the overall efficacy of Bt cotton in managing pest populations and reducing crop damage (Chen et al.  2017 ). Some countries swiftly adopted double and triple gene technologies in the cultivation plan, while Pakistan continues to rely solely on the initially introduced single gene ( Cry1Ac ) Bt cotton, which result in the development of resistance in the field (Tabashnik et al.  2013 ; Tabashnik and Carrière  2017 ).

Analysis of farmers' perspective about the efficacy of Bt technology in reducing spray costs has revealed that more than 50% farmers from both categories (planting approved or unapproved varieties) believe that spray cost has not been reduced upon introduction of Bt technology. Specifically, 33% of farmers cultivating approved varieties affirmed that Bt technology effectively reduces spray costs, while 67% hold a contrary viewpoint. Conversely, among farmers planting unapproved varieties, a higher percentage (78%) of farmers have expressed suspicion regarding the effectiveness of Bt cotton in reducing spray costs, with only 22% supporting this notion (Table  6 ). Farmers hold different views on the effectiveness of Bt cotton against bollworms and its impact on spray costs. Majority of farmers claimed that Bt cotton has successfully controlled bollworms, while they also believe that the introduction of Bt cotton has not reduced spray costs. This is attributed to the increased pressure from sucking insect pests such as whitefly, aphid, jassid, thrips, and mites (Table  4 ), which has led to higher spray costs instead of the anticipated reduction. The sucking pest pressure has been increased after introduction of Bt genotypes owing to the low adaptation to local agro ecological conditions (Lu et al. 2022 ) and narrow genetic base (Jamil et al., 2021a , b ). Therefore, these varieties are more vulnerable to sucking pests compared to earlier genetically diverse varieties, thereby necessitats frequent pesticide spray and nullifys the anticipated reduction in spray costs (Arshad et al. 2009 ).

One significant factor influence farmers' believe on Bt technology is large scale cultivation of unapproved varieties (21.67% area). Particularly, in Bahawalnagar, Layyah and Bahawalpur districts (Fig.  1 ). This may be a leading cause in building farmers' perceptions about Bt. cotton's inefficiency to control bollworms and reducing spray costs, reflecting mismanagement rather than inherent flaws in the technology. Because, during the formal varietal approval process, varieties are passed through certain checks, i.e., disease & insect resistance, adaptability to different geographies, response to different climatic factors and genetic diversity from cultivated varieties (Ahmad et al.  2023a , b ; Iftikhar et al.  2019 ). However, if a variety escape through this process and reach farmers field merely on the basis of high yield, it may be susceptible to bollworms and sucking insects (Kranthi and Stone  2020 ). Furthermore, approved varieties may also have mixing of non-Bt seed as reported in one of our previous study (Jamil et al., 2021a , b ), supressing their genetic potential. Perhaps, all the factors explained above, underscores a deficiency on the part of cotton breeders (both public and private sectors) and regulatory bodies (such as FSC&RD), as they have not effectively regulated the supply of unapproved varieties to farmers, lacking proper check and legislative measures (Shahzad et al.  2022 ).

Different opinions among farmers on the effectiveness of Bt cotton may partly be due to cultivation of unapproved varieties. Moreover, least adoption of double and triple transgene technologies and excessive outbreaks of sucking insects particularly whitefly, jassid and thrips exacerbated the situation. To mitigate these challenges, concerted efforts from cotton breeders and regulatory bodies are imperative. Moreover, there is a need to promote and disseminate the latest Bt cotton technologies particularly Cry2Ab and Vip3A genes among farmers on large scale for dissemination of broad-spectrum resistance against bollworms.

Availability of data and materials

All data generated or analysed during this study are included in this published article.

Ahmad S, Cheema HMN, Khan AA, et al. Resistance status of Helicoverpa armigera against Bt cotton in Pakistan. Transgenic Res. 2019;28:199–212. https://doi.org/10.1007/s11248-019-00114-9 .

Article   CAS   PubMed   Google Scholar  

Ahmad J, Zulkiffal M, Anwar J, et al. MH-21, a novel high-yielding and rusts resistant bread wheat variety for irrigated areas of Punjab Pakistan. SABRAO J Breed Genet. 2023a;55(3):749–59. https://doi.org/10.54910/sabrao2023.55.3.13 .

Article   Google Scholar  

Ahmad J, Rehman A, Ahmad N, et al. Dilkash-20: a newly approved wheat variety recommended for Punjab, Pakistan with supreme yielding potential and disease resistance. SABRAO J Breed Genet. 2023b;55(2):298–308. https://doi.org/10.54910/sabrao2023.55.2.3 .

Ahmad S, Shahzad R, Jamil S, et al. Regulatory aspects, risk assessment, and toxicity associated with RNAi and CRISPR methods. In: Abd-Elsalam KA, Lim K, et al., editors. CRISPR and RNAi systems. Cambridge: Elsevier Inc.; 2021. p. 687–721.  https://doi.org/10.1016/B978-0-12-821910-2.00013-8 .

Ali M, Kundu S, Alam M, et al. Selection of genotypes and contributing characters to improve seed cotton yield of upland cotton ( Gossypium hirsutum L.). Asian Res J Agri. 2020;13(1):31–41. https://doi.org/10.9734/arja/2020/v13i130095 .

Ali MA, Farooq J, Batool A, et al. Cotton production in Pakistan. In: Jabran K, Chauhan BS, et al., editors. Cotton production. Hoboken: John Wiley & Sons Ltd; 2019. p. 249–76.  https://doi.org/10.1002/9781119385523.ch12 .

Almas HI, Azhar MT, Atif RM, et al. Adaptation of genetically modified crops in Pakistan. In: Nawaz MA, Chung G, Tsatsakis AM, et al., editors. GMOs and political stance. Cambridge: Elsevier Inc; 2023. p. 93–114.  https://doi.org/10.1016/B978-0-12-823903-2.00002-0 .

Arshad M, Suhail A, Gogi MD, et al. Farmers’ perceptions of insect pests and pest management practices in Bt cotton in the Punjab, Pakistan. Int J Pest Manage. 2009;55(1):1–10. https://doi.org/10.1080/09670870802419628 .

Arshad A, Raza MA, Zhang Y, et al. Impact of climate warming on cotton growth and yields in China and Pakistan: a regional perspective. Agriculture. 2021;11(2):97. https://doi.org/10.3390/agriculture11020097 .

Article   CAS   Google Scholar  

Cheema H, Khan A, Khan M, et al. Assessment of Bt cotton genotypes for the Cry1Ac transgene and its expression. J Agri Sci. 2016;154(1):109–17. https://doi.org/10.1017/S0021859615000325 .

Chen WB, Lu GQ, Cheng HM, et al. Transgenic cotton coexpressing Vip3A and Cry1Ac has a broad insecticidal spectrum against lepidopteran pests. J Invertebr Pathol. 2017;149:59–65. https://doi.org/10.1016/j.jip.2017.08.001 .

Gore J, Leonard B, Church G, et al. Behavior of bollworm (Lepidoptera: Noctuidae) larvae on genetically engineered cotton. J Econ Entomol. 2002;95(4):763–9. https://doi.org/10.1603/0022-0493-95.4.763 .

Ministry of Finance, Government of Pakistan. Agriculture. In: Economic survey of Pakistan. Ministry of Finance, Government of Pakistan. 2023. p. 19–30.

Iftikhar MS, Talha GM, Shahzad R, et al. Early response of cotton ( Gossypium hirsutum L.) genotype against drought stress. Inter J Biosci. 2019;14(2):537–44. https://doi.org/10.12692/ijb/14.2.536-543 .

Jaleel W, Saeed S, Naqqash MN, et al. Effects of temperature on baseline susceptibility and stability of insecticide resistance against Plutella xylostella (Lepidoptera: Plutellidae) in the absence of selection pressure. Saudi J Biol Sci. 2020;27(1):1–5. https://doi.org/10.1016/j.sjbs.2019.03.004 .

Jamil S, Shahzad R, Rahman SU, et al. The level of Cry1Ac endotoxin and its efficacy against H. armigera in Bt cotton at large scale in Pakistan. GM Crops & Food. 2021a;12(1):1–17. https://doi.org/10.1080/21645698.2020.1799644 .

Jamil S, Shahzad R, Iqbal MZ, et al. DNA fingerprinting and genetic diversity assessment of GM cotton genotypes for protection of plant breeders rights. Int J Agric Biol. 2021b;25(4):768–76. https://doi.org/10.17957/IJAB/15.1728 .

Jarwar AH, Wang X, Iqbal MS, et al. Genetic divergence on the basis of principal component, correlation and cluster analysis of yield and quality traits in cotton cultivars. Pak J Bot. 2019;51(3):1143–8. https://doi.org/10.30848/PJB2019-3(38) .

Karthik K, Negi J, Rathinam M, et al. Exploitation of novel Bt ICPs for the management of Helicoverpa armigera (Hübner) in cotton ( Gossypium hirsutum L.): a transgenic approach. Front Microbial. 2021;12:661212. https://doi.org/10.3389/fmicb.2021.661212 .

Kashif N, Cheema HMN, Khan AA, et al. Expression profiling of transgenes ( Cry1Ac and Cry2A ) in cotton genotypes under different genetic backgrounds. J Integr Agric. 2022;21(10):2818–32. https://doi.org/10.1016/j.jia.2022.07.033 .

Kranthi KR, Stone GD. Long-term impacts of Bt cotton in India. Nat Plants. 2020;6(3):188–96. https://doi.org/10.1038/s41477-020-0750-z .

Kranthi K, Dhawad C, Naidu S, et al. Bt-cotton seed as a source of Bacillus thuringiensis insecticidal Cry1Ac toxin for bioassays to detect and monitor bollworm resistance to Bt-cotton. Curr Sci. 2005;88(5):796–800. https://www.jstor.org/stable/24111269 .

CAS   Google Scholar  

Lei Y, Jaleel W, Shahzad MF, et al. Effect of constant and fluctuating temperature on the circadian foraging rhythm of the red imported fire ant, Solenopsis invicta Buren (Hymenoptera: Formicidae). Saudi J Biol Sci. 2021;28(1):64–72. https://doi.org/10.1016/j.sjbs.2020.08.032 .

Article   PubMed   Google Scholar  

Li H, Wu KM, Yang XR, et al. Trend of occurrence of cotton bollworm and control efficacy of Bt cotton in cotton planting region of southern Xinjiang. Sci Agric Sin. 2006;39(1):199–205. https://www.cabidigitallibrary.org/doi/full/10.5555/20073100228 .

Google Scholar  

Llewellyn DJ, Mares CL, Fitt GP. Field performance and seasonal changes in the efficacy against Helicoverpa armigera (Hübner) of transgenic cotton expressing the insecticidal protein vip3A. Agric for Entomol. 2007;9(2):93–101. https://doi.org/10.1111/j.1461-9563.2007.00332.x .

Lu Y, Wyckhuys KA, Yang L, et al. Bt cotton area contraction drives regional pest resurgence, crop loss, and pesticide use. Plant Biotechnol J. 2022;20(2):390–8. https://doi.org/10.1111/pbi.13721 .

O’Connell C, Osmond D. Why soil testing is not enough: a mixed methods study of farmer nutrient management decision-making among US producers. J Environ Manage. 2022;314:115027. https://doi.org/10.1016/j.jenvman.2022.115027 .

Razaq M, Mensah R, Athar HUR. Insect pest management in cotton. In: Jabran K, Chauhan BS, editors. Cotton production. Hoboken: John Wiley & Sons Ltd; 2019. p. 85–107.  https://doi.org/10.1002/9781119385523.ch5 .

Razzaq A, Zafar MM, ALI A, et al. Cotton germplasm improvement and progress in Pakistan. J Cotton Res. 2021;4(1):1–14. https://doi.org/10.1186/s42397-020-00077-x .

Rehman A, Jingdong L, Chandio AA, et al. Economic perspectives of cotton crop in Pakistan: a time series analysis (1970–2015)(Part 1). J Saudi Soc Agric Sci. 2019;18(1):49–54. https://doi.org/10.1016/j.jssas.2016.12.005 .

Ross Z, Wickham H, Robinson D. Declutter your R workflow with tidy tools. PeerJ Preprints. 2017;5:e3180v1. https://doi.org/10.7287/peerj.preprints.3180v1 .

Shahid MR, Farooq M, Shakeel M, et al. Need for growing non-Bt cotton refugia to overcome Bt resistance problem in targeted larvae of the cotton bollworms, Helicoverpa armigera and Pectinophora gossypiella . Egypt J Biol Pest Co. 2021;31:1–8. https://doi.org/10.1186/s41938-021-00384-8 .

Shahzad K, Mubeen I, Zhang M, et al. Progress and perspective on cotton breeding in Pakistan. J Cotton Res. 2022;5:29. https://doi.org/10.1186/s42397-022-00137-4 .

Shekhawat SS, Hasmi SK. Safety and benefits of Bt and Bt cotton: factures, refute, and allegations. In: Shekhawat SS, Irsad, Hasmi SK, editors. Genetic engineering. New York: Apple Academic Press; 2023. p. 23–52.  https://www.taylorfrancis.com/chapters/edit/10.1201/9781003378273-2 .

Spielman DJ, Zaidi F, Zambrano P, et al. What are farmers really planting? Measuring the presence and effectiveness of Bt cotton in Pakistan. PLoS One. 2017;12(5):e0176592. https://doi.org/10.1371/journal.pone.0176592 .

Article   CAS   PubMed   PubMed Central   Google Scholar  

Swami D, Parthasarathy D. A multidimensional perspective to farmers’ decision making determines the adaptation of the farming community. J Environ Manage. 2020;264:110487. https://doi.org/10.1016/j.jenvman.2020.110487 .

Tabashnik BE, Carrière Y. Surge in insect resistance to transgenic crops and prospects for sustainability. Nat Biotechnol. 2017;35(10):926–35. https://doi.org/10.1038/nbt.3974 .

Tabashnik BE, Brévault T, Carrière Y, et al. Insect resistance to Bt crops: lessons from the first billion acres. Nat Biotechnol. 2013;31(6):510–21. https://doi.org/10.1038/nbt.2597 .

Trapero C, Wilson IW, Stiller WN, et al. Enhancing integrated pest management in GM cotton systems using host plant resistance. Front Plant Sci. 2016;7:500. https://doi.org/10.3389/fpls.2016.00500 .

Article   PubMed   PubMed Central   Google Scholar  

Download references

Acknowledgements

The authors are thankful to Dr. Shakeel Ahmad, Seed Center, Ministry of Environment, Water and Agriculture, Riyadh, Dr. Muqadas Aleem, Department of Plant Breeding and Genetics, University of Agriculture, Faisalabad, Dr. Waseem Akbar, Maize and Millets Research Institute, Sahiwal for spending significant time on improvement of the technical aspect of our article and Mr. Ahmad Shehzad, Lab Assistant to assist in biophysical survey. Furthermore, Punjab Agriculture Research Board (PARB) for provision of funds for carrying out this study under Grant No. PARB 890.

This work was supported by Punjab Agriculture Research Board, Grant numbers PARB No. 890. Author S.J., S.U.R. and M.Z.I. has received research support from Punjab Agriculture Board.

Author information

Authors and affiliations.

Genetically Modified Organisms Development and Testing Laboratory, Agricultural Biotechnology Research Institute, Ayub Agricultural Research Institute, Faisalabad, Punjab, 38000, Pakistan

Shahzad Rahil, Jamil Shakra, Chaudhry Urooj Fatima, Rahman Sajid Ur & Iqbal Muhammad Zaffar

Centre of Excellence for Olive Research and Trainings (CEFORT), Barani, Agricultural Research Institute, Chakwal, Punjab, Pakistan

Iqbal Muhammad Zaffar

You can also search for this author in PubMed   Google Scholar

Contributions

Shahzad R, Jamil S, Rahman SU, and Iqbal MZ Conceived and designed the analysis; Shahzad R, Jamil S, and Chaudhry UF Collected the data; Shahzad R Chaudhry UF and Jamil S Contributed data or analysis tools; Shahzad R and Chaudhry UF Performed the analysis; Shahzad R and Chaudhry UF wrote the paper; Jamil S, Rahman SU, and Iqbal MZ proofread the manuscript. All authors read and approved the final version of the manuscript.

Corresponding author

Correspondence to Shahzad Rahil .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Supplementary Information

Supplementary table s1. list of varieties cultivated at farmer fields along with their transgene and approval status., 42397_2024_191_moesm2_esm.docx.

Supplementary Table S2. Frequency Table showing the interaction between cotton type (BT and Non-BT) and various pest infestations, including American bollworm (AB), pink bollworm (PB), whitefly, aphid, jassid, and mite.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Shahzad, R., Jamil, S., Chaudhry, U.F. et al. In-depth analysis of Bt cotton adoption: farmers' opinions, genetic landscape, and varied perspectives—a case study from Pakistan. J Cotton Res 7 , 31 (2024). https://doi.org/10.1186/s42397-024-00191-0

Download citation

Received : 21 January 2024

Accepted : 18 July 2024

Published : 04 September 2024

DOI : https://doi.org/10.1186/s42397-024-00191-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Cry1Ac, Cry2Ab
  • Farmer’s perception
  • Purposive sampling
  • Sucking insects
  • Unapproved varieties

Journal of Cotton Research

ISSN: 2523-3254

case study for data analysis

IMAGES

  1. How To Do Case Study Analysis?

    case study for data analysis

  2. (PDF) Conceptualizing Big Data: Analysis of Case Studies

    case study for data analysis

  3. case study data interpretation

    case study for data analysis

  4. The case study data collection and analysis process (an author's view

    case study for data analysis

  5. How to Perform Case Study Using Excel Data Analysis

    case study for data analysis

  6. Case Analysis: Examples + How-to Guide & Writing Tips

    case study for data analysis

VIDEO

  1. Data Science Research Showcase

  2. [R18] Case study 2 data analysis using R Language

  3. Data Driven business with SAP Datasphere 1

  4. CASE STUDIES FOR DATA ANALYTICS IN SOCIAL MEDIA || Big data analytics || 18CS71

  5. Discover the New McGill SCS Graduate Certificate in Data Analysis for Complex Systems

  6. ALX Patoranking scholarship and paid internship

COMMENTS

  1. Top 10 Real-World Data Science Case Studies

    Data quality issues, including missing or inaccurate data, can hinder analysis. Domain expertise gaps may result in misinterpretation of results. Resource constraints might limit project scope or access to necessary tools and talent. ... Real-world data science case studies play a crucial role in helping companies make informed decisions. By ...

  2. 10 Real World Data Science Case Studies Projects with Example

    A case study in data science is an in-depth analysis of a real-world problem using data-driven approaches. It involves collecting, cleaning, and analyzing data to extract insights and solve challenges, offering practical insights into how data science techniques can address complex issues across various industries.

  3. Data Analytics Case Study: Complete Guide in 2024

    Step 1: With Data Analytics Case Studies, Start by Making Assumptions. Hint: Start by making assumptions and thinking out loud. With this question, focus on coming up with a metric to support the hypothesis. If the question is unclear or if you think you need more information, be sure to ask.

  4. Top 25 Data Science Case Studies [2024]

    Top 25 Data Science Case Studies [2024] In an era where data is the new gold, harnessing its power through data science has led to groundbreaking advancements across industries. From personalized marketing to predictive maintenance, the applications of data science are not only diverse but transformative. This compilation of the top 25 data ...

  5. Data Science Case Studies: Solved and Explained

    1. Solving a Data Science case study means analyzing and solving a problem statement intensively. Solving case studies will help you show unique and amazing data science use cases in your ...

  6. Qualitative case study data analysis: an example from practice

    Data sources: The research example used is a multiple case study that explored the role of the clinical skills laboratory in preparing students for the real world of practice. Data analysis was conducted using a framework guided by the four stages of analysis outlined by Morse ( 1994 ): comprehending, synthesising, theorising and recontextualising.

  7. Data Analytics Case Study Guide 2024

    A data analytics case study comprises essential elements that structure the analytical journey: Problem Context: A case study begins with a defined problem or question. It provides the context for the data analysis, setting the stage for exploration and investigation.. Data Collection and Sources: It involves gathering relevant data from various sources, ensuring data accuracy, completeness ...

  8. Data Analysis Case Study: Learn From These Winning Data Projects

    Humana's Automated Data Analysis Case Study. The key thing to note here is that the approach to creating a successful data program varies from industry to industry. Let's start with one to demonstrate the kind of value you can glean from these kinds of success stories. Humana has provided health insurance to Americans for over 50 years.

  9. Data in Action: 7 Data Science Case Studies Worth Reading

    Data in Action: 7 Data Science Case Studies Worth Reading. The field of data science is rapidly growing and evolving. And in the next decade, new ways of automating data collection processes and deriving insights from data will boost workflow efficiencies like never before. There's no better way to understand the changing nature of data ...

  10. Google Data Analytics Capstone: Complete a Case Study

    There are 4 modules in this course. This course is the eighth and final course in the Google Data Analytics Certificate. You'll have the opportunity to complete a case study, which will help prepare you for your data analytics job hunt. Case studies are commonly used by employers to assess analytical skills. For your case study, you'll ...

  11. 12 Data Science Case Studies: Across Various Industries

    Top 12 Data Science Case Studies. 1. Data Science in Hospitality Industry. In the hospitality sector, data analytics assists hotels in better pricing strategies, customer analysis, brand marketing, tracking market trends, and many more. Airbnb focuses on growth by analyzing customer voice using data science. A famous example in this sector is ...

  12. Case Study

    Defnition: A case study is a research method that involves an in-depth examination and analysis of a particular phenomenon or case, such as an individual, organization, community, event, or situation. It is a qualitative research approach that aims to provide a detailed and comprehensive understanding of the case being studied.

  13. Case Study Method: A Step-by-Step Guide for Business Researchers

    Case study protocol is a formal document capturing the entire set of procedures involved in the collection of empirical material . It extends direction to researchers for gathering evidences, empirical material analysis, and case study reporting . This section includes a step-by-step guide that is used for the execution of the actual study.

  14. Case Study Methods and Examples

    The purpose of case study research is twofold: (1) to provide descriptive information and (2) to suggest theoretical relevance. Rich description enables an in-depth or sharpened understanding of the case. It is unique given one characteristic: case studies draw from more than one data source. Case studies are inherently multimodal or mixed ...

  15. What Is a Case Study?

    Case studies are good for describing, comparing, evaluating and understanding different aspects of a research problem. Table of contents. When to do a case study. Step 1: Select a case. Step 2: Build a theoretical framework. Step 3: Collect your data. Step 4: Describe and analyze the case.

  16. Data Analytics Case Studies: Unraveling Insights for Business ...

    In conclusion, data analytics case studies serve as invaluable tools for businesses seeking growth and innovation. By harnessing the power of data, organizations can make informed decisions ...

  17. Chapter 5: DATA ANALYSIS AND INTERPRETATION

    5.2 ANALYSIS OF DATA IN FLEXIBLE RESEARCH 5.2.1 Introduction. As case study research is a flexible research method, qualitative data analysis methods are commonly used [176]. The basic objective of the analysis is, as in any other analysis, to derive conclusions from the data, keeping a clear chain of evidence.

  18. Qualitative Case Study Data Analysis: An Example from Practice

    In the second data analysis stage we used case study analysis method (Houghton et al., 2014) and identified numerous case examples of the interventions undertaken by intellectual disability nurses ...

  19. PDF Analyzing Case Study Evidence

    For case study analysis, one of the most desirable techniques is to use a pattern-matching logic. Such a logic (Trochim, 1989) compares an empiri-cally based pattern with a predicted one (or with several alternative predic-tions). If the patterns coincide, the results can help a case study to strengthen its internal validity. If the case study ...

  20. Open Case Studies: Statistics and Data Science Education through Real

    question and to create an illustrative data analysis - and the domain expertise needed. As a result, case studies based on realistic challenges, not toy examples, are scarce. To address this, we developed the Open Case Studies (opencasestudies.org) project, which offers a new statistical and data science education case study model.

  21. Qualitative case study data analysis: an example from practice

    Data sources The research example used is a multiple case study that explored the role of the clinical skills laboratory in preparing students for the real world of practice. Data analysis was conducted using a framework guided by the four stages of analysis outlined by Morse ( 1994 ): comprehending, synthesising, theorising and recontextualising.

  22. Google Data Analytics Capstone: Complete a Case Study

    There are 4 modules in this course. This course is the eighth and final course in the Google Data Analytics Certificate. You'll have the opportunity to complete a case study, which will help prepare you for your data analytics job hunt. Case studies are commonly used by employers to assess analytical skills. For your case study, you'll ...

  23. Methodologic and Data-Analysis Triangulation in Case Studies: A Scoping

    15-17,20,22 Three studies described the cross-case analysis using qualitative data. Two studies reported a combination of qualitative and quantitative data for the cross-case analysis. In each multiple-case study, the individual cases were contrasted to identify the differences and similarities between the cases.

  24. Case Study Methodology of Qualitative Research: Key Attributes and

    A case study is one of the most commonly used methodologies of social research. This article attempts to look into the various dimensions of a case study research strategy, the different epistemological strands which determine the particular case study type and approach adopted in the field, discusses the factors which can enhance the effectiveness of a case study research, and the debate ...

  25. Node attribute analysis for cultural data analytics: a case study on

    Cultural data analytics aims to use analytic methods to explore cultural expressions—for instance art, literature, dance, music. The common thing between cultural expressions is that they have multiple qualitatively different facets that interact with each other in non trivial and non learnable ways. To support this observation, we use the Italian music record industry from 1902 to 2024 as a ...

  26. Synergistic assessment of multi-scenario urban waterlogging through

    The dynamic interplay of urban expansion, infrastructure developments, and climate fluctuations may introduce changes in flood susceptibility that are not captured in this analysis. Moreover, while the data-driven model employed in this study offers significant advantages over traditional hydraulic models in computational efficiency and ...

  27. Crafting Tempo and Timeframes in Qualitative Longitudinal Research

    When conducting QLR, time is the lens used to inform the overall study design and processes of data collection and analysis. While QLR is an evolving methodology, spanning diverse disciplines (Holland et al., 2006), a key feature is the collection of data on more than one occasion, often described as waves (Neale, 2021).Thus, researchers embarking on designing a new study need to consider ...

  28. IJGI

    The research employed the case study methodology, utilizing structured questionnaires as the primary means of data gathering. The study employed a probability-random sampling strategy to select a diverse and representative group of participants. ... All data analysis was performed using the Statistical Products and Services Solution (SPSS ...

  29. iMRMC package

    This software does Multi-Reader, Multi-Case (MRMC) analyses of data from imaging studies where clinicians (readers) evaluate patient images (cases). What does this mean? ... Many imaging studies are designed so that every reader reads every case in all modalities, a fully-crossed study. In this case, the data is cross-correlated, and we consider the readers and cases to be cross-correlated ...

  30. In-depth analysis of Bt cotton adoption: farmers' opinions, genetic

    Background Bt technology has played significant role in controlling bollworms and increasing cotton yield in earlier days of its introduction, a subsequent decline in yield became apparent over time. This decline may be attributed to various environmental factors, pest dynamics, or combination of both. Therefore, the present biophysical survey and questionnaire were designed to evaluate the ...