example of statistical treatment in research paper

Community Blog

Keep up-to-date on postgraduate related issues with our quick reads written by students, postdocs, professors and industry leaders.

Statistical Treatment of Data – Explained & Example

DiscoverPhDs

  • By DiscoverPhDs
  • September 8, 2020

Statistical Treatment of Data in Research

‘Statistical treatment’ is when you apply a statistical method to a data set to draw meaning from it. Statistical treatment can be either descriptive statistics, which describes the relationship between variables in a population, or inferential statistics, which tests a hypothesis by making inferences from the collected data.

Introduction to Statistical Treatment in Research

Every research student, regardless of whether they are a biologist, computer scientist or psychologist, must have a basic understanding of statistical treatment if their study is to be reliable.

This is because designing experiments and collecting data are only a small part of conducting research. The other components, which are often not so well understood by new researchers, are the analysis, interpretation and presentation of the data. This is just as important, if not more important, as this is where meaning is extracted from the study .

What is Statistical Treatment of Data?

Statistical treatment of data is when you apply some form of statistical method to a data set to transform it from a group of meaningless numbers into meaningful output.

Statistical treatment of data involves the use of statistical methods such as:

  • regression,
  • conditional probability,
  • standard deviation and
  • distribution range.

These statistical methods allow us to investigate the statistical relationships between the data and identify possible errors in the study.

In addition to being able to identify trends, statistical treatment also allows us to organise and process our data in the first place. This is because when carrying out statistical analysis of our data, it is generally more useful to draw several conclusions for each subgroup within our population than to draw a single, more general conclusion for the whole population. However, to do this, we need to be able to classify the population into different subgroups so that we can later break down our data in the same way before analysing it.

Statistical Treatment Example – Quantitative Research

Statistical Treatment of Data Example

For a statistical treatment of data example, consider a medical study that is investigating the effect of a drug on the human population. As the drug can affect different people in different ways based on parameters such as gender, age and race, the researchers would want to group the data into different subgroups based on these parameters to determine how each one affects the effectiveness of the drug. Categorising the data in this way is an example of performing basic statistical treatment.

Type of Errors

A fundamental part of statistical treatment is using statistical methods to identify possible outliers and errors. No matter how careful we are, all experiments are subject to inaccuracies resulting from two types of errors: systematic errors and random errors.

Systematic errors are errors associated with either the equipment being used to collect the data or with the method in which they are used. Random errors are errors that occur unknowingly or unpredictably in the experimental configuration, such as internal deformations within specimens or small voltage fluctuations in measurement testing instruments.

These experimental errors, in turn, can lead to two types of conclusion errors: type I errors and type II errors . A type I error is a false positive which occurs when a researcher rejects a true null hypothesis. On the other hand, a type II error is a false negative which occurs when a researcher fails to reject a false null hypothesis.

Dissertation versus Thesis

In the UK, a dissertation, usually around 20,000 words is written by undergraduate and Master’s students, whilst a thesis, around 80,000 words, is written as part of a PhD.

DiscoverPhDs_Annotated_Bibliography_Literature_Review

Find out the differences between a Literature Review and an Annotated Bibliography, whey they should be used and how to write them.

Multistage Sampling explained with Multistage Sample

Multistage sampling is a more complex form of cluster sampling for obtaining sample populations. Learn their pros and cons and how to undertake them.

Join thousands of other students and stay up to date with the latest PhD programmes, funding opportunities and advice.

example of statistical treatment in research paper

Browse PhDs Now

What do you call a professor?

You’ll come across many academics with PhD, some using the title of Doctor and others using Professor. This blog post helps you understand the differences.

Types of Research Design

There are various types of research that are classified by objective, depth of study, analysed data and the time required to study the phenomenon etc.

Hannah-Mae-Lewis-Profile

Hannah is a 1st year PhD student at Cardiff Metropolitan University. The aim of her research is to clarify what strategies are the most effective in supporting young people with dyslexia.

Carina Nicu Profile Picture

Carina is a PhD student at The University of Manchester who has just defended her viva. Her research focuses on dermal white adipose tissue regulates human hair follicle growth and cycling.

Join Thousands of Students

Research Paper Statistical Treatment of Data: A Primer

We can all agree that analyzing and presenting data effectively in a research paper is critical, yet often challenging.

This primer on statistical treatment of data will equip you with the key concepts and procedures to accurately analyze and clearly convey research findings.

You'll discover the fundamentals of statistical analysis and data management, the common quantitative and qualitative techniques, how to visually represent data, and best practices for writing the results - all framed specifically for research papers.

If you are curious on how AI can help you with statistica analysis for research, check Hepta AI .

Introduction to Statistical Treatment in Research

Statistical analysis is a crucial component of both quantitative and qualitative research. Properly treating data enables researchers to draw valid conclusions from their studies. This primer provides an introductory guide to fundamental statistical concepts and methods for manuscripts.

Understanding the Importance of Statistical Treatment

Careful statistical treatment demonstrates the reliability of results and ensures findings are grounded in robust quantitative evidence. From determining appropriate sample sizes to selecting accurate analytical tests, statistical rigor adds credibility. Both quantitative and qualitative papers benefit from precise data handling.

Objectives of the Primer

This primer aims to equip researchers with best practices for:

Statistical tools to apply during different research phases

Techniques to manage, analyze, and present data

Methods to demonstrate the validity and reliability of measurements

By covering fundamental concepts ranging from descriptive statistics to measurement validity, it enables both novice and experienced researchers to incorporate proper statistical treatment.

Navigating the Primer: Key Topics and Audience

The primer spans introductory topics including:

Research planning and design

Data collection, management, analysis

Result presentation and interpretation

While useful for researchers at any career stage, earlier-career scientists with limited statistical exposure will find it particularly valuable as they prepare manuscripts.

How do you write a statistical method in a research paper?

Statistical methods are a critical component of research papers, allowing you to analyze, interpret, and draw conclusions from your study data. When writing the statistical methods section, you need to provide enough detail so readers can evaluate the appropriateness of the methods you used.

Here are some key things to include when describing statistical methods in a research paper:

Type of Statistical Tests Used

Specify the types of statistical tests performed on the data, including:

Parametric vs nonparametric tests

Descriptive statistics (means, standard deviations)

Inferential statistics (t-tests, ANOVA, regression, etc.)

Statistical significance level (often p < 0.05)

For example: We used t-tests and one-way ANOVA to compare means across groups, with statistical significance set at p < 0.05.

Analysis of Subgroups

If you examined subgroups or additional variables, describe the methods used for these analyses.

For example: We stratified data by gender and used chi-square tests to analyze differences between subgroups.

Software and Versions

List any statistical software packages used for analysis, including version numbers. Common programs include SPSS, SAS, R, and Stata.

For example: Data were analyzed using SPSS version 25 (IBM Corp, Armonk, NY).

The key is to give readers enough detail to assess the rigor and appropriateness of your statistical methods. The methods should align with your research aims and design. Keep explanations clear and concise using consistent terminology throughout the paper.

What are the 5 statistical treatment in research?

The five most common statistical treatments used in academic research papers include:

The mean, or average, is used to describe the central tendency of a dataset. It provides a singular value that represents the middle of a distribution of numbers. Calculating means allows researchers to characterize typical observations within a sample.

Standard Deviation

Standard deviation measures the amount of variability in a dataset. A low standard deviation indicates observations are clustered closely around the mean, while a high standard deviation signifies the data is more spread out. Reporting standard deviations helps readers contextualize means.

Regression Analysis

Regression analysis models the relationship between independent and dependent variables. It generates an equation that predicts changes in the dependent variable based on changes in the independents. Regressions are useful for hypothesizing causal connections between variables.

Hypothesis Testing

Hypothesis testing evaluates assumptions about population parameters based on statistics calculated from a sample. Common hypothesis tests include t-tests, ANOVA, and chi-squared. These quantify the likelihood of observed differences being due to chance.

Sample Size Determination

Sample size calculations identify the minimum number of observations needed to detect effects of a given size at a desired statistical power. Appropriate sampling ensures studies can uncover true relationships within the constraints of resource limitations.

These five statistical analysis methods form the backbone of most quantitative research processes. Correct application allows researchers to characterize data trends, model predictive relationships, and make probabilistic inferences regarding broader populations. Expertise in these techniques is fundamental for producing valid, reliable, and publishable academic studies.

How do you know what statistical treatment to use in research?

The selection of appropriate statistical methods for the treatment of data in a research paper depends on three key factors:

The Aim and Objective of the Study

The aim and objectives that the study seeks to achieve will determine the type of statistical analysis required.

Descriptive research presenting characteristics of the data may only require descriptive statistics like measures of central tendency (mean, median, mode) and dispersion (range, standard deviation).

Studies aiming to establish relationships or differences between variables need inferential statistics like correlation, t-tests, ANOVA, regression etc.

Predictive modeling research requires methods like regression, discriminant analysis, logistic regression etc.

Thus, clearly identifying the research purpose and objectives is the first step in planning appropriate statistical treatment.

Type and Distribution of Data

The type of data (categorical, numerical) and its distribution (normal, skewed) also guide the choice of statistical techniques.

Parametric tests have assumptions related to normality and homogeneity of variance.

Non-parametric methods are distribution-free and better suited for non-normal or categorical data.

Testing data distribution and characteristics is therefore vital.

Nature of Observations

Statistical methods also differ based on whether the observations are paired or unpaired.

Analyzing changes within one group requires paired tests like paired t-test, Wilcoxon signed-rank test etc.

Comparing between two or more independent groups needs unpaired tests like independent t-test, ANOVA, Kruskal-Wallis test etc.

Thus the nature of observations is pivotal in selecting suitable statistical analyses.

In summary, clearly defining the research objectives, testing the collected data, and understanding the observational units guides proper statistical treatment and interpretation.

What is statistical techniques in research paper?

Statistical methods are essential tools in scientific research papers. They allow researchers to summarize, analyze, interpret and present data in meaningful ways.

Some key statistical techniques used in research papers include:

Descriptive statistics: These provide simple summaries of the sample and the measures. Common examples include measures of central tendency (mean, median, mode), measures of variability (range, standard deviation) and graphs (histograms, pie charts).

Inferential statistics: These help make inferences and predictions about a population from a sample. Common techniques include estimation of parameters, hypothesis testing, correlation and regression analysis.

Analysis of variance (ANOVA): This technique allows researchers to compare means across multiple groups and determine statistical significance.

Factor analysis: This technique identifies underlying relationships between variables and latent constructs. It allows reducing a large set of variables into fewer factors.

Structural equation modeling: This technique estimates causal relationships using both latent and observed factors. It is widely used for testing theoretical models in social sciences.

Proper statistical treatment and presentation of data are crucial for the integrity of any quantitative research paper. Statistical techniques help establish validity, account for errors, test hypotheses, build models and derive meaningful insights from the research.

Fundamental Concepts and Data Management

Exploring basic statistical terms.

Understanding key statistical concepts is essential for effective research design and data analysis. This includes defining key terms like:

Statistics : The science of collecting, organizing, analyzing, and interpreting numerical data to draw conclusions or make predictions.

Variables : Characteristics or attributes of the study participants that can take on different values.

Measurement : The process of assigning numbers to variables based on a set of rules.

Sampling : Selecting a subset of a larger population to estimate characteristics of the whole population.

Data types : Quantitative (numerical) or qualitative (categorical) data.

Descriptive vs. inferential statistics : Descriptive statistics summarize data while inferential statistics allow making conclusions from the sample to the larger population.

Ensuring Validity and Reliability in Measurement

When selecting measurement instruments, it is critical they demonstrate:

Validity : The extent to which the instrument measures what it intends to measure.

Reliability : The consistency of measurement over time and across raters.

Researchers should choose instruments aligned to their research questions and study methodology .

Data Management Essentials

Proper data management requires:

Ethical collection procedures respecting autonomy, justice, beneficence and non-maleficence.

Handling missing data through deletion, imputation or modeling procedures.

Data cleaning by identifying and fixing errors, inconsistencies and duplicates.

Data screening via visual inspection and statistical methods to detect anomalies.

Data Management Techniques and Ethical Considerations

Ethical data management includes:

Obtaining informed consent from all participants.

Anonymization and encryption to protect privacy.

Secure data storage and transfer procedures.

Responsible use of statistical tools free from manipulation or misrepresentation.

Adhering to ethical guidelines preserves public trust in the integrity of research.

Statistical Methods and Procedures

This section provides an introduction to key quantitative analysis techniques and guidance on when to apply them to different types of research questions and data.

Descriptive Statistics and Data Summarization

Descriptive statistics summarize and organize data characteristics such as central tendency, variability, and distributions. Common descriptive statistical methods include:

Measures of central tendency (mean, median, mode)

Measures of variability (range, interquartile range, standard deviation)

Graphical representations (histograms, box plots, scatter plots)

Frequency distributions and percentages

These methods help describe and summarize the sample data so researchers can spot patterns and trends.

Inferential Statistics for Generalizing Findings

While descriptive statistics summarize sample data, inferential statistics help generalize findings to the larger population. Common techniques include:

Hypothesis testing with t-tests, ANOVA

Correlation and regression analysis

Nonparametric tests

These methods allow researchers to draw conclusions and make predictions about the broader population based on the sample data.

Selecting the Right Statistical Tools

Choosing the appropriate analyses involves assessing:

The research design and questions asked

Type of data (categorical, continuous)

Data distributions

Statistical assumptions required

Matching the correct statistical tests to these elements helps ensure accurate results.

Statistical Treatment of Data for Quantitative Research

For quantitative research, common statistical data treatments include:

Testing data reliability and validity

Checking assumptions of statistical tests

Transforming non-normal data

Identifying and handling outliers

Applying appropriate analyses for the research questions and data type

Examples and case studies help demonstrate correct application of statistical tests.

Approaches to Qualitative Data Analysis

Qualitative data is analyzed through methods like:

Thematic analysis

Content analysis

Discourse analysis

Grounded theory

These help researchers discover concepts and patterns within non-numerical data to derive rich insights.

Data Presentation and Research Method

Crafting effective visuals for data presentation.

When presenting analyzed results and statistics in a research paper, well-designed tables, graphs, and charts are key for clearly showcasing patterns in the data to readers. Adhering to formatting standards like APA helps ensure professional data presentation. Consider these best practices:

Choose the appropriate visual type based on the type of data and relationship being depicted. For example, bar charts for comparing categorical data, line graphs to show trends over time.

Label the x-axis, y-axis, legends clearly. Include informative captions.

Use consistent, readable fonts and sizing. Avoid clutter with unnecessary elements. White space can aid readability.

Order data logically. Such as largest to smallest values, or chronologically.

Include clear statistical notations, like error bars, where applicable.

Following academic standards for visuals lends credibility while making interpretation intuitive for readers.

Writing the Results Section with Clarity

When writing the quantitative Results section, aim for clarity by balancing statistical reporting with interpretation of findings. Consider this structure:

Open with an overview of the analysis approach and measurements used.

Break down results by logical subsections for each hypothesis, construct measured etc.

Report exact statistics first, followed by interpretation of their meaning. For example, “Participants exposed to the intervention had significantly higher average scores (M=78, SD=3.2) compared to controls (M=71, SD=4.1), t(115)=3.42, p = 0.001. This suggests the intervention was highly effective for increasing scores.”

Use present verb tense. And scientific, formal language.

Include tables/figures where they aid understanding or visualization.

Writing results clearly gives readers deeper context around statistical findings.

Highlighting Research Method and Design

With a results section full of statistics, it's vital to communicate key aspects of the research method and design. Consider including:

Brief overview of study variables, materials, apparatus used. Helps reproducibility.

Descriptions of study sampling techniques, data collection procedures. Supports transparency.

Explanations around approaches to measurement, data analysis performed. Bolsters methodological rigor.

Noting control variables, attempts to limit biases etc. Demonstrates awareness of limitations.

Covering these methodological details shows readers the care taken in designing the study and analyzing the results obtained.

Acknowledging Limitations and Addressing Biases

Honestly recognizing methodological weaknesses and limitations goes a long way in establishing credibility within the published discussion section. Consider transparently noting:

Measurement errors and biases that may have impacted findings.

Limitations around sampling methods that constrain generalizability.

Caveats related to statistical assumptions, analysis techniques applied.

Attempts made to control/account for biases and directions for future research.

Rather than detracting value, acknowledging limitations demonstrates academic integrity regarding the research performed. It also gives readers deeper insight into interpreting the reported results and findings.

Conclusion: Synthesizing Statistical Treatment Insights

Recap of statistical treatment fundamentals.

Statistical treatment of data is a crucial component of high-quality quantitative research. Proper application of statistical methods and analysis principles enables valid interpretations and inferences from study data. Key fundamentals covered include:

Descriptive statistics to summarize and describe the basic features of study data

Inferential statistics to make judgments of the probability and significance based on the data

Using appropriate statistical tools aligned to the research design and objectives

Following established practices for measurement techniques, data collection, and reporting

Adhering to these core tenets ensures research integrity and allows findings to withstand scientific scrutiny.

Key Takeaways for Research Paper Success

When incorporating statistical treatment into a research paper, keep these best practices in mind:

Clearly state the research hypothesis and variables under examination

Select reliable and valid quantitative measures for assessment

Determine appropriate sample size to achieve statistical power

Apply correct analytical methods suited to the data type and distribution

Comprehensively report methodology procedures and statistical outputs

Interpret results in context of the study limitations and scope

Following these guidelines will bolster confidence in the statistical treatment and strengthen the research quality overall.

Encouraging Continued Learning and Application

As statistical techniques continue advancing, it is imperative for researchers to actively further their statistical literacy. Regularly reviewing new methodological developments and learning advanced tools will augment analytical capabilities. Persistently putting enhanced statistical knowledge into practice through research projects and manuscript preparations will cement competencies. Statistical treatment mastery is a journey requiring persistent effort, but one that pays dividends in research proficiency.

Avatar of Antonio Carlos Filho

Antonio Carlos Filho @acfilho_dev

13. Study design and choosing a statistical test

Sample size.

example of statistical treatment in research paper

The Beginner's Guide to Statistical Analysis | 5 Steps & Examples

Statistical analysis means investigating trends, patterns, and relationships using quantitative data . It is an important research tool used by scientists, governments, businesses, and other organisations.

To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process . You need to specify your hypotheses and make decisions about your research design, sample size, and sampling procedure.

After collecting data from your sample, you can organise and summarise the data using descriptive statistics . Then, you can use inferential statistics to formally test hypotheses and make estimates about the population. Finally, you can interpret and generalise your findings.

This article is a practical introduction to statistical analysis for students and researchers. We’ll walk you through the steps using two research examples. The first investigates a potential cause-and-effect relationship, while the second investigates a potential correlation between variables.

Table of contents

Step 1: write your hypotheses and plan your research design, step 2: collect data from a sample, step 3: summarise your data with descriptive statistics, step 4: test hypotheses or make estimates with inferential statistics, step 5: interpret your results, frequently asked questions about statistics.

To collect valid data for statistical analysis, you first need to specify your hypotheses and plan out your research design.

Writing statistical hypotheses

The goal of research is often to investigate a relationship between variables within a population . You start with a prediction, and use statistical analysis to test that prediction.

A statistical hypothesis is a formal way of writing a prediction about a population. Every research prediction is rephrased into null and alternative hypotheses that can be tested using sample data.

While the null hypothesis always predicts no effect or no relationship between variables, the alternative hypothesis states your research prediction of an effect or relationship.

  • Null hypothesis: A 5-minute meditation exercise will have no effect on math test scores in teenagers.
  • Alternative hypothesis: A 5-minute meditation exercise will improve math test scores in teenagers.
  • Null hypothesis: Parental income and GPA have no relationship with each other in college students.
  • Alternative hypothesis: Parental income and GPA are positively correlated in college students.

Planning your research design

A research design is your overall strategy for data collection and analysis. It determines the statistical tests you can use to test your hypothesis later on.

First, decide whether your research will use a descriptive, correlational, or experimental design. Experiments directly influence variables, whereas descriptive and correlational studies only measure variables.

  • In an experimental design , you can assess a cause-and-effect relationship (e.g., the effect of meditation on test scores) using statistical tests of comparison or regression.
  • In a correlational design , you can explore relationships between variables (e.g., parental income and GPA) without any assumption of causality using correlation coefficients and significance tests.
  • In a descriptive design , you can study the characteristics of a population or phenomenon (e.g., the prevalence of anxiety in U.S. college students) using statistical tests to draw inferences from sample data.

Your research design also concerns whether you’ll compare participants at the group level or individual level, or both.

  • In a between-subjects design , you compare the group-level outcomes of participants who have been exposed to different treatments (e.g., those who performed a meditation exercise vs those who didn’t).
  • In a within-subjects design , you compare repeated measures from participants who have participated in all treatments of a study (e.g., scores from before and after performing a meditation exercise).
  • In a mixed (factorial) design , one variable is altered between subjects and another is altered within subjects (e.g., pretest and posttest scores from participants who either did or didn’t do a meditation exercise).
  • Experimental
  • Correlational

First, you’ll take baseline test scores from participants. Then, your participants will undergo a 5-minute meditation exercise. Finally, you’ll record participants’ scores from a second math test.

In this experiment, the independent variable is the 5-minute meditation exercise, and the dependent variable is the math test score from before and after the intervention. Example: Correlational research design In a correlational study, you test whether there is a relationship between parental income and GPA in graduating college students. To collect your data, you will ask participants to fill in a survey and self-report their parents’ incomes and their own GPA.

Measuring variables

When planning a research design, you should operationalise your variables and decide exactly how you will measure them.

For statistical analysis, it’s important to consider the level of measurement of your variables, which tells you what kind of data they contain:

  • Categorical data represents groupings. These may be nominal (e.g., gender) or ordinal (e.g. level of language ability).
  • Quantitative data represents amounts. These may be on an interval scale (e.g. test score) or a ratio scale (e.g. age).

Many variables can be measured at different levels of precision. For example, age data can be quantitative (8 years old) or categorical (young). If a variable is coded numerically (e.g., level of agreement from 1–5), it doesn’t automatically mean that it’s quantitative instead of categorical.

Identifying the measurement level is important for choosing appropriate statistics and hypothesis tests. For example, you can calculate a mean score with quantitative data, but not with categorical data.

In a research study, along with measures of your variables of interest, you’ll often collect data on relevant participant characteristics.

Population vs sample

In most cases, it’s too difficult or expensive to collect data from every member of the population you’re interested in studying. Instead, you’ll collect data from a sample.

Statistical analysis allows you to apply your findings beyond your own sample as long as you use appropriate sampling procedures . You should aim for a sample that is representative of the population.

Sampling for statistical analysis

There are two main approaches to selecting a sample.

  • Probability sampling: every member of the population has a chance of being selected for the study through random selection.
  • Non-probability sampling: some members of the population are more likely than others to be selected for the study because of criteria such as convenience or voluntary self-selection.

In theory, for highly generalisable findings, you should use a probability sampling method. Random selection reduces sampling bias and ensures that data from your sample is actually typical of the population. Parametric tests can be used to make strong statistical inferences when data are collected using probability sampling.

But in practice, it’s rarely possible to gather the ideal sample. While non-probability samples are more likely to be biased, they are much easier to recruit and collect data from. Non-parametric tests are more appropriate for non-probability samples, but they result in weaker inferences about the population.

If you want to use parametric tests for non-probability samples, you have to make the case that:

  • your sample is representative of the population you’re generalising your findings to.
  • your sample lacks systematic bias.

Keep in mind that external validity means that you can only generalise your conclusions to others who share the characteristics of your sample. For instance, results from Western, Educated, Industrialised, Rich and Democratic samples (e.g., college students in the US) aren’t automatically applicable to all non-WEIRD populations.

If you apply parametric tests to data from non-probability samples, be sure to elaborate on the limitations of how far your results can be generalised in your discussion section .

Create an appropriate sampling procedure

Based on the resources available for your research, decide on how you’ll recruit participants.

  • Will you have resources to advertise your study widely, including outside of your university setting?
  • Will you have the means to recruit a diverse sample that represents a broad population?
  • Do you have time to contact and follow up with members of hard-to-reach groups?

Your participants are self-selected by their schools. Although you’re using a non-probability sample, you aim for a diverse and representative sample. Example: Sampling (correlational study) Your main population of interest is male college students in the US. Using social media advertising, you recruit senior-year male college students from a smaller subpopulation: seven universities in the Boston area.

Calculate sufficient sample size

Before recruiting participants, decide on your sample size either by looking at other studies in your field or using statistics. A sample that’s too small may be unrepresentative of the sample, while a sample that’s too large will be more costly than necessary.

There are many sample size calculators online. Different formulas are used depending on whether you have subgroups or how rigorous your study should be (e.g., in clinical research). As a rule of thumb, a minimum of 30 units or more per subgroup is necessary.

To use these calculators, you have to understand and input these key components:

  • Significance level (alpha): the risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%.
  • Statistical power : the probability of your study detecting an effect of a certain size if there is one, usually 80% or higher.
  • Expected effect size : a standardised indication of how large the expected result of your study will be, usually based on other similar studies.
  • Population standard deviation: an estimate of the population parameter based on a previous study or a pilot study of your own.

Once you’ve collected all of your data, you can inspect them and calculate descriptive statistics that summarise them.

Inspect your data

There are various ways to inspect your data, including the following:

  • Organising data from each variable in frequency distribution tables .
  • Displaying data from a key variable in a bar chart to view the distribution of responses.
  • Visualising the relationship between two variables using a scatter plot .

By visualising your data in tables and graphs, you can assess whether your data follow a skewed or normal distribution and whether there are any outliers or missing data.

A normal distribution means that your data are symmetrically distributed around a center where most values lie, with the values tapering off at the tail ends.

Mean, median, mode, and standard deviation in a normal distribution

In contrast, a skewed distribution is asymmetric and has more values on one end than the other. The shape of the distribution is important to keep in mind because only some descriptive statistics should be used with skewed distributions.

Extreme outliers can also produce misleading statistics, so you may need a systematic approach to dealing with these values.

Calculate measures of central tendency

Measures of central tendency describe where most of the values in a data set lie. Three main measures of central tendency are often reported:

  • Mode : the most popular response or value in the data set.
  • Median : the value in the exact middle of the data set when ordered from low to high.
  • Mean : the sum of all values divided by the number of values.

However, depending on the shape of the distribution and level of measurement, only one or two of these measures may be appropriate. For example, many demographic characteristics can only be described using the mode or proportions, while a variable like reaction time may not have a mode at all.

Calculate measures of variability

Measures of variability tell you how spread out the values in a data set are. Four main measures of variability are often reported:

  • Range : the highest value minus the lowest value of the data set.
  • Interquartile range : the range of the middle half of the data set.
  • Standard deviation : the average distance between each value in your data set and the mean.
  • Variance : the square of the standard deviation.

Once again, the shape of the distribution and level of measurement should guide your choice of variability statistics. The interquartile range is the best measure for skewed distributions, while standard deviation and variance provide the best information for normal distributions.

Using your table, you should check whether the units of the descriptive statistics are comparable for pretest and posttest scores. For example, are the variance levels similar across the groups? Are there any extreme values? If there are, you may need to identify and remove extreme outliers in your data set or transform your data before performing a statistical test.

From this table, we can see that the mean score increased after the meditation exercise, and the variances of the two scores are comparable. Next, we can perform a statistical test to find out if this improvement in test scores is statistically significant in the population. Example: Descriptive statistics (correlational study) After collecting data from 653 students, you tabulate descriptive statistics for annual parental income and GPA.

It’s important to check whether you have a broad range of data points. If you don’t, your data may be skewed towards some groups more than others (e.g., high academic achievers), and only limited inferences can be made about a relationship.

A number that describes a sample is called a statistic , while a number describing a population is called a parameter . Using inferential statistics , you can make conclusions about population parameters based on sample statistics.

Researchers often use two main methods (simultaneously) to make inferences in statistics.

  • Estimation: calculating population parameters based on sample statistics.
  • Hypothesis testing: a formal process for testing research predictions about the population using samples.

You can make two types of estimates of population parameters from sample statistics:

  • A point estimate : a value that represents your best guess of the exact parameter.
  • An interval estimate : a range of values that represent your best guess of where the parameter lies.

If your aim is to infer and report population characteristics from sample data, it’s best to use both point and interval estimates in your paper.

You can consider a sample statistic a point estimate for the population parameter when you have a representative sample (e.g., in a wide public opinion poll, the proportion of a sample that supports the current government is taken as the population proportion of government supporters).

There’s always error involved in estimation, so you should also provide a confidence interval as an interval estimate to show the variability around a point estimate.

A confidence interval uses the standard error and the z score from the standard normal distribution to convey where you’d generally expect to find the population parameter most of the time.

Hypothesis testing

Using data from a sample, you can test hypotheses about relationships between variables in the population. Hypothesis testing starts with the assumption that the null hypothesis is true in the population, and you use statistical tests to assess whether the null hypothesis can be rejected or not.

Statistical tests determine where your sample data would lie on an expected distribution of sample data if the null hypothesis were true. These tests give two main outputs:

  • A test statistic tells you how much your data differs from the null hypothesis of the test.
  • A p value tells you the likelihood of obtaining your results if the null hypothesis is actually true in the population.

Statistical tests come in three main varieties:

  • Comparison tests assess group differences in outcomes.
  • Regression tests assess cause-and-effect relationships between variables.
  • Correlation tests assess relationships between variables without assuming causation.

Your choice of statistical test depends on your research questions, research design, sampling method, and data characteristics.

Parametric tests

Parametric tests make powerful inferences about the population based on sample data. But to use them, some assumptions must be met, and only some types of variables can be used. If your data violate these assumptions, you can perform appropriate data transformations or use alternative non-parametric tests instead.

A regression models the extent to which changes in a predictor variable results in changes in outcome variable(s).

  • A simple linear regression includes one predictor variable and one outcome variable.
  • A multiple linear regression includes two or more predictor variables and one outcome variable.

Comparison tests usually compare the means of groups. These may be the means of different groups within a sample (e.g., a treatment and control group), the means of one sample group taken at different times (e.g., pretest and posttest scores), or a sample mean and a population mean.

  • A t test is for exactly 1 or 2 groups when the sample is small (30 or less).
  • A z test is for exactly 1 or 2 groups when the sample is large.
  • An ANOVA is for 3 or more groups.

The z and t tests have subtypes based on the number and types of samples and the hypotheses:

  • If you have only one sample that you want to compare to a population mean, use a one-sample test .
  • If you have paired measurements (within-subjects design), use a dependent (paired) samples test .
  • If you have completely separate measurements from two unmatched groups (between-subjects design), use an independent (unpaired) samples test .
  • If you expect a difference between groups in a specific direction, use a one-tailed test .
  • If you don’t have any expectations for the direction of a difference between groups, use a two-tailed test .

The only parametric correlation test is Pearson’s r . The correlation coefficient ( r ) tells you the strength of a linear relationship between two quantitative variables.

However, to test whether the correlation in the sample is strong enough to be important in the population, you also need to perform a significance test of the correlation coefficient, usually a t test, to obtain a p value. This test uses your sample size to calculate how much the correlation coefficient differs from zero in the population.

You use a dependent-samples, one-tailed t test to assess whether the meditation exercise significantly improved math test scores. The test gives you:

  • a t value (test statistic) of 3.00
  • a p value of 0.0028

Although Pearson’s r is a test statistic, it doesn’t tell you anything about how significant the correlation is in the population. You also need to test whether this sample correlation coefficient is large enough to demonstrate a correlation in the population.

A t test can also determine how significantly a correlation coefficient differs from zero based on sample size. Since you expect a positive correlation between parental income and GPA, you use a one-sample, one-tailed t test. The t test gives you:

  • a t value of 3.08
  • a p value of 0.001

The final step of statistical analysis is interpreting your results.

Statistical significance

In hypothesis testing, statistical significance is the main criterion for forming conclusions. You compare your p value to a set significance level (usually 0.05) to decide whether your results are statistically significant or non-significant.

Statistically significant results are considered unlikely to have arisen solely due to chance. There is only a very low chance of such a result occurring if the null hypothesis is true in the population.

This means that you believe the meditation intervention, rather than random factors, directly caused the increase in test scores. Example: Interpret your results (correlational study) You compare your p value of 0.001 to your significance threshold of 0.05. With a p value under this threshold, you can reject the null hypothesis. This indicates a statistically significant correlation between parental income and GPA in male college students.

Note that correlation doesn’t always mean causation, because there are often many underlying factors contributing to a complex variable like GPA. Even if one variable is related to another, this may be because of a third variable influencing both of them, or indirect links between the two variables.

Effect size

A statistically significant result doesn’t necessarily mean that there are important real life applications or clinical outcomes for a finding.

In contrast, the effect size indicates the practical significance of your results. It’s important to report effect sizes along with your inferential statistics for a complete picture of your results. You should also report interval estimates of effect sizes if you’re writing an APA style paper .

With a Cohen’s d of 0.72, there’s medium to high practical significance to your finding that the meditation exercise improved test scores. Example: Effect size (correlational study) To determine the effect size of the correlation coefficient, you compare your Pearson’s r value to Cohen’s effect size criteria.

Decision errors

Type I and Type II errors are mistakes made in research conclusions. A Type I error means rejecting the null hypothesis when it’s actually true, while a Type II error means failing to reject the null hypothesis when it’s false.

You can aim to minimise the risk of these errors by selecting an optimal significance level and ensuring high power . However, there’s a trade-off between the two errors, so a fine balance is necessary.

Frequentist versus Bayesian statistics

Traditionally, frequentist statistics emphasises null hypothesis significance testing and always starts with the assumption of a true null hypothesis.

However, Bayesian statistics has grown in popularity as an alternative approach in the last few decades. In this approach, you use previous research to continually update your hypotheses based on your expectations and observations.

Bayes factor compares the relative strength of evidence for the null versus the alternative hypothesis rather than making a conclusion about rejecting the null hypothesis or not.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

The research methods you use depend on the type of data you need to answer your research question .

  • If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts, and meanings, use qualitative methods .
  • If you want to analyse a large amount of readily available data, use secondary data. If you want data specific to your purposes with control over how they are generated, collect primary data.
  • If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.

Statistical analysis is the main method for analyzing quantitative research data . It uses probabilities and models to test predictions about a population from sample data.

Is this article helpful?

Other students also liked, a quick guide to experimental design | 5 steps & examples, controlled experiments | methods & examples of control, between-subjects design | examples, pros & cons, more interesting articles.

  • Central Limit Theorem | Formula, Definition & Examples
  • Central Tendency | Understanding the Mean, Median & Mode
  • Correlation Coefficient | Types, Formulas & Examples
  • Descriptive Statistics | Definitions, Types, Examples
  • How to Calculate Standard Deviation (Guide) | Calculator & Examples
  • How to Calculate Variance | Calculator, Analysis & Examples
  • How to Find Degrees of Freedom | Definition & Formula
  • How to Find Interquartile Range (IQR) | Calculator & Examples
  • How to Find Outliers | Meaning, Formula & Examples
  • How to Find the Geometric Mean | Calculator & Formula
  • How to Find the Mean | Definition, Examples & Calculator
  • How to Find the Median | Definition, Examples & Calculator
  • How to Find the Range of a Data Set | Calculator & Formula
  • Inferential Statistics | An Easy Introduction & Examples
  • Levels of measurement: Nominal, ordinal, interval, ratio
  • Missing Data | Types, Explanation, & Imputation
  • Normal Distribution | Examples, Formulas, & Uses
  • Null and Alternative Hypotheses | Definitions & Examples
  • Poisson Distributions | Definition, Formula & Examples
  • Skewness | Definition, Examples & Formula
  • T-Distribution | What It Is and How To Use It (With Examples)
  • The Standard Normal Distribution | Calculator, Examples & Uses
  • Type I & Type II Errors | Differences, Examples, Visualizations
  • Understanding Confidence Intervals | Easy Examples & Formulas
  • Variability | Calculating Range, IQR, Variance, Standard Deviation
  • What is Effect Size and Why Does It Matter? (Examples)
  • What Is Interval Data? | Examples & Definition
  • What Is Nominal Data? | Examples & Definition
  • What Is Ordinal Data? | Examples & Definition
  • What Is Ratio Data? | Examples & Definition
  • What Is the Mode in Statistics? | Definition, Examples & Calculator

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • BMJ Open Access

Logo of bmjgroup

How to write statistical analysis section in medical research

Alok kumar dwivedi.

Department of Molecular and Translational Medicine, Division of Biostatistics and Epidemiology, Texas Tech University Health Sciences Center El Paso, El Paso, Texas, USA

Associated Data

jim-2022-002479supp001.pdf

Data sharing not applicable as no datasets generated and/or analyzed for this study.

Reporting of statistical analysis is essential in any clinical and translational research study. However, medical research studies sometimes report statistical analysis that is either inappropriate or insufficient to attest to the accuracy and validity of findings and conclusions. Published works involving inaccurate statistical analyses and insufficient reporting influence the conduct of future scientific studies, including meta-analyses and medical decisions. Although the biostatistical practice has been improved over the years due to the involvement of statistical reviewers and collaborators in research studies, there remain areas of improvement for transparent reporting of the statistical analysis section in a study. Evidence-based biostatistics practice throughout the research is useful for generating reliable data and translating meaningful data to meaningful interpretation and decisions in medical research. Most existing research reporting guidelines do not provide guidance for reporting methods in the statistical analysis section that helps in evaluating the quality of findings and data interpretation. In this report, we highlight the global and critical steps to be reported in the statistical analysis of grants and research articles. We provide clarity and the importance of understanding study objective types, data generation process, effect size use, evidence-based biostatistical methods use, and development of statistical models through several thematic frameworks. We also provide published examples of adherence or non-adherence to methodological standards related to each step in the statistical analysis and their implications. We believe the suggestions provided in this report can have far-reaching implications for education and strengthening the quality of statistical reporting and biostatistical practice in medical research.

Introduction

Biostatistics is the overall approach to how we realistically and feasibly execute a research idea to produce meaningful data and translate data to meaningful interpretation and decisions. In this era of evidence-based medicine and practice, basic biostatistical knowledge becomes essential for critically appraising research articles and implementing findings for better patient management, improving healthcare, and research planning. 1 However, it may not be sufficient for the proper execution and reporting of statistical analyses in studies. 2 3 Three things are required for statistical analyses, namely knowledge of the conceptual framework of variables, research design, and evidence-based applications of statistical analysis with statistical software. 4 5 The conceptual framework provides possible biological and clinical pathways between independent variables and outcomes with role specification of variables. The research design provides a protocol of study design and data generation process (DGP), whereas the evidence-based statistical analysis approach provides guidance for selecting and implementing approaches after evaluating data with the research design. 2 5 Ocaña-Riola 6 reported a substantial percentage of articles from high-impact medical journals contained errors in statistical analysis or data interpretation. These errors in statistical analyses and interpretation of results do not only impact the reliability of research findings but also influence the medical decision-making and planning and execution of other related studies. A survey of consulting biostatisticians in the USA reported that researchers frequently request biostatisticians for performing inappropriate statistical analyses and inappropriate reporting of data. 7 This implies that there is a need to enforce standardized reporting of the statistical analysis section in medical research which can also help rreviewers and investigators to improve the methodological standards of the study.

Biostatistical practice in medicine has been improving over the years due to continuous efforts in promoting awareness and involving expert services on biostatistics, epidemiology, and research design in clinical and translational research. 8–11 Despite these efforts, the quality of reporting of statistical analysis in research studies has often been suboptimal. 12 13 We noticed that none of the methods reporting documents were developed using evidence-based biostatistics (EBB) theory and practice. The EBB practice implies that the selection of statistical analysis methods for statistical analyses and the steps of results reporting and interpretation should be grounded based on the evidence generated in the scientific literature and according to the study objective type and design. 5 Previous works have not properly elucidated the importance of understanding EBB concepts and related reporting in the write-up of statistical analyses. As a result, reviewers sometimes ask to present data or execute analyses that do not match the study objective type. 14 We summarize the statistical analysis steps to be reported in the statistical analysis section based on review and thematic frameworks.

We identified articles describing statistical reporting problems in medicine using different search terms ( online supplemental table 1 ). Based on these studies, we prioritized commonly reported statistical errors in analytical strategies and developed essential components to be reported in the statistical analysis section of research grants and studies. We also clarified the purpose and the overall implication of reporting each step in statistical analyses through various examples.

Supplementary data

Although biostatistical inputs are critical for the entire research study ( online supplemental table 2 ), biostatistical consultations were mostly used for statistical analyses only 15 . Even though the conduct of statistical analysis mismatched with the study objective and DGP was identified as the major problem in articles submitted to high-impact medical journals. 16 In addition, multivariable analyses were often inappropriately conducted and reported in published studies. 17 18 In light of these statistical errors, we describe the reporting of the following components in the statistical analysis section of the study.

Step 1: specify study objective type and outcomes (overall approach)

The study objective type provides the role of important variables for a specified outcome in statistical analyses and the overall approach of the model building and model reporting steps in a study. In the statistical framework, the problems are classified into descriptive and inferential/analytical/confirmatory objectives. In the epidemiological framework, the analytical and prognostic problems are broadly classified into association, explanatory, and predictive objectives. 19 These study objectives ( figure 1 ) may be classified into six categories: (1) exploratory, (2) association, (3) causal, (4) intervention, (5) prediction and (6) clinical decision models in medical research. 20

An external file that holds a picture, illustration, etc.
Object name is jim-2022-002479f01.jpg

Comparative assessments of developing and reporting of study objective types and models. Association measures include odds ratio, risk ratio, or hazard ratio. AUC, area under the curve; C, confounder; CI, confidence interval; E, exposure; HbA1C: hemoglobin A1c; M, mediator; MFT, model fit test; MST, model specification test; PI, predictive interval; R 2 , coefficient of determinant; X, independent variable; Y, outcome.

The exploratory objective type is a specific type of determinant study and is commonly known as risk factors or correlates study in medical research. In an exploratory study, all covariates are considered equally important for the outcome of interest in the study. The goal of the exploratory study is to present the results of a model which gives higher accuracy after satisfying all model-related assumptions. In the association study, the investigator identifies predefined exposures of interest for the outcome, and variables other than exposures are also important for the interpretation and considered as covariates. The goal of an association study is to present the adjusted association of exposure with outcome. 20 In the causal objective study, the investigator is interested in determining the impact of exposure(s) on outcome using the conceptual framework. In this study objective, all variables should have a predefined role (exposures, confounders, mediators, covariates, and predictors) in a conceptual framework. A study with a causal objective is known as an explanatory or a confirmatory study in medical research. The goal is to present the direct or indirect effects of exposure(s) on an outcome after assessing the model’s fitness in the conceptual framework. 19 21 The objective of an interventional study is to determine the effect of an intervention on outcomes and is often known as randomized or non-randomized clinical trials in medical research. In the intervention objective model, all variables other than the intervention are treated as nuisance variables for primary analyses. The goal is to present the direct effect of the intervention on the outcomes by eliminating biases. 22–24 In the predictive study, the goal is to determine an optimum set of variables that can predict the outcome, particularly in external settings. The clinical decision models are a special case of prognostic models in which high dimensional data at various levels are used for risk stratification, classification, and prediction. In this model, all variables are considered input features. The goal is to present a decision tool that has high accuracy in training, testing, and validation data sets. 20 25 Biostatisticians or applied researchers should properly discuss the intention of the study objective type before proceeding with statistical analyses. In addition, it would be a good idea to prepare a conceptual model framework regardless of study objective type to understand study concepts.

A study 26 showed a favorable effect of the beta-blocker intervention on survival outcome in patients with advanced human epidermal growth factor receptor (HER2)-negative breast cancer without adjusting for all the potential confounding effects (age or menopausal status and Eastern Cooperative Oncology Performance Status) in primary analyses or validation analyses or using a propensity score-adjusted analysis, which is an EBB preferred method for analyzing non-randomized studies. 27 Similarly, another study had the goal of developing a predictive model for prediction of Alzheimer’s disease progression. 28 However, this study did not internally or externally validate the performance of the model as per the requirement of a predictive objective study. In another study, 29 investigators were interested in determining an association between metabolic syndrome and hepatitis C virus. However, the authors did not clearly specify the outcome in the analysis and produced conflicting associations with different analyses. 30 Thus, the outcome should be clearly specified as per the study objective type.

Step 2: specify effect size measure according to study design (interpretation and practical value)

The study design provides information on the selection of study participants and the process of data collection conditioned on either exposure or outcome ( figure 2 ). The appropriate use of effect size measure, tabular presentation of results, and the level of evidence are mostly determined by the study design. 31 32 In cohort or clinical trial study designs, the participants are selected based on exposure status and are followed up for the development of the outcome. These study designs can provide multiple outcomes, produce incidence or incidence density, and are preferred to be analyzed with risk ratio (RR) or hazards models. In a case–control study, the selection of participants is conditioned on outcome status. This type of study can have only one outcome and is preferred to be analyzed with an odds ratio (OR) model. In a cross-sectional study design, there is no selection restriction on outcomes or exposures. All data are collected simultaneously and can be analyzed with a prevalence ratio model, which is mathematically equivalent to the RR model. 33 The reporting of effect size measure also depends on the study objective type. For example, predictive models typically require reporting of regression coefficients or weight of variables in the model instead of association measures, which are required in other objective types. There are agreements and disagreements between OR and RR measures. Due to the constancy and symmetricity properties of OR, some researchers prefer to use OR in studies with common events. Similarly, the collapsibility and interpretability properties of RR make it more appealing to use in studies with common events. 34 To avoid variable practice and interpretation issues with OR, it is recommended to use RR models in all studies except for case–control and nested case–control studies, where OR approximates RR and thus OR models should be used. Otherwise, investigators may report sufficient data to compute any ratio measure. Biostatisticians should educate investigators on the proper interpretation of ratio measures in the light of study design and their reporting. 34 35

An external file that holds a picture, illustration, etc.
Object name is jim-2022-002479f02.jpg

Effect size according to study design.

Investigators sometimes either inappropriately label their study design 36 37 or report effect size measures not aligned with the study design, 38 39 leading to difficulty in results interpretation and evaluation of the level of evidence. The proper labeling of study design and the appropriate use of effect size measure have substantial implications for results interpretation, including the conduct of systematic review and meta-analysis. 40 A study 31 reviewed the frequency of reporting OR instead of RR in cohort studies and randomized clinical trials (RCTs) and found that one-third of the cohort studies used an OR model, whereas 5% of RCTs used an OR model. The majority of estimated ORs from these studies had a 20% or higher deviation from the corresponding RR.

Step 3: specify study hypothesis, reporting of p values, and interval estimates (interpretation and decision)

The clinical hypothesis provides information for evaluating formal claims specified in the study objectives, while the statistical hypothesis provides information about the population parameters/statistics being used to test the formal claims. The inference about the study hypothesis is typically measured by p value and confidence interval (CI). A smaller p value indicates that the data support against the null hypothesis. Since the p value is a conditional probability, it can never tell about the acceptance or rejection of the null hypothesis. Therefore, multiple alternative strategies of p values have been proposed to strengthen the credibility of conclusions. 41 42 Adaption of these alternative strategies is only needed in the explanatory objective studies. Although exact p values are recommended to be reported in research studies, p values do not provide any information about the effect size. Compared with p values, the CI provides a confidence range of the effect size that contains the true effect size if the study were repeated and can be used to determine whether the results are statistically significant or not. 43 Both p value and 95% CI provide complementary information and thus need to be specified in the statistical analysis section. 24 44

Researchers often test one or more comparisons or hypotheses. Accordingly, the side and the level of significance for considering results to be statistically significant may change. Furthermore, studies may include more than one primary outcome that requires an adjustment in the level of significance for multiplicity. All studies should provide the interval estimate of the effect size/regression coefficient in the primary analyses. Since the interpretation of data analysis depends on the study hypothesis, researchers are required to specify the level of significance along with the side (one-sided or two-sided) of the p value in the test for considering statistically significant results, adjustment of the level of significance due to multiple comparisons or multiplicity, and reporting of interval estimates of the effect size in the statistical analysis section. 45

A study 46 showed a significant effect of fluoxetine on relapse rates in obsessive-compulsive disorder based on a one-sided p value of 0.04. Clearly, there was no reason for using a one-sided p value as opposed to a two-sided p value. A review of the appropriate use of multiple test correction methods in multiarm clinical trials published in major medical journals in 2012 identified over 50% of the articles did not perform multiple-testing correction. 47 Similar to controlling a familywise error rate due to multiple comparisons, adjustment of the false discovery rate is also critical in studies involving multiple related outcomes. A review of RCTs for depression between 2007 and 2008 from six journals reported that only limited studies (5.8%) accounted for multiplicity in the analyses due to multiple outcomes. 48

Step 4: account for DGP in the statistical analysis (accuracy)

The study design also requires the specification of the selection of participants and outcome measurement processes in different design settings. We referred to this specific design feature as DGP. Understanding DGP helps in determining appropriate modeling of outcome distribution in statistical analyses and setting up model premises and units of analysis. 4 DGP ( figure 3 ) involves information on data generation and data measures, including the number of measurements after random selection, complex selection, consecutive selection, pragmatic selection, or systematic selection. Specifically, DGP depends on a sampling setting (participants are selected using survey sampling methods and one subject may represent multiple participants in the population), clustered setting (participants are clustered through a recruitment setting or hierarchical setting or multiple hospitals), pragmatic setting (participants are selected through mixed approaches), or systematic review setting (participants are selected from published studies). DGP also depends on the measurements of outcomes in an unpaired setting (measured on one occasion only in independent groups), paired setting (measured on more than one occasion or participants are matched on certain subject characteristics), or mixed setting (measured on more than one occasion but interested in comparing independent groups). It also involves information regarding outcomes or exposure generation processes using quantitative or categorical variables, quantitative values using labs or validated instruments, and self-reported or administered tests yielding a variety of data distributions, including individual distribution, mixed-type distribution, mixed distributions, and latent distributions. Due to different DGPs, study data may include messy or missing data, incomplete/partial measurements, time-varying measurements, surrogate measures, latent measures, imbalances, unknown confounders, instrument variables, correlated responses, various levels of clustering, qualitative data, or mixed data outcomes, competing events, individual and higher-level variables, etc. The performance of statistical analysis, appropriate estimation of standard errors of estimates and subsequently computation of p values, the generalizability of findings, and the graphical display of data rely on DGP. Accounting for DGP in the analyses requires proper communication between investigators and biostatisticians about each aspect of participant selection and data collection, including measurements, occasions of measurements, and instruments used in the research study.

An external file that holds a picture, illustration, etc.
Object name is jim-2022-002479f03.jpg

Common features of the data generation process.

A study 49 compared the intake of fresh fruit and komatsuna juice with the intake of commercial vegetable juice on metabolic parameters in middle-aged men using an RCT. The study was criticized for many reasons, but primarily for incorrect statistical methods not aligned with the study DGP. 50 Similarly, another study 51 highlighted that 80% of published studies using the Korean National Health and Nutrition Examination Survey did not incorporate survey sampling structure in statistical analyses, producing biased estimates and inappropriate findings. Likewise, another study 52 highlighted the need for maintaining methodological standards while analyzing data from the National Inpatient Sample. A systematic review 53 identified that over 50% of studies did not specify whether a paired t-test or an unpaired t-test was performed in statistical analysis in the top 25% of physiology journals, indicating poor transparency in reporting of statistical analysis as per the data type. Another study 54 also highlighted the data displaying errors not aligned with DGP. As per DGP, delay in treatment initiation of patients with cancer defined from the onset of symptom to treatment initiation should be analyzed into three components: patient/primary delay, secondary delay, and tertiary delay. 55 Similarly, the number of cancerous nodes should be analyzed with count data models. 56 However, several studies did not analyze such data according to DGP. 57 58

Step 5: apply EBB methods specific to study design features and DGP (efficiency and robustness)

The continuous growth in the development of robust statistical methods for dealing with a specific problem produced various methods to analyze specific data types. Since multiple methods are available for handling a specific problem yet with varying performances, heterogeneous practices among applied researchers have been noticed. Variable practices could also be due to a lack of consensus on statistical methods in literature, unawareness, and the unavailability of standardized statistical guidelines. 2 5 59 However, it becomes sometimes difficult to differentiate whether a specific method was used due to its robustness, lack of awareness, lack of accessibility of statistical software to apply an alternative appropriate method, intention to produce expected results, or ignorance of model diagnostics. To avoid heterogeneous practices, the selection of statistical methodology and their reporting at each stage of data analysis should be conducted using methods according to EBB practice. 5 Since it is hard for applied researchers to optimally select statistical methodology at each step, we encourage investigators to involve biostatisticians at the very early stage in basic, clinical, population, translational, and database research. We also appeal to biostatisticians to develop guidelines, checklists, and educational tools to promote the concept of EBB. As an effort, we developed the statistical analysis and methods in biomedical research (SAMBR) guidelines for applied researchers to use EBB methods for data analysis. 5 The EBB practice is essential for applying recent cutting-edge robust methodologies to yield accurate and unbiased results. The efficiency of statistical methodologies depends on the assumptions and DGP. Therefore, investigators may attempt to specify the choice of specific models in the primary analysis as per the EBB.

Although details of evidence-based preferred methods are provided in the SAMBR checklists for each study design/objective, 5 we have presented a simplified version of evidence-based preferred methods for common statistical analysis ( online supplemental table 3 ). Several examples are available in the literature where inefficient methods not according to EBB practice have been used. 31 57 60

Step 6: report variable selection method in the multivariable analysis according to study objective type (unbiased)

Multivariable analysis can be used for association, prediction or classification or risk stratification, adjustment, propensity score development, and effect size estimation. 61 Some biological, clinical, behavioral, and environmental factors may directly associate or influence the relationship between exposure and outcome. Therefore, almost all health studies require multivariable analyses for accurate and unbiased interpretations of findings ( figure 1 ). Analysts should develop an adjusted model if the sample size permits. It is a misconception that the analysis of RCT does not require adjusted analysis. Analysis of RCT may require adjustment for prognostic variables. 23 The foremost step in model building is the entry of variables after finalizing the appropriate parametric or non-parametric regression model. In the exploratory model building process due to no preference of exposures, a backward automated approach after including any variables that are significant at 25% in the unadjusted analysis can be used for variable selection. 62 63 In the association model, a manual selection of covariates based on the relevance of the variables should be included in a fully adjusted model. 63 In a causal model, clinically guided methods should be used for variable selection and their adjustments. 20 In a non-randomized interventional model, efforts should be made to eliminate confounding effects through propensity score methods and the final propensity score-adjusted multivariable model may adjust any prognostic variables, while a randomized study simply should adjust any prognostic variables. 27 Maintaining the event per variable (EVR) is important to avoid overfitting in any type of modeling; therefore, screening of variables may be required in some association and explanatory studies, which may be accomplished using a backward stepwise method that needs to be clarified in the statistical analyses. 10 In a predictive study, a model with an optimum set of variables producing the highest accuracy should be used. The optimum set of variables may be screened with the random forest method or bootstrap or machine learning methods. 64 65 Different methods of variable selection and adjustments may lead to different results. The screening process of variables and their adjustments in the final multivariable model should be clearly mentioned in the statistical analysis section.

A study 66 evaluating the effect of hydroxychloroquine (HDQ) showed unfavorable events (intubation or death) in patients who received HDQ compared with those who did not (hazard ratio (HR): 2.37, 95% CI 1.84 to 3.02) in an unadjusted analysis. However, the propensity score-adjusted analyses as appropriate with the interventional objective model showed no significant association between HDQ use and unfavorable events (HR: 1.04, 95% CI 0.82 to 1.32), which was also confirmed in multivariable and other propensity score-adjusted analyses. This study clearly suggests that results interpretation should be based on a multivariable analysis only in observational studies if feasible. A recent study 10 noted that approximately 6% of multivariable analyses based on either logistic or Cox regression used an inappropriate selection method of variables in medical research. This practice was more commonly noted in studies that did not involve an expert biostatistician. Another review 61 of 316 articles from high-impact Chinese medical journals revealed that 30.7% of articles did not report the selection of variables in multivariable models. Indeed, this inappropriate practice could have been identified more commonly if classified according to the study objective type. 18 In RCTs, it is uncommon to report an adjusted analysis based on prognostic variables, even though an adjusted analysis may produce an efficient estimate compared with an unadjusted analysis. A study assessing the effect of preemptive intervention on development outcomes showed a significant effect of an intervention on reducing autism spectrum disorder symptoms. 67 However, this study was criticized by Ware 68 for not reporting non-significant results in unadjusted analyses. If possible, unadjusted estimates should also be reported in any study, particularly in RCTs. 23 68

Step 7: provide evidence for exploring effect modifiers (applicability)

Any variable that modifies the effect of exposure on the outcome is called an effect modifier or modifier or an interacting variable. Exploring the effect modifiers in multivariable analyses helps in (1) determining the applicability/generalizability of findings in the overall or specific subpopulation, (2) generating ideas for new hypotheses, (3) explaining uninterpretable findings between unadjusted and adjusted analyses, (4) guiding to present combined or separate models for each specific subpopulation, and (5) explaining heterogeneity in treatment effect. Often, investigators present adjusted stratified results according to the presence or absence of an effect modifier. If the exposure interacts with multiple variables statistically or conceptually in the model, then the stratified findings (subgroup) according to each effect modifier may be presented. Otherwise, stratified analysis substantially reduces the power of the study due to the lower sample size in each stratum and may produce significant results by inflating type I error. 69 Therefore, a multivariable analysis involving an interaction term as opposed to a stratified analysis may be presented in the presence of an effect modifier. 70 Sometimes, a quantitative variable may emerge as a potential effect modifier for exposure and an outcome relationship. In such a situation, the quantitative variable should not be categorized unless a clinically meaningful threshold is not available in the study. In fact, the practice of categorizing quantitative variables should be avoided in the analysis unless a clinically meaningful cut-off is available or a hypothesis requires for it. 71 In an exploratory objective type, any possible interaction may be obtained in a study; however, the interpretation should be guided based on clinical implications. Similarly, some objective models may have more than one exposure or intervention and the association of each exposure according to the level of other exposure should be presented through adjusted analyses as suggested in the presence of interaction effects. 70

A review of 428 articles from MEDLINE on the quality of reporting from statistical analyses of three (linear, logistic, and Cox) commonly used regression models reported that only 18.5% of the published articles provided interaction analyses, 17 even though interaction analyses can provide a lot of useful information.

Step 8: assessment of assumptions, specifically the distribution of outcome, linearity, multicollinearity, sparsity, and overfitting (reliability)

The assessment and reporting of model diagnostics are important in assessing the efficiency, validity, and usefulness of the model. Model diagnostics include satisfying model-specific assumptions and the assessment of sparsity, linearity, distribution of outcome, multicollinearity, and overfitting. 61 72 Model-specific assumptions such as normal residuals, heteroscedasticity and independence of errors in linear regression, proportionality in Cox regression, proportionality odds assumption in ordinal logistic regression, and distribution fit in other types of continuous and count models are required. In addition, sparsity should also be examined prior to selecting an appropriate model. Sparsity indicates many zero observations in the data set. 73 In the presence of sparsity, the effect size is difficult to interpret. Except for machine learning models, most of the parametric and semiparametric models require a linear relationship between independent variables and a functional form of an outcome. Linearity should be assessed using a multivariable polynomial in all model objectives. 62 Similarly, the appropriate choice of the distribution of outcome is required for model building in all study objective models. Multicollinearity assessment is also useful in all objective models. Assessment of EVR in multivariable analysis can be used to avoid the overfitting issue of a multivariable model. 18

Some review studies highlighted that 73.8%–92% of the articles published in MEDLINE had not assessed the model diagnostics of the multivariable regression models. 17 61 72 Contrary to the monotonically, linearly increasing relationship between systolic blood pressure (SBP) and mortality established using the Framingham’s study, 74 Port et al 75 reported a non-linear relationship between SBP and all-cause mortality or cardiovascular deaths by reanalysis of the Framingham’s study data set. This study identified a different threshold for treating hypertension, indicating the role of linearity assessment in multivariable models. Although a non-Gaussian distribution model may be required for modeling patient delay outcome data in cancer, 55 a study analyzed patient delay data using an ordinary linear regression model. 57 An investigation of the development of predictive models and their reporting in medical journals identified that 53% of the articles had fewer EVR than the recommended EVR, indicating over half of the published articles may have an overfitting model. 18 Another study 76 attempted to identify the anthropometric variables associated with non-insulin-dependent diabetes and found that none of the anthropometric variables were significant after adjusting for waist circumference, age, and sex, indicating the presence of collinearity. A study reported detailed sparse data problems in published studies and potential solutions. 73

Step 9: report type of primary and sensitivity analyses (consistency)

Numerous considerations and assumptions are made throughout the research processes that require assessment, evaluation, and validation. Some assumptions, executions, and errors made at the beginning of the study data collection may not be fixable 13 ; however, additional information collected during the study and data processing, including data distribution obtained at the end of the study, may facilitate additional considerations that need to be verified in the statistical analyses. Consistencies in the research findings via modifications in the outcome or exposure definition, study population, accounting for missing data, model-related assumptions, variables and their forms, and accounting for adherence to protocol in the models can be evaluated and reported in research studies using sensitivity analyses. 77 The purpose and type of supporting analyses need to be specified clearly in the statistical analyses to differentiate the main findings from the supporting findings. Sensitivity analyses are different from secondary or interim or subgroup analyses. 78 Data analyses for secondary outcomes are often referred to as secondary analyses, while data analyses of an ongoing study are called interim analyses and data analyses according to groups based on patient characteristics are known as subgroup analyses.

Almost all studies require some form of sensitivity analysis to validate the findings under different conditions. However, it is often underutilized in medical journals. Only 18%–20.3% of studies reported some forms of sensitivity analyses. 77 78 A review of nutritional trials from high-quality journals reflected that 17% of the conclusions were reported inappropriately using findings from sensitivity analyses not based on the primary/main analyses. 77

Step 10: provide methods for summarizing, displaying, and interpreting data (transparency and usability)

Data presentation includes data summary, data display, and data from statistical model analyses. The primary purpose of the data summary is to understand the distribution of outcome status and other characteristics in the total sample and by primary exposure status or outcome status. Column-wise data presentation should be preferred according to exposure status in all study designs, while row-wise data presentation for the outcome should be preferred in all study designs except for a case–control study. 24 32 Summary statistics should be used to provide maximum information on data distribution aligned with DGP and variable type. The purpose of results presentation primarily from regression analyses or statistical models is to convey results interpretation and implications of findings. The results should be presented according to the study objective type. Accordingly, the reporting of unadjusted and adjusted associations of each factor with the outcome may be preferred in the determinant objective model, while unadjusted and adjusted effects of primary exposure on the outcome may be preferred in the explanatory objective model. In prognostic models, the final predictive models may be presented in such a way that users can use models to predict an outcome. In the exploratory objective model, a final multivariable model should be reported with R 2 or area under the curve (AUC). In the association and interventional models, the assessment of internal validation is critically important through various sensitivity and validation analyses. A model with better fit indices (in terms of R 2 or AUC, Akaike information criterion, Bayesian information criterion, fit index, root mean square error) should be finalized and reported in the causal model objective study. In the predictive objective type, the model performance in terms of R 2 or AUC in training and validation data sets needs to be reported ( figure 1 ). 20 21 There are multiple purposes of data display, including data distribution using bar diagram or histogram or frequency polygons or box plots, comparisons using cluster bar diagram or scatter dot plot or stacked bar diagram or Kaplan-Meier plot, correlation or model assessment using scatter plot or scatter matrix, clustering or pattern using heatmap or line plots, the effect of predictors with fitted models using marginsplot, and comparative evaluation of effect sizes from regression models using forest plot. Although the key purpose of data display is to highlight critical issues or findings in the study, data display should essentially follow DGP and variable types and should be user-friendly. 54 79 Data interpretation heavily relies on the effect size measure along with study design and specified hypotheses. Sometimes, variables require standardization for descriptive comparison of effect sizes among exposures or interpreting small effect size, or centralization for interpreting intercept or avoiding collinearity due to interaction terms, or transformation for achieving model-related assumptions. 80 Appropriate methods of data reporting and interpretation aligned with study design, study hypothesis, and effect size measure should be specified in the statistical analysis section of research studies.

Published articles from reputed journals inappropriately summarized a categorized variable with mean and range, 81 summarized a highly skewed variable with mean and standard deviation, 57 and treated a categorized variable as a continuous variable in regression analyses. 82 Similarly, numerous examples from published studies reporting inappropriate graphical display or inappropriate interpretation of data not aligned with DGP or variable types are illustrated in a book published by Bland and Peacock. 83 84 A study used qualitative data on MRI but inappropriately presented with a Box-Whisker plot. 81 Another study reported unusually high OR for an association between high breast parenchymal enhancement and breast cancer in both premenopausal and postmenopausal women. 85 This reporting makes suspicious findings and may include sparse data bias. 86 A poor tabular presentation without proper scaling or standardization of a variable, missing CI for some variables, missing unit and sample size, and inconsistent reporting of decimal places could be easily noticed in table 4 of a published study. 29 Some published predictive models 87 do not report intercept or baseline survival estimates to use their predictive models in clinical use. Although a direct comparison of effect sizes obtained from the same model may be avoided if the units are different among variables, 35 a study had an objective to compare effect sizes across variables but the authors performed comparisons without standardization of variables or using statistical tests. 88

A sample for writing statistical analysis section in medical journals/research studies

Our primary study objective type was to develop a (select from figure 1 ) model to assess the relationship of risk factors (list critical variables or exposures) with outcomes (specify type from continuous/discrete/count/binary/polytomous/time-to-event). To address this objective, we conducted a (select from figure 2 or any other) study design to test the hypotheses of (equality or superiority or non-inferiority or equivalence or futility) or develop prediction. Accordingly, the other variables were adjusted or considered as (specify role of variables from confounders, covariates, or predictors or independent variables) as reflected in the conceptual framework. In the unadjusted or preliminary analyses as per the (select from figure 3 or any other design features) DGP, (specify EBB preferred tests from online supplemental table 3 or any other appropriate tests) were used for (specify variables and types) in unadjusted analyses. According to the EBB practice for the outcome (specify type) and DGP of (select from figure 3 or any other), we used (select from online supplemental table 1 or specify a multivariable approach) as the primary model in the multivariable analysis. We used (select from figure 1 ) variable selection method in the multivariable analysis and explored the interaction effects between (specify variables). The model diagnostics including (list all applicable, including model-related assumptions, linearity, or multicollinearity or overfitting or distribution of outcome or sparsity) were also assessed using (specify appropriate methods) respectively. In such exploration, we identified (specify diagnostic issues if any) and therefore the multivariable models were developed using (specify potential methods used to handle diagnostic issues). The other outcomes were analyzed with (list names of multivariable approaches with respective outcomes). All the models used the same procedure (or specify from figure 1 ) for variable selection, exploration of interaction effects, and model diagnostics using (specify statistical approaches) depending on the statistical models. As per the study design, hypothesis, and multivariable analysis, the results were summarized with effect size (select as appropriate or from figure 2 ) along with (specify 95% CI or other interval estimates) and considered statistically significant using (specify the side of p value or alternatives) at (specify the level of significance) due to (provide reasons for choosing a significance level). We presented unadjusted and/or adjusted estimates of primary outcome according to (list primary exposures or variables). Additional analyses were conducted for (specific reasons from step 9) using (specify methods) to validate findings obtained in the primary analyses. The data were summarized with (list summary measures and appropriate graphs from step 10), whereas the final multivariable model performance was summarized with (fit indices if applicable from step 10). We also used (list graphs) as appropriate with DGP (specify from figure 3 ) to present the critical findings or highlight (specify data issues) using (list graphs/methods) in the study. The exposures or variables were used in (specify the form of the variables) and therefore the effect or association of (list exposures or variables) on outcome should be interpreted in terms of changes in (specify interpretation unit) exposures/variables. List all other additional analyses if performed (with full details of all models in a supplementary file along with statistical codes if possible).

Concluding remarks

We highlighted 10 essential steps to be reported in the statistical analysis section of any analytical study ( figure 4 ). Adherence to minimum reporting of the steps specified in this report may enforce investigators to understand concepts and approach biostatisticians timely to apply these concepts in their study to improve the overall quality of methodological standards in grant proposals and research studies. The order of reporting information in statistical analyses specified in this report is not mandatory; however, clear reporting of analytical steps applicable to the specific study type should be mentioned somewhere in the manuscript. Since the entire approach of statistical analyses is dependent on the study objective type and EBB practice, proper execution and reporting of statistical models can be taught to the next generation of statisticians by the study objective type in statistical education courses. In fact, some disciplines ( figure 5 ) are strictly aligned with specific study objective types. Bioinformaticians are oriented in studying determinant and prognostic models toward precision medicine, while epidemiologists are oriented in studying association and causal models, particularly in population-based observational and pragmatic settings. Data scientists are heavily involved in prediction and classification models in personalized medicine. A common thing across disciplines is using biostatistical principles and computation tools to address any research question. Sometimes, one discipline expert does the part of others. 89 We strongly recommend using a team science approach that includes an epidemiologist, biostatistician, data scientist, and bioinformatician depending on the study objectives and needs. Clear reporting of data analyses as per the study objective type should be encouraged among all researchers to minimize heterogeneous practices and improve scientific quality and outcomes. In addition, we also encourage investigators to strictly follow transparent reporting and quality assessment guidelines according to the study design ( https://www.equator-network.org/ ) to improve the overall quality of the study, accordingly STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) for observational studies, CONSORT (Consolidated Standards of Reporting Trials) for clinical trials, STARD (Standards for Reporting Diagnostic Accuracy Studies) for diagnostic studies, TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis OR Diagnosis) for prediction modeling, and ARRIVE (Animal Research: Reporting of In Vivo Experiments) for preclinical studies. The steps provided in this document for writing the statistical analysis section is essentially different from other guidance documents, including SAMBR. 5 SAMBR provides a guidance document for selecting evidence-based preferred methods of statistical analysis according to different study designs, while this report suggests the global reporting of essential information in the statistical analysis section according to study objective type. In this guidance report, our suggestion strictly pertains to the reporting of methods in the statistical analysis section and their implications on the interpretation of results. Our document does not provide guidance on the reporting of sample size or results or statistical analysis section for meta-analysis. The examples and reviews reported in this study may be used to emphasize the concepts and related implications in medical research.

An external file that holds a picture, illustration, etc.
Object name is jim-2022-002479f04.jpg

Summary of reporting steps, purpose, and evaluation measures in the statistical analysis section.

An external file that holds a picture, illustration, etc.
Object name is jim-2022-002479f05.jpg

Role of interrelated disciplines according to study objective type.

Acknowledgments

The author would like to thank the reviewers for their careful review and insightful suggestions.

Contributors: AKD developed the concept and design and wrote the manuscript.

Funding: The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests: AKD is a Journal of Investigative Medicine Editorial Board member. No other competing interests declared.

Provenance and peer review: Commissioned; externally peer reviewed.

Supplemental material: This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Data availability statement

Ethics statements, patient consent for publication.

Not required.

  • Foundations
  • Write Paper

Search form

  • Experiments
  • Anthropology
  • Self-Esteem
  • Social Anxiety
  • Statistics >

Statistical Treatment Of Data

Statistical treatment of data is essential in order to make use of the data in the right form. Raw data collection is only one aspect of any experiment; the organization of data is equally important so that appropriate conclusions can be drawn. This is what statistical treatment of data is all about.

This article is a part of the guide:

  • Statistics Tutorial
  • Branches of Statistics
  • Statistical Analysis
  • Discrete Variables

Browse Full Outline

  • 1 Statistics Tutorial
  • 2.1 What is Statistics?
  • 2.2 Learn Statistics
  • 3 Probability
  • 4 Branches of Statistics
  • 5 Descriptive Statistics
  • 6 Parameters
  • 7.1 Data Treatment
  • 7.2 Raw Data
  • 7.3 Outliers
  • 7.4 Data Output
  • 8 Statistical Analysis
  • 9 Measurement Scales
  • 10 Variables and Statistics
  • 11 Discrete Variables

There are many techniques involved in statistics that treat data in the required manner. Statistical treatment of data is essential in all experiments, whether social, scientific or any other form. Statistical treatment of data greatly depends on the kind of experiment and the desired result from the experiment.

For example, in a survey regarding the election of a Mayor, parameters like age, gender, occupation, etc. would be important in influencing the person's decision to vote for a particular candidate. Therefore the data needs to be treated in these reference frames.

An important aspect of statistical treatment of data is the handling of errors. All experiments invariably produce errors and noise. Both systematic and random errors need to be taken into consideration.

Depending on the type of experiment being performed, Type-I and Type-II errors also need to be handled. These are the cases of false positives and false negatives that are important to understand and eliminate in order to make sense from the result of the experiment.

example of statistical treatment in research paper

Treatment of Data and Distribution

Trying to classify data into commonly known patterns is a tremendous help and is intricately related to statistical treatment of data. This is because distributions such as the normal probability distribution occur very commonly in nature that they are the underlying distributions in most medical, social and physical experiments.

Therefore if a given sample size is known to be normally distributed, then the statistical treatment of data is made easy for the researcher as he would already have a lot of back up theory in this aspect. Care should always be taken, however, not to assume all data to be normally distributed, and should always be confirmed with appropriate testing.

Statistical treatment of data also involves describing the data. The best way to do this is through the measures of central tendencies like mean , median and mode . These help the researcher explain in short how the data are concentrated. Range, uncertainty and standard deviation help to understand the distribution of the data. Therefore two distributions with the same mean can have wildly different standard deviation, which shows how well the data points are concentrated around the mean.

Statistical treatment of data is an important aspect of all experimentation today and a thorough understanding is necessary to conduct the right experiments with the right inferences from the data obtained.

  • Psychology 101
  • Flags and Countries
  • Capitals and Countries

Siddharth Kalla (Apr 10, 2009). Statistical Treatment Of Data. Retrieved Jun 01, 2024 from Explorable.com: https://explorable.com/statistical-treatment-of-data

You Are Allowed To Copy The Text

The text in this article is licensed under the Creative Commons-License Attribution 4.0 International (CC BY 4.0) .

This means you're free to copy, share and adapt any parts (or all) of the text in the article, as long as you give appropriate credit and provide a link/reference to this page.

That is it. You don't need our permission to copy the article; just include a link/reference back to this page. You can use it freely (with some kind of link), and we're also okay with people reprinting in publications like books, blogs, newsletters, course-material, papers, wikipedia and presentations (with clear attribution).

example of statistical treatment in research paper

Want to stay up to date? Follow us!

Save this course for later.

Don't have time for it all now? No problem, save it as a course and come back to it later.

Footer bottom

  • Privacy Policy

example of statistical treatment in research paper

  • Subscribe to our RSS Feed
  • Like us on Facebook
  • Follow us on Twitter

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

Chapter 3 RESEARCH AND METHODOLOGY

Profile image of Rodelito Aramay

Related Papers

Jessa Arcina

Research design is the blue print of the procedures that enable the research to test hypothesis by reaching valid conclusions about the relationship between dependent and in depend variables. It is a plan structure and strategy of research prepared to obtain answer to research questions and to control variances. Before doing the various studies on the present thesis the researcher has fixed the topic and area because it provide the entire draft of the scheme of the research staring from writing the hypothesis their operational implications to the final analysis of the data. The structural of the research is more specific as it provides the outline, the scheme the paradigm o f the operation of the variables. It presents a series of guide posts to enable the researcher to progress in the right direction it gives an outline of the

example of statistical treatment in research paper

Scholarly Communication and the Publish or Perish Pressures of Academia A volume in the Advances in Knowledge Acquisition, Transfer, and Management (AKATM) Book Series

Dr. Naresh A . Babariya , Alka V. Gohel

The most important of research methodology in research study it is necessary for a researcher to design a methodology for the problem chosen and systematically solves the problem. Formulation of the research problem is to decide on a broad subject area on which has thorough knowledge and second important responsibility in research is to compare findings, it is literature review plays an extremely important role. The literature review is part of the research process and makes a valuable contribution to almost every operational step. A good research design provides information concerning with the selection of the sample population treatments and controls to be imposed and research work cannot be undertaken without sampling. Collecting the data and create data structure as organizing the data, analyzing the data help of different statistical method, summarizing the analysis, and using these results for making judgments, decisions and predictions. Keywords: Research Problem, Economical Plan, Developing Ideas, Research Strategy, Sampling Design, Theoretical Procedures, Experimental Studies, Numerical Schemes, Statistical Techniques.

Xochitl Ortiz

The authors felt during their several years of teaching experience that students fail to understand the books written on Research Methodology because generally they are written in technical language. Since this course is not taught before the Master’s degree, the students are not familiar with its vocabulary, methodology and course contents. The authors have made an attempt to write it in very non- technical language. It has been attempted that students who try to understand the research methodology through self-learning may also find it easy. The chapters are written with that approach. Even those students who intend to attain high level of knowledge of the research methodology in social sciences will find this book very helpful in understanding the basic concepts before they read any book on research methodology. This book is useful those students who offer the Research Methodology at Post Graduation and M.Phil. Level. This book is also very useful for Ph.D. Course Work examinations.

Chisomo Mgunda

MD Ashikur Rahman

Roxannie Ibot

Second Language Learning and Teaching

Magdalena Walenta

Lester Millara

the purpose of this paper is to know what are the use of methodology in a research paper.

Asmatullah Ghayasi

RELATED PAPERS

Annales du Groupe Numismatique du Comtat et de Provence, p. 14-21

Jean-Albert CHEVILLON , Jérôme Casta

2016 24th Signal Processing and Communication Application Conference (SIU)

Pedro Ornelas

Arif Emre GÜN

El tiempo de las vacas gordas. Las cifras negras del agroextractivismo ganadero exportador en Bolivia

Marielle Cauthin

Brain Network and Modulation (BNM)

Jimmy Y Zhong, PhD

Lynne Overman

Anais do XXI Simpósio Brasileiro de Engenharia de Software (SBES 2007)

Iranian journal of microbiology

Shaukath Khanum

European Journal of Palliative Care

Mayara Floss

Leonel Preto

Teguh Huget

Diego Limon

Ziadah Amaliyah

Biopolymers

Éric Gaudreau

Theoretical Computer Science

Farid Ablayev

Jurnal Keperawatan Indonesia

Tutik sri hariyati

Chris Allender

Revista Inclusiones

Arkadiy Sedykh

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

Comprehensive Understanding of DSM-5 Schizophrenia Criteria

This essay about the DSM-5 criteria for diagnosing schizophrenia explains the key symptoms and the diagnostic process. It covers positive symptoms like delusions and hallucinations, negative symptoms such as reduced emotional expression, and cognitive impairments like memory issues. For a diagnosis, two or more symptoms must be present for at least one month, with significant impairment lasting at least six months. The essay also highlights the exclusion of other conditions, the influence of cultural and developmental backgrounds, and the importance of family history. The structured criteria ensure accurate diagnosis and guide effective treatment and support strategies for individuals with schizophrenia.

How it works

Schizophrenia, a convoluted and frequently misconstrued psychological ailment, intricately alters an individual’s cognition, emotions, and conduct. The fifth iteration of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5), issued by the American Psychiatric Association, furnishes an exhaustive array of parameters for discerning schizophrenia. This discourse delves into these parameters, endeavoring to elucidate the diagnostic procedure for this perplexing affliction.

The DSM-5 delineates several pivotal indicators requisite for diagnosing schizophrenia. These indicators are classified into affirmative indicators, adverse indicators, and cognitive impairments.

Affirmative indicators encompass delusions, hallucinations, chaotic ideation, and grossly chaotic or atypical motor actions. Delusions denote erroneous convictions devoid of factual basis, whereas hallucinations entail perceptual experiences devoid of actual stimuli, such as auditory hallucinations. Chaotic ideation often manifests as incoherent discourse or difficulty in maintaining a rational train of thought. Atypical motor actions can vary from agitation to catatonia, a condition typified by immobility.

Adverse indicators pertain to a decline in normative functional capacity. This encompasses diminished emotive manifestation, reduced capacity to initiate and sustain undertakings, and an overall dearth of enjoyment or engagement in life. These indicators substantially contribute to the debilitation experienced by schizophrenia sufferers and are notably arduous to ameliorate. Though less conspicuous, cognitive indicators may encompass memory deficits, attention deficits, and executive function deficits, impeding routine activities and social interactions.

For a schizophrenia diagnosis, the DSM-5 necessitates the presence of two or more of these indicators for a substantial duration within a one-month span, with at least one of them being delusions, hallucinations, or chaotic speech. Furthermore, these indicators must engender significant debilitation in one or more major domains of functioning, such as employment, interpersonal relationships, or self-care, and must persist for a minimum of six months, with at least one month featuring active-phase indicators.

The DSM-5 also underscores the necessity of excluding alternative conditions that could elucidate the indicators. For instance, schizophrenia cannot be diagnosed if the indicators stem from substance abuse, a medical ailment, or another psychological disorder, such as schizoaffective disorder or bipolar disorder with psychotic attributes. This ensures diagnostic specificity to schizophrenia and precludes misdiagnosis.

Another salient facet of the DSM-5 criteria is the contextualization within the individual’s developmental and cultural backdrop. Schizophrenia indicators may be influenced by cultural norms and personal narrative, and what may be construed as a hallucination or delusion in one culture may be a widely embraced belief in another. Clinicians are urged to incorporate these variables to avert misdiagnosis.

Furthermore, the DSM-5 acknowledges the contribution of familial history and genetics to schizophrenia development. While not a diagnostic criterion per se, a family history of schizophrenia augments the likelihood of the disorder. This underscores the significance of a comprehensive clinical assessment encompassing an exhaustive personal and familial medical narrative.

In practice, schizophrenia diagnosis is a nuanced undertaking mandating scrupulous consideration of the individual’s indicators, history, and holistic functioning. It is not a diagnosis to be made perfunctorily, as it bears profound ramifications for treatment and prognosis. Effective diagnosis and treatment stratagems necessitate a collaborative approach involving psychiatrists, psychologists, social workers, and other healthcare practitioners, alongside the patient and their kin.

In summation, the DSM-5 criteria for schizophrenia furnish a structured framework for diagnosing this intricate psychological ailment. By delineating specific indicators and ensuring the exclusion of alternative etiologies, the DSM-5 facilitates clinicians in rendering precise and dependable diagnoses. Grasping these criteria is imperative for both healthcare practitioners and individuals grappling with schizophrenia, as it guides treatment and support endeavors aimed at ameliorating the quality of life for those afflicted with this formidable condition.

Recall, this discourse serves as a springboard for contemplation and further inquiry. For bespoke assistance and to ascertain compliance with all scholarly benchmarks, contemplate availing the services of professionals at EduBirdie.

owl

Cite this page

Comprehensive Understanding of DSM-5 Schizophrenia Criteria. (2024, May 28). Retrieved from https://papersowl.com/examples/comprehensive-understanding-of-dsm-5-schizophrenia-criteria/

"Comprehensive Understanding of DSM-5 Schizophrenia Criteria." PapersOwl.com , 28 May 2024, https://papersowl.com/examples/comprehensive-understanding-of-dsm-5-schizophrenia-criteria/

PapersOwl.com. (2024). Comprehensive Understanding of DSM-5 Schizophrenia Criteria . [Online]. Available at: https://papersowl.com/examples/comprehensive-understanding-of-dsm-5-schizophrenia-criteria/ [Accessed: 1 Jun. 2024]

"Comprehensive Understanding of DSM-5 Schizophrenia Criteria." PapersOwl.com, May 28, 2024. Accessed June 1, 2024. https://papersowl.com/examples/comprehensive-understanding-of-dsm-5-schizophrenia-criteria/

"Comprehensive Understanding of DSM-5 Schizophrenia Criteria," PapersOwl.com , 28-May-2024. [Online]. Available: https://papersowl.com/examples/comprehensive-understanding-of-dsm-5-schizophrenia-criteria/. [Accessed: 1-Jun-2024]

PapersOwl.com. (2024). Comprehensive Understanding of DSM-5 Schizophrenia Criteria . [Online]. Available at: https://papersowl.com/examples/comprehensive-understanding-of-dsm-5-schizophrenia-criteria/ [Accessed: 1-Jun-2024]

Don't let plagiarism ruin your grade

Hire a writer to get a unique paper crafted to your needs.

owl

Our writers will help you fix any mistakes and get an A+!

Please check your inbox.

You can order an original essay written according to your instructions.

Trusted by over 1 million students worldwide

1. Tell Us Your Requirements

2. Pick your perfect writer

3. Get Your Paper and Pay

Hi! I'm Amy, your personal assistant!

Don't know where to start? Give me your paper requirements and I connect you to an academic expert.

short deadlines

100% Plagiarism-Free

Certified writers

IMAGES

  1. Example Statistical Treatment Of Data In Thesis

    example of statistical treatment in research paper

  2. Statistical Treatment

    example of statistical treatment in research paper

  3. Statistical Report Writing Sample No.4. Introduction

    example of statistical treatment in research paper

  4. Statistical Treatment of Data

    example of statistical treatment in research paper

  5. Statistical Treatment

    example of statistical treatment in research paper

  6. Statistical Treatment

    example of statistical treatment in research paper

VIDEO

  1. Research Methodology and Statistical Analysis Paper Mcom 2023 #questionpaper #exam #ignou #ignoumec

  2. INDEPENDENT SAMPLES T-TEST USING SPSS

  3. Selecting the Appropriate Hypothesis Test [FIL]

  4. WHAT STATISTICAL TREATMENT WILL YOU BE USING IN THE STUDY?

  5. Statistical Treatment of Data : Frequency Distribution, Measures of Central Tendancy

  6. Common problems in experiments

COMMENTS

  1. Statistical Treatment of Data

    For a statistical treatment of data example, consider a medical study that is investigating the effect of a drug on the human population. As the drug can affect different people in different ways based on parameters such as gender, age and race, the researchers would want to group the data into different subgroups based on these parameters to ...

  2. Research Paper Statistical Treatment of Data: A Primer

    Key Takeaways for Research Paper Success. When incorporating statistical treatment into a research paper, keep these best practices in mind: Clearly state the research hypothesis and variables under examination. Select reliable and valid quantitative measures for assessment. Determine appropriate sample size to achieve statistical power

  3. (PDF) Chapter 3 Research Design and Methodology

    Research Design and Methodology. Chapter 3 consists of three parts: (1) Purpose of the. study and research design, (2) Methods, and (3) Statistical. Data analysis procedure. Part one, Purpose of ...

  4. The Beginner's Guide to Statistical Analysis

    This article is a practical introduction to statistical analysis for students and researchers. We'll walk you through the steps using two research examples. The first investigates a potential cause-and-effect relationship, while the second investigates a potential correlation between variables. Example: Causal research question.

  5. Introduction to Research Statistical Analysis: An Overview of the

    Introduction. Statistical analysis is necessary for any research project seeking to make quantitative conclusions. The following is a primer for research-based statistical analysis. It is intended to be a high-level overview of appropriate statistical testing, while not diving too deep into any specific methodology.

  6. Statistical Treatment

    1. Statistical Treatment in Data Analysis. The term "statistical treatment" is a catch all term which means to apply any statistical method to your data. Treatments are divided into two groups: descriptive statistics, which summarize your data as a graph or summary statistic and inferential statistics, which make predictions and test ...

  7. Basic statistical tools in research and data analysis

    Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if ...

  8. An Introduction to Statistics: Choosing the Correct Statistical Test

    Here, the statistical analysis involves only descriptive statistics. For example, Sridharan et al. aimed to analyze the clinical profile, species distribution, and susceptibility pattern of patients with invasive candidiasis. 3 They used descriptive statistics to express the characteristics of their study sample, including mean (and standard ...

  9. Inferential Statistics

    Inferential statistics have two main uses: making estimates about populations (for example, the mean SAT score of all 11th graders in the US). testing hypotheses to draw conclusions about populations (for example, the relationship between SAT scores and family income). Table of contents. Descriptive versus inferential statistics.

  10. Choosing the Right Statistical Test

    ANOVA and MANOVA tests are used when comparing the means of more than two groups (e.g., the average heights of children, teenagers, and adults). Predictor variable. Outcome variable. Research question example. Paired t-test. Categorical. 1 predictor. Quantitative. groups come from the same population.

  11. PDF Chapter 10. Experimental Design: Statistical Analysis of Data Purpose

    Now, if we divide the frequency with which a given mean was obtained by the total number of sample means (36), we obtain the probability of selecting that mean (last column in Table 10.5). Thus, eight different samples of n = 2 would yield a mean equal to 3.0. The probability of selecting that mean is 8/36 = 0.222.

  12. 13. Study design and choosing a statistical test

    Design. In many ways the design of a study is more important than the analysis. A badly designed study can never be retrieved, whereas a poorly analysed one can usually be reanalysed. (1) Consideration of design is also important because the design of a study will govern how the data are to be analysed. Most medical studies consider an input ...

  13. The Beginner's Guide to Statistical Analysis

    Table of contents. Step 1: Write your hypotheses and plan your research design. Step 2: Collect data from a sample. Step 3: Summarise your data with descriptive statistics. Step 4: Test hypotheses or make estimates with inferential statistics.

  14. PDF Anatomy of a Statistics Paper (with examples)

    As you read papers also notice the construction of the papers (learn from the good and bad examples). Abstract and Introduction { keys for getting readers engaged. Be gentle with your audience. Tell them your story. Writing is work { but ultimately rewarding! 13

  15. (PDF) An Overview of Statistical Data Analysis

    1 Introduction. Statistics is a set of methods used to analyze data. The statistic is present in all areas of science involving the. collection, handling and sorting of data, given the insight of ...

  16. How to write statistical analysis section in medical research

    Abstract. Reporting of statistical analysis is essential in any clinical and translational research study. However, medical research studies sometimes report statistical analysis that is either inappropriate or insufficient to attest to the accuracy and validity of findings and conclusions. Published works involving inaccurate statistical ...

  17. PDF Appendix(A( Statisticaltreatment of(data(

    The easiest method is first to calcu-‐late the standard deviation σ and to store the result in a cell. Let's say you enter the formula =STDEVP(B2:B6) in cell E5. Entering the formula =TINV(0.01,4)*E5/SQRT(5) in cell E6 returns the 99% confidence interval = 5.860814209.

  18. Descriptive Statistics

    Types of descriptive statistics. There are 3 main types of descriptive statistics: The distribution concerns the frequency of each value. The central tendency concerns the averages of the values. The variability or dispersion concerns how spread out the values are. You can apply these to assess only one variable at a time, in univariate ...

  19. Statistical Treatment of Data

    For example, in a survey regarding the election of a Mayor, parameters like age, gender, occupation, etc. would be important in influencing the person's decision to vote for a particular candidate. Therefore the data needs to be treated in these reference frames. An important aspect of statistical treatment of data is the handling of errors.

  20. (Pdf) Basic Statistical Techniques in Research

    Basic Statistical Techniques in Research 3. present, data and conditions; it is also possible to make prediction s. based on this information. It would be observed that descriptive. statistics ...

  21. Chapter 3

    This chapter reveals the methods of research to be employed by the researcher in conducting the study which includes the research design, population of the study, research instrument and its development establishing its validity and reliability, data gathering procedures, and the appropriate statistical treatment of data. Research Design

  22. Chapter 3 RESEARCH AND METHODOLOGY

    A good research design provides information concerning with the selection of the sample population treatments and controls to be imposed and research work cannot be undertaken without sampling. Collecting the data and create data structure as organizing the data, analyzing the data help of different statistical method, summarizing the analysis ...

  23. An Introduction to t Tests

    Revised on June 22, 2023. A t test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether a process or treatment actually has an effect on the population of interest, or whether two groups are different from one another. t test example.

  24. Comprehensive Understanding of DSM-5 Schizophrenia Criteria

    Essay Example: Schizophrenia, a convoluted and frequently misconstrued psychological ailment, intricately alters an individual's cognition, emotions, and conduct. The fifth iteration of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5), issued by the American Psychiatric Association