4: Distributions
- Page ID
- 48877
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)By the end of this chapter, you will be able to:
- Differentiate between categorical distributions and continuous distributions
- Understand the steps for reviewing journal articles and their statistical analysis
- Understand the importance of a normal distribution for continuous variables
- What to do if you have a skewed distribution
Key Terms
- Categorical Distributions
- Continuous Distributions
- Normal Distribution
- Skewed Distribution
Recap
Statistics is about studying variation. So, let’s see what kind of variation we have in each of our variables. This is part of the overall process of descriptive statistics. To recap, descriptive statistics simply describe the variation we have in our variables. Descriptive statistics is the first step when doing your statistical analyses. Inferential statistics is your second step. Inferential statistics refers to conducting statistical tests to determine if the variations in our variables predict or help us understand the variations in other variables. It is your classic statistical testing involving the common analysis: t-test, ANOVA, correlation, regression, and Chi-square.
Always do your descriptive statistics first, then your inferential statistics. The goal of descriptive statistics is to assess the quality of the variation in your variables. It is tempting to dive into your inferential statistics and start running statistical analyses. But just like in cooking, you want to take inventory of your ingredients, make sure you have the quality ingredients, make sure you have the right amounts, and organize how you want to use your ingredients before you start the cooking process. It is the same with statistics. We need to know the quality of our variation for each of our variables before we begin analyzing our variables. That is why we start with descriptive statistics before we conduct our inferential statistics.
There are many ways to proceed with descriptive statistics. For statisticians, the goal of descriptive statistics is to satisfy the assumptions of the statistical test. What statisticians mean when they state this goal is they need to examine the variables and their variations to determine if these assumptions are met. If the assumptions are not met, then the statistical test cannot be conducted.
What are these assumptions? How do you meet them? How do you know if the assumptions are met or not met? What do you do if the assumptions are met? What do you do if the assumptions are not met? If the statistical test cannot be conducted if the assumptions are not met, then what happens?
We will gather answers to these questions as we proceed. Suffice it to say, descriptive statistics is a process involving our critical thinking about how to evaluate the quality of the variations in our variables.
- 4.1: Distributions – A Picture of Variation
- This page emphasizes the importance of using descriptive statistics to understand variations in variables, advocating for visual representations like distribution plots for effective data analysis. By likening data examination to reviewing a picture before a date, it underscores the value of visual inspection in evaluating distributions, whether categorical or continuous.
- 4.2: Categorical Distributions
- This page discusses categorical distributions related to nominal variables, including demographic categories like gender and race. It explains that the data is represented by frequency counts and arbitrary codes, emphasizing the importance of distinguishing between codes and counts, as well as recognizing the arbitrary nature of category order. Unlike continuous distributions, categorical ones lack a specific shape.
- 4.3: Reviewing Journal Articles for Frequency Distributions for Categorical Variables
- This page discusses the importance of reviewing journal articles, focusing on statistical analyses and categorical distributions. It emphasizes using a checklist to identify the research question, assess sample demographics against the population, evaluate expected representation, and analyze anomalies. The conclusion should determine if the sample accurately reflects the population and if discrepancies impact the study’s conclusions.
- 4.4: Continuous Distributions
- This page discusses the distinctions between continuous and categorical variables, emphasizing that continuous variables (ordinal, interval, and ratio) possess meaningful numerical values that highlight relationships. It notes the importance of the distribution's shape, mentioning that independent observations of a variable often create a normal bell-shaped curve, which reflects the frequency of outcomes over time.
- 4.5: Evaluating the Quality of Frequency Distributions
- This page discusses the assessment of continuous distributions aiming for a normal distribution, characterized by a bell-shaped curve. While achieving a perfect curve is rare, a close resemblance to normality is often adequate. The evaluation process typically starts by creating a histogram, which visually represents frequency distribution, displaying variable ranges on the x-axis and participant scores on the y-axis.
- 4.6: Normal Distribution for Continuous Variables
- This page discusses the importance of statistics in understanding variation and significance. It emphasizes the role of normal distribution in making valid comparisons and assumptions for statistical analyses. Normal distribution supports the use of parametric tests like t-tests and ANOVA, while non-normal distributions may require non-parametric tests, highlighting the implications of distribution type on result validity.
- 4.7: Skewed Distribution- The Opposite of Normal Distribution
- This page explains skewed distributions, which occur when scores cluster at one end, resulting in a tail at the opposite end. There are two types: positive skew (tail on the right with higher scores) and negative skew (tail on the left with lower scores). Examples include student discipline data (positive skew) and self-esteem data (negative skew). Normal distributions are uncommon, leading to interest in measuring deviations from normalcy in distributions.
- 4.8: Evaluating a Continuous Distribution
- This page outlines methods for evaluating data distribution normality, emphasizing visual inspection, skewness assessment, and handling outliers. It distinguishes between types of variables, noting that Likert scales show minimal skewness while ratio variables can exhibit positive skew. The text highlights the importance of determining distribution types before statistical analyses and the need for more data if sample sizes are insufficient.
- 4.9: Discussion Questions
- This page discusses the evaluation of categorical versus continuous distributions, highlighting methods for analysis such as frequency counts and proportions for categorical data, and mean and variance for continuous data. It mentions specific techniques like the Chi-squared test and bar charts for categorical variables, and t-tests and boxplots for continuous variables.