Skip to main content
Statistics LibreTexts

Glossary

  • Page ID
    55028
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Chapter 1 Vocabulary

    Data
    Information collected about individuals, objects, events, or conditions. Each data point represents an observation.
    Observation
    A single row or record in a dataset; one instance of data collected from an individual or object.
    Dataset
    A structured collection of data, typically organized in rows (observations) and columns (variables).
    Variable
    A characteristic that is recorded about each observation and can vary from one observation to another.
    Statistics
    The science of collecting, analyzing, interpreting, and communicating data in the presence of variability.
    Variability
    The tendency for data values to differ from one another; natural differences observed in real-world data.
    Categorical Variable
    A variable that places observations into non-numeric groups or categories.
    Numerical Variable
    A variable that represents quantifiable amounts or measurements; values are numeric and meaningful.
    Nominal Variable
    A categorical variable with no natural order (e.g., color, gender, blood type).
    Ordinal Variable
    A categorical variable with a meaningful order but inconsistent spacing between values (e.g., satisfaction level, fitness difficulty).
    Discrete Variable
    A numerical variable based on counts; values are whole numbers (e.g., number of pets, number of bedrooms).
    Continuous Variable
    A numerical variable based on measurements; values can include decimals and fall along an interval (e.g., height, income, temperature).
    Population
    The entire group of individuals or observations that you're interested in studying or learning about.
    Sample
    A subset of the population that is actually studied or measured. Used to make inferences about the population.
    Sampling
    The process of selecting individuals or units from a population to create a sample.
    Representative Sample
    A sample that accurately reflects the characteristics of the population.
    Simple Random Sample (SRS)
    A sample in which every member of the population has an equal chance of being selected.
    Stratified Sample
    A sample created by dividing the population into subgroups (strata) and randomly sampling from each subgroup.
    Cluster Sample
    A sample created by dividing the population into groups (clusters), then randomly selecting entire clusters.
    Systematic Sample
    A sample chosen by selecting every nth individual from a list or process.
    Convenience Sample
    A non-random sample chosen based on ease of access, often leading to bias.
    Bias
    Systematic error in the way a sample is collected or data is measured, leading to results that do not accurately reflect the population.
    Sampling Bias
    A specific type of bias introduced when some members of the population are less likely to be included in the sample than others.
    Convenience Bias
    Bias introduced when individuals are selected simply because they are easy to reach.
    Nonresponse Bias
    Bias that occurs when individuals selected for a survey do not respond, and their views differ from those who did respond.
    Undercoverage Bias
    Bias caused by systematically missing part of the population during sampling.
    Voluntary Response Bias
    Bias that occurs when participants choose to respond — often those with strong opinions are overrepresented.
    Random Sampling
    A method of sampling where each member of the population has a known and typically equal chance of being selected.
    Statistical Question
    A question that can be answered using data and involves variability in the answers.
    Observational Study
    A study in which researchers observe and measure variables without affecting or influencing subjects.
    Survey
    A method of collecting information from individuals by asking structured questions.
    Experiment
    A study in which researchers deliberately apply a treatment and observe the outcomes, often using random assignment.
    Treatment Group
    The group that receives the experimental condition or change in an experiment.
    Control Group
    The group in an experiment that does not receive the treatment; used as a baseline comparison.
    Confounding Variable
    An outside factor that affects both the explanatory and response variable, potentially obscuring results.
    Random Assignment
    Assigning individuals to groups in an experiment by chance to ensure fairness and reduce bias.
    Blinding
    A method used in experiments to prevent subjects or researchers from knowing which treatment was given, to reduce bias.
    Statistical Process
    A step-by-step approach to answering questions with data: ask a question, collect data, organize/summarize, analyze, interpret, communicate.
    Data Collection Plan
    A plan that outlines what data will be gathered, how it will be obtained, and what variables will be included.
    Variable of Interest
    A specific characteristic or measurement that is central to a particular study or research question.
    Semester Project
    A long-term project where students apply the full statistical process to a real-world topic — in this case, housing affordability.

    Chapter 2 Vocabulary

    Measure of Center
    A single number that represents the “typical” or central value in a dataset. Includes the mean, median, and mode.
    Mean (Arithmetic Average)
    The sum of all values divided by the number of values. Sensitive to outliers. Symbol: \( \bar{x} \)
    Median
    The middle value when the data are sorted. If there’s an even number of values, it’s the average of the two middle ones. Resistant to outliers.
    Mode
    The value (or values) that occur most often in a dataset. There can be no mode, one mode, or multiple modes.
    Measure of Spread
    A statistic that describes how much the values vary around the center. Includes range, IQR, and standard deviation.
    Range
    The difference between the maximum and minimum values in a dataset. Sensitive to outliers.
    Interquartile Range (IQR)
    The range of the middle 50% of values. Calculated as \( Q3 - Q1 \); resistant to extreme values.
    Standard Deviation
    The average distance from each data point to the mean. Uses all values in the dataset and is sensitive to outliers.
    Variance
    The squared average distance from the mean. Used as an intermediate step when calculating standard deviation.
    Percentile
    A value that separates a certain percent of the data. For example, the 80th percentile is the value that 80% of the dataset lies below.
    Quartiles
    Special percentiles that divide the data into quarters:
    • Q1: 25th percentile
    • Q2: 50th percentile / median
    • Q3: 75th percentile
    Five-Number Summary
    A descriptive summary of a dataset using five values: minimum, Q1, median, Q3, and maximum.
    Outlier
    A data point that is unusually far from the others. Often flagged if it lies beyond 1.5 × IQR below Q1 or above Q3.
    Bessel’s Correction
    The practice of dividing by \( n - 1 \) instead of \( n \) when calculating the sample variance or standard deviation. Helps make estimates less biased when working with sample data.
    Box-and-Whisker Plot (Boxplot)
    A standardized graph that displays the five-number summary. Visually shows a dataset’s spread, skew, and potential outliers.
    Whiskers
    Lines on a boxplot that extend from Q1 to the minimum and from Q3 to the maximum — unless outliers are present, in which case they stop at the nearest non-outlier.
    Skew (in a boxplot)
    Asymmetry in the distribution. A longer whisker on one side suggests skew in that direction.

    Chapter 3 Vocabulary

    Bar Chart
    A graph that uses rectangular bars to represent the frequencies or proportions of different categories. The height (or length) of each bar corresponds to its value. Often used with categorical data. Bars do not touch.
    Pie Chart
    A circular chart divided into slices to represent parts of a whole. Each slice shows the proportion of a category relative to the total dataset. Best used with a small number of categories.
    Histogram
    A graph that displays the distribution of quantitative data by grouping values into equal-width intervals (bins). The height of each bar represents the frequency within each interval. Bars touch to show continuity.
    Bin
    An interval used when grouping numerical data for a histogram. Each bin includes a fixed range of values (e.g., 60–70), and all bins should be of equal width unless otherwise noted.
    Dot Plot
    A simple graph where each data point is shown as a dot placed above a number line. Repeated values are stacked vertically. Ideal for small datasets.
    Stem-and-Leaf Plot
    A tabular display where quantitative values are split into stems (typically the leading digits) and leaves (the final digit). This chart preserves the original dataset while showing distribution shape.
    Frequency
    The number of times a specific value or interval appears in a dataset. Frequencies are often shown in tables or bar charts to summarize distributions.
    Relative Frequency
    The proportion or percentage of total values that fall in a particular category or bin. Calculated as: frequency ÷ total sample size.
    Distribution
    The way data values are spread or clustered across a range. A distribution can be described by its center, spread, shape, and presence of outliers.
    Symmetric Distribution
    A distribution where the left and right sides are mirror images of each other. Often has a single peak near the center.
    Skewed Distribution
    Asymmetry in a dataset's shape. If the right tail (larger values) is longer, it’s right-skewed; if the left tail (smaller values) is longer, it’s left-skewed.
    Uniform Distribution
    A distribution where all values (or ranges) are approximately equally frequent. Appears flat with no clear peak.
    Bimodal Distribution
    A distribution with two clear peaks (modes) in different parts of the data. May suggest the combination of two separate groups.
    Time Plot
    A graph that shows how a numeric variable changes over time. Time is placed on the x-axis, and values are plotted on the y-axis, often with points connected by a line.
    Scatter Plot
    A graph that uses points to show the relationship between two numerical variables. Used in time plots (with time on the x-axis) or to display general associations between variables. Lines may or may not be included.
    Axis (Plural: Axes)
    The reference lines in a graph. The x-axis runs horizontally (typically showing categories or time), and the y-axis runs vertically (often showing numerical values or frequency).

    Chapter 4 Vocabulary

    Probability
    The long-run relative frequency of an event. Describes the proportion of times an event would occur if a random experiment were repeated many times under the same conditions. Values range from 0 to 1.
    Relative Frequency
    The proportion of times an event occurs out of the total number of trials in a simulation or experiment. Approximates probability in practice.
    Experiment
    An activity or process with an observable result. In probability, experiments are repeatable under identical conditions but have uncertain outcomes.
    Sample Space
    The set of all possible outcomes of a random experiment. Usually denoted by the symbol \( S \) or \( \Omega \).
    Event
    A set of outcomes (subset of the sample space) that satisfies a particular condition. Events can consist of one or more outcomes.
    Complement
    The set of all outcomes in the sample space that are not part of a given event. Denoted \( A^c \). Represented as "not A".
    Mutually Exclusive Events
    Two events that cannot occur at the same time. The intersection of mutually exclusive events is the empty set; \( P(A \cap B) = 0 \).
    Addition Rule
    A formula to find the probability that at least one of two events occurs: \[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \] When A and B are mutually exclusive: \[ P(A \cup B) = P(A) + P(B) \]
    Conditional Probability
    The probability that an event A occurs, given that another event B has occurred. Notated as \( P(A \mid B) \) and calculated as: \[ P(A \mid B) = \frac{P(A \cap B)}{P(B)} \quad \text{provided } P(B) > 0 \]
    Independent Events
    Two events A and B are independent if knowing that one occurred does not affect the probability of the other. Mathematically: \[ P(A \cap B) = P(A) \cdot P(B) \quad \text{or equivalently} \quad P(A \mid B) = P(A) \]
    Dependent Events
    Events for which the outcome or occurrence of one affects the probability of the other. For dependent events, \( P(A \mid B) \neq P(A) \).
    Counting Rule (Multiplication Rule of Counting)
    If one task can be done in \( m \) ways and another in \( n \) ways, then the total number of ways to complete both tasks is \( m \times n \).
    Factorial
    The product of all positive integers less than or equal to a given number \( n \). Denoted \( n! \). For example, \( 4! = 4 \times 3 \times 2 \times 1 = 24 \).
    Permutation
    An arrangement of items where order matters. The number of ways to arrange \( r \) items from \( n \) is: \[ P(n, r) = \frac{n!}{(n - r)!} \]
    Combination
    A selection of items where order does not matter. The number of combinations of \( r \) items chosen from \( n \) is: \[ C(n, r) = \binom{n}{r} = \frac{n!}{r!(n - r)!} \]

    Chapter 5 Vocabulary

    Random Variable
    A variable that takes numerical values determined by the outcome of a random process. Each value is associated with a probability.
    Discrete Random Variable
    A random variable with a finite or countable set of possible outcomes. Example: number of heads in 3 coin flips.
    Continuous Random Variable
    A random variable with an infinite number of possible values within a given interval. Example: time to complete a quiz.
    Probability Distribution
    A table, graph, or formula that assigns probabilities to each possible value of a random variable.
    Bernoulli Trial
    An experiment with only two possible outcomes: success or failure.
    Bernoulli Distribution
    A discrete probability distribution for a single Bernoulli trial. Only two values possible: 0 and 1.
    Binomial Distribution
    A probability distribution that models the number of successes in \( n \) independent Bernoulli trials with probability \( p \).
    Binomial Coefficient
    The number of ways to choose \( x \) successes out of \( n \) trials: \( \binom{n}{x} \).
    Probability Mass Function (PMF)
    A function that gives the probability that a discrete random variable is exactly equal to some value.
    Expected Value
    The theoretical mean of a random variable, computed as \( E(X) = \sum x \cdot P(x) \).
    Variance
    The average of the squared differences from the mean: \( \sigma^2 = \sum (x - \mu)^2 P(x) \).
    Standard Deviation
    The square root of the variance. It measures the typical distance between the data values and the mean.
    Net Gain
    The outcome of a situation after subtracting the initial cost or investment. Often used in lottery and game models.
    Normal Distribution
    A continuous, symmetric, bell-shaped distribution commonly found in natural and social processes. Defined by its mean and standard deviation.
    Density Curve
    A curve that represents the probability distribution of a continuous random variable. The area under the curve equals 1.
    Symmetric Distribution
    A distribution where the left and right halves are mirror images of each other.
    Standard Normal Distribution
    A normal distribution with mean \( \mu = 0 \) and standard deviation \( \sigma = 1 \).
    Z-score
    A standardized value that indicates how many standard deviations a data point is from the mean: \( z = \frac{x - \mu}{\sigma} \).
    Empirical Rule (68–95–99.7 Rule)
    In a normal distribution: ~68% of data fall within 1 standard deviation, ~95% within 2, and ~99.7% within 3.
    Percentile
    The percentage of data values below a given value. Example: 90th percentile means 90% scored below that point.
    Z-table (Standard Normal Table)
    A table showing the cumulative probability up to a given z-score in the standard normal distribution.
    Area under the curve
    In a normal distribution, the area under the curve corresponds to probability. The total area is 1.
    Left Tail
    The area under the normal curve to the left of a z-score. Represents \( P(Z < z) \).
    Right Tail
    The area to the right of a z-score: \( P(Z > z) = 1 - P(Z < z) \).
    Standardization
    The process of converting raw data into z-scores so values can be compared across distributions with different scales.
    Between Probability
    The probability that a random variable falls between two values. Calculated as the difference between cumulative areas.

    Chapter 6 Vocabulary

    Population
    The entire group we are interested in studying or making conclusions about.
    Sample
    A subset of individuals taken from the population, used to estimate population characteristics.
    Statistic
    A number calculated from a sample (such as a sample mean or proportion) used to estimate a population parameter.
    Sampling Distribution
    The distribution of a sample statistic (such as a mean or proportion) based on repeated random samples of the same size. It shows how the statistic would vary across repeated samples.
    Sample Mean (\( \bar{x} \))
    The average value of all data points in a sample. Used to estimate the population mean.
    Sample Proportion (\( \hat{p} \))
    The proportion of observations in a sample that meet a certain condition. Used to estimate the population proportion.
    Standard Error (SE)
    The standard deviation of a sampling distribution. It measures how much a sample statistic typically varies from sample to sample.
    Standard Error of the Mean
    The standard deviation of the sampling distribution of the sample mean. Given by \( \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \).
    Standard Error of the Proportion
    The standard deviation of the sampling distribution of a sample proportion. Given by \( \sqrt{\frac{p(1 - p)}{n}} \).
    Central Limit Theorem (CLT)
    A theorem stating that the sampling distribution of the sample mean (or proportion) becomes approximately normal as the sample size increases, regardless of population distribution shape.
    Normal Approximation to the Binomial
    A method of using the normal distribution to approximate binomial probabilities when sample size is large and both \( np \) and \( n(1 - p) \) are at least 10.
    Continuity Correction
    An adjustment used when approximating a discrete binomial distribution with a continuous normal distribution. We add or subtract 0.5 to better align with the binomial probability being approximated.

    Chapter 7 Vocabulary

    Point Estimate
    A single value calculated from a sample that serves as the best guess for a population parameter. Example: the sample mean \( \bar{x} \) is a point estimate of the population mean \( \mu \).
    Interval Estimate
    A range of values around a sample statistic that likely contains the population parameter. The most common type is a confidence interval.
    Confidence Interval
    An interval estimate built from a sample statistic plus and minus a margin of error. A 95% confidence interval means the method succeeds in capturing the population parameter about 95% of the time.
    Confidence Level
    The statistical confidence that the interval contains the population parameter. Common levels include 90%, 95%, and 99%.
    Sample Mean (\( \bar{x} \))
    The average value of a sample; used as a point estimate for the population mean.
    Sample Proportion (\( \hat{p} \))
    The proportion of individuals in a sample with a specific characteristic; used as a point estimate for the population proportion.
    Standard Error (SE)
    The estimated standard deviation of a sampling distribution. It measures how much the sample statistic varies from sample to sample.
    Critical Value
    The number of standard errors to extend from the sample statistic when building a confidence interval. Based on the desired confidence level (e.g., \( z^* = 1.96 \) for 95%).
    Margin of Error
    The amount added and subtracted to the point estimate to create the confidence interval. It reflects uncertainty due to sampling variability.
    Interpretation of a Confidence Interval
    A proper interpretation indicates the range within which the population parameter lies and is linked to a confidence level — not the probability that the true value is in that specific interval.

    Chapter 8 Vocabulary

    Hypothesis
    A testable statement about a population parameter. Hypotheses are used to structure statistical tests and are evaluated based on sample data.
    Null Hypothesis (\( H_0 \))
    The default or starting assumption in a hypothesis test — usually a statement of “no effect,” “no change,” or “no difference.” We test this claim using data.
    Alternative Hypothesis (\( H_A \))
    The rival claim to the null. It represents the outcome we're trying to find evidence for (e.g., an increase, decrease, or difference).
    Test Statistic
    A calculated value (e.g., z, t) that measures how far the sample statistic is from the null hypothesis value. It's used to determine how extreme the result is.
    z-test for a mean
    A hypothesis test used when the population standard deviation is known. Assumes normal distribution. The test statistic is:
    \( Z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} \)
    Welch’s t-test
    A two-sample test for comparing means when variances are unequal and sample sizes may differ. It uses sample standard deviations and calculates degrees of freedom based on variability.
    z-test for a proportion
    Used to test a hypothesis about a population proportion. The test statistic is:
    \( Z = \frac{\hat{p} - p_0}{\sqrt{p_0(1 - p_0)/n}} \)
    p-value
    The probability of observing a result as extreme or more extreme than your observed sample statistic, assuming the null hypothesis is true. A small p-value suggests the data are inconsistent with \( H_0 \).
    Significance Level (\( \alpha \))
    The threshold we set for how much risk of a Type I error we are willing to accept. Common values are 0.05, 0.01, and 0.10.
    Statistical Significance
    If the p-value is less than or equal to \( \alpha \), we say the result is statistically significant — meaning it is unlikely to have occurred by chance under the null.
    Practical Significance
    A result that is meaningful or impactful in the real world. Even small statistical differences may not matter practically, depending on the context.
    Type I Error
    Rejecting the null hypothesis when it is actually true. This is a false positive. The probability of this is controlled by \( \alpha \).
    Type II Error
    Failing to reject the null hypothesis when the alternative is actually true. This is a false negative.
    Power of a Test
    The probability that a test will correctly reject a false null hypothesis. It is equal to 1 minus the probability of a Type II Error.
    One-tailed Test
    A hypothesis test in which the alternative hypothesis is directional (e.g., \( p > p_0 \) or \( \mu < \mu_0 \)) — we are only interested in one direction of difference.
    Two-tailed Test
    A hypothesis test in which the alternative hypothesis tests for any difference from the null (e.g., \( \mu \ne \mu_0 \)). Both tails of the distribution are considered.
    Decision Rule
    The rule that guides whether to reject or fail to reject the null hypothesis, usually based on comparing the p-value to \( \alpha \).
    p-hacking
    A problematic practice in which researchers run many statistical analyses until they obtain a significant (p < 0.05) result, often misrepresenting the evidence.
    Effect Size
    A measure of how big a difference or change is — separate from statistical significance. A small p-value may accompany a small effect size, or vice versa.
    Standard Error
    The estimated standard deviation of a sampling distribution. Used in many test statistics, including for means and proportions.

    Chapter 9 Vocabulary

    Bivariate Data
    Data that includes two quantitative variables measured for each individual in a study. Often used to examine relationships between variables.
    Scatterplot
    A graph that displays pairs of numerical data as points on a coordinate plane. Each point represents an individual’s value for two variables (x and y).
    Direction of Association
    A description of how one variable tends to change as the other changes. Associations can be positive, negative, or have no clear direction.
    Positive Association
    When values of one variable tend to increase as the values of the other variable also increase. The pattern typically slopes upward from left to right.
    Negative Association
    When values of one variable increase as the other decreases. The pattern typically slopes downward from left to right.
    Correlation (r)
    A numerical measure between –1 and 1 that describes the strength and direction of a linear relationship between two variables. A value close to ±1 indicates a strong linear relationship.
    Correlation Coefficient (r)
    Another name for the correlation measure. It quantifies linear association but does not imply causation.
    Least-Squares Regression Line (LSRL)
    The line that best fits a scatterplot and minimizes the sum of the squared residuals. It models the linear relationship and can be used for prediction.
    Regression Equation
    The formula of the form \( \hat{y} = a + bx \) where \( \hat{y} \) is the predicted value, \( a \) is the intercept, and \( b \) is the slope.
    Slope
    In a regression line, the slope tells how much the predicted value of \( y \) changes for each one-unit increase in \( x \). It represents the rate of change.
    Intercept
    The y-value predicted when \( x = 0 \) in a regression model. It may or may not be meaningful depending on the context and data range.
    Residual
    The difference between an actual data value and the value predicted by a regression model. Residual = actual – predicted.
    Interpolation
    Using a regression model to predict a value within the range of the observed data.
    Extrapolation
    Using a regression model to predict a value outside the range of the observed data. It can be risky because the relationship may not hold beyond the data range.
    Causation
    The idea that one variable has a direct effect on another. Causation implies a change in one variable produces a change in the other. It requires evidence beyond just statistical association.
    Correlation Does Not Imply Causation
    A reminder that even strong relationships between variables do not prove one causes the other. Alternate explanations or lurking variables may be involved.
    • Was this article helpful?