Mostly Harmless Statistics Formula Packet
( \newcommand{\kernel}{\mathrm{null}\,}\)
Chapter 3 Formulas
Sample Mean: ˉx=∑xnle | Population Mean: μ=∑xN |
Weighted Mean: ˉx=∑(xw)∑w | Range = Max−Min |
Sample Standard Deviation: s=√∑(x−ˉx)2n−1 | Population Standard Deviation = σ |
Sample Variance: s2=∑(x−ˉx)2n−1 | Population Variance = σ2 |
Coefficient of Variation: CVar=(sˉx⋅100) | Z-Score: z=x−ˉxs |
Percentile Index: i=(n+1)⋅p100 | Interquartile Range: IQR=Q3−Q1 |
Empirical Rule: z=1,2,3⇒68 | Outlier Lower Limit: Q1−(1.5⋅IQR) |
Chebyshev’s Inequality: ((1−1(z)2)⋅100r) | Outlier Upper Limit: Q3+(1.5⋅IQR) |
TI-84: Enter the data in a list and then press [STAT]. Use cursor keys to highlight CALC. Press 1 or [ENTER] to select 1:1-Var Stats. Press [2nd], then press the number key corresponding to your data list. Press [Enter] to calculate the statistics. Note: the calculator always defaults to L1 if you do not specify a data list.
sx is the sample standard deviation. You can arrow down and find more statistics. Use the min and max to calculate the range by hand. To find the variance simply square the standard deviation.
Chapter 4 Formulas
Complement Rules: P(A)+P(AC)=1P(A)=1−P(AC)P(AC)=1−P(A) | Mutually Exclusive Events: P(A∪B)=0 |
Union Rule: P(A∪B)=P(A)+P(B)–P(A∩B) | Independent Events: P(A∪B)=P(A)⋅P(B) |
Intersection Rule: P(A∩B)=P(A)⋅P(A|B) | Conditional Probability Rule: P(A|B)=P(A∩B)P(B) |
Fundamental Counting Rule: m1⋅m2⋯mn | Factorial Rule: n!=n⋅(n−1)⋅(n−2)⋯3⋅2⋅1 |
Combination Rule: nCr=n!(r!(n−r)!) | Permutation Rule: nPr=n!(n−r)! |
Chapter 5 Formulas
Discrete Distribution Table: 0≤P(xi)≤1∑P(xi)=1 |
Discrete Distribution Mean: μ=∑(xi⋅P(xi)) |
Discrete Distribution Variance: σ2=∑(x2i⋅P(xi))−μ2 |
Discrete Distribution Standard Deviation: σ=√σ2 |
Geometric Distribution: P(X=x)=p⋅qx−1,x=1,2,3,… |
Geometric Distribution Mean: μ=1p Variance: σ2=1−pp2 Standard Deviation: σ=√1−pp2 |
Binomial Distribution: P(X=x)=nCxpx⋅q(n−x),x=0,1,2,…,n |
Binomial Distribution Mean: μ=n⋅p Variance: sigma2=n⋅p⋅q Standard Deviation: σ=√n⋅p⋅q |
Hypergeometric Distribution: P(X=x)=aCx⋅bCn−xNCn |
p=P(success)p=P(failure)=1−p n=sample sizeN=population size |
Unit Change for Poisson Distribution: New μ=old μ(new unitsold units) |
Poisson Distribution: P(X=x)=e−μμxx! |
P(X=x) | P(X≤x) | P(X≥x) |
---|---|---|
Is the same as | Is less than or equal to | Is greater than or equal to |
Is equal to | Is at most | Is at least |
Is exactly the same as | Is not greater than | Is not less than |
Has not changed from | Within | Is more than or equal to |
Excel =binom.dist(x,n,p,0) =HYPGEOM.DIST(x,n,a,N,0) =POISSON.DIST(x,μ,0) |
Excel =binom.dist(x,n,p,1) =HYPGEOM.DIST(x,n,a,N,1) =POISSON.DIST(x,μ,1) |
Excel =1−binom.dist(x−1,n,p,1) =1−HYPGEOM.DIST(x−1,n,a,N,1) =1−POISSON.DIST(x−1,μ,1) |
TI Calculator geometpdf(p,x) binompdf(n,p,x) poissonpdf(μ,x) |
TI Calculator binomcdf(n,p,x) poissoncdf(μ,x) |
TI Calculator 1−binomcdf(n,p,x−1) 1−poissoncdf(μ,x−1) |
P(X>x) | P(X<x) | |
---|---|---|
How do you tell them apart?
|
x)\)">More than | Less than |
x)\)">Greater than | Below | |
x)\)">Above | Lower than | |
x)\)">Higher than | Shorter than | |
x)\)">Longer than | Smaller than | |
x)\)">Bigger than | Decreased | |
x)\)">Increased | Reduced | |
x)\)"> | ||
x)\)">Excel =1−binom.dist(x,n,p,1) =1−HYPGEOM.DIST(x,n,a,N,1) =1−POISSON.DIST(x,μ,1) |
Excel =binom.dist(x−1,n,p,1) =HYPGEOM.DIST(x−1,n,a,N,1) =POISSON.DIST(x−1,μ,1) |
|
x)\)">TI Calculator 1−binomcdf(n,p,x) 1−poissoncdf(μ,x) |
TI Calculator binomcdf(n,p,x−1) poissoncdf(μ,x−1) |
Chapter 6 Formulas
Uniform Distribution f(x)=1b−a, for a≤x≤b P(X≥x)=P(X>x)=(1b−a)⋅(b−x) P(X≤x)=P(X<x)=(1b−a)⋅(x−a) P(x1≤X≤x2)=P(x1<X<x2)=(1b−a)⋅(x2−x1) |
Exponential Distribution f(x)=1μe(−x/μ), for x≥0 P(X≥x)=P(X>x)=e−x/μ P(X≤x)=P(X<x)=1−e−x/μ P(x1≤X≤x2)=P(x1<X<x2)=e(−x1/μ)−e(−x2/μ) |
Standard Normal Distribution μ=0,σ=1 z-score: z=x−μσ x=zσ+μ |
Central Limit Theorem Z-score: z=ˉx−μ(σ√n) |
In the table below, note that when μ=0 and σ=1 use the NORM.S. DIST or NORM.S.INV function in Excel for a standard normal distribution.
P(X≤x) or P(X<x) | P(x1<X<x2) or P(x1≤X≤x2) | P(X≥x) or P(X>x) |
---|---|---|
Is less than or equal to | Between | x)\)">Is greater than or equal to |
Is at most | x)\)">Is at least | |
Is not greater than | x)\)">Is not less than | |
Within | x)\)">More than | |
Less than | x)\)">Greater than | |
Below | x)\)">Above | |
Lower than | x)\)">Higher than | |
Shorter than | x)\)">Longer than | |
Smaller than | x)\)">Bigger than | |
Decreased | x)\)">Increased | |
Reduced | x)\)">Larger | |
![]() |
![]() |
x)\)">![]() |
Excel Finding a Probability: =NORM.DIST(x,μ,σ,true) Finding a Percentile: =NORM.INV(area,μ,σ) |
Excel Finding a Probability: =NORM.DIST(x2,μ,σ,true)−NORM.DIST(x1,μ,σ,true) Finding a Percentile: x1=NORM.INV((1−area)/2,μ,σ) x2=NORM.INV(1−((1−area)/2),μ,σ) |
x)\)">Excel Finding a Probability: =1−NORM.DIST(x,μ,σ,true) Finding a Percentile: =NORM.INV(1−area,μ,σ) |
TI Calculator Finding a Probability: =normalcdf(−1E99,x,μ,σ) Finding a Percentile: =invNorm(area,μ,σ) |
TI Calculator Finding a Probability: =normalcdf(x1,x2,μ,σ) Finding a Percentile: x1=invNorm((1−area)/2,μ,σ) x2=invNorm(1−((1−area)/2),μ,σ) |
x)\)">TI Calculator Finding a Probability: =normalcdf(x,1E99,μ,σ) Finding a Percentile: =invNorm(1−area,μ,σ) |
Chapter 7 Formulas
Confidence Interval for One Proportion ˆp±zα/2√(ˆpˆqn) ˆp=xn ˆq=1−ˆp TI-84: 1−PropZInt |
Sample Size for Proportion n=p∗⋅q∗(zα/2E)2 Always round up to whole number. If p is not given use p∗=0.5. E = Margin of Error |
Confidence Interval for One Mean Use z-interval when σ is given. Use t-interval when s is given. If n<30, population needs to be normal. |
Z-Confidence Interval ˉx±zα/2(σ√n) TI-84: ZInterval |
Z-Critical Values Excel: zα/2=NORM.INV(1−area/2,0,1) TI-84: zα/2=invNorm(1−area/2,0,1) |
t-Critical Values Excel: tα/2=T.INV(1−area/2,df) TI-84: tα/2=invT(1−area/2,df) |
t-Confidence Interval ˉx±tα/2(s√n) df=n−1 TI-84: TInterval |
Sample Size for Mean n=(zα/2⋅σE)2 Always round up to whole number. E = Margin of Error |
Chapter 8 Formulas
Hypothesis Test for One Mean Use z-test when σ is given. Use t-test when s is given. If n<30, population needs to be normal. |
Type I Error - Reject H0 when H0 is true. Type II Error - Fail to reject H0 when H0 is false. |
Z-Test: H0:μ=μ0 H1:μ≠μ0 z=ˉx−μ0(σ√n) TI-84: Z-Test |
t-Test: H0:μ=μ0 H1:μ≠μ0 t=ˉx−μ0(s√n) TI-84: T-Test |
z-Critical Values Excel: Two-tail: zα/2=NORM.INV(1−α/2,0,1) Right-tail: z1−α=NORM.INV(1−α,0,1) Left-tail: zα=NORM.INV(α,0,1) TI-84: Two-tail: zα/2=invNorm(1−α/2,0,1) Right-tail: z1−α=invNorm(1−α,0,1) Left-tail: zα=invNorm(α,0,1) |
t-Critical Values Excel: Two-tail: tα/2=T.INV(1−α/2,df) Right-tail: t1−α=T.INV(1−α,df) Left-tail: tα=T.INV(α,df) TI-84: Two-tail: tα/2=invT(1−α/2,df) Right-tail: t1−α=invT(1−α,df) Left-tail: tα=invT(α,df) |
Hypothesis Test for One Proportion H0:p=p0 H1:p≠p0 z=ˆp−p0√(p0q0n) TI-84: 1-PropZTest |
Rejection Rules: P-value method: reject H0 when the p-value ≤α. Critical value method: reject H0 when the test statistic is in the critical region (shaded tails). |
Two-tailed Test | Right-tailed Test | Left-tailed Test |
---|---|---|
H0:μ=μ0 or H0:p=p0 H1:μ≠μ0 or H0:p≠p0 |
H0:μ=μ0 or H0:p=p0 H1:μ>μ0 or H0:p>p0 |
H0:μ=μ0 or H0:p=p0 H1:μ<μ0 or H0:p<p0 |
![]() |
![]() |
![]() |
Claim is in the Null Hypothesis | ||
---|---|---|
= | ≤ | ≥ |
Is equal to | Is less than or equal to | Is greater than or equal to |
Is exactly the same as | Is at most | Is at least |
Has not changed from | Is not more than | Is not less than |
Is the same as | Within | Is more than or equal to |
Claim is in the Alternative Hypothesis | ||
---|---|---|
≠ | > | < |
Is not | More than | Less than |
Is not equal to | Greater than | Below |
Is different from | Above | Lower than |
Has changed from | Higher than | Shorter than |
Is not the same as | Longer than | Smaller than |
Bigger than | Decreased | |
Increased | Reduced |
Chapter 9 Formulas
Hypothesis Test for Two Dependent Means H0:μD=0 H1:μD≠0 t=ˉD−μD(sD√n) TI-84: T-Test |
Confidence Interval for Two Dependent Means ˉD±tα/2(sD√n) TI-84: TInterval |
Hypothesis Test for Two Independent Means Z-Test: H0:μ1=μ2 H1:μ1≠μ2 z=(ˉx1−ˉx2)−(μ1−μ2)0√(σ21n1+σ22n2) TI-84: 2-SampZTest |
Confidence Interval for Two Independent Means Z-Interval (ˉx1−ˉx2)±zα/2√(σ21n1+σ22n2) TI-84: 2-SampZInt |
Hypothesis Test for Two Independent Means H0:μ1=μ2 H1:μ1≠μ2 T-Test: Assume variances are unequal t=(ˉx1−ˉx2)−(μ1−μ2)0√(s21n1+s22n2) TI-84: 2-SampTTest df=(s21n1+s22n2)2((s21n1)2(1n1−1)+(s22n2)2(1n2−1)) T-Test: Assume variances are equal t=(ˉx1−ˉx2)−(μ1−μ2)√((n1−1)s21+(n2−1)s22(n1+n2−2))(1n1+1n2) df = n_{1} - n_{2} - 2 |
Confidence Interval for Two Independent Means T-Interval: Assume variances are unequal \left(\bar{x}_{1} - \bar{x}_{2}\right) \pm t_{\alpha/2} \sqrt{\left(\frac{s_{1}^{2}}{n_{1}} + \frac{s_{2}^{2}}{n_{2}}\right)} TI-84: 2\text{-SampTInt} df = \dfrac{\left( \frac{s_{1}^{2}}{n_{1}} + \frac{s_{2}^{2}}{n_{2}} \right)^2}{\left( \left(\frac{s_{1}^{2}}{n_{1}}\right)^{2} \left(\frac{1}{n_{1}-1}\right) + \left(\frac{s_{2}^{2}}{n_{2}}\right)^{2} \left(\frac{1}{n_{2}-1}\right) \right)} T-Interval: Assume variances are equal \left(\bar{x}_{1} - \bar{x}_{2}\right) \pm t_{\alpha/2} \sqrt{\left( \left(\frac{\left(n_{1} - 1\right) s_{1}^{2} + \left(n_{2} - 1\right) s_{2}^{2}}{\left(n_{1} - n_{2} - 2\right)}\right) \left(\frac{1}{n_{1}} + \frac{1}{n_{2}}\right) \right)} df = n_{1} - n_{2} - 2 |
Hypothesis Test for Two Proportions H_{0}: p_{1} = p_{2} H_{1}: p_{1} \neq p_{2} z = \dfrac{\left(\hat{p}_{1} - \hat{p}_{2}\right) - \left(p_{1} - p_{2}\right)}{\sqrt{ \left( \hat{p} \cdot \hat{q} \left(\frac{1}{n_{1}} + \frac{1}{n_{2}}\right) \right) }} \hat{p} = \frac{\left(x_{1} + x_{2}\right)}{\left(n_{1} + n_{2}\right)} = \frac{\left(\hat{p}_{1} \cdot n_{1} + \hat{p}_{2} \cdot n_{2}\right)}{\left(n_{1} + n_{2}\right)} \hat{q} = 1 - \hat{p} \hat{p}_{1} = \frac{x_{1}}{n_{1}}, \quad\quad \hat{p}_{2} = \frac{x_{2}}{n_{2}} TI-84: 2\text{-PropZTest} |
Confidence Interval for Two Proportions \left(\hat{p}_{1} - \hat{p}_{2}\right) \pm z_{\alpha/2} \sqrt{\left( \frac{\hat{p}_{1} \hat{q}_{1}}{n_{1}} + \frac{\hat{p}_{2} \hat{q}_{2}}{n_{2}} \right)} \hat{p}_{1} = \frac{x_{1}}{n_{1}} \quad\quad\quad\. \hat{p}_{2} = \frac{x_{2}}{n_{2}} \hat{q}_{1} = 1 - \hat{p}_{1} \quad\quad \hat{q}_{2} = 1 - \hat{p}_{2} TI-84: 2\text{-PropZInt} |
Hypothesis Test for Two Variances H_{0}: \sigma_{1}^{2} = \sigma_{2}^{2} H_{1}: \sigma_{1}^{2} \neq \sigma_{2}^{2} F = \frac{s_{1}^{2}}{s_{2}^{2}} df_{\text{N}} = n_{1} - 1, \quad\quad df_{\text{D}} = n_{2} - 1 TI-84: 2\text{-SampFTest}} |
Hypothesis Test for Two Standard Deviations H_{0}: \sigma_{1} = \sigma_{2} H_{1}: \sigma_{1} \neq \sigma_{2} F = \frac{s_{1}^{2}}{s_{2}^{2}} df_{\text{N}} = n_{1} - 1, \quad\quad df_{\text{D}} = n_{2} - 1 TI-84: 2\text{-SampFTest}} |
F-Critical Values Excel: Two-tail: F_{\alpha/2} = \text{F.INV}(1 - \alpha/2, 0, 1) Right-tail: F_{1-\alpha} = \text{F.INV}(1 - \alpha, 0, 1) Left-tail: F_{\alpha} = \text{F.INV}(\alpha, 0, 1) |
For z and t-Critical Values refer back to Chapter 8 TI-84: invF program can be downloaded at http://www.MostlyHarmlessStatistics.com. |
Chapter 10 Formulas
Goodness of Fit Test H_{0}: p_{1} = p_{0}, p_{2} = p_{0}, \ldots, p_{k} = p_{0} H_{1}: At least one proportion is different. \chi^{2} = \sum \frac{(O-E)^{2}}{E} df = k-1, p_{0} = 1/k \text{ or given %} TI-84: \chi^{2} \text{ GOF-Test} |
Test for Independence H_{0}: Variable 1 and Variable 2 are independent. H_{1}: Variable 1 and Variable 2 are dependent. \chi^{2} = \sum \frac{(O-E)^{2}}{E} df = (R-1)(C-1) TI-84: \chi^{2} \text{-Test} |
Chapter 11 Formulas
One-Way ANOVA: H_{0}: \mu_{1} = \mu_{2} = \mu_{3} = \ldots = \mu_{k} \quad\quad k = \text{number of groups} H_{1}: At least one mean is different. ![]() \bar{x}_{i} = sample mean from the i^{th} group n_{i} = sample size of the i^{th} group s_{i}^{2} = sample variance from the i^{th} group N = n_{1} + n_{2} + \cdots + n_{k} \bar{x}_{GM} = \frac{\sum x_{i}}{N} |
Bonferroni test statistic: t = \dfrac{\bar{x}_{i} - \bar{x}_{j}}{\sqrt{\left( MSW \left(\frac{1}{n_{i}} + \frac{1}{n_{j}}\right) \right)}} H_{0}: \mu_{i} = \mu_{j} H_{1}: \mu_{i} \neq \mu_{j} Multiply p-value by m = {}_{k} C_{2}, divide area for critical value by m = {}_{k} C_{2} |
Two-Way ANOVA: Row Effect (Factor A): H_{0}: The row variable has no effect on the average ______________. H_{1}: The row variable has an effect on the average ______________. Column Effect (Factor B): H_{0}: The column variable has no effect on the average ______________. H_{1}: The column variable has an effect on the average ______________. Interaction Effect (A \times B\): H_{0}: There is no interaction effect between row variable and column variable on the average ______________. H_{1}: There is an interaction effect between row variable and column variable on the average ______________. ![]() |
Chapter 12 Formulas
SS_{xx} = (n-1) s_{x}^{2} SS_{yy} = (n-1) s_{y}^{2} SS_{xy} = \sum (xy) - n \cdot \bar{x} \cdot \bar{y} |
Correlation Coefficient r = \frac{SS_{xy}}{\sqrt{\left( SS_{xx} \cdot SS_{yy} \right)}} |
Slope = b_{1} = \frac{SS_{xy}}{SS_{xx}} y-intercept = b_{0} = \bar{y} - b_{1} \bar{x} Regression Equation (Line of Best Fit): \hat{y} = b_{0} + b_{1} x |
Correlation t-test H_{0}: \rho = 0; \ H_{1}: \rho \neq 0 \quad\quad\quad t = r \sqrt{\left(\frac{n-2}{1-r^{2}}\right)} \quad df = n-2 Slope t-test H_{0}: \beta_{1} = 0; \ H_{1}: \beta_{1} \neq 0 \quad\quad\quad t = \frac{b_{1}}{\sqrt{\left( \frac{MSE}{SS_{xx}} \right)}} \quad df = n - p - 1 = n-2 |
Residual e_{i} = y_{i} - \hat{y}_{i} (Residual plots should have no patterns.) Standard Error of Estimate s_{est} = \sqrt{\frac{\sum \left(y_{i} - \hat{y}_{i}\right)^{2}}{n - 2}} = \sqrt{MSE} Prediction Interval \hat{y} = t_{\alpha/2} \cdot s_{est} \sqrt{\left(1 + \frac{1}{n} + \frac{\left(x - \bar{x}\right)^{2}}{SS_{xx}}\right)} |
Slope/Model F-test H_{0}: \beta_{1} = 0; \ H_{1}: \beta_{1} \neq 0 ![]() |
Multiple Linear Regression Equation \hat{y} = b_{0} + b_{1} x_{1} + b_{2} x_{2} + \cdots + b_{p} x_{p} |
Coefficient of Determination R^{2} = (r)^{2} = \frac{SSR}{SST} |
Model F-Test for Multiple Regression H_{0}: \beta_{1} = \beta_{2} = \cdots \beta_{p} = 0 H_{1}: At least one slope is not zero. |
Adjusted Coefficient of Determination R_{adj}^{2} = 1 - \left(\frac{\left(1 - R^{2}\right) (n-1)}{(n - p - 1)}\right) |
Chapter 13 Formulas
Ranking Data
|
Sign Test H_{0}: Median = MD_{0}
|
Wilcoxon Signed-Rank Test n is the sample size not including a difference of 0. When n < 30, use test statistic w_{s}, which is the absolute value of the smaller of the sum of ranks. CV uses table below. If critical value is not in table then use an online calculator: http://www.socscistatistics.com/tests/signedranks When n \geq 30, use z-test statistic: z = \frac{\left(w_{s} - \left(\frac{n (n+1)}{4}\right) \right)}{\sqrt{\left( \frac{n(n+1)(2n+1)}{24} \right)}} |
Mann-Whitney U Test When n_{1} \leq 20 and n_{2} \leq 20 U_{1} = R_{1} - \frac{n_{1} \left(n_{1}+1\right)}{2}, \ U_{2} = R_{2} - \frac{n_{2} \left(n_{2}+1\right)}{2}. U = \text{Min} \left(U_{1}, U_{2}\right) CV uses tables below. If critical value is not in tables then use an online calculator: https://www.socscistatistics.com/tests/mannwhitney/default.aspx When n_{1} > 20 and n_{2} > 20, use z-test statistic: z = \frac{\left( U - \left(\frac{n_{1} \cdot n_{2}}{2}\right) \right)}{\sqrt{\left( \frac{n_{1} \cdot n_{2} \left(n_{1} + n_{2} + 1\right)}{12} \right)}} |