Skip to main content
Statistics LibreTexts

7.5: Odds ratio

  • Page ID
    45177
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Introduction

    We introduced the concept of odds 7.1: Epidemiology definitions. As a reminder, odds are a way to communicate the chance (likelihood) that a particular event will take place. Odds are calculated as the number of individuals with the event divided by the number of individuals without the event.

    Odds ratio is a measure of effect size for the association between two binary (yes/no) variables. It is the ratio of the odds of an event occurring in one group to the odds of the same event happening in another group. The odds ratio (OR) is a way to quantify the strength of association between one condition and another.

    Note:

    Effect size — the size of the difference between groups — is discussed further in Chapter 9.2 and Chapter 11.4.

    How are odds ratios calculated? The probabilities are conditional; recall finding the conditional probability of some event A, given the occurrence of some other event B.

    Let \(p_{y-y}\) equal probability of the event occurring (y = Yes) in A, \(p_{y-n}\) equal probability of the event not occurring (n = No) in A, \(p_{n-y}\) equal probability of the event occurring in B, and \(p_{n-n}\) equal probability of the event not occurring in B.

        A
        Yes No
    B Yes \(p_{y-y}\) \(p_{y-n}\)
    No \(p_{n-y}\) \(p_{n-n}\)

    These sum to one: \(p_{y-y} + p_{y-n} + p_{n-y} + p_{n-n} = 1\)

    The conditional probabilities are

        A
        Yes No
    B Yes \(\dfrac{p_{y-y}}{p_{y-y} + p_{n-n}}\) \(\dfrac{p_{y-n}}{p_{y-y} + p_{y-n}}\)
    No \(\dfrac{p_{n-y}}{p_{n-y} + p_{n-n}}\) \(\dfrac{p_{n-n}}{p_{n-y} + p_{n-n}}\)

    and finally then, the odds ratio (OR) is \[OR = \frac{p_{y-y} \cdot p_{n-n}}{p_{y-n} \cdot p_{n-y}} \nonumber\]

    If you have the raw numbers you can calculate the odds ratio directly, too.

        A
        Yes No
    B Yes \(a\) \(b\)
    No \(c\) \(d\)

    and the odds ratio is then \[OR = \frac{a \div b}{c \div d} \nonumber\]

    or, equivalently, \[OR = \frac{a \cdot d}{b \cdot c} \nonumber\]

    Example

    Comparing proportions is a frequent need in court. Gray (2002) provided an example from Title IX of the Education Act of 1972 case Cohen v. Brown University. Under the Act, discrimination based on gender is prohibited. The case concerned participation in collegiate athletics by women. The case data were that of the 5722 undergraduate students, 51% were women, but of the 987 athletes, only 38% were women. A mosaic plot shows graphically these proportions (Fig. \(\PageIndex{1}\), males in red bars, females in yellow bars).

    Mosaic plot of athletes to non-athletes in Brown University, divided by gender.
    Figure \(\PageIndex{1}\): Mosaic plot of athletes to non-athletes in college. Males red, females yellow, data from Gray 2002.

    Alternatively, use a Venn diagram to describe the distribution (Fig. \(\PageIndex{2}\)). Circles that overlap show regions of commonality.

    Venn Diagram showing the 5347 students as a large red oval and the 987 athletes as a smaller yellow oval. The 612 male athletes are represented by the region of the yellow oval not overlapping the red oval, and the 375 female athletes are represented by the orange region created by the overlap of the red and yellow ovals.
    Figure \(\PageIndex{2}\): Venn Diagram of athletes to non-athletes in Brown University. Female athletes \((n = 375)\), male athletes \((n = 612)\), data from Gray 2002.

    where the orange region represents \(\text{Students} \cap \text{Female Athletes}\).

    R code for the Venn diagram was

    library(VennDiagram)
    area1 = 5722
    area2 = 987 
    cross.area = 375 
    draw.pairwise.venn(area1,area2,cross.area,category=c("Students","Athletes"),
    euler.d = TRUE, scaled = TRUE, inverted = FALSE, print.mode = "percent",
    fill=c("Red","Yellow"),cex = 1.5, lty="blank", cat.fontfamily = rep("sans", 2),
    cat.cex = 1.7, cat.pos = c(0, 180), ext.pos=0)

    The question raised before the court was whether these proportions meet the demand of “substantially proportionate.” What exactly the law means by “substantially proportionate” was left to the courts and the lawyers to work out (Gray 2002). Title IX suggests that “substantially proportionate” is a statistical problem and the two sides of the argument must address the question from that perspective.

    What is the chance that an undergraduate student was an athlete and female? 38%. And the chance that an undergraduate student was an athlete and male? 62%. Clearly 38% is not 62%; did the plaintiffs have a case?

    Graphs like Figure \(\PageIndex{1}\) and Figure \(\PageIndex{2}\) help communicate but can’t provide a sense of whether the differences are important. Let’s start by looking at the numbers. Working with the proportions, we have the following breakdown for numbers of students (Table \(\PageIndex{1}\)) or as proportions (Table \(\PageIndex{2}\)).

    Table \(\PageIndex{1}\). Gray’s raw data displayed in a 2×2 format.
        Athletes
        Yes No
    Undergraduates Male 612 2192
    Female 375 2543

    Together, the numbers total 5,722.

    The Odds Ratio (OR) would be \[OR = \frac{612 \cdot 2543}{2192 \cdot 375} = 1.89 \nonumber\]

    Or from the proportions (Table \(\PageIndex{2}\)):

    Table \(\PageIndex{2}\). Data from Table \(\PageIndex{1}\) as proportions.
        Athletes
        Yes No
    Undergraduates Male 0.107 0.383
    Female 0.066 0.444

    Adding all of these frequencies together equals 1. Carry out the calculation of odds (Table \(\PageIndex{3}\)), which shows the conditional probabilities in bold.

    Table \(\PageIndex{3}\). Odds calculated from Table \(\PageIndex{2}\) inputs.
        Athletes
        Yes No
    Undergraduates Male 0.218
    \(=\frac{0.107}{0.107 + 0.383}\)
    0.782
    \(=\frac{0.383}{0.107 + 0.383}\)
    Female 0.129
    \(=\frac{0.066}{0.066 + 0.444}\)
    0.871
    \(=\frac{0.444}{0.066 + 0.444}\)

    Calculate the odds ratio: \[OR = \frac{0.2182 \cdot 0.871}{0.129 \cdot 0.871} = 1.89 \nonumber\]

    Thankfully, whether we use the raw number format or the proportion format, we got the same results!

    Interpretation. Because the Odds Ratio (OR) is greater than 1, males students were more likely to be athletes than female students. If there was no difference in proportion of male and female athletes, the odds ratio would be close to one. That is a test of statistical inference (e.g., a contingency table), but for now, if one is included in the confidence interval, then this would be evidence that there was no difference between the proportions.

    Relative risk v. odds ratio

    We introduced another way to quantify this association as the Relative Risk (RR) and Absolute Risk Reductions in the previous section. Both can be used to describe the risk of the treatment (exposed) group relative to the control (nonexposed) group. RR is the ratio of the treated to control group. OR is the ratio between odds of treated (exposed) and control (nonexposed). What’s the difference? OR is more general — it can be used in situations in which the researcher chooses the number of affected individuals in the groups and, therefore, the base rate or prevalence of the condition in the population is not known or is not represented in the group makeup, whereas RR is appropriate when prevalence is known (this is a general point, but see Schechtman 2002 for a nice discussion).

    The odds ratio is related to relative risk, but not over the entire range of possible risk. Odds of an event is simply the number of individuals with the event divided by the number without the event. Odds of an event therefore can range from zero (event cannot occur) to infinity (event must occur). For example, odds of eight (1.89:1) means that nearly two male students were student athletes at Brown University for every one female student.

    In contrast, the risk of an event occurring is the number of individuals with the event divided by the total number of people at risk of having that event. Risk is expressed as a percentage (Davies et al 1998). Thus, for our example, odds of 1.89:1 correspond to a risk of 1.89 divided by (1 + 1.89), which equals 65%.

    To get the relative risk we can use \[RR = \frac{\frac{a}{a+b}}{\frac{c}{c+d}} \nonumber\]

    For our example, this comes out to 1.7%.

    In this example we could use either odds or relative risk; the key distinction is that we knew how many events happened in both groups. If this information is missing for one group (e.g., control group of the case-control design), then only the odds ratio would be appropriate.

    From cumulative wisdom in the literature (e.g., Tamhane et al 2107), if prevalence is less than ten percent, \(OR \approx RR\). We can relate \(RR\) and \(OR\) as \[RR = OR \cdot \frac{1 + \frac{n_{2,1}}{n_{2,2}}}{1 + \frac{n_{1,1}}{n_{1,2}}} \nonumber\]

    where \(n_{1,1}\) and \(n_{2,1}\) are the frequency with the condition for group 1 and group 2, respectively, and \(n_{1,2}\) and \(n_{2,2}\) are the frequency without the condition for group 1 and group 2, respectively. For the examples on this page, group 1 is the treatment group and group 2 is the control group.

    Hazard ratio

    The hazard ratio is the ratio of hazard rates. Hazard rates are like the relative risk rates, but are specific to a period of time. Hazard rates come from a technique called Survival Analysis (introduced in Chapter 20.9). Survival analysis can be thought of as following a group of subjects over time until something (the event) happens. By following two groups, perhaps one group exposed to a suspected carcinogen vs. another group matched in other respects except the exposure, at the end of the trial, we’ll have two hazard rates: the rate for the exposed group and the rate for the control group. If there is no difference, then the hazard ratio will be one.

    Hazard ratios are more appropriate for clinical trials; relative risk is more appropriate for observational studies.

    For a hazard ratio, it is often easier to think of it as a probability (between 0 to 1). To translate a hazard ratio to a probability, use the following equation: \[p = \frac{hazard ratio}{1 + hazard ratio} \nonumber\]


    Questions

    1. Distinguish between odds ratio, relative risk, and hazard ratio.
    2. Refer to problem 4 introduced in 7.4 – Epidemiology: Relative risk and absolute risk, explained.

    This page titled 7.5: Odds ratio is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Michael R Dohm via source content that was edited to the style and standards of the LibreTexts platform.