9.5: Percentiles
- Page ID
- 64645
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)If you have ever taken a standardized exam, you will be familiar with the concept of a percentile. Standardized exams are designed so that your grade is computed relative to a population of students who have taken the same exam. That is, your grade on a standardized exam reflects how well you did relative to the other people who took the exam. Hence, such an exam is not a reflection of your overall knowledge of a subject but is really a reflection of your knowledge of a subject relative to a specified population. Usually, these exams include two indicators of how well you did. The first is typically some type of raw score which can be difficult to interpret without specific knowledge about how the exam is graded and scored. The second indicator usually reports your percentile, which is what percentage of the population got a score equal to or below your own. Therefore, if you scored at the 80th percentile, then 80% of the population got your score or below on the exam. Knowing your percentile score lets you know how well you did relative to the population of other students who took the exam. For those who look at your scores, for example a prospective university, the percentile is interpreted as a measure of exceptionalism.
A percentile is a number that identifies the point at which a specified percentage of a set of data is less than or equal to that number.
The \(p\)th percentile of a set of data is the smallest point for which at least \(p\)% of the data is less than or equal to that point.
The idea of a percentile is a generalization of the concept of the median. The median is a point that divides the data in half, so that 50% of the data should be less than or equal to the median, and hence the median is the 50th percentile of a set of data. It should be noted, however, that this method for computing the median is slightly different than what is outlined in above due to computing conventions, so we will make a distinction between the median and the 50th percentile.
Consider the following ten observations from the simulated salary data: 20, 44, 21, 23, 27, 30, 24, 110, 28, and 141. Find the 50th and the 25th percentiles.
Solution
To make our calculations easier we will sort the data from smallest to largest to get: 20, 21, 23, 24, 27, 28, 30, 44, 110, and 141. Suppose we would like to compute the median—then we would use the fact that there are an even number of observations and compute the median as the mean of the two middle observations. That is, the median is \((27+28)\div 2=27.5\). The 50th percentile, as defined above, is the smallest value in the data for which at least 50% is less than or equal to the value. Note that the value 27 has exactly 50% of the data less than or equal to that value, and that any value less than 27 will have a smaller percentage less than or equal to that value. Hence, the 50th percentile is 27. Suppose now we wish to find the 25th percentile. We need to find the smallest value for which at least 25% of the data is less than or equal to that value. Note that the value 21 is too small since only 20% of the data is less than or equal to that point. This is true for any value that is greater than 21 but less than 23. For the value 23, 30% of the data is less than or equal to that value. This is the smallest value such that at least 25% of the data is less than or equal to that value, and hence 23 is the 25th percentile.
While percentiles are conceptually easy to define, the computation of percentiles for sets of data with few observations can be problematic in that the result is difficult to interpret in terms of the data. In the previous example we found the 25th percentile to be the value 24. But this value also corresponds to the 27th percentile, and the 30th percentile. The fact that these percentiles are all equal can cause confusion and there are many alternative suggestions on how percentiles can be computed, though there is no absolute consensus on how the 25th percentile would be computed in this case. One of the most common suggestions is to use linear interpolation. For example, it seems like the 25th percentile should be halfway between 21 and 23, which would be 22. The good news is that when there are many data points, these problems quickly go away, and in practice you will only need to be able to interpret what the percentile means in terms of the data and not compute it on your own. Generally, technical problems with how a percentile is computed will rarely influence the overall interpretation of the results of a research study.

