Skip to main content
Statistics LibreTexts

8.4: Randomness and Observing Data

  • Page ID
    64179

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    The concept of using a random mechanism to aid in obtaining a representative sample dates to the early twentieth century (Stephan 1948). Early examples of using random sampling methods include the work of William Sealy Gosset (Figure \(\PageIndex{1}\)). Gosset worked for the Guinness Brewery in Dublin, Ireland, and was a pioneer in using statistical methods to solve manufacturing problems. In his research, the concept of sampling items at random verified important theoretical calculations that became what is known as Student’s \(t\) test (Student 1908). The technique that Gosset used was based on drawing cards from a large bin. This research was published in 1908. Another early example of the use of a random mechanism in sampling was in research published by the English statistician and economist Sir Arthur Lyon Bowley (Figure \(\PageIndex{2}\)) in 1906. He studied bond rates by taking a sample where the random mechanism was based on the final digits in one of the tables in the Nautical Almanac (Bowley 1906).

    A person with a mustache wearing glasses

AI-generated content may be incorrect.
    Figure \(\PageIndex{1}\): William Sealy Gosset (1876–1937) was an English statistician, chemist and brewer who for the Guinness Brewery. He and was a pioneer in the development of modern statistical methods (Public domain image).
    A person in a suit and tie

AI-generated content may be incorrect.
    Figure \(\PageIndex{2}\): Sir Arthur Lyon Bowley (1869–1957) was an English statistician and economist who pioneered the use of sampling techniques in social surveys (Public domain image).

    How is it possible that choosing a sample randomly will help researchers obtain a sample that is representative of the population? In the example of choosing which exams should be graded, we pointed out that what was really required was a method of choosing the exams to grade in a way that is not related to any possible trend in the exam scores. How can the professor guard against any type of trend when there is no way to know what the trends may be? This is where the idea of randomness is helpful.

    Suppose the professor could randomly choose five exams in such a way that each exam has an equal chance of being selected. If that could be accomplished, then the method of selection would be completely unrelated to the order in which the exams were turned in. In fact, this method of selection is unrelated to any possible trend that could exist in the exam scores.

    The professor could implement taking such a sample by taking twenty-five index cards and numbering them from one to twenty-five. They could then shuffle the deck many times to ensure that the ordering of the cards was random and that each ordering was equally likely. The first five cards could then be selected and the exams corresponding to the number on the cards would be selected for the sample. This type of sample is known as a simple random sample.

    Definition: Simple Random Sample

    A simple random sample is any random method of choosing a sample from a population in such a way such that every individual or item in the population has an equal chance of being selected.

    The method of shuffling a deck of playing cards in card games is used to ensure that every possible selection of cards is equally likely. When you pick a card from the top of a well shuffled deck, each card has an equal chance of being selected. In the same way, shuffling the index cards with the exam numbers on them ensures that each exam has an equal chance of being selected for the sample.

    In the past, statisticians have used physical objects such as decks of cards to implement random sampling in practical situations. The use of large tables of random sequences was also used before the advent of digital computers. Practical applications for modern applications depend on computer-based algorithms.

    The advantage of simple random sampling is that the method guarantees that the corresponding sample will be very likely to be a representative sample. It may happen every time, but the probability of this occurring is very small. In many cases, using a simple random sample is the standard method for taking a sample from a population.

    As an example, consider the data given in Table 8.2, which corresponds to a small population of residents in a senior living complex. The data in the table gives the unit number, gender identity, and the average response time in minutes for non-emergency calls to the staff over the past month. The female and non-binary residents contend that the average response time is larger for residents who do not identify as male. Rather than observing the entire population, the management decides to take a simple random sample of five units to study the situation. Since there are twenty units, the manager uses a 20-sided die to choose the units for the sample. The manager rolls the die five times, getting 18, 4, 17, 5, and 4. Noticing that the roll 4 was repeated, the die is rolled once more and 1 is observed. Hence the units 1, 4, 5, 17, and 18 are include in the random sample. If another number had been repeated, the die would have been rolled again until five distinct numbers had been observed. It has been proven mathematically that if the die is fair, this method will result in a simple random sample.

    Table 8.2 The unit number, gender identity, and the average response time in minutes for non-emergency calls to the staff over the past month at a senior living complex. The gender identities are coded as F (female), M (male), and N (nonbinary).

    Unit

    Gender

    Response

    1

    M

    11.03

    2

    N

    12.15

    3

    F

    11.21

    4

    F

    11.45

    5

    F

    11.61

    6

    M

    10.83

    7

    M

    10.65

    8

    N

    12.62

    9

    M

    10.53

    10

    F

    11.88

    11

    F

    12.00

    12

    M

    10.34

    13

    M

    10.37

    14

    N

    13.26

    15

    F

    12.01

    16

    M

    10.16

    17

    M

    10.65

    18

    F

    10.90

    19

    F

    10.83

    20

    F

    10.49

    For these die rolls, the corresponding simple random sample of data is presented in Table 8.3. How representative is this sample? Let us consider the percentage of each gender identity in the population and in the sample. The population contains eight males, or \(8\div 20\times 100%=40%\) identify as male. The simple random sample contains two males, or \(2\div 5\times 100%=40%\) identify as male. From this viewpoint, the sample seems to be representative. On the other hand, there are no nonbinary people included in the sample.

    Table 8.3 The unit number, gender identity, and the average response time in minutes for non-emergency calls to the staff over the past month at a senior living complex included in the simple random sample. The gender identities are coded as F (female), M (male), and N (nonbinary).

    Unit

    Gender

    Response

    1

    M

    11.03

    4

    F

    11.45

    5

    F

    11.61

    17

    M

    10.65

    18

    F

    10.90

    Few samples will perfectly represent the population from which they came. The hope when taking a sample is that it represents the characteristics of the population well enough for researchers to answer the questions that they have about the population. Without knowing the entire population, it is very difficult to determine if a sample represents a population. As we shall see later, statistical methodology can help determine the size of the researcher’s potential error, and this information determines whether the sample provides sufficient evidence to prove a research hypothesis. In cases where a simple random sample cannot be taken, other sampling methods have been developed.


    This page titled 8.4: Randomness and Observing Data is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by .

    • Was this article helpful?