Skip to main content
Statistics LibreTexts

1.2: From Data to Decisions

  • Page ID
    56622

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Not so very long ago the most convenient method for purchasing new media, such as books or music, started with a visit to a local bookstore or music store. Upon entering the store, a customer would browse through the different sections of the store, observing different types of books or music in each. Scanning the shelves, customers would observe the inventory of options available to them and would make purchasing decisions based on what they saw. The in-store experience was enhanced by special product placements and attentive staff members who were available to assist the customer should they need help. With the help of the staff, inventory could be searched, and specific items could be ordered. However, the hallmark of the attentive staff member was that they could respond to questions, offer suggestions, and perhaps even offer alternative items that the customer may enjoy even more than what they were originally interested in. In the best situation, the customer would develop a relationship with the staff members at the store, and the staff would be able to anticipate the needs of the customer.

    As the retail industry entered the online digital age, stores sought methods to try to duplicate the in-store experience online. Simple menus allowed customers to browse through books or music selections in a similar way that they would in a regular physical store. In fact, it could be easier to browse online as the customer could enter a few keywords and search the inventory, sorting their results according to price, popularity, and other factors. It was not long until online retailers found other ways to not only replicate many of the advantages of in-store shopping but to enhance the online shopping experience beyond what could be expected in a physical store. Customers soon found that they had access to online reviews from both professional reviewers and other customers. Perhaps the most useful development was the personalization of the shopping experience for each online customer.

    Customers soon found that when they logged into their online shopping account, they would be greeted with recommendations, notices of sales, and updates on their orders. What was particularly useful to the customers was that the recommendations and the notice of sales were for items that they might be interested in. That is, these recommendations were different for each customer, and they were specifically targeted to the interest of that customer. Online retailing had entered the modern age.

    At first it may seem quite remarkable that an online store would be able to tailor recommendations individually for each customer visiting the store. It is perhaps even more impressive when one considers the fact that some online retailers may have thousands or perhaps even millions of customers. Of course, for the most part these recommendations were not being hand-picked by human staff but were being automatically generated by computer algorithms based on what had been observed about the purchasing and browsing history of the customer. But how does one train a computer to look at customer habits and come up with recommendations about items that they might be interested in purchasing?

    The answer lies in being able to gather data on customer purchasing and browsing habits, summarize that data into a usable form, draw conclusions about the customer based on the data summary, and assess potential errors and how often they may happen. The scientific field that addresses all the activities is the field of statistics. Statistics also holds the theoretical basis for many new emerging fields such as data analytics and data science, which combine the theoretical power of statistical analysis with the computational power of modern computing devices.

    The study of statistics begins in the 17th century when many European governments found it useful to systematically collect data on the demographics and economic status of their populations. Contributions to the field of statistics in this time consisted mainly of descriptions of data. A typical example of this type of analysis is John Graunt's 1662 work on the Bills of Mortality, which contains statistics on mortality due the plague, a detailed description of the ratio of female to male births, and the development of an early life table that estimates the probability that a person of a given age will live another specified number of years (Fienberg 1992).

    By the early 19th century mathematicians and scientists began using the relatively new field of probability theory to look at data in new ways. Carl Friedrich Gauss and Pierre Simon Laplace started to pioneer a mathematical approach to statistics that culminated in the early 20th century with the development of formal procedures for statistical inference. The important feature of statistical inference is the ability to make decisions from data and do so in a way that one can specify the probability of making an error. This is an important distinction from simply describing data. As will be shown throughout this book, the fact that the potential size of errors as well as the rate at which one makes errors when making decisions based on data is a crucial aspect of making data analysis useful.

    For example, suppose you are sitting in your comfortable chair at home talking with a friend, looking out on a snowy day, and you would like to know the outside temperature. You friend states that they can estimate the outside temperature by counting the number of chairs in your house, and based on this number your friend estimates the outside temperature to be 17 degrees Fahrenheit. Alternatively, you can look at the trusty thermometer mounted outside your window, which shows the temperature as 27 degrees Fahrenheit. It is important to realize that both methods provide a legitimate estimate of the outside temperature. Though it may seem a little silly, estimating the outside temperature by counting chairs does estimate the outside temperature. The important difference between the two methods is the typical size of their error: the difference between the actual outside temperature and the estimated temperature. The question is how you compute these errors and decide which method to use.

    Statisticians in the early 20th century used technical mathematical arguments to formalize the process of looking at errors when using data to make decisions, and by the 1950s these methods were being used in agriculture, medicine, psychology, sociology, and many other fields. The advent of the first accessible digital computers revolutionized the use of statistical procedures in analyzing data, and more sophisticated methods were developed and implemented. By this time statistical methodology was seen as an indispensable part of the scientific method and was applied across almost all fields of study.

    By the beginning of the 21st century, the advent of highly advanced computing technology and the development of new algorithms took the theoretical development of statistics to new and exciting places in the field of data science, where new techniques with interesting names like artificial intelligence, machine learning, and deep learning were solving problems that just a few years prior were considered too complex and too large to address.

    Statistics and data science are not only useful in the online retail industry. More and more of our lives are becoming directly related to data, and that data is used to decide everything: available medical treatments, the placement of highway and rail systems, insurance premiums, and online gaming and sports odds, just to name a few applications. It is difficult to find an area of society that has not been affected using statistics and data science.

    The purpose of this book is to introduce you to the world of statistics and data science, with an emphasis on how data is collected and used to affect your everyday life. Reading this book will not make you a statistician or a data scientist, but it will help you become a good consumer of data. There is not a great amount of technical mathematical information about statistical methodology or computer algorithms in the coming chapters. The emphasis of this book is on a conceptual understanding of the framework of statistical inference. By the end of this book, you should be able to critically analyze the use of data from a conceptual standpoint.

    The constant collection and analysis of data means that there is always someone, somewhere, who is trying to convince you of something based on that data. Therefore, as a member of the modern data-driven society, it is imperative that you have the knowledge to critically assess the use of data in your life. This book is intended to help you gain that knowledge. The quote from H.G. Wells at the start of this chapter looked forward to the future. But that future is now: statistical thinking is as necessary for efficient citizenship as the ability to read and write.


    This page titled 1.2: From Data to Decisions is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by .

    • Was this article helpful?