Skip to main content
Statistics LibreTexts

4: Descriptive Statistics

  • Page ID
    29448
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Any time that you get a new set of data, one of the first tasks that you have to do is summarize the data in a compact, easily understood fashion. This is what descriptive statistics (as opposed to inferential statistics) is all about. In fact, to many people the term “statistics” is synonymous with descriptive statistics. It is this topic that we’ll consider in this chapter, but before going into any details, let’s take a moment to get a sense of why we need descriptive statistics. To do this, let’s load the MLB_GL2021.sav file:

    clipboard_e01f64d0b95bb3937029ad8336ea79b91.png

    There are 17 variables in this data file, but we'll be looking at WinMargin. The WinMargin variable contains the winning margin (number of runs) for all 2,429 home and away regular-season games played during the 2021 season for Major League Baseball (MLB). Let's have a quick look at the first 20 cases of theWinMarginvariable:

    clipboard_e98f386b7ba47f656f3e13251d42a5ac1.png

     

    As lovely as this output is, it does not really make it easy to understand what the data really says. Even if we looked at the list of all 2,429 data points, there isn't any way to just look at the data and make any real sense of it. To make any sense of things we need to make some pretty pictures and calculate some descriptive statistics. At this point, we will make a single graph, but the next chapter will deal with many types of graphs and when they are best used.

    Let's make a quick histogram of the data inWinMargin:

    Simple Histogram Count of Margin of Victory variable

    A histogram of the MLB_GL2021 winning margin data (theWinMarginvariable). As you might expect, the larger the winning margin the less frequently you tend to see it.

    As you might expect, the larger margins of victory occur less often than smaller margins, and this histogram illustrates that quite well. We still do not know a lot about the data, but we now have at least an idea of what it looks like.

    Note:

         The information used here was obtained free of
         charge from and is copyrighted by Retrosheet.  Interested
         parties may contact Retrosheet at "www.retrosheet.org".
    


    This page titled 4: Descriptive Statistics is shared under a CC BY-SA 4.0 license and was authored, remixed, and/or curated by Danielle Navarro.