By the end of this chapter, the student should be able to:

Display data graphically and interpret graphs: stemplots, histograms, and box plots.

Recognize, describe, and calculate the measures of location of data: quartiles and percentiles.

Recognize, describe, and calculate the measures of the center of data: mean, median, and mode.

Recognize, describe, and calculate the measures of the spread of data: variance, standard deviation, and range.

Once you have collected data, what will you do with it? Data can be described and presented in many different formats. For example, suppose you are interested in buying a house in a particular area. You may have no clue about the house prices, so you might ask your real estate agent to give you a sample data set of prices. Looking at all the prices in the sample often is overwhelming. A better way might be to look at the median price and the variation of prices. The median and variation are just two ways that you will learn to describe data. Your agent might also provide you with a graph of the data.

In this chapter, you will study numerical and graphical ways to describe and display your data. This area of statistics is called "Descriptive Statistics." You will learn how to calculate, and even more importantly, how to interpret these measurements and graphs.

A statistical graph is a tool that helps you learn about the shape or distribution of a sample or a population. A graph can be a more effective way of presenting data than a mass of numbers because we can see where data clusters and where there are only a few data values. Newspapers and the Internet use graphs to show trends and to enable readers to compare facts and figures quickly. Statisticians often graph data first to get a picture of the data. Then, more formal tools may be applied.

Some of the types of graphs that are used to summarize and organize data are the dot plot, the bar graph, the histogram, the stem-and-leaf plot, the frequency polygon (a type of broken line graph), the pie chart, and the box plot. In this chapter, we will briefly look at stem-and-leaf plots, line graphs, and bar graphs, as well as frequency polygons, and time series graphs. Our emphasis will be on histograms and box plots.

Qualitative Data Discussion

Below are tables comparing the number of part-time and full-time students at De Anza College and Foothill College enrolled for the spring 2010 quarter. The tables display counts (frequencies) and percentages or proportions (relative frequencies). The percent columns make comparing the same categories in the colleges easier. Displaying percentages along with the numbers is often helpful, but it is particularly important when comparing sets of data that do not have the same totals, such as the total enrollments for both colleges in this example. Notice how much larger the percentage for part-time students at Foothill College is compared to De Anza College.

De Anza College

Foothill College

Table \(\PageIndex{1}\): Fall Term 2007 (Census day)

Number

Percent

Number

Percent

Full-time

9,200

40.9%

Full-time

4,059

28.6%

Part-time

13,296

59.1%

Part-time

10,124

71.4%

Total

22,496

100%

Total

14,183

100%

Tables are a good way of organizing and displaying data. But graphs can be even more helpful in understanding the data. There are no strict rules concerning which graphs to use. Two graphs that are used to display qualitative data are pie charts and bar graphs.

In a pie chart, categories of data are represented by wedges in a circle and are proportional in size to the percent of individuals in each category.

In a bar graph, the length of the bar for each category is proportional to the number or percent of individuals in each category. Bars may be vertical or horizontal.

A Pareto chartconsists of bars that are sorted into order by category size (largest to smallest).

Look at Figures \(\PageIndex{3}\) and \(\PageIndex{4}\) and determine which graph (pie or bar) you think displays the comparisons better.

It is a good idea to look at a variety of graphs to see which is the most helpful in displaying the data. We might make different choices of what we think is the “best” graph depending on the data and the context. Our choice also depends on what we are using the data for.

Figure \(\PageIndex{4}\): Bar chart

Percentages That Add to More (or Less) Than 100%

Sometimes percentages add up to be more than 100% (or less than 100%). In the graph, the percentages add to more than 100% because students can be in more than one category. A bar graph is appropriate to compare the relative size of the categories. A pie chart cannot be used. It also could not be used if the percentages added to less than 100%.

Characteristic/Category

Percent

Table \(\PageIndex{2}\): De Anza College Spring 2010

Full-Time Students

40.9%

Students who intend to transfer to a 4-year educational institution

48.6%

Students under age 25

61.0%

TOTAL

150.5%

Figure \(\PageIndex{2}\): Bar chart of data in Table \(\PageIndex{2}\).

Omitting Categories/Missing Data

The table displays Ethnicity of Students but is missing the "Other/Unknown" category. This category contains people who did not feel they fit into any of the ethnicity categories or declined to respond. Notice that the frequencies do not add up to the total number of students. In this situation, create a bar graph and not a pie chart.

Table \(\PageIndex{2}\): Ethnicity of Students at De Anza College Fall Term 2007 (Census Day)

Frequency

Percent

Asian

8,794

36.1%

Black

1,412

5.8%

Filipino

1,298

5.3%

Hispanic

4,180

17.1%

Native American

146

0.6%

Pacific Islander

236

1.0%

White

5,978

24.5%

TOTAL

22,044 out of 24,382

90.4% out of 100%

Figure \(\PageIndex{3}\): Enrollment of De Anza College (Spring 2010)

The following graph is the same as the previous graph but the “Other/Unknown” percent (9.6%) has been included. The “Other/Unknown” category is large compared to some of the other categories (Native American, 0.6%, Pacific Islander 1.0%). This is important to know when we think about what the data are telling us.

This particular bar graph in Figure \(\PageIndex{4}\) can be difficult to understand visually. The graph in Figure \(\PageIndex{5}\) is a Pareto chart. The Pareto chart has the bars sorted from largest to smallest and is easier to read and interpret.

Figure \(\PageIndex{4}\): Bar Graph with Other/Unknown Category

Figure \(\PageIndex{5}\): Pareto Chart With Bars Sorted by Size

Pie Charts: No Missing Data

The following pie charts have the “Other/Unknown” category included (since the percentages must add to 100%). The chart in Figure \(\PageIndex{6}\) is organized by the size of each wedge, which makes it a more visually informative graph than the unsorted, alphabetical graph in Figure \(\PageIndex{6}\).

Figure \(\PageIndex{6}\).

Contributors and Attributions

Barbara Illowsky and Susan Dean (De Anza College) with many other contributing authors. Content produced by OpenStax College is licensed under a Creative Commons Attribution License 4.0 license. Download for free at http://cnx.org/contents/30189442-699...b91b9de@18.114.