Statistics is the study of the collection, analysis, interpretation, presentation, and organization of data. In applying statistics to, e.g., a scientific, industrial, or societal problem, it is conventional to begin with a statistical population or a statistical model process to be studied.
- Statistics naturally divides into two branches, descriptive statistics and inferential statistics. Our main interest is in inferential statistics to try to infer from the data what the population might thin or to evaluate the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study. Nevertheless, the starting point for dealing with a collection of data is to organize, display, and summarize it effectively.
- The likelihood that the survey proportion is close to the population proportion determines our confidence in the survey result. For that reason, we would like to be able to compute that likelihood. The task of computing it belongs to the realm of probability, which we study in this chapter.
- It is often the case that a number is naturally associated to the outcome of a random experiment: the number of boys in a three-child family, the number of defective light bulbs in a case of 100 bulbs, the length of time until the next customer arrives at the drive-through window at a bank. Such a number varies from trial to trial of the corresponding experiment, and does so in a way that cannot be predicted with certainty; hence, it is called a random variable.
- A random variable is called continuous if its set of possible values contains a whole interval of decimal numbers. In this chapter we investigate such random variables.
- The probability distribution of a statistic is called its sampling distribution. Typically sample statistics are not ends in themselves, but are computed in order to estimate the corresponding population parameters. This chapter introduces the concepts of the mean, the standard deviation, and the sampling distribution of a sample statistic, with an emphasis on the sample mean.
- In the sampling that we have studied so far the goal has been to estimate a population parameter. But the sampling done by the government agency has a somewhat different objective, not so much to estimate the population mean as to test an assertion—or a hypothesis—about it, namely, whether it is as large as 75 or not. The agency is not necessarily interested in the actual value mean, just whether it is as claimed. Their sampling is done to perform a test of hypotheses.
- Previously, we treated the questions of estimating and making inferences about a parameter of a single population. In this chapter we consider a comparison of parameters that belong to two different populations. For example, we might wish to compare the average income of all adults in one region of the country with the average income of those in another region, or we might wish to compare the proportion of all men who are vegetarians with the proportion of all women who are vegetarians.
- Our interest in this chapter is in situations in which we can associate to each element of a population or sample two measurements x and y, particularly in the case that it is of interest to use the value of x to predict the value of y. In this chapter we will learn statistical methods for analyzing the relationship between variables x and y in this context.
- Whereas the standardized test statistics that appeared in earlier chapters followed either a normal or Student t-distribution, in this chapter the tests will involve two other very common and useful distributions, the chi-square and the F-distributions. The chi-square distribution arises in tests of hypotheses concerning the independence of two random variables and concerning whether a discrete random variable follows a specified distribution.