Skip to main content
Statistics LibreTexts

Assignment Hints

  • Page ID
    64407
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Type

    • Qualitative Data occurs when the outcomes or responses are not number, but are words.  Examples are:  favorite baseball team, eye color, etc.
    • Quantitative Data occurs when the outcomes or responses are numbers.
      • For quantitative data, it is discrete if there are a finite number of outcomes or a countable number of outcomes.  Examples are:  number of children, roll of the dice, etc.
      • For quantitative data, it is continuous if it is not discrete.  The responses can include any number in an interval including fractions and decimals.  Examples are:  exact time to run a mile, exact weight of a carrot, etc.

     

     

     

    Descriptive Inferential

     Recall that descriptive statistics involve displaying the sample data using a chart such as a pie chart, histogram, etc. and looking at the statistics such as the mean, median, mode, etc.  Inferential statistics involves making conclusions about the population based on the sample such as "Based on the sample, the average age of all Americans has increased over the past 10 years" or "After looking at the sample data, we can be confident that the mean movie length for all movies is between 118 and 134 minutes."

     

     

     

    Key Terms

    • The population is the collection of all possible respondents. 
    • The sample is the collection of only those respondents who gave a response. 
    • A parameter is a number that describes the population. 
    • A statistic is a number that describes the data. 
    • The variable is the number that is a response to the survey or experiment. 
    • Data refers to all the responses of the survey or the experiment.

     

     

     

     

     

    Sampling Types:

    • Convenience Sampling:  The is sampling the respondents who are the easiest to collect data from.  It is unscientific and prone to produce biased results.
    • Stratified Sampling:  This is a scientific sampling technique where the researcher first identifies relevant strata such as race, income, or location.  Then the researcher find the proportion of each type and makes sure that the sample has the same proportion as the population has.
    • Cluster Sampling:  This is a scientific sampling technique where the researcher first identifies a natural partition of the population.  From this partition, the researcher selects several clusters and proceeds to collect data from every individual in each of the selected clusters.
    • Systematic Sampling:  This is a scientific sampling technique where the researcher selects every nth item to sample, such as every tenth item that is produced.
    • Simple Random Sampling:  This is the scientific sampling technique where the researcher first itemizes every member of the population and randomly (usually using a computer) selects the sample from this list.

     

     

     

    Bias

    A sampling technique is not scientific if the design is poor.  That means that the survey question was biased, the choice of respondents was biased, or there just weren't enough respondents.  If the sample data differs significantly from the population data, but the sample size is large and the survey was conducted without bias, then the sampling technique is scientific.  Bad luck can still occur.

     

     

    Replacement

    Think about randomly selecting two Americans.  If the first respondent is a Democrat, does the probability that the second is a Democrat decline significantly since one Democrat American is taken out of the list of possibilities?

     

     

     

     

    Frequency

    • If you add up all the frequencies, you should get the total sample size.
    • To find the relative frequency, divide the frequency by the sample size.
    • To find the cumulative frequency of a value, add up the frequencies at and below that value.
    • If you know the relative frequency of a value, then just multiple by 100% to get the percent that have that value.

     

     

     

    Frequency of Range Values

    To find the frequency of a range of values, just count how many values are in that range.

     

     

    Relative Frequency

    To find a relative frequency, divide the frequency of the values in the given range by the sample size.

     

     

     

     

    Class Boundary

    The lower class boundary is the lowest value in the adjusted interval.  If the boundaries do not meet because they are integers or other rounded numbers, then you adjust the left boundary by subtracting half the distance between the boundaries and adjust the right boundary by adding half the distance between the boundaries.

    For example, if the original boundaries are 5-9, 10-14, 15-19, and 20-24, then the distance between the boundaries is 10 - 9 = 1.  Half of this is 0.5.  Thus the new adjusted boundaries are:  4.5-9.5, 9.5-14.5, 14.5-19.5, 19.5-24.5.  The lower class boundary will then be 4.5.

     

     

     

    Cumulative Relative Frequency

      The Cumulative Relative Frequency is defined by the frequency at or below that value divided by the sample size.  First add up the number of students who are taking 2 or fewer courses then divide by the sample size 50.

     

     

     

    Box Plot

    The general box plot is shown below.

    Box Plot labeled Min, Q1,Median, Q3, and Max

    • The Min (minimum) corresponds to the lowest data value.
    • Q1 (the first quartile) corresponds to the 25th percentile, or the value at which 25% of the data lies at or below this value.
    • The Median corresponds to the 50th percentile or the middle value, or the value at which 50% of the data lies at or below this value.
    • Q3 (the third quartile) corresponds to the 75th percentile, or the value at which 75% of the data lies at or below this value.
    • The Max (maximum) corresponds to the highest data value.
    • IQR (Inter-Quartile-Range) is the range for the middle 50% of the data. 

     

     

    Coefficient of Variation

    Hint:  You should be able to use a calculator or computer to find the mean and standard deviation of the data.  The definition of the coefficient of variation is the standard deviation divided by the mean:

         \( CV = \frac{\sigma}{\mu} \) 

    To change it to a percent, multiply by 100%.

     

     

     

    Frequency Table

    View the video to see how to enter the data into the calculator and to find the statistics.

    • To find the percent of respondents who gave an answer at least, less then, more than, or at most a number, use the table to count the number of these respondents.  Then divide by the sample size and multiple by 100%.  Please do not include the symbol "%" in your answer.
    • If you are given a percent of respondents who answered below (above, at least, at most) a number, multiply this percent (as a decimal) by the sample size.  This will give the ranking of that response.  Then find the value that has the ranking. 

    For example use the table below to find the value such that 30% of the respondents are at most that value do the following:

    Value Frequency
    2 15
    3 30
    4 25
    5 10

    The sample size is 80.  We find the ranking by calculating: 

    0.30 x 80  =  24

    Now to find the 24th number, notice that the first through 15th numbers have the value of 2.  The 16th through 35th numbers have the value of 3.  The 36th through 70th numbers have the value of 4 and the 71st through 80th numbers have the value of 4.  Therefore the 24th number has the value of 3.  We can conclude that 30% of the respondents gave a value of at most 3.  Note that has the question asked "less than that value" we would have an answer of 4.

     

     

     

     

    Sample Space:

    • The sample space includes all possible outcomes.  How many cards are there?
    • The probability of an event A, P(A), is defined by the number of outcomes in A divided by the sample size.  So, count the the number cards that are in the event that you are concerned with and divide by the total number of cards to be drawn from.

     

     

     

    Independent Events

    If events A and B are independent then P(A)P(B) = P(A and B)

     

     

     

     

     

    Mutually Exclusive

    • To determine the sample space, just count the cards and count how many types of landings of the coin there are (heads and tails).  Then you can multiply to find the size of the sample space.
    • Two events are mutually exclusive if they cannot occur simultaneously.  In other words events U and V are mutually exclusive if P(U and V) = 0.

     

     

     

     

     

    Expected Value

    Find the expected value of each.  Then just look at which is the largest, middle, and smallest.  To find the expected values, multiply each outcome by its corresponding probability.  Finally, add the three products together.

     

     

    Profit Expected Value 

    To write the probability distribution table, put the possible outcomes in the column labeled "x" and the corresponding probabilities in the column labeled P(x).  Notice that the x values represent profit which is revenue minus cost.  For example, if the revenue for a win is $a and the cost is $b, then the profit is a - b.

    • To find the expected value, first multiple each of the "x" values by their corresponding probabilities.  Then add up all these products.

    • The expected value is the number such that if many many trials are done, then the average outcome per trial is likely to be very close to the expected value.  Thus this is the average profit per game played.

     

     

     

     

    Binomial:

    • To find the probability that a binomial variable is exactly equal to a number x, use:  binompdf(n,p,x) where n is the sample size, p is the probability of success.
    • To find the probability that a binomial variable is less than or equal to a number x or at most x, use: binomcdf(n,p,x).
    • To find the probability that a binomial variable is less than a number x, use: binomcdf(n,p,x-1).
    • To find the probability that a binomial variable is greater than to a number x, use the rule of complements:  1 - binomcdf(n,p,x).
    • To find the probability that a binomial variable is greater than or equal to a number x or at least x, use the rule of complements: 1 - binomcdf(n,p,x - 1).

     

     

    Only 2 Outcomes

      For a Binomial Distribution, each trial must have only two possible outcomes (think heart or not heart) and each trial's probability of success must be the same as and independent of every other trial.

     

     

     

    Average

    The expected value tells that in the long run for many trials (not just one trial) it is very likely that the average amount for all trials will be close to this number.

     

     

    Loss is Bad

    Do you think the business owner would be wise to bid when on average there will be a loss (negative gain) of about $3000 on the project?

     

     

     

     

    Uniform Distribution Properties 

    • The mean for a uniform distribution is the average of the left and the right endpoints.

    • The standard deviation for a uniform distribution is the square root of (b - a)2 / 12 where a and b are the left and right endpoints respectively.

    • For a uniform distribution, the probability that an outcome will be exactly a given number is always 0.

    • For a uniform distribution, the probability that an outcome will be between two numbers x and y is (y - x) / (b - a) where a and b are the left and right endpoints respectively.

    • In general for a uniform distribution, we can find a probability by taking the length of the described line segment and divide by b - a.

    • To find a percentile, p (or a quartile:  25th or 75th percentile) you want to go backwards with the uniform distribution calculations.  Here you know the probability and want to find y, so you set
      p = (y - a) / (b - a) and solve for y.

    • If you have a uniform distribution and want to find a conditional probability P(A|B), then use the given to get the new endpoints.  For example if the distribution is uniform between 5 and 20 and you want to find the probability of an event being between 10 and 17 given that the outcome is less than or equal to 15, you need to find the probability that an event is between 10 and 15 for a uniform distribution with endpoints 5 and 15.

     

     

     

    Discrete is a List Continuous is an Interval

      A discrete random variable has a finite number of outcomes or a countable number of outcomes.  A continuous random variable has an entire interval of outcomes including decimals and fractions.

     

     

     

    Normal

    • We write X ~ N(m,s) to mean that the distribution is Normal (N) with mean m and standard deviation s.  The mean and the standard deviation are given in the problem.

    • For a normal distribution, the mean and the median are the same.

    • To find the z-score, use the formula:  z = (x - m)/s.

    • To find the probability that an event is between two numbers a and b, use your calculator with N(a,b,m,s).

    • To find the probability that an event is less than a number a, use your calculator with N(-99999,a,m,s).  It is recommended that you use at least enough 9's so that the lower bound is at least 10 times larger in magnitude than the maximum magnitude of a, m, and s.

    • To find the probability that an event is greater than a number b, use your calculator with N(b,99999,m,s).  It is recommended that you use at least enough 9's so that the upper bound is at least 10 times larger in magnitude than the maximum of b, m, and s.

    • For normal distribution probabilities, < is the same as < and > is the same as >.

    • If you want to find the value such that the proportion of the data that is below that value is p, then use the inverse normal:  invNorm(p,m,s).

    • To find the pth percentile, first convert p to a decimal and then use the inverse normal.  For example, to find the 17th percentile, use invNorm(0.17,m,s).  Note that the first quartile is the 25th percentile and the third quartile is the 75th percentile.

    • To find out the value such that the proportion above that value is p, first subtract from 1 and then use the hint above.   invNorm(1-p,m,s)

     

     

     

     

     

    Central Limit Theorem 1

     Consider the diagram below and notice that there is a left tail and a right tail that must make a combined 0.05 area.

    • Use the fact that the sampling distribution (X distribution) has mean m and standard deviation s divided by the square root of n.

    • To find a probability that involves the mean use normalcdf(\(a,b,\mu,\frac{\sigma}{\sqrt{n}})\) .

      • a is the lower bound.  Use -99999 if there is no lower bound (negative infinity).

      • b is the upper bound.  Use 99999 if there is no upper bound (infinity).

      • m is the population mean

      • s is the population standard deviation

      • n is the sample size.

    • To find a percentile or quartile:  Q1 = 25th percentile and Q3 = 75th percentile, use invNorm(\(x,\mu,\frac{\sigma}{\sqrt{n}})\) .  Other wordings are find the value such that 30% (x = 0.3) of the data lies below the value.  If you want to find the value such that a given percentage lies above that value, use the rule of complements.  For example if 10% lies above, then 90% lies below.

    • Use the fact that the distribution of a sum of values has mean nm and standard deviation s times the square root of n.

    • To find a probability that involves the sum use normalcdf(\(a,b,n* \mu,\sigma * \sqrt{n})\)  

    • To find the Inter-Quartile-Range (IQR), find the first quartile (25th  percentile) and the third quartile (75th percentile) and subtract.

     

     

     

     

    CLT, Sample and Population  

    Recall that the Central Limit Theorem tell us that the sampling distribution will be approximately normal when the sample size it large.  Notice that the Central Limit Theorem tells us nothing about the distribution of the sample.  It is important to understand the difference between the sampling distribution and the distribution of the sample.

     

     

     

    Central Limit Theorem Symbols

    Below are the some of the symbols that represent parameters and statistics that are used in elementary statistics.

    • \( x \)    The random variable that represents the quantitative outcome.  For example, if a survey is conducted asking 100 people how much they weigh, then \( x \) is a randomly selected respondents weight
    • \(m\)     The population mean.  For example, for the survey that asks 100 people's weight, \(m\) represents the average weight of all people in the world, not just from the survey respondents.
    •  \(\bar{x}\)    The sample mean.  For example, for the survey that asks 100 people's weight, \(\bar{x}\)  represents the average weight of the 100 respondents.
    • \(\mu_{\bar{x}}\)     The population mean of the sampling distribution.  For example, consider every possible group of 100 people.  Each one of these groups will have its own sample mean \(\bar{x}\) , the mean of all of these many many \(\bar{x}\) 's will be the population mean of the sampling distribution \(\mu_{\bar{x}}\).
    • \( \sigma \)    The population standard deviation.  For example, for the survey that asks 100 people's weight, \( \sigma \) represents the standard deviation of all people in the world, not just from the survey respondents.
    • \( s \)    The sample standard deviation.  For example, for the survey that asks 100 people's weight, s represents the standard deviation of the 100 respondents.
    • \(\sigma_{\bar{x}}\)     The population standard deviation of the sampling distribution.  For example, consider every possible group of 100 people.  Each one of these groups will have its own sample standard deviation \( s \), the standard deviation of all of these many many \(\bar{x}\)'s will be the population standard deviation of the sampling distribution \(\sigma_{\bar{x}}\) .
    •  \( p \) The population proportion.  For example, if a survey is conducted of 100 randomly selected Americans asking them if they were born in America, then \( p \) is the proportion of all Americans who were born in America not just the 100 Americans who were surveyed.
    • \( \hat{p} \)   The sample proportion.  For example, if a survey is conducted of 100 randomly selected Americans asking them if they were born in America, then \( \hat{p} \) is the proportion of the 100 Americans who were surveyed that were born in American.
    • \( \mu_{\hat{p}} \)  The population mean of the sampling distribution for proportions.  For example, consider every possible group of 100 people.  Each one of these groups will have its own sample proportion \( \hat{p} \) , the mean of all of these many many \( \hat{p} \) 's will be the population mean of the sampling distribution for proportions \( \mu_{\hat{p}} \) .
    • \( \sigma_{\hat{p}} \)    The population standard deviation of the sampling distribution for proportions.  For example, consider every possible group of 100 people.  Each one of these groups will have its own sample proportion \( \hat{p} \) , the standard deviation of all of these many many \( \hat{p} \) 's will be the population standard deviation of the sampling distribution for proportions \( \sigma_{\hat{p}} \).

     

     

     

     

     

    Confidence Interval For the Population Variable

    The confidence interval is for the population mean.  Confidence intervals are never for individual data values.

     

     

     

     

    Confidence Interval For a Proportion

    • First notice that the study involves a "Yes or No" question.  You are interested in finding a confidence interval for a proportion here.
    • In your calculator go to STAT then TESTS.  Then scroll down until you find 1-PropZInt.  x represents the number of successes and n represents the sample size.
    • Here's how to interpret the confidence level.  If many randomly selected groups, each with the given sample size are studied, then the results from each group would correspond to a separate confidence interval.  The confidence level tells us what percent of these confidence intervals will contain the true population proportion. 

     

     

     

     

    Confidence Interval for a Mean:

    • To determine whether to use a normal (Z) distribution or a Student's T distribution, decide whether the standard deviation for the population is known.  If you know the population standard deviation, use the normal (Z) distribution.  If you do not know the population standard deviation, us the Student's T distribution.  If both the population standard deviation and the sample standard deviation are known, then use the normal (Z) distribution.  Typically if the given standard deviation appears in the sentence that describes the sample, then it is the sample standard deviation.
    • In your calculator go to STAT then TESTS.  If the population standard deviation is known and you want to find a confidence interval for the population mean, use Z-Interval.  If the population standard deviation is unknown and you want to find a confidence interval for the population mean, then use T-Interval.
    • Here's how to interpret the confidence level.  If many randomly selected groups, each with the given sample size are studied, then the data from each group would correspond to a separate confidence interval.  The confidence level tells us what percent of these confidence intervals will contain the true population mean. 

     

     

     

     

    Hints for Hypothesis Testing:  

    • To decide whether the alternative hypothesis' statement is "<", "<", or "Not Equal", read the question for key words such as "less than", "lower", "worse", "greater than", "more", "better", "not the same as", "different", etc.

    • To find the test statistic (z or t) and the p-value, you will need your calculator.  Here are some guidelines for deciding whether to use a Z-Test, T-Test, or 1-PropZTest.

      • If the experiment or survey question is a "Yes or No" question, use a 1-PropZTest:  Test statistic Z.

      • If the experiment or survey is quantitative and if the population standard deviation is known, use a Z-Test:  Test statistic Z.

      • If the experiment or survey is quantitative and if the population standard deviation is unknown, use a T-Test:  Test statistic T.

    • Remember that if you are given data, then you will need to select "Data" once in the Test editor.  Otherwise you will use "Stats".

    • If the p-value is less than the level of significance, then there is sufficient evidence to conclude that the alternative hypothesis is true.  Otherwise there is insufficient evidence to make a conclusion.

     

     

     

     

    Hypothesis Test p-Value

    Recall that the p-value gives the probability that if another sample was taken with the same sample size and if the null hypothesis is true, then the sample mean or proportion will be at least as extreme as the sample mean obtained from the current sample.  Is this what is being stated in the problem?

     

     

     

    Hypothesis Test Inequality and Variable

    When performing the hypothesis test, think carefully about the choice of the inequality to decide if it is "<", ">" or not equal to.  Also if the survey question has a "yes" or "no" answer, you use the letter p and if the survey question has a numerical answer you use \( \mu \).

     

     

     

    Hints for Hypothesis Testing With 2 Samples:  

    • To decide whether the alternative hypothesis' statement is "<", "<", or "Not Equal", read the question for key words such as "less than", "lower", "worse", "greater than", "more", "better", "not the same as", "different", etc.

    • To find the test statistic (z or t) and the p-value, you will need your calculator.  Here are some guidelines for deciding whether to use a 2SampZTest, 2SampTTest, or 2PropZTest.

      • If the experiment or survey question is a "Yes or No" question, use a 2PropZTest:  Test statistic Z.

      • If the experiment or survey is quantitative, the samples are independent, and the population standard deviations are known, use a 2SampTest:  Test statistic Z.

      • If the experiment or survey is quantitative, the samples are independent, and the population standard deviation is unknown, use a 2SampTTest:  Test statistic T.

      • If the experiment or survey is quantitative, the samples are dependent, and the population standard deviations are known, use a Z-Test with the single variable, d, defined by subtracting the two dependent variables; L1-L2 STO>L3:  Test statistic Z.

      • If the experiment or survey is quantitative, the samples are dependent, and the population standard deviation is unknown, use a T-Test with the single variable, d, defined by subtracting the two dependent variables; L1-L2 STO>L3:  Test statistic T.

      • Examples of dependent samples are:  before and after studies, identical twins studies, husband wife studies.  Dependent samples have a natural pairing; each data value from the first sample is naturally paired to a data value from the second sample.

    • Remember that if you are given data, then you will need to select "Data" once in the Test editor.  Otherwise you will use "Stats".

    • If the p-value is less than the level of significance, then there is sufficient evidence to conclude that the alternative hypothesis is true.  Otherwise there is insufficient evidence to make a conclusion.

    • The p-value represents the probability that if another study was done with the same sample size then the results will be at least as extreme as the results obtained.  Here are some examples of what is meant by at least as extreme:

      • For a left tailed test ("<"), if the p-value is 0.38 and the null hypothesis was \( \mu_1 - \mu_2 = 0\) and the sample means are 23 and 27 then there would be a 38% chance that a new study would have the first sample mean at least 4 less than the second sample mean.

      • For a right tailed test (">"), if the p-value is 0.41 and the null hypothesis was  \( \mu_1 - \mu_2 = 0\)  and the sample mean are 36 and 33, then there would be a 41% chance that a new study would have the first sample mean that is at least 3 more than the second sample mean.

      • For a two tailed test ("Not Equal"), if the p-value is 0.17 and the null hypothesis was \( \mu_1 - \mu_2 = 0\) and the sample mean are 18 and 25, then there would be a 17% chance that a new study would have the first sample mean either 7 less than the second sample mean or 7 greater than the second sample mean.

      • Your calculator will give you the sample means (or sample proportions).

    • You can interpret the level of significance as follows:  If another study is done with the same sample sizes and if the null hypothesis is true, then the level of significance is the probability that this new study will  give results that falsely lead you to reject the null hypothesis.

     

     

     

     

     

    Type 1 Error

    A Type 1 error is when the researcher rejects the null hypothesis when the null hypothesis is true.  Did the researcher reject the null hypothesis?

     

     

    Rejecting Ho

    If the p-value is less than the level of significance, then the researcher should reject the null hypothesis and conclude that there is sufficient evidence to support the claim of the alternative hypothesis.  If the p-value is greater than the level of significance, then the researcher has insufficient evidence to support the claim of the alternative hypothesis.

     

     

    Interpreting the p-value

    The p-value is the probability that if the study was done again with the same sample sizes and if the null hypothesis is true, then the results obtained from the new study will be more extreme as the the results from the current study.  This probability refers to the average or proportion of all respondents in the new study, not just one pair of respondents.

     

     

     

     

    Paired vs. Independent

    A hypothesis test involves dependent samples if there is a pairing such as with before and after, husband and wife, and identical twins studies.  Each individual member of the first sample corresponds with a specific individual member of the second sample.  If the individuals from the first sample have no relation to the individuals of the second sample, then the hypothesis test involves independent samples.

     

     

     

    Tailed Tests

    The determination of whether to use a left, right or two tailed test is always done before the data is collected.  Use a left tailed test if you want to show that the first mean or proportion is less than the second, a right tailed test if you want to show that the first mean or proportion is greater than the second, and a two tailed test if you want to show that there is a difference between the two.

     

     

     

    Number of Samples

    Notice that there is only one sample taken here:  the sample of business owners.  Since the proportion of all American males is known, there is no need to find a sample mean from this population.

     

     

     

    Level of Significance

    The level of significance is the probability of a Type 1 error.  It is the probability of rejecting the null hypothesis when the null hypothesis is true.

     

     

     

     

    Rejecting the Null Hypothesis

    If the p-value is less than the level of significance, then the researcher should reject the null hypothesis and conclude that there is sufficient evidence to support the claim of the alternative hypothesis.  If the p-value is greater than the level of significance, then the researcher has insufficient evidence to support the claim of the alternative hypothesis.

     

     

     

    Never Accept Ho

    If the p-value is large, the proper course of action is to fail to reject the null hypothesis and only state that no conclusion can be made.  Never accept the null hypothesis.

     

     

     

     

    Sample Size Needed

    Look at the sample sizes and look at whether the variables are quantitative or Yes/No.

     

     

     

     

    Chi-Square Types

    There are three types of Chi-Square Tests that we cover.  Here are the guidelines for which to use.

    • If you have two different categories and you want to see if the two categories are related, associated, or dependent vs unrelated, not associated, or independent, then conduct a Chi-Square Test for Independence.

    • If you have a single sample such that the data is categorical and you want to see if either each of the possible outcomes are equally likely (uniform) or if the distribution of the outcomes is the same as some given distribution (usually given by known percentages), then conduct a Chi-Square Goodness of Fit Test.

    • If you have two different samples and you want to see if the two populations that the samples come from have the same distribution, then conduct a Chi-Square Test for Homogeneity. 

     

     

     

     

    Goodness of Fit

     This is a Chi-Square test for goodness of fit.  We want to see if all proportions are equal or if there is some different (unequal) distribution. 

    To find the expected counts, realize that if all proportions are equal, then each count is expected to be equal to the sample size divided by the number of categories.  For example if there were 200 numbers looked at to see if their last digits were uniformly distributed between 0,1,2,3,4,5,6,7,8, and 9, then the expected counts would all be the same number 200/10 = 20.

     

     

     

    Chi Square Goodness of Fit:  Uniform

    This is a Chi-Square test for goodness of fit.  We want to see if all proportions are equal or if there is some different (unequal) distribution.  The hypotheses are:

    H0:  The distribution is uniform

    Ha:  The distribution is not uniform

    To find the expected counts, realize that if all the proportions are equal, then each count is expected to be equal to the sample size divided by the number of categories.

    On your calculator put the observed into L1 and the expected into L2.  The degrees of freedom is one less than the number of categories.

    Note that if the p-value is less than the level of significance, then there is sufficient evidence to reject the null hypothesis and accept the alternative hypothesis.  If the p-Value is larger than the level of significance, then the null hypothesis cannot be rejected.  Some interpretations will say that it is suggestive that the distribution is uniform if the p-value is greater than one minus the level of significance.

     

     

     

    Homogeneity

    This is a Chi-Square test for Homogeneity.  We want to see if the distributions for two unknown populations are the same or different.  The hypotheses are:

    H0:  The two distributions are the same.

    Ha:  The two distributions are different.

    On your calculator use the matrix editor to put in the data.  The number of rows will be two.  The number of columns will be equal to the number of categories.  Then go to TESTS -> c2-Test.  The observed is the matrix that you entered, probably A.  The expected is any matrix.  The default that the calculator puts in is probably fine.

    Note that if the p-value is less than the level of significance, then there is sufficient evidence to reject the null hypothesis and accept the alternative hypothesis.  There is sufficient evidence to conclude that the two distributions are different.  If the p-Value is larger than the level of significance, then the null hypothesis cannot be rejected.  You can just state that there is insufficient evidence to make a conclusion.  Some interpretations will say that it is suggestive that the categories are the sam if the p-value is greater than one minus the level of significance, but still no conclusion can be made.

     

     

    Independence

    This is a Chi-Square test for independence.  We want to see whether the two categories are dependent.  The hypotheses are:

    H0:  The two categories are independent

    Ha:  The two categories are dependent

    On your calculator use the matrix editor to put in the data.  Then go to TESTS -> c2-Test.  The observed is the matrix that you entered, probably A.  The expected is any matrix.  The default that the calculator puts in is probably fine.

    Note that if the p-value is less than the level of significance, then there is sufficient evidence to reject the null hypothesis and accept the alternative hypothesis.  If the p-Value is larger than the level of significance, then the null hypothesis cannot be rejected.  Some interpretations will say that it is suggestive that the categories are independent if the p-value is greater than one minus the level of significance.

     

     

     

    Slope and y-Intercept

    If the equation of the regression line is

    y = a + bx

    Then a is y-intercept and b is the slope. 

    • The y-intercept is the value of y when x is 0.  In other words it gives y when there is none of x.
    • The slope is the rise over the run, or the amount that y changes when x increases by 1.

     

     

    Correlation

      The correlation, r, tells us how good the data fits a line and whether that line is negatively or positively sloped.  Here are some guidelines:

    • If all the points lie perfectly on a positively sloped line, then r = 1.

    • If all the points lie perfectly on a negatively sloped line, then r = -1.

    • If there is no linear trend whatsoever, then r is very close to 0 such as 0.1 or -0.05.

    • If the points have a clear linear trend such that as the x values increase the y values also increase then r is close to 1 such as 0.84 or 0.95.

    • If the points have a clear linear trend such that as the x values increase the y values decrease then r is close to -1 such as -0.89 or -0.93.

    • If the points have a positive linear trend, but the trend is more difficult to recognize, then r is between 0 and 1 such as 0.43 or 0.59.

    • If the points have a negative linear trend, but the trend is more difficult to recognize, then r is between -1 and 0 such as -0.38 or 0.52.

     

     

    Regression Analysis Calculation:  

    • To use your calculator for regression analysis enter your data into L1 and L2, then go to TESTS -> LinRegTTest.

    • To use the regression line to make a prediction for y given x, just plug in the given x value into the linear regression line equation.  You may need your calculator to perform the arithmetic.

    • To test the hypothesis that there is a correlation between the two variables use
      H0:  r = 0
      Ha:  r ≠ 0
      Then use your calculator to find the p-value (see the first hint) and compare the p-value with the level of significance just like any hypothesis test.

     

     

     

    Correlation Facts

    Here are some facts about the correlation.

    • The correlation will be 1 or -1 if and only if all the points lie on a line.

    • If the correlation is 0, then there is no linear relationship between x and y.  There could be a relationship that is not linear.

    • The correlation and the slope of the regression line always have the same slope.

    • If the correlation is positive then we can say that as values of x increase, values of y tend to also increase.  We can also say that we can predict that a value of y will be higher if the value of x is higher.

    • If the correlation is negative then we can say that as values of x increase, values of y tend to decrease.  We can also say that we can predict that a value of y will be lower if the value of x is higher and that on average as x increases y increases.

    • Correlation does not imply causation.  It is incorrect to say that if x increases then y will also increase.  We can only talk about tendency and predictions when using correlation.

     

     

     

    Hypothesis Test and Correlation:  

    The hypothesis test for correlation investigates whether there is a correlation between the two variables.  The null hypothesis is always H0: r = 0.  The alternative hypothesis can be a left tailed test Ha:  r < 0 (there is a negative correlation), a right tailed test Ha:  r > 0 (there is a positive correlation), or a two tailed test Ha: r = 0 (there is a correlation).  Note that a low p-value tell us that there is a correlation.  It does not tell us whether that correlation is strong or weak.

     

     

     

    R-Squared Interpretation:

      Interpret R-Squared as follows:  There will be variation from one y-value to the next.  For a fixed x-value, there will still be variation amongst the y-values that have that x-value, but that variation may be less.  R-Squared gives the proportion of the variation in all of the y-values that can be explained by the variation in the x-values.  The rest of the variation in the y-values cannot be explained by the variation in the x-values.

     

     

     

     

    Which Test to Use

    t-test for two dependent means is used when there are two survey questions asked of the same person or such that they are paired and you want to decide whether the means differ.

    t-test for two independent means is used when there are two populations that you have samples from that are independent of each other and you want to decide whether the means differ.

    Goodness of Fit test is used to decide whether the distribution of a qualitative variables is the same as a known distribution.

    Chi Squared Test for Independence is used to decide if two qualitative variables are dependent.

    Chi Squared Test for Homogeneity is used to decide if two distributions of qualitative variables are different.

    F-Test (ANOVA) is used when there are more than two populations that you have samples from that are independent of each other and you want to decide whether the means are not all the same.

    z-test for two proportions is used when there are two populations that are given Yes or No questions and you want to decide whether the proportion of yes answers is the for one population differs from the other.

     

     

    ANOVA Hypotheses and Requirements

    Hint:  The 1-Way ANOVA (ANalysis Of VAriance) test is a hypothesis test that is used to test whether all means from several populations are equal to each other.  The null and alternative hypotheses are as follows:

    H0:  All the means are equal
    Ha:  At least two of the means that are not equal to each other.

    In order to use a 1-Way ANOVA, the following conditions must be met:

    • The standard deviations of each of the populations must be equal or at least close to equal.
    • The populations must all be approximately normally distributed.
    • The samples must be randomly and independently selected.

    The 2-Way ANOVA is used to simultaneously test whether the means are all the same for one factor such as race and another factor such as income bracket.

     

     

    ANOVA Hypotheses and Calculations  

    The ANOVA (ANalysis Of VAriance) test is a hypothesis test that is used to test whether all means from several populations are equal to each other.  The null and alternative hypotheses are as follows:

    H0:  All the means are equal
    Ha:  At least two of the means that are not equal to each other.

    The calculator can calculate the test statistic, F, and the p-value if you put the data into L1, L2, L3, etc. and then use the TESTS -> ANOVA(.  Put the lists inside the parentheses.  For example if there are four samples then use ANOVA(L1,L2,L3,L4).  Interpret the p-value as with any hypothesis test.  If the p-value is less than the level of significance, then there is sufficient evidence to conclude that the means are not all the same.  If the p-value is greater than the level of significance then there is insufficient evidence to conclude that the means are not all the same.

     

     

     


    Assignment Hints is shared under a not declared license and was authored, remixed, and/or curated by LibreTexts.

    • Was this article helpful?