Skip to main content
Statistics LibreTexts

Assignment Hints

  • Page ID
    64407
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Type

    • Qualitative Data occurs when the outcomes or responses are not number, but are words.  Examples are:  favorite baseball team, eye color, etc.
    • Quantitative Data occurs when the outcomes or responses are numbers.
      • For quantitative data, it is discrete if there are a finite number of outcomes or a countable number of outcomes.  Examples are:  number of children, roll of the dice, etc.
      • For quantitative data, it is continuous if it is not discrete.  The responses can include any number in an interval including fractions and decimals.  Examples are:  exact time to run a mile, exact weight of a carrot, etc.

     

     

     

    Descriptive Inferential

     Recall that descriptive statistics involve displaying the sample data using a chart such as a pie chart, histogram, etc. and looking at the statistics such as the mean, median, mode, etc.  Inferential statistics involves making conclusions about the population based on the sample such as "Based on the sample, the average age of all Americans has increased over the past 10 years" or "After looking at the sample data, we can be confident that the mean movie length for all movies is between 118 and 134 minutes."

     

     

     

    Key Terms

    • The population is the collection of all possible respondents. 
    • The sample is the collection of only those respondents who gave a response. 
    • A parameter is a number that describes the population. 
    • A statistic is a number that describes the data. 
    • The variable is the number that is a response to the survey or experiment. 
    • Data refers to all the responses of the survey or the experiment.

     

     

     

     

     

    Sampling Types:

    • Convenience Sampling:  The is sampling the respondents who are the easiest to collect data from.  It is unscientific and prone to produce biased results.
    • Stratified Sampling:  This is a scientific sampling technique where the researcher first identifies relevant strata such as race, income, or location.  Then the researcher find the proportion of each type and makes sure that the sample has the same proportion as the population has.
    • Cluster Sampling:  This is a scientific sampling technique where the researcher first identifies a natural partition of the population.  From this partition, the researcher selects several clusters and proceeds to collect data from every individual in each of the selected clusters.
    • Systematic Sampling:  This is a scientific sampling technique where the researcher selects every nth item to sample, such as every tenth item that is produced.
    • Simple Random Sampling:  This is the scientific sampling technique where the researcher first itemizes every member of the population and randomly (usually using a computer) selects the sample from this list.

     

     

     

    Bias

    A sampling technique is not scientific if the design is poor.  That means that the survey question was biased, the choice of respondents was biased, or there just weren't enough respondents.  If the sample data differs significantly from the population data, but the sample size is large and the survey was conducted without bias, then the sampling technique is scientific.  Bad luck can still occur.

     

     

    Replacement

    Think about randomly selecting two Americans.  If the first respondent is a Democrat, does the probability that the second is a Democrat decline significantly since one Democrat American is taken out of the list of possibilities?

     

     

     

     

    Frequency

    • If you add up all the frequencies, you should get the total sample size.
    • To find the relative frequency, divide the frequency by the sample size.
    • To find the cumulative frequency of a value, add up the frequencies at and below that value.
    • If you know the relative frequency of a value, then just multiple by 100% to get the percent that have that value.

     

     

     

    Frequency of Range Values

    To find the frequency of a range of values, just count how many values are in that range.

     

     

    Relative Frequency

    To find a relative frequency, divide the frequency of the values in the given range by the sample size.

     

     

     

     

    Class Boundary

    The lower class boundary is the lowest value in the adjusted interval.  If the boundaries do not meet because they are integers or other rounded numbers, then you adjust the left boundary by subtracting half the distance between the boundaries and adjust the right boundary by adding half the distance between the boundaries.

    For example, if the original boundaries are 5-9, 10-14, 15-19, and 20-24, then the distance between the boundaries is 10 - 9 = 1.  Half of this is 0.5.  Thus the new adjusted boundaries are:  4.5-9.5, 9.5-14.5, 14.5-19.5, 19.5-24.5.  The lower class boundary will then be 4.5.

     

     

     

    Cumulative Relative Frequency

      The Cumulative Relative Frequency is defined by the frequency at or below that value divided by the sample size.  First add up the number of students who are taking 2 or fewer courses then divide by the sample size 50.

     

     

     

    Box Plot

    The general box plot is shown below.

    Box Plot labeled Min, Q1,Median, Q3, and Max

    • The Min (minimum) corresponds to the lowest data value.
    • Q1 (the first quartile) corresponds to the 25th percentile, or the value at which 25% of the data lies at or below this value.
    • The Median corresponds to the 50th percentile or the middle value, or the value at which 50% of the data lies at or below this value.
    • Q3 (the third quartile) corresponds to the 75th percentile, or the value at which 75% of the data lies at or below this value.
    • The Max (maximum) corresponds to the highest data value.
    • IQR (Inter-Quartile-Range) is the range for the middle 50% of the data. 

     

     

    Coefficient of Variation

    Hint:  You should be able to use a calculator or computer to find the mean and standard deviation of the data.  The definition of the coefficient of variation is the standard deviation divided by the mean:

         \( CV = \frac{\sigma}{\mu} \) 

    To change it to a percent, multiply by 100%.

     

     

     

    Frequency Table

    View the video to see how to enter the data into the calculator and to find the statistics.

    • To find the percent of respondents who gave an answer at least, less then, more than, or at most a number, use the table to count the number of these respondents.  Then divide by the sample size and multiple by 100%.  Please do not include the symbol "%" in your answer.
    • If you are given a percent of respondents who answered below (above, at least, at most) a number, multiply this percent (as a decimal) by the sample size.  This will give the ranking of that response.  Then find the value that has the ranking. 

    For example use the table below to find the value such that 30% of the respondents are at most that value do the following:

    Value Frequency
    2 15
    3 30
    4 25
    5 10

    The sample size is 80.  We find the ranking by calculating: 

    0.30 x 80  =  24

    Now to find the 24th number, notice that the first through 15th numbers have the value of 2.  The 16th through 35th numbers have the value of 3.  The 36th through 70th numbers have the value of 4 and the 71st through 80th numbers have the value of 4.  Therefore the 24th number has the value of 3.  We can conclude that 30% of the respondents gave a value of at most 3.  Note that has the question asked "less than that value" we would have an answer of 4.

     

     

     

     

    Sample Space:

    • The sample space includes all possible outcomes.  How many cards are there?
    • The probability of an event A, P(A), is defined by the number of outcomes in A divided by the sample size.  So, count the the number cards that are in the event that you are concerned with and divide by the total number of cards to be drawn from.

     

     

     

    Independent Events

    If events A and B are independent then P(A)P(B) = P(A and B)

     

     

     

     

     

    Mutually Exclusive

    • To determine the sample space, just count the cards and count how many types of landings of the coin there are (heads and tails).  Then you can multiply to find the size of the sample space.
    • Two events are mutually exclusive if they cannot occur simultaneously.  In other words events U and V are mutually exclusive if P(U and V) = 0.

     

     

     

     

     

    Expected Value

    Find the expected value of each.  Then just look at which is the largest, middle, and smallest.  To find the expected values, multiply each outcome by its corresponding probability.  Finally, add the three products together.

     

     

    Profit Expected Value 

    To write the probability distribution table, put the possible outcomes in the column labeled "x" and the corresponding probabilities in the column labeled P(x).  Notice that the x values represent profit which is revenue minus cost.  For example, if the revenue for a win is $a and the cost is $b, then the profit is a - b.

    • To find the expected value, first multiple each of the "x" values by their corresponding probabilities.  Then add up all these products.

    • The expected value is the number such that if many many trials are done, then the average outcome per trial is likely to be very close to the expected value.  Thus this is the average profit per game played.

     

     

     

     

    Binomial:

    • To find the probability that a binomial variable is exactly equal to a number x, use:  binompdf(n,p,x) where n is the sample size, p is the probability of success.
    • To find the probability that a binomial variable is less than or equal to a number x or at most x, use: binomcdf(n,p,x).
    • To find the probability that a binomial variable is less than a number x, use: binomcdf(n,p,x-1).
    • To find the probability that a binomial variable is greater than to a number x, use the rule of complements:  1 - binomcdf(n,p,x).
    • To find the probability that a binomial variable is greater than or equal to a number x or at least x, use the rule of complements: 1 - binomcdf(n,p,x - 1).

     

     

    Only 2 Outcomes

      For a Binomial Distribution, each trial must have only two possible outcomes (think heart or not heart) and each trial's probability of success must be the same as and independent of every other trial.

     

     

     

    Average

    The expected value tells that in the long run for many trials (not just one trial) it is very likely that the average amount for all trials will be close to this number.

     

     

    Loss is Bad

    Do you think the business owner would be wise to bid when on average there will be a loss (negative gain) of about $3000 on the project?

     

     

     

     

    Uniform Distribution Properties 

    • The mean for a uniform distribution is the average of the left and the right endpoints.

    • The standard deviation for a uniform distribution is the square root of (b - a)2 / 12 where a and b are the left and right endpoints respectively.

    • For a uniform distribution, the probability that an outcome will be exactly a given number is always 0.

    • For a uniform distribution, the probability that an outcome will be between two numbers x and y is (y - x) / (b - a) where a and b are the left and right endpoints respectively.

    • In general for a uniform distribution, we can find a probability by taking the length of the described line segment and divide by b - a.

    • To find a percentile, p (or a quartile:  25th or 75th percentile) you want to go backwards with the uniform distribution calculations.  Here you know the probability and want to find y, so you set
      p = (y - a) / (b - a) and solve for y.

    • If you have a uniform distribution and want to find a conditional probability P(A|B), then use the given to get the new endpoints.  For example if the distribution is uniform between 5 and 20 and you want to find the probability of an event being between 10 and 17 given that the outcome is less than or equal to 15, you need to find the probability that an event is between 10 and 15 for a uniform distribution with endpoints 5 and 15.

     

     

     

    Discrete is a List Continuous is an Interval

      A discrete random variable has a finite number of outcomes or a countable number of outcomes.  A continuous random variable has an entire interval of outcomes including decimals and fractions.

     

     

     

    Normal

    • We write X ~ N(m,s) to mean that the distribution is Normal (N) with mean m and standard deviation s.  The mean and the standard deviation are given in the problem.

    • For a normal distribution, the mean and the median are the same.

    • To find the z-score, use the formula:  z = (x - m)/s.

    • To find the probability that an event is between two numbers a and b, use your calculator with N(a,b,m,s).

    • To find the probability that an event is less than a number a, use your calculator with N(-99999,a,m,s).  It is recommended that you use at least enough 9's so that the lower bound is at least 10 times larger in magnitude than the maximum magnitude of a, m, and s.

    • To find the probability that an event is greater than a number b, use your calculator with N(b,99999,m,s).  It is recommended that you use at least enough 9's so that the upper bound is at least 10 times larger in magnitude than the maximum of b, m, and s.

    • For normal distribution probabilities, < is the same as < and > is the same as >.

    • If you want to find the value such that the proportion of the data that is below that value is p, then use the inverse normal:  invNorm(p,m,s).

    • To find the pth percentile, first convert p to a decimal and then use the inverse normal.  For example, to find the 17th percentile, use invNorm(0.17,m,s).  Note that the first quartile is the 25th percentile and the third quartile is the 75th percentile.

    • To find out the value such that the proportion above that value is p, first subtract from 1 and then use the hint above.   invNorm(1-p,m,s)

     

     

     

     

     

    Central Limit Theorem 1

     Consider the diagram below and notice that there is a left tail and a right tail that must make a combined 0.05 area.

    • Use the fact that the sampling distribution (X distribution) has mean m and standard deviation s divided by the square root of n.

    • To find a probability that involves the mean use normalcdf(\(a,b,\mu,\frac{\sigma}{\sqrt{n}})\) .

      • a is the lower bound.  Use -99999 if there is no lower bound (negative infinity).

      • b is the upper bound.  Use 99999 if there is no upper bound (infinity).

      • m is the population mean

      • s is the population standard deviation

      • n is the sample size.

    • To find a percentile or quartile:  Q1 = 25th percentile and Q3 = 75th percentile, use invNorm(\(x,\mu,\frac{\sigma}{\sqrt{n}})\) .  Other wordings are find the value such that 30% (x = 0.3) of the data lies below the value.  If you want to find the value such that a given percentage lies above that value, use the rule of complements.  For example if 10% lies above, then 90% lies below.

    • Use the fact that the distribution of a sum of values has mean nm and standard deviation s times the square root of n.

    • To find a probability that involves the sum use normalcdf(\(a,b,n* \mu,\sigma * \sqrt{n})\)  

    • To find the Inter-Quartile-Range (IQR), find the first quartile (25th  percentile) and the third quartile (75th percentile) and subtract.

     

     

     

     

    CLT, Sample and Population  

    Recall that the Central Limit Theorem tell us that the sampling distribution will be approximately normal when the sample size it large.  Notice that the Central Limit Theorem tells us nothing about the distribution of the sample.  It is important to understand the difference between the sampling distribution and the distribution of the sample.

     

     

     

    Central Limit Theorem Symbols

    Below are the some of the symbols that represent parameters and statistics that are used in elementary statistics.

    • \( x \)    The random variable that represents the quantitative outcome.  For example, if a survey is conducted asking 100 people how much they weigh, then \( x \) is a randomly selected respondents weight
    • \(m\)     The population mean.  For example, for the survey that asks 100 people's weight, \(m\) represents the average weight of all people in the world, not just from the survey respondents.
    •  \(\bar{x}\)    The sample mean.  For example, for the survey that asks 100 people's weight, \(\bar{x}\)  represents the average weight of the 100 respondents.
    • \(\mu_{\bar{x}}\)     The population mean of the sampling distribution.  For example, consider every possible group of 100 people.  Each one of these groups will have its own sample mean \(\bar{x}\) , the mean of all of these many many \(\bar{x}\) 's will be the population mean of the sampling distribution \(\mu_{\bar{x}}\).
    • \( \sigma \)    The population standard deviation.  For example, for the survey that asks 100 people's weight, \( \sigma \) represents the standard deviation of all people in the world, not just from the survey respondents.
    • \( s \)    The sample standard deviation.  For example, for the survey that asks 100 people's weight, s represents the standard deviation of the 100 respondents.
    • \(\sigma_{\bar{x}}\)     The population standard deviation of the sampling distribution.  For example, consider every possible group of 100 people.  Each one of these groups will have its own sample standard deviation \( s \), the standard deviation of all of these many many \(\bar{x}\)'s will be the population standard deviation of the sampling distribution \(\sigma_{\bar{x}}\) .
    •  \( p \) The population proportion.  For example, if a survey is conducted of 100 randomly selected Americans asking them if they were born in America, then \( p \) is the proportion of all Americans who were born in America not just the 100 Americans who were surveyed.
    • \( \hat{p} \)   The sample proportion.  For example, if a survey is conducted of 100 randomly selected Americans asking them if they were born in America, then \( \hat{p} \) is the proportion of the 100 Americans who were surveyed that were born in American.
    • \( \mu_{\hat{p}} \)  The population mean of the sampling distribution for proportions.  For example, consider every possible group of 100 people.  Each one of these groups will have its own sample proportion \( \hat{p} \) , the mean of all of these many many \( \hat{p} \) 's will be the population mean of the sampling distribution for proportions \( \mu_{\hat{p}} \) .
    • \( \sigma_{\hat{p}} \)    The population standard deviation of the sampling distribution for proportions.  For example, consider every possible group of 100 people.  Each one of these groups will have its own sample proportion \( \hat{p} \) , the standard deviation of all of these many many \( \hat{p} \) 's will be the population standard deviation of the sampling distribution for proportions \( \sigma_{\hat{p}} \).

     

     

     

     

     

     


    Assignment Hints is shared under a not declared license and was authored, remixed, and/or curated by LibreTexts.

    • Was this article helpful?