15.2: Some Random Selection Problems
- Page ID
- 10845
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)In the unit on Random Selection, we develop some general theoretical results and computational procedures using MATLAB. In this unit, we extend the treatment to a variety of problems. We establish some useful theoretical results and in some cases use MATLAB procedures, including those in the unit on random selection.
The Poisson Decomposition
In many problems, the individual demands may be categorized in one of m types. If the random variable \(T_i\) is the type of the \(i\)th arrival and the class \(\{T_i: 1 \le i\}\) is iid, we have multinomial trials. For \(m = 2\) we have the Bernoulli or binomial case, in which one type is called a success and the other a failure.
Multinomial trials
We analyze such a sequence of trials as follows. Suppose there are m types, which we number 1 through \(m\). Let \(E_{ki}\) be the event that type \(k\) occurs on the \(i\)th component trial. For each \(i\), the class \(\{E_{ki}: 1 \le k \le m\}\) is a partition, since on each component trial exactly one of the types will occur. The type on the \(i\)th trial may be represented by the type random variable
\(T_i = \sum_{k = 1}^{m} kI_{E_{ki}}\)
we assume
\(\{T_k: 1 \le i\}\) is iid, with \(P(T_i = k) = P(E_{ki}) = p_k\) invariant with \(i\)
In a sequence of \(n\) trials, we let \(N_{kn}\) be the number of occurrences of type \(k\). Then
\(N_{kn} = \sum_{i = 1}^{n} I_{E_{ki}}\) with \(\sum_{k = 1}^{m} N_{kn} = n\)
Now each \(N_{kn}\) ~ binomial (\(n, p_k\)). The class \(\{N_{kn}: 1 \le k \le m\}\) cannot be independent, since it sums to \(n\). If the values of \(m - 1\) of them are known, the value of the other is determined. If \(n_1 + n_2 + \cdot\cdot\cdot n_m = n\). the event
\(\{N_{1n} = n_1, N_{2n} = n_2, \cdot\cdot\cdot, N_{mn} = n_m\}\)
is one of the
\(C(n; n_1, n_2, \cdot\cdot\cdot, n_m) = n!/(n1! n2! \cdot\cdot\cdot n_m!)\)
ways of arranging \(n_1\) of the \(E_{1i}\), \(n_2\) of the \(E_{2i}\), \(\cdot\cdot\cdot\), \(n_m\) of the \(E_{mi}\). Each such arrangement has probability \(p_{1}^{n_1} p_{2}^{n_2} \cdot\cdot\cdot p_{m}^{n_m}\), so that
\(P(N_{1n} = n_1, N_{2n} = n_2, \cdot\cdot\cdot N_{mn} = n_m) = n! \prod_{k = 1}^{m} \dfrac{p_{k}^{n_k}}{n_k !}\)
This set of joint probabilities constitutes the multinomial distribution. For \(m = 2\), and type 1 a success, this is the binomial distribution with parameter \((n, p_1)\).
A random number of multinomial trials
We consider, in particular, the case of a random number \(N\) of multinomial trials, where \(N\) ~ Poisson \((\mu)\). Let \(N_k\) be the number of results of type \(k\) in a random number \(N\) of multinomial trials.
\(N_k = \sum_{i = 1}^{N} I_{E_{ki}} = \sum_{n = 1}^{\infty} I_{\{N = n\}} N_{kn}\) with \(\sum_{k = 1}^{m} N_k = N\)
Poisson decomposition
Suppose
\(N\) ~ Poisson (\(\mu\))
\(\{T_i: 1 \le i\}\) is iid with \(P(T_i = k) = p_k\), \(1 \le k \le m\)
\(\{N, T_i : 1 \le i\}\) is independent
Then
Each \(N_k\) ~ Poisson (\(\mu p_k\))
\(\{N_k: 1 \le k \le m\}\) is independent.
— □
The usefulness of this remarkable result is enhanced by the fact that the sum of independent Poisson random variables is also Poisson, with \(\mu\) for the sum the sum of the \(\mu_i\) for the variables added. This is readily established with the aid of the generating function. Before verifying the propositions above, we consider some examples.
Example \(\PageIndex{1}\) A shipping problem
The number \(N\) of orders per day received by a mail order house is Poisson (300). Orders are shipped by next day express, by second day priority, or by regular parcel mail. Suppose 4/10 of the customers want next day express, 5/10 want second day priority, and 1/10 require regular mail. Make the usual assumptions on compound demand. What is the probability that fewer than 150 want next day express? What is the probability that fewer than 300 want one or the other of the two faster deliveries?
Solution
Model as a random number of multinomial trials, with three outcome types: Type 1 is next day express, Type 2 is second day priority, and Type 3 is regular mail, with respective probabilities \(p_1 = 0.4\), \(p_2 = 0.5\), and \(p_3 = 0.1\). The \(N_1\) ~ Poisson \((0.4 \cdot 300 = 120)\), \(N_2\) ~ Poisson \((0.5 \cdot 300 = 150)\), and \(N_3\) ~ Poisson \((0.1 \cdot 300 = 30)\). Also \(N_1 + N_2\) ~ Poisson (120 + 150 = 270).
P1 = 1 - cpoisson(120,150) P1 = 0.9954 P12 = 1 - cpoisson(270,300) P12 = 0.9620
Example \(\PageIndex{2}\) Message routing
A junction point in a network has two incoming lines and two outgoing lines. The number of incoming messages \(N_1\) on line one in one hour is Poisson (50); on line 2 the number is \(N_2\) ~ Poisson (45). On incoming line 1 the messages have probability \(P_{1a} = 0.33\) of leaving on outgoing line a and \(1 - p_{1a}\) of leaving on line b. The messages coming in on line 2 have probability \(P_{2a} = 0.47\) of leaving on line a. Under the usual independence assumptions, what is the distribution of outgoing messages on line a? What are the probabilities of at least 30, 35, 40 outgoing messages on line a?
Solution
By the Poisson decomposition, \(N_a\) ~ Poisson \((50 \cdot 0.33 + 45 \cdot 0.47 = 37.65)\).
ma = 50*0.33 + 45*0.47 ma = 37.6500 Pa = cpoisson(ma,30:5:40) Pa = 0.9119 0.6890 0.3722
VERIFICATION of the Poisson decomposition
\(N_k = \sum_{i = 1}^{N} I_{E{ki}}\).
This is composite demand with \(Y_k = I_{E_{ki}}\), so that \(g_{Y_k} (s) = q_k + sp_k = 1 + p_k (s - 1)\). Therefore,
\(g_{N_k} (s) = g_N [g_{Y_k} (s)] = e^{} = e^{}\)
which is the generating function for \(N_k\) ~ Poisson \((\mu p_k)\).
For any \(n_1\), \(n_2\), \(\cdot\cdot\cdot\), \(n_m\), let \(n = n_1 + n_2 + \cdot\cdot\cdot + n_m\), and consider
\(A = \{N_1 = n_1, N_2 = n_2, \cdot\cdot\cdot, N_m = n_m\} = \{N = n\} \cap \{N_{1n} = N_1, N_{2n} = n_2, \cdot\cdot\cdot, N_{mn} = n_m\}\)
Since \(N\) is independent of the class of \(I_{E_{ki}}\), the class
\(\{\{N = n\}, \{N_{1n} = n_1, N_{2n} = n_2, \cdot\cdot\cdot, N_{mn} = n_m\}\}\)
is independent. By the product rule and the multinomial distribution
\(P(A) = e^{-\mu} \dfrac{\mu^n}{n!} \cdot n! \prod_{k = 1}^{m} \dfrac{p_{k}^{n_k}}{(n_k)!} = \prod_{k = 1}^{m} e^{-\mu p_k} \dfrac{p_{k}^{n_k}}{n_k !} = \prod_{k = 1}^{m} P(N_k = n_k)\)
The second product uses the fact that
\(e^{\mu} = e^{\mu (p_1 + p_2 + \cdot\cdot\cdot + p_m)} = \prod_{k = 1}^{m} e^{\mu p_k}\)
Thus, the product rule holds for the class
Extreme values
Consider an iid class \(\{Y_i: 1 \le i\}\) of nonnegative random variables. For any positive integer \(n\) we let
\(V_n = \text{min } \{Y_1, Y_2, \cdot\cdot\cdot, Y_n\}\) and \(W_n = \text{max } \{Y_1, Y_2, \cdot\cdot\cdot, Y_n\}\)
Then
\(P(V_n > t) = P^n (Y > t)\) and \(P(W_n \le t) = P^n (Y \le t)\)
Now consider a random number \(N\) of the \(Y_i\). The minimum and maximum random variables are
\(V_N = \sum_{n = 0}^{\infty} I_{\{N = n\}} V_n\) and \(W_N = \sum_{n = 0}^{\infty} I_{\{N = n\}} W_n\)
— □
Computational formulas
If we set \(V_0 = W_0 = 0\), then
\(F_V (t) = P(V \le t) = 1 + P(N = 0) - g_N [P(Y > t)]\)
\(F_W (t) = g_N [P(Y \le t)]\)
These results are easily established as follows. \(\{V_N > t\} = \bigvee_{n = 0}^{\infty} \{N = n\} \ \{V_n > t\}\). By additivity and independence of \(\{N, V_n\}\) for each \(n\)
\(P(V_N > t) = \sum_{n = 0}^{\infty} P(N = n) P(V_n > t) = \sum_{n = 1}^{\infty} P(N = n) P^n (Y > t)\), since \(P(V_0 > t) = 0\)
If we add into the last sum the term \(P(N = 0) P^0 (Y > t) = P(N = 0)\) then subtract it, we have
\(P(V_N > t) = \sum_{n = 0}^{\infty} P(N = n) P^n (Y > t) - P(N = 0) = g_N [P(Y > t)] - P(N = 0)\)
A similar argument holds for proposition (b). In this case, we do not have the extra term for \(\{N = 0\}\), since \(P(W_0 \le t) = 1\).
Special case. In some cases, \(N = 0\) does not correspond to an admissible outcome (see Example 14.2.4, below, on lowest bidder and Example 14.2.6). In that case
\(F_V (t) = \sum_{n = 1}^{\infty} P(V_n \le t) P(N = n) = \sum_{n = 1}^{\infty} [1 - P^n (Y > t)] P(N = n) = \sum_{n = 1}^{\infty} P(N = n) - \sum_{n = 1}^{\infty} P^n (Y > t) P(N = n)\)
Add \(P(N = 0) = p^0\ (Y > t) P(N = 0)\) to each of the sums to get
\(F_V (t) = 1 - \sum_{n = 0}^{\infty} P^n (Y > t) P (N = n) = 1 - g_N [P(Y > t)]\)
— □
Example \(\PageIndex{3}\) Maximum service time
The number \(N\) of jobs coming into a service center in a week is a random quantity having a Poisson (20) distribution. Suppose the service times (in hours) for individual units are iid, with common distribution exponential (1/3). What is the probability the maximum service time for the units is no greater than 6, 9, 12, 15, 18 hours?
Solution
\(P(W_N \le t) = g_N [P(Y \le t)] = e^{20[F_Y (t) - 1]} = \text{exp} (-20e^{-t/3})\)
t = 6:3:18; PW = exp(-20*exp(-t/3)); disp([t;PW]') 6.0000 0.0668 9.0000 0.3694 12.0000 0.6933 15.0000 0.8739 18.0000 0.9516
Example \(\PageIndex{4}\) Lowest Bidder
A manufacturer seeks bids on a modification of one of his processing units. Twenty contractors are invited to bid. They bid with probability 0.3, so that the number of bids \(N\) ~ binomial (20,0.3). Assume the bids Yi (in thousands of dollars) form an iid class. The market is such that the bids have a common distribution symmetric triangular on (150,250). What is the probability of at least one bid no greater than 170, 180, 190, 200, 210? Note that no bid is not a low bid of zero, hence we must use the special case.
Solution
\(P(V \le t) = 1 - g_N [P(Y > t)] = 1 - (0.7 + 0.3p)^{20}\) where \(p = P(Y > t)\)
Solving graphically for \(p = P (V > t)\), we get
\(p =\) [23/25 41/50 17/25 1/2 8/25] for \(t =\) [170 180 190 200 210]
Now \(g_N (s) = (0.7 + 0.3s)^{20}\). We use MATLAB to obtain
t = [170 180 190 200 210]; p = [23/25 41/50 17/25 1/2 8/25]; PV = 1 - (0.7 + 0.3*p).^20; disp([t;p;PV]') 170.0000 0.9200 0.3848 180.0000 0.8200 0.6705 190.0000 0.6800 0.8671 200.0000 0.5000 0.9612 210.0000 0.3200 0.9896
Example \(\PageIndex{5}\) Example 15.2.4 with a general counting variable
Suppose the number of bids is 1, 2 or 3 with probabilities 0.3, 0.5, 0.2, respectively.
Determine \(P(V \le t)\) in each case.
Solution
The minimum of the selected \(Y\)'s is no greater than \(t\) if and only if there is at least one \(Y\) less than or equal to \(t\). We determine in each case probabilities for the number of bids satisfying \(Y \le t\). For each \(t\), we are interested in the probability of one or more occurrences of the event \(Y \le t\). This is essentially the problem in Example 7 from "Random Selection", with probability \(p = P(Y \le t)\).
t = [170 180 190 200 210]; p = [23/25 41/50 17/25 1/2 8/25]; % Probabilities Y <= t are 1 - p gN = [0 0.3 0.5 0.2]; % Zero for missing value PV = zeros(1,length(t)); for i=1:length(t) gY = [p(i),1 - p(i)]; [d,pd] = gendf(gN,gY); PV(i) = (d>0)*pd'; % Selects positions for d > 0 and end % adds corresponding probabilities disp([t;PV]') 170.0000 0.1451 180.0000 0.3075 190.0000 0.5019 200.0000 0.7000 210.0000 0.8462
Example 15.2.4 may be worked in this manner by using gN = ibinom(20,0.3,0:20)
. The results, of course, are the same as in the previous solution. The fact that the probabilities in this example are lower for each t than in Example 15.2.4 reflects the fact that there are probably fewer bids in each case.
Example \(\PageIndex{6}\) Batch testing
Electrical units from a production line are first inspected for operability. However, experience indicates that a fraction \(p\) of those passing the initial operability test are defective. All operable units are subsequenly tested in a batch under continuous operation ( a “burn in” test). Statistical data indicate the defective units have times to failure \(Y_i\) iid, exponential (\(\lambda\), whereas good units have very long life (infinite from the point of view of the test). A batch of \(n\) units is tested. Let \(V\) be the time of the first failure and \(N\) be the number of defective units in the batch. If the test goes \(t\) units of time with no failure (i.e., \(V > t\)), what is the probability of no defective units?
Solution
Since no defective units implies no failures in any reasonable test time, we have
\(\{N = 0\} \subset \{V > t \}\) so that \(P(N = 0|V > t) = \dfrac{P(N = 0)}{P(V > t)}\)
Since \(N = 0\) does not yield a minimum value, we have \(P(V > t) = g_N [P(Y > t)]\). Now under the condition above, the number of defective units \(N\) ~ binomial (\(n, p\)), so that \(g_N (s) = (q + ps)^n\). If \(N\) is large and \(p\) is reasonably small, \(N\) is approximately Poisson \((np)\) with \(g_N (s) = e^{np (s - 1)}\) and \(P(N = 0) = e^{-np}\). Now \(P(Y > t) = e^{-\lambda t}\); for large \(n\)
\(P(N = 0|V > t) = \dfrac{e^{-np}}{e^{np[P(Y > t) - 1]}} = e^{-np P(Y >t)} = e^{-npe^{-lambda t}}\)
For \(n = 5000\), \(p = 0.001\), \(\lambda = 2\), and \(t = 1, 2, 3, 4, 5\), MATLAB calculations give
t = 1:5; n = 5000; p = 0.001; lambda = 2; P = exp(-n*p*exp(-lambda*t)); disp([t;P]') 1.0000 0.5083 2.0000 0.9125 3.0000 0.9877 4.0000 0.9983 5.0000 0.9998
It appears that a test of three to five hours should give reliable results. In actually designing the test, one should probably make calculations with a number of different assumptions on the fraction of defective units and the life duration of defective units. These calculations are relatively easy to make with MATLAB.
Bernoulli trials with random execution times or costs
Consider a Bernoulli sequence with probability \(p\) of success on any component trial. Let \(N\) be the number of the trial on which the first success occurs. Let \(Y_i\) be the time (or cost) to execute the \(i\)th trial. Then the total time (or cost) from the beginning to the completion of the first success is
\(T = \sum_{i = 1}^{N} Y_i\) (composite "demand" with \(N - 1\) ~ geometric \(p\))
We suppose the \(Y_i\) form an iid class, independent of \(N\). Now \(N - 1\) ~ geometric (\(p\)) implies \(g_N (s) = ps/(1 - qs)\), so that
\(M_T (s) = g_N [M_Y (s)] = \dfrac{pM_Y (s)}{1 - qM_Y (s)}\)
There are two useful special cases:
\(Y_i\) ~ exponential \((\lambda)\), so that \(M_Y (s) = \dfrac{}{}\).
\(M_T (s) = \dfrac{}{} = \dfrac{}{}\)
which implies \(T\) ~ exponential (\(p \lambda\)).
\(Y_i - 1\) ~ geometric \((p_0)\), so that \(g_Y (s) = \dfrac{\lambda}{\lambda - s}\)
\(g_T (s) = \dfrac{p \lambda/ (\lambda -s)}{1 - q\lambda/(\lambda -s)} = \dfrac{p \lambda}{p\lambda - s}\)
so that \(T - 1\) ~ geometric \((pp_0)\).
Example \(\PageIndex{7}\) Job interviews
Suppose a prospective employer is interviewing candidates for a job from a pool in which twenty percent are qualified. Interview times (in hours) \(Y_i\) are presumed to form an iid class, each exponential (3). Thus, the average interview time is 1/3 hour (twenty minutes). We take the probability for success on any interview to be \(p = 0.2\). What is the probability a satisfactory candidate will be found in four hours or less? What is the probability the maximum interview time will be no greater than 0.5, 0.75, 1, 1.25, 1.5 hours?
Solution
\(T\) ~ exponential (\(0.2 \cdot 3 = 0.6\)), so that \(P(T \le 4) = 1 - e^{-0.6 \cdot 4} = 0.9093\).
\(P(W \le t) = g_N [P(Y \le t)] = \dfrac{0.2 (1 - e^{-3t})}{1 - 0.8 (1 - e^{-3t})} = \dfrac{1 - e^{-3t}}{1 + 4e^{-3t}}\)
MATLAB computations give
t = 0.5:0.25:1.5; PWt = (1 - exp(-3*t))./(1 + 4*exp(-3*t)); disp([t;PWt]') 0.5000 0.4105 0.7500 0.6293 1.0000 0.7924 1.2500 0.8925 1.5000 0.9468
The average interview time is 1/3 hour; with probability 0.63 the maximum is 3/4 hour or less; with probability 0.79 the maximum is one hour or less; etc.
In the general case, solving for the distribution of \(T\) requires transform theory, and may be handled best by a program such as Maple or Mathematica.
For the case of simple \(Y_i\) we may use approximation procedures based on properties of the geometric series. Since \(N - 1\) ~ geometric \((p)\).
\(g_N 9s) = \dfrac{ps}{1 - qs} = ps \sum_{k = 0}^{\infty} (qs)^k = ps [\sum_{k = 0}^{n} (qs)^k + \sum_{k = m + 1}^{\infty} (qs)^k] = ps[\sum_{k = 0}^{n} (qs)^k + (qs)^{n + 1} \sum_{k = 0}^{\infty} (qs)^k]\)
\(= ps[\sum_{k = 0}^{n} (qs)^k] + (qs)^{n + 1} g_N 9s) = g_n (s) + (qs)^{n + 1} g_N (s)\)
Note that \(g_n (s)\) has the form of the generating function for a simple approximation \(N_n\) which matches values and probabilities with \(N\) up to \(k = n\). Now
\(g_T (s) = g_n[g_Y (s)] + (qs)^{n + 1} g_N [g_Y (s)]\)
The evaluation involves convolution of coefficients which effectively sets \(s = 1\). Since \(g_N (1) = g_Y (1) = 1\).
\((qs)^{n + 1} g_N [g_Y (s)]\) for \(s = 1\) reduces to \(q^{n + 1} = P(N > n)\)
which is negligible if \(n\) is large enough. Suitable \(n\) may be determined in each case. With such an \(n\), if the \(Y_i\) are nonnegative, integer-valued, we may use the gend procedure on \(g_n [g_Y (s)]\), where
\(g_n (s) = ps + pqs^2 + pq^2s^3 + \cdot\cdot\cdot + pq^n s^{n + 1}\)
For the integer-valued case, as in the general case of simple \(Y_i\), we could use mgd. However, gend is usually faster and more efficient for the integer-valued case. Unless \(q\) is small, the number of terms needed to approximate \(g_n\) is likely to be too great.
Example \(\PageIndex{8}\) Approximating the generating function
Let \(p = 0.3\) and \(Y\) be uniformly distributed on \(\{1, 2, \cdot\cdot\cdot, 10\}\). Determine the distribution for
\(T = \sum_{k = 1}^{N} Y_k\)
Solution
p = 0.3; q = 1 - p; a = [30 35 40]; % Check for suitable n b = q.^a b = 1.0e-04 * % Use n = 40 0.2254 0.0379 0.0064 n = 40; k = 1:n; gY = 0.1*[0 ones(1,10)]; gN = p*[0 q.^(k-1)]; % Probabilities, 0 <= k <= 40 gend Do not forget zero coefficients for missing powers Enter gen fn COEFFICIENTS for gN gN Enter gen fn COEFFICIENTS for gY gY Values are in row matrix D; probabilities are in PD. To view the distribution, call for gD. sum(PD) % Check sum of probabilities ans = 1.0000 FD = cumsum(PD); % Distribution function for D plot(0:100,FD(1:101)) % See Figure 15.2.1 P50 = (D<=50)*PD' P50 = 0.9497 P30 = (D<=30)*PD' P30 = 0.8263
Figure 15.2.1. Execution Time Distribution Function \(F_D\).
The same results may be achieved with mgd, although at the cost of more computing time. In that case, use \(gN\) as in Example 15.2.8, but use the actual distribution for \(Y\).
Arrival times and counting processes
Suppose we have phenomena which take place at discrete instants of time, separated by random waiting or interarrival times. These may be arrivals of customers in a store, of noise pulses on a communications line, vehicles passing a position on a road, the failures of a system, etc. We refer to these occurrences as arrivals and designate the times of occurrence as arrival times. A stream of arrivals may be described in three equivalent ways.
- Arrival times: \(\{S_n: 0 \le n\}\), with \(0 = S_0 < S_1 < \cdot\cdot\cdot\) a.s. (basic sequence)
- Interarrival times: \(\{W_i: 1 \le i\}\), with each \(W_i > 0\) a.s. (incremental sequence)
The strict inequalities imply that with probability one there are no simultaneous arrivals. The relations between the two sequences are simply
\(S_0 = 0\), \(S_n = \sum_{i = 1}^{n} W_i\) and \(W_n = S_n - S_{n - 1}\) for all \(n \ge 1\)
The formulation indicates the essential equivalence of the problem with that of the compound demand. The notation and terminology are changed to correspond to that customarily used in the treatment of arrival and counting processes.
The stream of arrivals may be described in a third way.
- Counting processes: \(N_t = N(t)\) is the number of arrivals in time period \((0, t]\). It should be clear that this is a random quantity for each nonnegative \(t\). For a given \(t, \omega\) the value is \(N (t, \omega)\). Such a family of random variables constitutes a random process. In this case the random process is a counting process.
We thus have three equivalent descriptions for the stream of arrivals.
\(\{S_n: 0 \le n\}\) \(\{W_n: 1 \le n\}\) \(\{N_t: 0 \le t\}\)
Several properties of the counting process \(N\) should be noted:
\(N(t + h) - N(t)\) counts the arrivals in the interval \((t, t + h]\), \(h > 0\), so that \(N(t + h) \ge N(t)\) for \(h > 0\).
\(N_0 = 0\) and for \(t >0\) we have
\(N_t = \sum_{i = 1}^{\infty} I_{(0, t]} (S_i) = \text{max } \{n: S_n \le t\} = \text{min } \{n: S_{n + 1} > t\}\)
For any given \(\omega\), \(N(\cdot, \omega)\) is a nondecreasing, right-continuous, integer-valued function defined on \([0, \infty)\), with \(N(0, \omega) = 0\).
The essential relationships between the three ways of describing the stream of arrivals is displayed in
\(W_n = S_n - S_{n - 1}\), \(\{N_t \ge n\} = \{S_n \le t\}\), \(\{N_t = n\} = \{S_n \le t < S_{n + 1}\}\)
This imples
\(P(N_t = n) = P(S_n \le t) - P(S_{n + 1} \le t) = P(S_{n + 1} > t) - P(S_n > t)\)
Although there are many possibilities for the interarrival time distributions, we assume
\(\{W_i: 1 \le i\}\) is iid, with \(W_i > 0\) a.s.
Under such assumptions, the counting process is often referred to as a renewal process and the interrarival times are called renewal times. In the literature on renewal processes, it is common for the random variable to count an arrival at \(t = 0\). This requires an adjustment of the expressions relating \(N_t\) and the \(S_i\). We use the convention above.
Exponential iid interarrival times
The case of exponential interarrival times is natural in many applications and leads to important mathematical results. We utilize the following propositions about the arrival times \(S_n\), the interarrival times \(W_i\), and the counting process \(N\).
If \(\{W_i: 1 \le i\}\) is iid exponential (\(\lambda\)), then \(S_n\) ~ gamma \((n, \lambda)\) for all \(n \ge 1\). This is worked out in the unit on TRANSFORM METHODS, in the discussion of the connection between the gamma distribution and the exponential distribution.
\(S_n\) ~ gamma \((n, \lambda)\) for all \(n \ge 1\), and \(S_0 = 0\), iff \(N_t\) ~ Poisson \((\lambda t)\) for all \(t > 0\). This follows the result in the unit DISTRIBUTION APPROXI9MATIONS on the relationship between the Poisson and gamma distributions, along with the fact that \(\{N_t \ge n\} = \{S_n \le t\}\).
Remark. The counting process is a Poisson process in the sense that \(N_t\) ~ Poisson (\(\lambda t\)) for all \(t > 0\). More advanced treatments show that the process has independent, stationary increments. That is
\(N(t + h) - N(t) = N(h)\) for all \(t, h > 0\), and
For \(t_1 < t_2 \le t_3 < t_4 \le \cdot\cdot\cdot \le t_{m - 1} < t_m\), the class \(\{N(t_2) - N(N_1), N(t_4) - N(t_3), \cdot\cdot\cdot, N(t_m) - N(t_{m -1})\}\) is independent.
In words, the number of arrivals in any time interval depends upon the length of the interval and not its location in time, and the numbers of arrivals in nonoverlapping time intervals are independent.
Example \(\PageIndex{9}\) Emergency calls
Emergency calls arrive at a police switchboard with interarrival times (in hours) exponential (15). Thus, the average interarrival time is 1/15 hour (four minutes). What is the probability the number of calls in an eight hour shift is no more than 100, 120, 140?
p = 1 - cpoisson(8*15,[101 121 141]) p = 0.0347 0.5243 0.9669
We develop next a simple computational result for arrival processes for which \(S_n\) ~ gamma \((n, \lambda)\)
Example \(\PageIndex{10}\) Gamma arrival times
Suppose the arrival times \(S_n\) ~ gamma (\(n, \lambda\)) and \(g\) is such that
\(\int_{0}^{\infty} |g| < \infty\) and \(E[\sum_{n = 1}^{\infty} |g(S_n)|] < \infty\)
Then
\(E[\sum_{n = 1}^{\infty} g(S_n)] = \lambda \int_{0}^{\infty} g\)
VERIFICATION
We use the countable sums property (E8b) for expectation and the corresponding property for integrals to assert
\(E[\sum_{n = 1}^{\infty} g(S_n)] = \sum_{n = 1}^{\infty} E[g(S_n)] = \sum_{n = 1}^{\infty} \int_{0}^{\infty} g(t) f_n (t)\ dt\) where \(f_n (t) = \dfrac{\lambda e^{-\lambda t} (\lambda t)^{n - 1}}{(n - 1)!}\)
We may apply (E8b) to assert
\(\sum_{n = 1}^{\infty} \int_{0}^{\infty} gf_n = \int_{0}^{\infty} g \sum_{n = 1}^{\infty} f_n\)
Since
\(\sum_{n = 1}^{\infty} f_n (t) = \lambda e^{-\lambda t} \sum_{n = 1}^{\infty} \dfrac{(\lambda t)^{n - 1}}{(n - 1)!} = \lambda e^{-\lambda t} e^{\lambda t} = \lambda\)
the proposition is established.
Example \(\PageIndex{11}\) Discounted replacement costs
A critical unit in a production system has life duration exponential \((\lambda)\). Upon failure the unit is replaced immediately by a similar unit. Units fail independently. Cost of replacement of a unit is c dollars. If money is discounted at a rate \(\alpha\), then a dollar spent tunits of time in the future has a current value \(e^{\alpha t}\). If \(S_n\) is the time of replacement of the \(n\)th unit, then \(S_n\) ~ gamma \((n, \lambda)\) and the present value of all future replacements is
\(C = \sum_{n = 1}^{\infty} ce^{-\alpha S_n}\)
The expected replacement cost is
\(E[C] = E[\sum_{n =1}^{\infty} g(S_n)]\) where \(g(t) = ce^{-\infty}\)
Hence
\(E[C] = \lambda \int_{0}^{\infty} ce^{-\alpha t} \ dt = \dfrac{\lambda c}{\alpha}\)
Suppose unit replacement cost \(c = 1200\), average time (in years) to failure \(1/\lambda = 1/4\), and the discount rate per year \(\alpha = 0.08\) (eight percent). Then
\(E[C] = \dfrac{1200 \cdot 4}{0.08} = 60,000\)
Example \(\PageIndex{12}\) Random costs
Suppose the cost of the \(n\)th replacement in Example 15.2.11 is a random quantity \(C_n\), with \(\{C_n, S_n\}\) independent and \(E[C_n] = c\), invariant with \(n\). Then
\(E[C] = E[\sum_{n = 1}^{\infty} C_n e^{-\alpha S_n}] = \sum_{n = 1}^{\infty} E[C_n] E[e^{-\alpha S_n}] = \sum_{n = 1}^{\infty} cE[e^{-\alpha S_n}] = \dfrac{\lambda c}{\alpha}\)
The analysis to this point assumes the process will continue endlessly into the future. Often, it is desirable to plan for a specific, finite period. The result of Example 15.2.10 may be modified easily to account for a finite period, often referred to as a finite horizon.
Example \(\PageIndex{13}\) Finite horizon
Under the conditions assumed in Example 15.2.10, above, let \(N_t\) be the counting random variable for arrivals in the interval \((0, t]\).
If \(Z_t = \sum_{n = 1}^{N_t} g(S_n)\), then \(E[Z_t] = \lambda \int_{0}^{t} g(u)\ du\)
VERIFICATION
Since \(N_t \ge n\) iff \(S_n \le t\). \(\sum_{n = 1}^{N_t} g(S_n) = \sum_{n = 0}^{\infty} I_{(0, t]} (S_n) g(S_n)\). In the result of Example 15.2.10, replace \(g\) by \(I_{(0, t]} g\) and note that
\(\int_{0}^{\infty} I_{(0, t]} (u) g(u)\ du = \int_{0}^{t} g(u)\ du\)
Example \(\PageIndex{14}\) Replacement costs, finite horizon
Under the condition of Example 15.2.11, consider the replacement costs over a two-year period.
Solution
\(E[C] = \lambda c\int_{0}^{t} e^{-\alpha u} \ du = \dfrac{\lambda c}{\alpha} (1 - e^{-\alpha t})\)
Thus, the expected cost for the infinite horizon \(\lambda c/ \alpha\) is reduced by the factor \(1 - e^{-\alpha t}\). For \(t = 2\) and the number in Example 15.2.11, the reduction factor is \(1 - e^{-0.16} = 0.1479\) to give \(E[C] = 60000 \cdot 0.1479 = 8871.37\).
In the important special case that \(g(u) = ce^{-\alpha u}\), the exporession for \(E[\sum_{n = 1}^{\infty} g(S_n)]\) may be put into a form which does not require the interarrival times to be exponential.
Example \(\PageIndex{15}\) General interarrival, exponential g
Suppose \(S_0 = 0\) and \(S_n = \sum_{i = 1}^{n} W_i\), where \(\{W_i: 1 \le i\}\) is iid. Let \(\{V_n: 1 \le n\}\) be a class such that each \(E[V_n] = c\) and each pair \(\{V_n ,S_n\}\) is independent. Then for \(\alpha > 0\)
\(E[C] = E[\sum_{n = 1}^{\infty} V_n e^{-\alpha S_n}] = c \cdot \dfrac{M_W (-\alpha)}{1 - M_W (-\alpha)}\)
where \(M_W\) is the moment generating function for \(W\).
DERIVATION
First we note that
\(E[V_n e^{-\alpha S_n}] = cM_{S_n} (-\alpha) = cM_W^n (-\alpha)\)
Hence, by properties of expectation and the geometric series
\(E[C] = c \sum_{n =1}^{\infty} M_W^n (- \alpha) = \dfrac{M_W (-\alpha)}{1 - M_W (-\alpha)}\), provided \(|M_W (-\alpha)| < 1\)
Since \(\alpha > 0\) and \(W > 0\), we have \(0 < e^{-\alpha W} < 1\), so that \(M_W (-\alpha) = E[e^{-\alpha W}] < 1\)
Example \(\PageIndex{16}\) Uniformly distributed interarrival times
Suppose each \(W_i\) ~ uniform \((a, b)\). Then (see Appendix C),
\(M_W (-\alpha) = \dfrac{e^{-a \alpha} - e^{-b \alpha}}{\alpha (b - a)}\) so that \(E[C] = c \cdot \dfrac{e^{-a \alpha} - e^{-b \alpha}}{\alpha (b - a) - [e^{-a \alpha} - e^{-b \alpha}]}\)
Let \(a = 1\), \(b = 5\), \(c = 100\) and \(\alpha = 0\). Then,
a = 1; b = 5; c = 100; A = 0.08; MW = (exp(-a*A) - exp(-b*A))/(A*(b - a)) MW = 0.7900 EC = c*MW/(1 - MW) EC = 376.1643