4 credits, MTWF 01:15PM - 02:05PM REISS 262 (MWF), Reiss 284 (T)
You can reach me by phone (or voice-mail) at 7-2703 or by e-mail kainen at georgetown.edu or drop by Reiss 258 during my office hours.
**** Last updated **** May 4, 2005 **********************
There is a new page on the final for our course , including information on the reviews scheduled for May 8,9,10 in Reiss 262 9:30 to 10:20 pm. Bus surveys will still be accepted up to Friday 8pm, May 6.
For info on the projects, please see the projects page . All I am expecting from you is a group presentation on the pills data. The bus survey only asks you to fill out forms and can be done in shorter breaks - not necessarily 2 hours - while you take a break or have lunch. Fri April 29, 8 pm - deadline for the bus survey. For the buses, you only need to write time and direction, e.g., 9:45 U, 9:50 D, 10:03 D, etc. as well as the bus-line and date.
The midterm on Wed. April 27 will cover chapters 4 to 8, and 27, as they have been used in class. For instance, we didn't spend time on the notion of ``stem and leaf diagram'' in Chap. 4 so it won't be on the midterm. Chapter 8 is the key chapter, which uses material from the earlier chapters such as z-score, correlation, and scatterplot. Chapter 27 unifies the ideas of chapter 8 with the previously studied concepts of hypothesis testing and interval estimates. We will discuss this in class on Monday; see below for some selected answers.
Prioritizing: Since the midterm is Wed. while we have Fri. and Monday to report on class projects, study for the midterm first. Don't forget that we will have three reviews on the evenings of May 8-10 to prepare for the final.
For info on results-to-date which does not show a link between cell phone use and brain cancer but also points out that long-term studies of heavy users have not yet been done, see an NIH newsreport with further pointers to related areas. Just in case you forget how relevant statistics can be!
For Chapter 27, here are some selected answers: #2 (b) H_0 There is no linear relationship between marijuana use and the use of other drugs; beta_1 = 0 vs. H_A beta_1 not= 0 (c) t-value is the ratio of b_1 to SE(b_1) (the book calls this a t-ratio); according to the print-out given with the problem, b_1 = .615003 and SE(b_1) = .0784, and the ratio is 7.85 which corresponds to a P-value of .0001 according to the table. Hence, reject H_0. The percentage of teenagers using other drugs is positively correlated with the percentage using M. (d) The R-squared value means that 87.3 % of the variance of y is accounted for by the variance of x together with linearity. (e) No causation is proved since lurking variables could exist. #4 The hypothesis is H_0: beta_1 = 0 vs. H_A: beta_1 > 0. Calculate the t-value with 292 degrees of freedom which is approximately a z-value. Looking up 1.23 in the z-table, we see the probability of exceeding this value by a z-variable is 1 - .8907 = .1093 (with the use of a calculator or computer, the exact value for the t-distribution is actually .1103). In either case, such a large P-value means there is a significant chance that such a result could occur by chance so do _not_ reject H_0. #6 Try the same problem with three data points for ages 1,5,11, which is much easier to implement but gives you some of the same steps to practice - except of course for (b) and (d), which won't tell you much for three data pairs. Do you know why? For the full data set, the linear model turns out to look ok, with 12,319.59 - 924.00 x age as predicted advertized price. But residuals against predicted values shows possible curvature since histogram of residuals doesn't look like normal distrib. Inference may not be valid - but proceed anyway. Based on the data, 95%-CI for yearly change in used-car value is (-1099.4, -748.6). #8 (a) The scatterplot looks linear with pos. slope. (b) Assumptions ok since (except for one outlier) plot of residuals against predicted values has no clear pattern and the histogram of residuals is roughly symmetric and unimodal so not too unnormal.
When is it appropriate to use linear regression to model some relationship between x and y data? Be ready to use plots similar to those in our text to make reasonable inferences on suitability.
Know how to use the formulas in order to carry out the estimates described above. For instance, b_1 (+/-) (t-* X SE(b_1)) gives the (1-alpha)%-confidence interval for beta_1, where t-* means the t-value with n-2 df with P(t > t-*) = alpha/2 and X means multiplication. Remember that beta_1 is the slope of the line of best fit relating populations x and y.
This is just like the corresponding one-sample t-interval estimate for mu_y, the mean of a population y, based on SE(y-bar), given on p. 458 of Chapter 23. The general pattern is that the interval estimate for some population parameter is the point-estimator plus or minus t-* times the standard error. But the t-* here has n-1 degrees of freedom, while for the estimate of beta_1 there are n-2 df.
Read chapters 27 for next week. I already put the problems for Chapter 27 on the board on Friday: #2,4,6,8,10,14,26.
For special homework due Wed. April 13 you did the following problem:
Given data (x,y) = (2,2), (3,2), (4,4); that is, the x-sample is (2,3,4) - i.e., x_1 = 2, x_2 = 3, x_3 = 4, and the corresponding y-sample is (2,2,4) find the value of r, the corresponding line of best fit (i.e., the values b_1 and b_0 given in the book which are the slope and y-intercept of the line). You should also calculate the value (SS_yy - SSE)/SS_yy and see that it actually is equal to r^2. In class, we checked that x-bar is 3, SS_xx = 2, and s_x = 1. So you can start there and then do the others.
HW for Friday April 15: Chap. 8: #8,10,18,22,30. Also, skim Chap. 6 (already mentioned in class) and Chap. 9 and Chap. 10.
For Friday, Apr. 8, homework to be collected (previously announced in class) Chap 5: #4,8,10,12,14,28,30; Ch. 7: #4,6,14,20,24,32.
SS_xy r = ------------------- sqrt(SS_xx * SS_yy) where SS_xy = sum_i x_i y_i - (1/n)*(sum_i x_i)(sum_i y_i) and SS_xx = sum_i (x_i - x-bar)^2 = sum_i (x_i)^2 - (1/n)*(sum_i x_i)^2, and similarly for SS_yy. These give the same values as the formulas in our text on p. 121, but they are somewhat faster for computation. The line of best fit for some given data is the line which minimizes SSE, the sum of the squared errors, where the error of a data-point (i.e., a pair (x_i,y_i)) is defined to be the difference between y_i and the value f(x_i) given by the function corresponding the line. For the line of best fit, we denote this by SSE = sum_i (y_i - y_i-hat)^2. I noted in class that SS_yy is the sum of the squared errors with respect to a horizontal line (with height y-bar) and so, by the minimum property of SSE, SSE is not bigger than SS_yy. Hence, SS_yy - SSE 0 <= ----------- <= 1 and in fact this value is equal to r^2 SS_yy so r must be between -1 and +1 as we said. Once you have r, the slope b_1 of the line of best-fit is given by r * s_y b_1 = ------- s_x snd b_0 = y-bar - b_1 * x-bar, where x-bar and y-bar are the sample means and s_x and s_y are the sample standard deviations. You may also want to make use of similar short-cut formula for s^2 1 s^2 = (---)*SS_xx where s^2 is the sample variance for the x_i. n-1
We have covered chapters 18,19,20,21,23 (as on the problems below). I'll add some selected answers for a couple of the later problems and we'll go over the enclosed list of review topics on Tuesday.
For the midterm on March 30, be ready to do the following:
The answer depends on what you know about the population. If the pop'n is normal, then n > 1 is the only restriction on sample size; if the SD of the pop'n is known, use x-bar plus or minus ME, where ME = z_(alpha/2) * sigma/sqrt(n). If the SD isn't known, replace sigma by s and z_(alpha/2) by t_n-1,(alpha/2) - the t-distribution with n-1 degrees of freedom (df). If the pop'n isn't known to be normal (or nearly normal), you can use the z-based interval provided that n is large enough (by the CLT). (There are other conditions, too, which you should look up.)
Since 1.96 is z_(.025) and the test is 2-tailed, the probability is .05.If you were comparing H_0 against H'_A: mu > mu_0 and x-bar < mu_0, would you ever reject H_0?
No. When H_0 is compared against H'_A as an alternative, H_0 is the hypothesis that mu is less than or equal to mu_0. Hence, even a very small value for x-bar is not evidence against H_0.
You would prefer the test with the least probability of a type-II error.
Increasing alpha decreases beta and vice versa. You can decrease both alpha and beta by increasing the sample size.
Just as above, the center of the CI is x-bar, which is the sample proportion p-hat. The ME is z_(alpha/2) sqrt(p-hat * q-hat)/sqrt(n), where q-hat means 1 - p-hat.
E(X-bar) = mu; E(s) = sigma
For background information on class policies and procedures, please see 040 background page . This includes information on grading and what I expect for homework and quizzes. Now added: Descriptions of project topics and organization. Also now added - some notes on theory in mathematical notation.
Also see my index page for information on office hours and other topics.
The grader for our class is Bob Geng who is a graduate student in economics. You can reach him at bg25@georgetown.edu and you can try the Math Assistance Ctr. (MAC) which meets in Reiss 256 every evening (Sun. - Thurs.) from 6 to 9 pm.
For our first class project, we are doing a survey of the people you come in contact with over the Easter vacation to ask "Have you or have you not taken any prescription medication in the last four weeks?" The survey questionnaire is on the web (in pdf format for easy printing) at prescription meds survey . Note that to save paper the form has three copies of the survey question so you can print out copies and cut the pages yourself. Alternatively, I'll be passing out these copies in class today and later this week.
Please note: When you collect this information, you should just provide the blank forms (1/3rd of a page each) and a pen or pencil, together with a large manilla envelope, to your respondents. The idea is to preserve their annonymity completely. No name or other identifying information is required - or permitted! I'll also have some envelopes for you in class if you need them.
For homework due on Friday 18 March , Chap. 21: #12,14,16,20; (12a one-tailed - we only want to determine whether _more_ visible; 12d probability of detecting that a more visible stop- sign works; 14d 10 percent gives more chance to reject H_0; 14e Lower - bigger n gives more power so lower type II error probability; 16b There is a 9.23 percent chance of having 133 or more of 600 people remember the add if true prop'n is 20 percent - assuming random sample)
Chap. 23 (note that we're skipping Chap 22 for a while): # 10,12,18,20,24. 10b (\$122.20, \$129.80), 10e No. The interval is below \$130;12a More chance the interval contains the true mean, 12d 99 days;18a Two-sided: too big means catheters won't fit in veins; too small means they may not work properly; 18c Catheters that don't meet specs are allowed to be produced and sold; 20a Increase (since alpha decreases - assuming no change in sample size); 20d Increase alpha or increase sample size;24b 28.98 = sample avg, s = .36; 24c (28.61,29.36) grams; 24e The company is putting in more than the stated amount of 28.3 grams. If the null hypothesis were that the amount per bag is at most 28.3 grams, what is the approximate probablilty of getting the sample average and SD given above?) I'll go over some problems from these chapters tomorrow in class and may also discuss ch 21 #10 and Ch 23 #8,26. Previously, for Wed. the 16th the problem below labeled "Assignment" was written up and handed in.
The probability problem I described today on the board will be briefly discussed on Wed. and I hope that a few of you will figure it out. The problem, in case you missed it, is to find the probability p that a fair coin which has come up TH will have two consecutive Heads before it has two consecutive Tails. The hint I gave in class is to make a "tree diagram" but you need one more idea to solve the problem. As discussed in class, symmetry tells you that p = (1/2) + (1/2)(1-p) and solving for p you get p=2/3. Now try this for a similar problem with the same initial condition but where you now want to find the probability p' of getting HHH before getting TTT (given you start with TH).
Note that in Chap. 23, we'll skip (for now) the section on "the sign test".
The midterm on March 30 will require you to use the tables I gave out in class. You'll need to be able to (1) recognize whether a statistical inference requires the z-distribution or the t-distribution, (2) determine the sample size necessary to answer certain questions with 95 percent confidence, (3) differentiate between one-tail and two-tail tests, (4) describe Type I and Type II errors, and (5) be able to give the P-value for test statistics. This material is covered in Chap. 18,19,20 which we've already covered, Chap. 21, and Chap. 23. Later, we will review the essentially similar methods in Chap. 22,24,25, and 26, which deal with ways to compare two estimates for a population parameter.
I will allow calculators for the midterm - but only the simple graphing calculator type is permitted. Problems will require you to show your work and in some cases you will only need to know what to do rather than to actually do it.
Following the midterm, we will return to some earlier chapters on how to display statistical data. We'll also cover the key notions of covariance, correlation, and linear regression, as well as briefly considering the issues of ``goodness of fit'' and analysis of variance.
In addition, we are going to consider several different projects in this latter third of the course. The small-group meetings will occur on Monday and Tuesday evenings, on April 4,5,11,12,25,26 in the conference room in Reiss 256 from 7:15 to 8:30 or 9 pm. Each of the small groups will have a given time and day. I know that a few of you have time conflicts so I'll find another time that you can attend - should that be necessary.
The first project, "Pills", has to do with the frequency with which children are being prescribed psychoactive medication, such as Ritalin (a stimulant which seems to calm ADHD sufferers), and various anti-depressants, generically called SSRIs due to an effect on a neurotransmitter. The hypothesis to be studied H_A is whether the average frequency p_1 for those children from the same socioeconomic group as the typical Georgetown student will differ significantly from the national average, which appears to be approximately 20 percent. All students will be asked to fill out questionnaires for as many children as they can during the Easter holiday.
The questions to be asked are: Q1 Have you taken any prescription medications within the last 30 days to treat a mental, emotional, or behavioral condition? Q2 Have you taken any prescription medication at all during the last 30 days? The second question Q2 is to be asked first (since it seems to be value-neutral), followed by question Q1 regarding psychoactive medication. You should only interview people between the ages of 6 and 20 years, inclusive. Printed sheets will be provided to you, one for each person you interview. While this technique is not random across the entire US population, it may be somewhat random for the socioeconomic group corresponding to Georgetown University students - at least, those taking statistics!
This data will be investigated following the midterm on March 30, but the Easter holiday is an especially opportune time to collect some data. I'll consider the possibility of, in addition, having some teams go to area schools to obtain data from a sample of the students.
Assignment: How many chilren must be interviewed in total to obtain a margin of error of at most plus or minus 4 percent? Justify your sample size n as completely as you can. Assume that you need to be approximately 95 percent confident that p_1 is within the margin of error of the fraction of children who respond Yes to the question on the questionnaire. This is due on Wed. March 16 . On Monday evening, you should read Chap. 12 in our text (on questionnaires).
The second project, "Buses", considers the hypothesis that the degree of goodness of fit of a Poisson model to bus arrival times is positively correlated with the duration of the bus route. This data will be collected during the month of April for three convenient local bus lines. A key problem will be to find suitable ways to test the data for schedule regularity.
Students are encouraged to explore the available web-based resources (see below). One can find many studies which partially overlap both the above two projects and consideration of the methods used and results obtained in other studies is part of the background preparation which should accompany any scientific investigation, whether statistical or not.
However, since we are in Washington, DC, there are a large number of specialized organizations (both governmental and non-governmental) which can provide more detailed information than is publicly available on the web. To ensure diversity and to give each student a chance to learn how to use this information resource effectively, I will work with you (in small groups) to coordinate your inquiries.
A third project, "Psychophysics", involves the determination of human capabilities in the perception of certain figures created by visual display of mathematical relationships between harmonic motions. Lissajous figures, for example, correspond to musical chords, and it is possible to see distortions in the chordal ratios just as one can hear the nuances of correct vs. sharp or flat pitch. This is ongoing work, including the participation of some students in the cognitive science program who are taking a module from me. I'll have more to say about this one later.
To search for info on children taking psychoactive prescription meds, you could use the following Google query:
children Zito medication (NB Zito is prof at U of Md) See also Washington Post, A15, Dec. 3, 2004 - article by S. Vedantam on CDC Study (A. Bernstein project director) For information on poisson-ness of distribution: measure of poisson-ness http://www.math.yorku.ca/SCS/Courses/grcat/grcat.pdf http://www.iop.org/EJ/article/0305-4470/33/26/102/a026l2.html poisson route-length buses transport
For Monday, Feb. 28, read chapter 19 and do 4,8,10,12,14,16,20,22,24,26. We'll discuss some of these in class. Also, look over Ch. 20 and 21. Try Ch. 20: #2,4,6,8,10 for Tues. Chapters 18 -- 22 involve just a couple of ideas which are fundamental. The second midterm on March 30 will cover these chapters as well a few additional chapters.
For Monday, Feb. 28, read chapter 19 and do 4,8,10,12,14, 16,20,22,24,26. We'll discuss some of these in class. Also, look over Ch. 20 and 21. Try Ch. 20: #2,4,6,8,10 for Tues.
For collection on Fri. March 4 (or on March 2 if you leave early!): Ch. 19: #12,14,20,24,26; Ch. 20: #10.
Here are a couple of answers from previous homework and practice problems: Ch 18: #4b 99.7 percent in (.037,.163), #10b .234, #16 if mu + 2sigma is "pretty sure" (i.e., about 97.5 percent certain), then should have at least 47 (mu=.20 and sigma=.03) #22b 31.9 inches, 22d .005; Ch. 19: #4d population = employees at the company, sample = all employees that year, p = proportion of all employees who will have an injury, p-hat (means p with circumflex) = proportion of that year's employees with an injury (3.9 percent), can use methods of the chapter if that year's employees are a random sample of all possible employees; #8 all 4 parts are True (these might make good quiz problems!); #14a (.106,.141); b We are 90 percent confident that the proportion of people contacted who will buy something is between .106 and .141; c About 90 percent of all confidence intervals constructed this way contain the true population proportion; d Do the mass mailing since 5 percent is well below the interval; #22a (.188,.285); Ch. 20 #8 (a) Use p in hypotheses, not p-hat, (b) H_A should be 2-sided as problem asks "not accurate" - i.e., H_A is "p is not equal to 0.9", (c) (.1)(750) > 10, so (.9)(750) is also > 10, (d) p-hat = 657/750 = .876, SD(p-hat) = .011 (e) z = (.876 - .9)/.011 = -2.18, (f) P = 2P(Z < -2.18) = .029, (g) There is only 2.9 percent chance of observing p-hat this far from .9 so proportion of adults who drink milk in the area being studied is significantly different from claimed .9
The text does have some very interesting and useful remarks scattered about, largely dealing with practical issues. For instance, in Chap. 19, see the top of p. 383. Don't worry about instructions about how to do the problems by software. These are simple enough that you can do them by using the tables and a calculator (hand-held or on the computer). The difference is that you should be able to explain what is going on with the calculations rather than merely invoking some program that does everything for you.
For Tuesday Feb. 22, after reading the Ch. 18 try the following problems: #2,4,6,10,16,18. Also, for Wed. try #22,28,34. For Friday, read Chapter 19. Do problems 4,10,16,22,28 from Ch. 18 for collection on Fri.
Some of these problems will require you to figure out the area underneath a normal curve. In addition to using tables or pushing buttons on your calculator, you can try one of the following links:
The first and last will calculate areas under the normal curve. You need to figure out how to translate - we'll discuss this on Tuesday but you may want to try it yourself first!
first midterm on Wed. Feb. 16. Please note that many answers have been added below to help you check your understanding of the problems.
For the midterm on Wed. we will cover the same material as for the quiz on Friday plus the exponential distrib'n and the simplest facts about the normal distribution, the Law of Large Numbers, and the Central Limit Theorems. See below for a reminder of what we discussed - and check your class notes (and those of your classmates). Also, don't forget that a r.v. X with mean E(X) and nonzero SD is standardized by replacing it with (X-E(X))/SD which has the same distribution as X except for being recentered (with mean at zero) and rescaled (with SD = 1).
To put things into perspective, the Central Limit Theorem and standardization allow one distribution and hence the table for that distribution to describe a very wide range of random variables. Essentially, we can reduce everything to calculating areas under the standard normal distribution. The CLT tells us that we can replace general r.v. by the normal distribution. Rather than calculating separately for all the possible forms of the normal distribution (one for each choice of mean and variance), it is more efficient to build a table for the standard normal and then to find anything else using that. In fact, a table of a few dozen pages provides enough information for a good approximation.
After the exam, we'll be using such methods but for now just make sure you can do the problems listed here and which we've done in class, on homework, and quizzes.
If X has an exponential distribution, then X corresponds to the lifetime distribution. As t increases, the chance that X exceeds t changes at an exponentially decreasing rate, where the "rate constant" is h which is the reciprocal of the expected value of X (the lifespan).
If X has pdf f(x;h) defined for h > 0 by
-hx f(x;h) = h exp (-hx) = h e for x \geq 0 = 0 for x < 0
(writing "\geq" for the greater-than-or-equal-to symbol), then you should know that
(1) P(X > t) = exp(-ht) (2) E(X) = 1/h
We used these two facts to calculate P(a < X < b) given E(X). We even showed that the (unique) t such that P(X > t) = 1/2 is t = ln(2)E(X) (using logarithms). Since ln(2) is approx'ly .7, the median lifespan is about .7 times the expected life span, so mean lifespan exceeds the median. Another example is incomes where again mean is larger than median (I said this backward in class). FYI, the SD of f(x;h) is also equal to 1/h.
The normal distribution, recentered at zero, is symmetric about the y-axis so its mean and median are both zero.
You should recall that for a r.v. X, the cdf F(x) is the integral from -oo to x of f(t), where f is the pdf of X.
Here are a couple of parenthetical remarks which you do not have to know for the midterm. I discussed briefly the "Law of Large Numbers" and the "Central Limit Theorem". We will return to this after the exam as it leads to statistics!
The law of large numbers says that the sample average (a number) approaches the mean of some random variable. The CLT, in contrast, says that the cdf of a r.v. which is the sum of a sufficiently large number of i.i.d. r.v.'s is indistinguishable from the cdf of a normal distribution.
By differentiation it follows that if the cdf's are approaching each other, so are the pdf's, and the converse is also true.
An additional refinement occurs when we have proved that the sum of several different normal variables is again normal (though the mean and variance change). This implies that when variables being summed don't all have the same distribution but have distinct distributions, the CLT still holds as long as there are enough of each and those of each type are mutually independent.
FYI, the Law of Large Numbers follows from Chebyshev's Theorem which I mentioned in class:
P(|X-EX| > kSD(X)) < 1/k^2
for k > 0. The CLT needs a more elaborate argument and we have not stated all the technical conditions.
End of parenthetical remarks.
For the quiz on Friday and the midterm, be ready to do the following:
If X is an exponentially distributed r.v. with E(X) = 1000, find P(2000 < X < 3000). The answer is exp(-2) - exp(-3).
Independencs for random variables is defined in terms of independence of events (random variables are functions on sample space while events are subsets of sample space). We say that two discrete r.v. X and Y are independent r.v. if the events (X=x) and (Y=y) are independent for all pairs of real numbers x,y. (Recall that events A and B are independent if P(A)P(B) = P(A I B); the probability of the intersection is the product of the separate probabilities.)
In fact, it is only necessary to check this for the finite set of pairs which X and Y can achieve in the finite discrete case since the other values have zero probability of being achieved and so the corresponding events are automatically independent. Simple example: Let X be the value showing on the 1st die and Y the value on the 2nd die. These are independent r.v. since the outcome of the first die and that of the second are independent events.
The best way to prepare for the quiz and the midterm is (1) go over your class notes (and the web pages), (2) review your homework and previous quizzes, (3) ask questions about what you don't understand (and discuss these things with your classmates). Do you know the difference between independent events and independent r.v.?
Today (8 Feb.) I described the "Central Limit Theorem" in class. For large n, if X_1, ... ,X_n are independent identically distributed (i.i.d.)r.v. (subject to a couple of technical restrictions which we'll ignore), then their sum is approximately normally distributed r.v. The theorem as given in class recentered the sum and rescaled also so that the result was a standard normal distribution, i.e., with mean = 0 and SD = 1. But the real thrust of the theorem is the fact that a normal distribution arises from the sum of _any_ family of i.i.d. r.v.'s once the family is large.
This may seem rather surprising when you think about it! For example, suppose X_1 and X_2 each have a uniform distribution. If you take the discrete case, where X_1 and X_2 represent, e.g., the number showing on a single die, then the pdf of X = X_1 + X_2 looks like a triangle:
P(X=2)=1/36, P(X=3)=3/36, ... , P(X=7)=6/36, P(X=8)=5/36, ..., P(X=12)=1/36
In fact, the same thing happens for the continuous uniform distribution, but we won't prove it. But a triangular distribution doesn't look much like the normal distribution (though it is at least unimodal). Of course, it is only a sum of two r.v. What if we take _three_ distributions?
Since we have the double-value quiz on Friday and the midterm next Wed., I won't give a homework assignment for this Friday. But the following is an Exercise for Monday. Suppose that you have 3 tetrahedral dice (i.e., each die is a triangular pyramid which has four triangular faces). If the faces are numbered 1,2,3,4 for each die, let X_1,X_2,X_3, respectively, denote the number which shows up on the first, second, and third die, resp. In the experiment we are considering, an outcome consists of a sequence of three values chosen from the numbers on the faces of the dice. Let X be the r.v. which is the sum - that is,
X = X_1 + X_2 + X_3.What is the pdf of X? For example, P(X<3)=0, P(X=3)=1/64, P(X=4)=3/64, ... The resulting pdf (if you calculate it correctly) looks already rather like a normal variable even though it is only the sum of three r.v.
ans. P(X=5)=6/64, P(X=6)=10/64, P(X=7)=12/64, and symmetrically for the rest, so the nonzero values (in units of 1/64) are 1,3,6,10,12,12,10,6,3,1, with sum = 64. Here is what that looks like (turned on its side): X XXX XXXXXX XXXXXXXXXX XXXXXXXXXXXX XXXXXXXXXXXX XXXXXXXXXX XXXXXX XXX X Pretty normal looking! To see that P(X=7)=12, note that 7 can appear as the sum 1+2+4, 2+2+3, and 1+3+3. A sum like 7 = 5+1+1 isn't achievable since the dice here only have four sides. There are three ways 2+2+3 can occur: (2,2,3),(2,3,2),(3,2,2), and similarly for 1+3+3, but 1+2+4 can occur in 6 ways (the number of different permutations of 3; that is, the sum corresponds to (1,2,4), (2,1,4), (1,4,2), (2,4,1), (4,2,1), (4,1,2) in the sample space of outcomes. Each roll has equal probability of achieving any of the 4 values, so there a total of 4 x 4 x 4 = 64 possible outcomes for the experiment and each outcome is equally likely. Since the total probability of the sample space is 1 by assumption, each outcome has a probability of 1/64. To determine the probability of an event we only need to count the number of elements in the event and multiply by 1/64.
For Monday Feb. 7, here is some material to look over, as well as the answer to the quiz.
This week, we'll talk about continuous random variables and I'll put a web page with some of the theory to supplement the text. There are many beasts in the zoo of random variables and we won't have time to dwell on all of their properties. I expect that you can distinguish the entities involved and work simple related problems related to those properties we do discuss.
For the quiz, I asked you: What is the difference between probability and a random variable?
Answer: Probability is a function P defined on the set of all subsets of some set S which satisfies (i) P(A) is nonnegative and at most 1 for all subsets A of S, (ii) P(S) is 1, (iii) P is additive for disjoint subsets. A random variable is a function X: S --> Re from some sample space S to the real numbers Re = (-oo,oo).
We say "probability" but we mean "a probability" since the function can be different under different circumstances. For rolling dice, if the dice are fair, one can say the probability because then all outcomes have equal likelihood of occurring and so the probability of any event (set of outcomes) is uniquely determined as the ratio of the number of elementary outcomes in the event divided by the total number of such outcomes, i.e., |S|. In general, one doesn't know that all outcomes are equally likely so the values of P at each element could be anything - except for our three conditions above.
For |S| = oo, the above arguments no longer apply. Individual elements of S have probability zero, but (some of the) subsets A can have an assigned probability of P(A), still subject to the three laws above.
The connection between r.v. and probability for discrete sample spaces S is given by defining f(x) = P(X=x) and noting that f(x) gives a probability on the set of real numbers: all the f(x) values are nonnegative and their sum is one so no value can exceed 1.
For a continuous r.v. X one can define a function f (the pdf) such that
f: X --> (-oo,oo)
such that for any two real numbers a and b,
P(a < X < b) = \int(from a to b) f(x) dx
that is, the probability that X is between a and b is the integral of f(x) as x goes from a to b. (Don't worry if you don't remember integrals or never saw them before - we won't be explicitly calculating with them except for an example or two.)
Let X be any continuous r.v. with E(X)=0. Call X symmetric if f(x) = f(-x), where f is the pdf of X. Suppose some kind person gives you a table of the following numbers: P(0 < X < b) = t_b, for b=0, .5, 1.0, 1.5, etc. How could you use the table to find P(-2 < X < 5)? What about P(3 < X < oo)?
P(-2 < X < 5) = P(-2 < X < 0) + P(0 < X < 5) since (-2 < X < 0) + (0 < X < 5) = (-2 < X < 5) is a partition But P(-2 < X < 0) = P(0 < X < 2) since X is symmetric. Thus, P(-2 < X < 5) = P(0 < X < 2) + P(0 < X < 5). For the second part above, since X is symmetric, P(X > 0) = 1/2 = P(X < 0) (since P(X=a) = 0 for any a) Hence, P(X > 3) = (1/2) - P(0 < X < 3).
If X is any r.v. with mean E(X) and standard deviation SD = sigma. How can you form a r.v. from X by recentering and rescaling to get a r.v. with mean = 0 and SD = 1?
We use E(aX+b) = aE(X) + b, which was proved in class. Let Y = X - E(X). Then E(Y) = E(X) - E(X) = 0. Using var(aX+b) = (a^2)*var(X) "a squared times var(X)" equivalently SD(aX+b) = a*SD(X)), for Y as above, sigma(Y) = sigma(X). If sigma(Y) is not equal to 0 and W = (1/sigma(Y))*Y, then sigma(W) = (1/sigma(Y))* sigma(Y) = 1. Since W is a multiple of Y and Y has expectation 0, so does W. Thus, W is Y ``rescaled'' to have SD = 1. Any r.v. X can be recentered and rescaled this way to obtain a new r.v. W = (X - E(X))/sigma(X) and E(W) = 0, sigma(W) = 1.
Putting these last two paragraphs together gives you some useful aspects of the ``standard normal distribution.'' Summarizing the facts about expectation and variance used, multiplying some r.v. by a scalar also multiplies its expectation and standard deviation by the same scalar. Adding a scalar to some r.v. adds the same scalar to its expectation but has no effect upon the standard deviation (since shifting the distribution must shift its expected value but does not change the way in which the distribution varies from its mean.
For Monday Jan. 31, look over the notes from today's class. I will put some additional material on-line later this weekend. For now, read Chapters 16 and 17 but don't worry about the references to continuous r.v. (including the "normal" distribution) which we'll go over soon.
Here is the homework for collection on Friday Feb. 4: The notation "sum_(k=0)^n" means "big sigma (the summmation sign) from 0 to n" (as I wrote on the board). Problems marked with an asterisk below are harder or more theoretical and I'll consider them as extra credit problems. When answering the problems, you should leave the answer in indicated form; there is no need to use calculators to figure out, e.g., the decimal value if the answer to some problem is, say, 1/121. I've included one or two hints and numeric answers for your convenience. Remember that you need to show some work to get credit for answers.
(1)Suppose that S_n is the number of successes in n independent repetitions of a trial where on each trial, the probability of success is p and of failure is q (so p + q = 1). Find P(S_4 = 2), (a) under the assumption that p = 1/3, q = 2/3 (b) under the assumption that p = 3/4, q = 1/4. *** (a) P(S_4 = 2) = C(4,2) * (1/3)^2 * (2/3)^2 = 6 (1/9) (4/9) = 36/81 This is the binomial pdf when p=1/3 and q=2/3. C(4,2) = 4*3/2*1 = 6. *** (2)With the same notation and assumptions as #1, find P(S_n = k) - P(S_n = n-k) when p = q. *** The answer is zero since C(n,k)=C(n,n-k). *** (3)* Find sum_{k=0}^n (-1)^k C(n,k), where we write C(n,k) for the binomial coeficient (this is just another notation - see below) n(n-1) ... (n-k+1) n n choose k = ------------------- We write this as ( ) k(k-1) ... 1 k (there are k factors in both numerator and denominator). Hint: Try it by taking the alternating sum of the rows of Pascal's triangle. Do you see why it must always be true? *** The k-th row of Pascal's triangle is 1, k, k(k-1)/2, k(k-1)(k-2)/6, k(k-1)(k-2)(k-3)/24, etc. For instance, the 3rd row is 1 3 3 1 and the 5th is 1 5 10 10 5 1. Putting in the alternating signs, from (-1)^k, the odd rows sum to zero - e.g., 1 - 3 + 3 -1 = 0, and 1 - 5 + 10 - 10 + 5 - 1 = 0 - since the terms cancel out in pairs. But each term in an even row is the sum of the two terms above it in the preceding odd row. Taking the alternating sum of the terms in the even row is just the same as twice the alternating sum of the previous row, which was 0. Another way to do this problem is to use the binomial expansion: 0 = (1 - 1)^n = sum (k from 0 to n) (-1)^k C(n,k). *** (As an example for the next problem C(8,3) = 8*7*6/3*2*1 = 56.) (4)Calculate C(11,3) and C(9,4). *** C(11,3) = 11*10*9/3*2*1 = 11*5*3 = 11*15 = 165 C(9,4) = 9*8*7*6/4*3*2*1 = 3*7*6 = 126 *** (5)Suppose X is a r.v. which can take on the values 0,1,3 with probabilities .5,.4,.1, respectively. Find E(X), E(2X - 1), var(X), and var(2X - 1). *** E(X) = 1(.4) + 3(.1) = .7 so E(2X-1) = 2(.7) - 1 = .4, and also E(X^2) = 1(.4) + 9(.1) = 1.3 so var(X) = E(X^2) - (E(X))^2 = 1.3 - (.7)^2 = .81 and var(2X-1) = 3.24. *** (6)Suppose Y is a r.v. which can take on the values 1,2,3 with probabilities .2,.3,.5, resp. Find E(Y), E(3Y + 1), var(Y), and var(3Y + 1). e.g., E(3Y+1)=7.9; var(Y) = .61 *** E(Y) = 1(.2) + 2(.3) + 3(.5) = 2.3 E(Y^2) = 1(.2) + 4(.3) + 9(.5) = 5.9 var(Y) = 5.9 - (2.3)^2 = 5.9 - 5.29 = .61 *** (7)Suppose that X and Y (as in #5 and #6) are defined on the same sample space. What is the value of the expectation of X + Y? If X and Y are independent, determine the standard deviation (SD) of X + Y? *** E(X+Y) = E(X) + E(Y) = .7 + 2.3 = 3.0 If X and Y are independent r.v., var(X+Y) = var(X) + var(Y) = .81 + .61 = 1.42, so SD(X+Y) = sqrt(1.42) which is approx. 1.2. *** (8)True or false: For any r.v. X, var(X) = var(-X)? *** True (intuitively, the spread of a distribution is not affected by mirror-reversal). *** (9)If we are given the following table for two indep. r.v.: mean SD X 30 4 Y 12 2 find the mean and SD for (a)X - 2Y, (b)X/2 + Y. e.g., SD(X-2Y)=sqrt(32) *** (b) E(X/2) = E(X)/2 = 15 so E(X/2 + Y) = 15 + 12 = 27. Since X and Y are independent, so are X/2 and Y. Hence, var(X/2 + Y) = (1/4)var(X) + var(Y) = (1/4)(4^2) + 2^2 = 8, so SD(X/2 + Y) = sqrt(8) = 2*sqrt(2). *** (10) Suppose a help desk has a 10% chance of receiving 0 calls, 20% chance of 1 call, 40% chance of 2 calls, 20% chance of 3 calls, and 10% chance of 4 calls during an hour. What is the expected number of calls which the help desk receives in an hour? *** 1*(.2) + 2(.4) + 3(.2) + 4(.1) = 2.0 *** (11) Suppose 10 percent of the computer chips produced fail their tests and must be rejected. What is the probability that the 3rd chip you test is the first one to be rejected? What is the probability that at least one of the first three chips must be rejected? What is the expected number of chips you must examine before rejecting one? *** P(1st reject is 3rd chip tested) = (.9)^2 * (.1) = .081 P(at least one of 1st 3 chips is rejected) = 1 - P(none of the first 3 are rejected) = 1 - (.9)^3 = 1 - .729 = .271. This is a geometric probability problem. If "success" is "rejection", expected number till first rejection is 1/P(rejection) = 10. *** (12) Suppose 1/3 of the applicants for a jury are rejected. What is the chance of rejecting the first four candidates? What is the probability that the second candidate is the first one accepted? What is the expected number of candidates before rejecting one? *** expected number of candidate jurors till 1st reject is 3 *** (13)** Show that for the Poisson distribution (see p. 339 in our text) sum_(j=0)^(oo) P(X=j) is 1. That is, verify that the function shown there is really a pdf. Do you see why the other condition for a pdf, namely that all the values are nonnegative, is automatically true? Don't worry about this exercise if you haven't seen infinite sums (the symbol "oo" means "infinity") or if you don't know what the exponential function means - we will go over this function in class. (14) This problem uses the Poisson distribution but you only need to use the definition and basic properties. If the number X of people who call a help desk per hour can be modeled as a random variable with Poisson distribution and if the expected value E(X) of this r.v. is 10, (a) what is P(X=0) (i.e. of receiving 0 calls in an hour)? (b) what is P(X=1) (i.e., of getting exactly 1 call in an hr)? (c) what is var(X)? *** skip these last two problems for Friday; I'll explain on Monday. ***
Recall that a partition of a set A is a family of sets A_1,...,A_k such that A = A_1 U A_2 U ... U A_k where A_i and A_j have no elements in common when i and j are distinct. This is written A = A_1 + ... A_k.
Short proof that A and B' are independent provided that A and B are independent: Need to show that P(A I B') = P(A)P(B'), where as usual A I B' means the intersection of A and B'. As A = A I B' + A I B is a partition and A,B indep., we have P(A) = P(A I B') + P (A I B) = P(A I B') + P(A)P(B) Hence, using P(B') = 1 - P(B), P(A)P(B') = P(A)(1 - P(B)) = P(A I B').
Here is a short argument for Bayes' Theorem:
Thm. Let S = B_1 + ... + B_k be a partition with P(B_i) > 0 for all i. Then for any i in {1,...,k} and for any event A contained in S (i) P(A) = P(A|B_1)P(B_1) + ... + P(A|B_k)P(B_k) and if in addition P(A) > 0, then P(A|B_i)P(B_i) (ii) P(B_i|A) = ------------------------------------- P(A|B_1)P(B_1) + ... + P(A|B_k)P(B_k) Proof. Since B_1 + ... + B_k is a partition of S, A I B_1 + ... + A I B_k is a partition of A so (i) holds by our extension of the basic additive rule for probability. Hence, by definition of conditional probability and using (i), P(B_i I A) P(A|B_i)P(B_i) P(B_i|A) = -------------- = -------------------------- P(A) sum_{i=1}^k P(A|B_i)P(B_i) and this proves (ii) which is Bayes' Theorem. Now can you work out the "rosebush" problem?
The next homework set will be due on Jan. 28 but not all of the following problems will be included. Try them now and we'll go over some of them in class. I'll assign the most interesting ones for you to write up: In Chap. 14, #9,10,11,14,16,17(a,b); Chap. 15, #4,9,10,14(a,c),16(b), 23,24. You can read Chapters 14 and 15 now, but I suggest that you do _not_ read Chap. 16 until we've been over the idea of random variable (r.v.) in class.
Please write up the following for collection on Fri. 28 Jan Chapter 14: #10,14,16; Chapter 15: #10,14(a,c),24.
A random variable (r.v.) is a function X from S, the sample space, to the real numbers. We are assuming that S is finite but we now drop the requirement that each element in S has the same probability and allow outcomes of the experiment to have different probabilities.
The probability that X has a particular value x, P(X=x), is the probability of the event (X=x):= {s : X(s) = x}, where we write ":=" to mean "is defined as". For instance, if we roll a "loaded" die and let X be the number on the face that turns up, we might have probabilities P(X=1) = P(X=3) = P(X=5) = 1/4, P(X=2) = P(X=4) = P(X=6) = 1/12, P(X=anything else) = 0. The expected value of X is (1/4)(1 + 3 + 5) + (1/12)(2 + 4 + 6) = 9/4 + 1 = 13/4.
Since X is a function, (X=x) I (X=y) = 0 (meaning empty set) when x and y are not equal. Furthermore, since S is finite, there is a finite set T of possible values for X. Hence, if T = {x_1, ... ,x_k}, then
S = (X=x_1) + ... + (X=x_k) is a partition. For example, if we consider the sample space for tossing a fair coin four times and the random variable X is the number of Tails, then P(X=4) = 1/16, P(X=3) = 4/16 (why?), so P(X \geq 3) = 5/16, where "\geq" means "greater than or equal to". E.g., P(X=4) = 1/16 since TTTT is the only member of the sample space with 4 Ts. It is a nice exercise to show calculate the following using the basic properties of probability without needing to actually write out the corresponding elements of the sample space! What is P(X < 3)? What is P(X=2)? Now let us consider the experiment of tossing a coin N times, where the probability of Heads is p and Tails is q = 1-p.
Define a r.v. X to be the number of H in N tosses, so X can take on any value from 0 to N. We define the expected value E(X) of X to be
E(X) = sum x P(X=x). Note that this automatically makes sense since by our assumption the sample space is finite and so there are only a finite number of x values for which P(X=x) is nonzero. What is E(X) when N = 3 and p=1/2. *** E(X) = 3(1/2) = 3/2 since X = X_1 + X_2 + X_3 where the X_i are 0/1 r.v. with E(X_i)=1/2. *** What is E(X) when N = 2 and p=1/3? *** E(X) = 2/3 ***
For Fri. 21 Jan.: Show that if A and B are independent events, then A and B' are also independent events.
The next problem need not be written up but see if you can find the answer for the "rosebush problem", using Bayes formula (given in class, not yet proved)
Suppose that S = B_1 + B_2 + ... + B_k is a partition of the sample space S into k different subsets, each with positive probability. Then for any subset A of S with P(A) not equal to 0
P(A|B_1)P(B_1) P(B_1|A) = -------------------------------------- P(A|B_1)P(B_1) + ... + P(A|B_k)P(B_k)
The problem I gave in class was to find P(W'|D), where W is the event that the plant is watered, W' the event it is not watered, and D is the event that the plant dies. You are given that
P(W')= 2/3, P(D|W) = 1/2, P(D|W') = 3/4 and to fit the format of Bayes theorem above, take B_1 = W', B_2 = W, A = D.
You would expect that P(W'|D) is bigger than P(W') since not watering increases the chance of death, and this is what the calculuation will show.
ans. P(W'|D) = (3/4)(2/3)/[(3/4)(2/3) + (2/4)(1/3)] = 3/4For Wed., 19 Jan, I didn't assign any specific homework. But I expect that students will (1) take some time to go over the notes from class and (2) compare the class notes with the text (chap. 14 & 15). And you can also try that problem I mentioned in class ...
For Tues. Jan. 18, for homework, please do the problem assigned in class:
1. Find P(at least 1 H) on 2 tosses of a (fair) coin. *** P(at least 1 H) = 1 - P(0 Hs) = 1 - P(TT) = 3/4 ***and also do the following:
2. If A and B represent mutually exclusive events (i.e., the intersection of A and B is empty) and if P(A) = .25 and P(B) = .40, find (i) P(A'), (ii) P(A U B), (iii) P(A' I B'), where A' = S - A (the complement of A), "U" denotes union and "I" intersecton. *** P(A' I B') = .35 *** 3. Suppose the experiment consists of selecting a person from some group, and A represents the event that (s)he is healthy, while B represents the event that the person selected is wealthy. What do the following mean? (i) 1 - P(A), (ii) P(A U B), (iii) P(A' I B) *** P(A U B) = P(healthy and wealthy) *** 4. Suppose you have a standard deck of cards and each card has an equal chance of being selected. What is the probability of selecting (i) a red queen, (ii) a red jack or a black king, (iii) an 8,9, or 10 ? *** P(red Q) = 2/52 = 1/26; P(red J or black K) = 2/26 = 1/13; P(8,9,or 10) = 3/13. ***