You can reach me by phone (or voice-mail) at 7-2703 or by e-mail kainen at georgetown.edu or drop by Reiss 258 during my office hours.
**** Last updated **** May. 8, 2006 **********************
Other general information, including for other courses, on the index (classroom) page . The following link takes you to course background information.
Office Hours: Thurs. May 11, 3 to 6 pm; Fri. May 12, 1 to 3 pm.
Location: Reiss 258 (or outdoors in front of Reiss on a nice day!)
The Math Assistance Center, Reiss 256, Sun. - Thurs. evenings will open soon. It usually only covers courses through 036 but I will enquire to see if help can be made available for Math 040 as well.
The old stats homework is located here . remember to use the odd-numbered problems for review. There will be a review session on May 10 from 9 to 10:30 pm in Reiss 283. The final exam is listed on the registar's site as follows (note that it is in White-Gravenor!):
MATH-040-05 WGR 208 Kainen P Sat, May 13 9:00-11:00AM
In response to a question from Daniel Grasso:
If you look at the histogram on p. 458, it allows you to reconstruct the original dataset: 20, 22.5, 25,25,27.5,27.5,27.5, etc., ..., 37.5 (a total of n = 1 + 1 + 2 + 5 + 5 + 5 + 3 + 1 = 23 in the sample), so, calculating y-bar, you should get 31, etc., as they claim. Does that help? Your question was good and I'm going to post it on our website to help others. > professor, > > I am drawing a blank as to how to find (s) for a one-sample t- > interval. On page 459 there is a problem about car speeds and it > states that the s=4.25 mph. I am having difficulty understanding > how they found that number. If you have time i would really > appreciate it if you can re-explain this via a quick email. > > Thanks, > > Daniel > >
Here is an example of a chi-square problem:
Suppose you have seven different car types and three different groups (executives, middle management, and technical staff). If you count the numbers of each type of car owned by each of the three groups you get a table with 7 rows and 3 columns. To get the expected number of car-type-1 owned by executives exp(1,exec), use the following equation exp(1,exec) = frac(exec) * tot # of car-type-1 where frac(exec) = (# of execs)/N, N = # of execs + # of midmgt + # of tech staff, tot # of car-type-1 = # of car-type-1 observed by the three groups So if the execs constitute 1/3 of N and 45 cars of type 1 are counted, one expects that 15 of them will be owned by execs. If actually 20 are owned by execs, then the corresponding residual for that cell of the table will have the value 5. To calculate chi-square, you take the residual for each cell, square it, and then divide by the expected number. For the cell corresponding to execs and car-type-1, you would get 25/15 = 1.67. Now add these numbers over all the 21 cells (7 * 3) and that's the chi-square value. Now suppose that the value you get is 30.9. Should you reject the null hypothesis that all three groups have homogeneous choice of cars, in favor of the alternative hypoth: the three groups don't have homogeneous choice of cars? Suppose you need to keep alpha at most .05. Try it first before you look! Spoiler below ... First, to decide if the chi-square test is even appropriate, let's assume that the usual blather about randomness is ok. THere is one serious condition: the expected counts should be greater than 5. Let's assume that's ok, too. Now all you need to answer this is to find the number of degrees of freedom (d.f.). For a table with r rows and c columns, d.f. is (r-1) times (c-1) so the d.f. is 6*2 = 12. Since alpha = .05, using the table in our text, the critical chi-square value is 28.3. Hence, if the value you calculate from the table is more than 28.3, then you reject the null hypoth. Since 30.9 > 28.3, you reject H_0. Thus, according to our chi-square test, the type of car is not indep. of the employment type (exec, midmgt, tech). Note that for a chi-square goodness-of-fit test, the d.f. is just n-1 where n is # of cells.
For such a hypoth-testing problem, like the others we have studied, you need to know: (1) what is the test statistic? (i.e., z, t, or chi-square); (2) how do I calculate the test statistic? e.g., for chi-square, you take the sum of (obs-exp)^2/exp over all the cells, while for a t-test of a hypoth. about the mean, you take X-bar - mu_0 and divide by SE(X-bar); (3) how do I calculate the critical value of the test stat so that rejecting H_0 based on being above (or below) the critical value has at most prob. alpha of rejecting H_0 if, in fact, H_0 is true? E.g., if X-bar involves a sample of size n, then look up the t-value with n-1 d.f. corresponding to prob. alpha using 1-tail or 2-tail depending on whether the hypoth. test is 1-tail or 2-tail (for d.f. at least 50, there is very little difference between the z and t distributions and I'll accept either one). Also note that you can find SE(X-bar) by taking either s or sigma and dividing by sqrt(n). (4) do large or small values provide evidence to reject H_0? For instance, a small value of X-bar can't give evidence in favor of a 1-tail hypothesis where H_A asserts that the mean is larger than some given number mu_0 since the null hypothesis H_0: mu = mu_0 means in this case mu at most mu_0.
These questions (1) through (4) would be appropriate for our midterm. Note that they require very little computation but they may use a table. Don't worry if you've lost your tables - I'll give everyone copies to use. It is the concepts and procedures that I want to make sure you understand.
As announced in class on Friday, midterm postponed to Mon. May 1. We will also cover part of chap. 27 - but only what is given in the three problems below. Note the chart at the end of the chapter which compares the tests for the slope of the regression line with tests for the mean.
For Wed., in Chap. 27 #2,4,8,18,20. We will go over some of these problems in class - I'll call on some of you to present them. For #2, assuming that the conditions have been met, the sampling distribution of the regression slope can be modeled by a t-model with 11-2 = 9 d.f. Your calculation should give t = 7.85 which has a P-value of .0001 (by program) or using the table in the back, the P-value is (much) less than .01.
If H_0: mu = 90 and H_A: mu < 90 and if alpha = .01, decide whether or not to reject H_0 in favor of H_A based on the following data: Sample of size 16 has X-bar = 78 and s = 20. Assume the population is approximately normal.
Since the sample average is in the direction favored by the one-sided alternate hypothesis, it is plausible that it could give sufficient evidence to reject H_0 in favor of H_A - but only if it is far enough below what one could reasonably expect by chance if H_0 were true. Far enough means that the calculated t-statistic given by
X-bar - mu_0 78 - 90 - 12 ------------- = ------------ = ------- = -2.4 SE(X-bar) 20/sqrt(16) 5
is below the cut-off value of
-t_(.01) = -2.602 as given by the t-table, 2nd col from the right (for alpha = .01 in a one-tail situation, with 15 degrees of freedom.
Since -2.4 is _not_ less than -2.602, the evidence is not sufficient to reject H_0 with at most a one-percent chance of making a type I error.
If H_0 is the same, H_A: mu not equal to 90, alpha is still .01, decide whether or not to reject H_0 in favor of this new H_A given following data: Sample of size 16 has X-bar = 78 and s = 14. Assume the population is approximately normal.
Now we have a two-tail hypothesis to test, and sample averages that are sufficiently large as well as sufficiently small can both provide enough evidence to reject H_0 in favor of the new H_A. Since the population is small but s is known for the sample, we can use the t-test again. We first calculate the test statistic:
... just as above but instead of 20/sqrt(16) = 5, we now have 14/4 = 3.5 in the denominator, so the t-value is -12/3.5 = -3.43, while the critical t-value in the two-tail case with alpha = .01 is -2.947 so the statistic is small enough to reject the null hypothesis.
Determine a 99.7 percent confidence interval for mu given that the s.d. sigma is known to be 14, basing the confidence interval on a sample of size 49 with sample average X-bar = 33.18.
Here you first calculate the margin of error
ME = s.d.(X-bar) * z_.0015 where s.d.(X-bar) = sigma/sqrt(n) = 14/sqrt(49) = 2, and z.0015 = 3. Hence, the ME = 6, so the interval, centered at 33.18, of [27.18, 39.18] is the 99.7-percent confidence interval. Note that it would also be ok to use the t-distribution here, with 48 d.f., but the values will be quite close.
Ok here's a simple "homogeneity" problem which you can compute by hand. I'll include the answer below so you can check it. Then you can try a computer method and see if it gives the same answers!
Suppose on a cruise ship you know that the following counts occurred for the types of breakfast consumed by passengers and crew. Do you believe that there is no evidence of difference between the choices made by crew and passengers (homogeneity)? Or should you reject the null hypothesis? Explain. passengers crew cold cereal 30 10 hot cereal 20 10 eggs and ham 40 60 continental 10 20 spoiler below ... spoiler below ... The null hypothesis H_0: Passengers and crew have the same distribution of breakfast preferences vs. H_A: Passengers and crew have different preferences. passengers crew total cold cereal 30 (20) 10 (20) 40 hot cereal 20 (15) 10 (15) 30 eggs and ham 40 (50) 60 (50) 100 continental 10 (15) 20 (15) 30 total 100 100 200 Since 100/200 = 1/2, expected counts are as shown (in parens), so squaring the residuals (obs-exp) and dividing by the expected counts, you should get the table of chi-squared entries (i.e., values to be summed) E.g., for the upper left cell, 30-20 = 10 and squaring 10 and dividing by 20 gives 100/20 = 5, etc. passengers crew cold cereal 5 5 hot cereal 25/15 25/15 eggs and ham 2 2 continental 25/15 25/15 So the chi-squared value is 14 + 100/15 = 20.66... Since r = 4 and c = 2, df = 3 and from the table in the back of the book, the P-value is less than .005 (in fact, very much less). Hence, you should reject the null hypothesis.
The homework (given out in class last Friday) was collected on Tues. Apr. 11, but the quiz was deferred to Wed. Apr. 19.
For Wed., April 19, we will go over the homework which I gave you over the break (e.g., for #1, the first CI is (.55,.65) - or slightly smaller if you calculate exactly). I asked you to do #2 and #12 of Chap. 26 also.
For the quiz, also on 4/19, I'll ask you:
Here are partial answers for #1 through #5 just below on the page: !. CI is [.55,.65] approx. 2a Don't reject H_0 (the data has wrong direction in terms of the alternative) 2b Don't reject H_0: You calculate a statistic which comes out to be 1.27 but that isn't large enough (it would have to exceed z_.025 = 1.96). 3 Reject H_0 (since 2 > z_.05 = 1.645) 4 Retain H_0 the test statistic is (42 - 50)/(11.9/sqrt(12)) which is about -2.32 but this is not less than -t_.01 = -2.72 don't worry about the P-value 5. Reject H_0 (the stat is about 3 which is bigger than 1.96) For #5, the stat is (p_1-hat - p_2-hat)/SE(p_1-hat - p_2-hat) = (approximately) .12/.04 since the null hypothesis is that p_1 = p_2 and hence the mean of (p_1-hat - p_2-hat) (as a sampling distribution) is zero. One gets SE(p_1-hat - p_2-hat) using sqrt(p_1-hat*q_1-hat/n_1 + p_2-hat*q_2-hat/n_2) and p_1-hat itself is 120/200 = .60, p_2-hat is 240/500 = .48
Here are some practice problems for Tues. 4/18. I won't collect them but I will call on some of you to put the problems on the board.
1. A random sample of 400 voters provides a sample proportion of .60 in favor of some proposition. Give a 95-percent confidence interval for the true proportion of the population which favors the proposition. 2.(a) If the null hypothesis is that 3 percent of new tires are defective vs. the alternative hypothesis that more than 3 percent are defective, what can you conclude from a sample of 625 tires with a sample percentage of 2 percent defective? Assume the sampling is suitably random. (b) If your sample percentage is 4 percent defective and you have an significance level (alpha) of .025, what do you do? 3. A random sample of 100 recorded deaths in the US during the past year showed an average life span of 71.8 years with s.d. of 8.9 years. Does this seem to indicate that the average life span today is greater than 70 years? Use a 0.05-level of significance. 4. The average length of time for a student to register for classes at some college is 50 minutes. Suppose a new procedure is tried on a random sample of 12 students and the average time is 42 minutes with a standard error of 11.9 minutes. Test the hypothesis that the new procedure is faster than the old procedure, using a significance of .01. What is the p-value? Assume that the distribution of times is approximately normal. 5. If in a sample of 200 urban voters, 120 favor a proposition, while in a sample of 500 rural voters, 240 favor the proposition, should we reject the null hypothesis that these two groups have the same degree of support for the proposition in favor of the alternative that the urban support is greater than the rural support? Use a .025 level of significance.
Please do #10 and #16 of CH. 20. For #16, ALSO do the problem with a different alternative hypothesis H_A: p not equal to p_0 (i.e., the 2-tail version). These will be collected for homework on Friday Apr. 7. Also read chapter 21 up to p. 420.
Homework due Tues. April 4: CH. 19: #4, 12a,b, 14, 20, 22, 24.
For hw due 3/28: pp. 349--351: #4,16; pp. 370--372: #4,10,16,28* (for the last problem you may use a calculator)
For Friday (3/24), try pp. 349--351: #4,16,22. Read Ch. 18, and then try on pp. 370--372: #4,10,16,22,28. Some of these will be selected for homework due Tues. 3/28.
For homework due Tues. Mar. 21 for collection. I did the problems below in class today with the values given in the text (.02 and 4/5, resp.). The homework below asks you to do the same problems but with the changed probabilities of .01 and 2/3, resp. If you missed class, you should check with someone who was there to see the notes.
Ch. 17, pp.343--344: #8 and 10 but change P(chip fails) to .01; #14a,b,d and 16a,b - but with P(hit the target) to 2/3.
Chap. 16 : Homework for Tues. 3/14: pp. 325--326 #2b,8 10b,22,24 - total of 5 problems to be collected.
Also for Wed. 3/15 try pp. 327--328 #32,38 and read Ch. 17 - try: #8,10,14,16,18,22,26,28,36 pp. 343--345.
#1 (a) What is the probability of drawing 2 hearts if you draw 2 cards from a standard deck? (b) What is the probability of drawing 2 cards of the same suit if you draw 2 cards from a standard deck? (c) What is the expected number of hearts if you draw 2 cards from a standard deck? ... spoiler below (try it first and then look) Answers: We write C(n,k) for the binomial coeff. n choose k = n!/k!(n-k)! Use cancellation when you can to make the calculations easier. 1a C(13,2)/C(52,2) = 13*12/52*51 = 1/17 = .06 approx note that P(1 H and 1 nonH) = 13*39/C(52,2) = 13/34 = .38 approx so P(2 nonH) is 1 - (.06 + .38) = .56 approx, and this agrees with the direct calculation P(2nonH) = C(39,2)/C(52,2) = 39*38/52*51 = 38/68 = .5588 ... 1b 4 times larger = .24 approx 1c E(# of hearts) = E(X) = Sum_x P(X=x)*x = 2 * .06 + 1 * (.38) = .50 and in fact this is exact: 2/17 + 13/34 = 17/34 = 1/2. Note that E(# of hearts in drawing one card) is 1/4 so E(# of hearts in drawing two) = 1/2, and E(# of hearts in drawing 12 cards) = 12/4 = 3.
Some answers: Ch. 14 #10c - legitimate, d - not legit since can't have negatives #18a .027, b .125, c .001, d .729, 3 .784 #24 .469 Ch. 15 #6a .06, b .50; #20a .444, b No. (4 % of US residents have been to both) c No. 18 % of res. have been to Canada but 4/9 = 44.4 % of those who have been to Mexico have also been to Canada, so P(C) is not equal to P(C|M). Intuitively, this should make sense; the likelihood of traveling to Mexico is higher given that a person has traveled to Canada. #10 abc .62, .867, .194 #26a Yes, percent who graduate depends on the school they attend. Part b: (.7)(.75) + (.3)(.9) = .525 + .27 = .795 #36 P(A|D) = P(A and D)/P(D) = (.7)(.01)/[(.7)(.01) + (.2)(.02) + (.1)(.04)] = .007/[.007 + .004 + .004] = 7/15 = .467 The reason is that D = (D and A) or (D and B) or (D and C) so P(D) = P(D and A) + P(D and B) + P(D and C) since these events are mutually exclusive (exactly one can occur). Now P(D and A) is P(A)P(D|A), etc.
For Tuesday Feb. 21, try Ch. 14 #24; Ch. 15 #6ab, 8ab, 20abc, 24, 26, 28, 30, 32, 36; for homework to be collected on Wed. Feb. 22, Ch. 14: #18abc, Ch. 15: #10abc,36
Here are some selected answers to check your work: Ch. 14 #14a .04, .51, .55; b: (.4)^4 = .0256, etc. Ch. 15 #26b 79.5 percent; #28 66 percent For Fri. 2/17, please read Chap. 14 and on pp. 291--293, please try #10,14,16,18(a,b,c). Then read pp. 294--302 in Chap. 15 (you can read more if you have time) and on pp. 310--311 try #2,4,10,14(a,c). Note that there is a misprint in the book on p. 302 (right-hand column, last paragraph above "drawing without replacement"). It should read: "Overall 36 percent of the drivers got blood tests, but only _28_ percent of those who got a breath test do." (i.e., also got a blood test).
For Wed., read through chapters 10 -- 13 and bring your questions to class!
For Tuesday 2/14, read chapter 13 (if you haven't already done so) and be prepared to discuss the following problems on pp 267--271: #2,4,10,12,20,22,24. These won't be collected tomorrow but make some notes so you can respond in class.
For Tues. Feb 7: homework to be collected: Chap. 9 (pp. 181--182) #8,12,14 (use #11 and 13 to see what is expected from #12 and 14). We'll talk about #11 and 13 in class on Monday but try them first on your own - then look up the answers in the book and reread the chapter to see why the book's answers are correct.
Here are problems from Chap. 8 for review: pp. 158--164: #4,6,8,10,18,22. Also for Ch. 8: #2 ab and #36 (_not #30 which I mistakenly assigned before; we will do it using the computer later - though you can try now if you like).
For those who want to check my claim that the slope of the line of best fit does not change if the x and y variables are interchanged - provided that the standard deviations are the same, here is the concrete example I mentioned. Here are three data points for which you can make a scatterplot: (1,4), (3,-1), (-4,-3) that is, x = (1,3,-4) and y = (4,-1,-3). Note that x-bar = 0 = y-bar; i.e., both means are zero. Also the standard deviations are both sqrt(13). For instance, s_x = sqrt [(1 + 9 + 16)/2]. Let's figure out the slope of the line of best fit. The line must go through (0,0) since both means are zero and let's call its slope m. The residuals are the differences y_i - y-hat_i, where y-hat_i is the value predicted for y_i by using the line; in this case, y-hat_i = m x_i. Calculating the sum of the squared residuals by using the algebraic fact that (a + b)^2 = a^2 + 2ab + b^2, you get SSR(m) = (4 - m)^2 + (-1 - 3m)^2 + (-3 + 4m)^2 = (16 - 8m + m^2) + (1 + 6m + 9m^2) + (9 - 24m + 16m^2) = 26(m^2 - m + 1). To find the value of m which minimizes SSR(m), take the derivative, set the result equal to zero, and solve for m. So 2m - 1 = 0 and hence m = 1/2. Now if you take the same data but reverse x and y, you have x' = (4,-1,-3) and y' = (1,3,-4) so the means are still zero and the standard deviations are still the same. Let k be the slope of the line of best fit for this new ``reversed'' data, and do the same calculation as above to find SSR(k) = (1 - 4k)^2 + (3 + k)^2 + (-4 + 3k)^2. Doing the same calculation as above, you should get 26(k^2 - k + 1) so the unique root of the derivative is 1/2 again! That means that the line of best fit for the reversed data has the same slope as the line of best fit for the original data.
You may wish to try the following online survey.