 # AP Stats Unit 5 Practice FRQ

A researcher in Yellowstone National Park observed the “Old Faithful” geyser for several weeks. For each eruption of the geyser, the duration from start to end, in seconds, was recorded. The histogram below summarizes the results from 421 observations. The mean of the distribution is 210 seconds, with a standard deviation of 68 seconds. a. Describe the sampling distribution of sample mean eruption length for random samples of 40 eruptions from the researcher’s observations.

b. What is the probability that the sample mean eruption length for a random sample of 40 eruptions is 200 seconds or less?

a) The sampling distribution of sample mean eruption length for random samples of 40 eruptions has a mean of 210 seconds, and it is bimodal with peaks at 100-125 seconds and 250-275 seconds. The shape of the sampling distribution of sample mean eruption length seems to be roughly symmetrical, and the range of the sampling distribution of sample mean eruption length for random samples of 40 eruptions is no more than 250 seconds.

b) x = mean geyser eruption duration for a random sample of 40 eruptions

Conditions : Random - stated that there were random samples of 40 eruptions, 10% Rule for Independence - satisfied since there are at least 421(10) = 4210 observations of geyser eruptions, Normal/Large Sample - satisfied since n =40 >= 30; therefore, the sampling distribution of sample mean eruption duration is approximately normal.

P(x<200) = P(z<-0.14) = .4443 <–from Table A using z-score of -0.14

z = (200-210)/68 = -0.14

[pretend i drew a picture of a normal distribution here with 210 as median, 200 slightly to left of it, and everything shaded below -0.14]

The probability that the sample mean eruption duration for a random sample of 40 eruptions is 200 seconds is less is 0.4443.

Hi Brandon!

Thanks for replying. I’ll preface the feedback with an acknowledgment that we are unlikely to get this style of question with this year’s modified test. But anyway, on to the feedback!

In part (a), you appear to misunderstand what you’re being asked to describe. You describe the distribution provided by the histogram. However, the histogram is really providing the distribution of the “population” in this scenario; we are being asked to describe what it would look like if we took repeated samples of 40 eruptions from the graph shown and create a new graph of x-bars. Since 40 > 30, the shape of the original distribution (the one you described) doesn’t matter; the Central Limit Theorem applies and we can describe the resulting sampling distribution as approximately normal, with a mean of 210 seconds and a standard error of 68/sqrt(40) seconds [using formulas from our formula sheet].

The misconception in part (a) then extends to part (b) - you use the “original” standard deviation when we should instead use the standard error of 68/sqrt(40) = 10.751 seconds. This impacts the z-score you would get an ultimately your associated probability. In previous rubrics, you would still get partial credit for calculating the probability that you did, because you did all of your work correctly given the mistake you made.

Another small thing: you had the right idea checking the 10% condition, but used the wrong numbers. We should be comparing 40 to 421 (and since 40(10) = 400 < 421, the condition is still met). I am unsure whether that would be penalized on a typical rubric.

Hopefully this explanation helps!
Jerry

a) The population is approximately normal, the value of n (40) is >= 30 by the Central Limit Theorem, and the sample shows no strong skew or outliers. The center of the sample mean is 210 seconds. The variability is 10.752 seconds because 68/sqrt(40) = 10.752.
b) P(x<=200) = ?
Use the z = (x-bar - mu)/(standard deviation/sqrt(n)) equation
(200-210)/10.752 = -0.93
Using Table A, a z-score of -0.93 is a p-value of 0.1762.
The probability that the sample mean eruption length for a random sample of 40 eruptions is 200 seconds or less is 0.1762.

Hi Sophia -

Thanks for submitting. This looks really good.

In part (a), you correctly describe the shape, center, and spread of the sampling distribution, citing the Central Limit Theorem as the reason for the distribution being “approximately normal.” What you should be careful with is that you start by saying “the population is approximately normal”, when it’s the sampling distribution that is approximately normal. Unfortunately, that would sometimes be enough to lower your score by a level (from fully correct to partially correct); you’ve used the wrong statistical term.

In part (b), you do a good job of communicating the probability you are asked to find, then carry out calculations correctly and answer in context. Nice job!

~Jerry

a.) The sampling distribution is approximately normal because according to the Central Limit Theorem, if the sampling size (40) is greater than 30, the shape is approximately normal. The mean of the sampling distribution is 210 seconds. The standard deviation of the sampling distribution is 68/sqrt(40)=10.75.

b.) P(x<200)= P(z<-0.93)= .1762
z= P(200-210)/10.75 (from part a)=-.93
There is a 17.62% chance that the sample mean eruption length for a random sample of 40 eruptions in 200 seconds or less.

Good on both parts! In part (a), you give correct descriptions for shape, center, and spread, and correctly invoke the Central Limit Theorem since n = 40 > 30. In part (b), you calculate the correct probability. Small note on notation: P(X < 200) should be P(x-bar < 200). I know that we can’t format “x-bar” on here, but you were asked about a sample mean so we need to use the appropriate symbol. That actually could be enough to bump you down a scoring level, so watch your symbols/notation carefully.

~Jerry

A. The sample is approximately normal due to the Central Limit Theorem (40 is greater than 30). There seems to contain not outliers. For the center, the mean of the distribution is 210. And for the spread, the standard deviation is 10.75 ( 68/ square root of 40).

B.
Require Assumptions:

1. Sampling: There is a random sample of 40 eruptions.
2. Normally Distributed: 40 is greater than 30 therefore it meets the Central Limit Theorem so we can assume approximately normal.
3. Independence: 10(40) is less than all geyser eruptions.

The mean is 210. Standard deviation is 10.75 ( 68/ square root of 40). I then proceeded to find the z-score:
200-210/10.75=-.9302.
The probability statement is P(z is less than or equal to -.9302).
Then using my calculator I did normalcdf(-1000,-.9302,0,1) and found the p-value which is .1761.

To conclude, there’s a 17.61% chance that a random sample of 40 eruptions is 200 seconds or less.

Also I would have added a sketch to show the distribution

Well done!

You’ve correctly invoked the CLT in part (a) to justify your shape being approximately normal, while giving correct measures of center and spread. Be careful - your first two words are “the sample” instead of “the sampling distribution” - there’s a big difference in those two things. There are no issues in part (b) - nice job!

A. The sampling distribution of sample mean eruption lengthfor random samples of 40 eruptions would be approximately normal due to the Central Limit Theorem — because the sample size is greater than 30, the sampling distribution will be approximately normal.
B. The probability that the sample mean eruption length for a random sample of 40 eruptions is 200 seconds or less is about 44.04%. The Z-score would be about -0.15.

Hello again -

For part (a), you correctly identify the shape as “approximately normal” due to the CLT (and give the correct reason, n = 40 > 30). However, a description of a distribution (of any type) should include measures of center and spread to go with shape. (Many teachers use “S.O.C.S.” or “C.U.S.S.” as acronyms to help students remember - Shape/Outliers/Center/Spread or Center/Unusual Features/Shape/Spread). In this case, you did not mention the mean of the sampling distribution (which would still be 210) or the standard error (which would be 68/sqrt(40) = 10.75). This then impacted your probability calculation in part (b). You correctly used z-scores and did a correct calculation for what your z-score was, but would only earn partial credit from not calculating the standard error in part (a)

~Jerry

1 Like

Thank you for the feedback!

a) The sampling distribution of sample mean eruption length for random samples of 40 eruptions from the researcher’s observations is approximately normal(random samples of 40 eruptions > 30; Central Limit theorem). The distribution has a mean(center) of 210 sec and and standard deviation (spread) of 10.75 sec ( 68/ sqrt 40).

b) The probability that the sample mean eruption length for a random sample of 40 eruptions is 200 seconds or less is 0.176.
Conditions: Random: Random sample of 40 eruptions was taken.
Independent: Random sample of 40 eruptions is less than 10% of the
population; 400<421
Normal: 40 samples > 30 ; Central Limit Theorem is satisfied
Calculator: normalcdf[ Lower:0, Upper: 200, u: 210, st. dev. : 10.75] = 0.1761

Perfect all around!

1 Like

a)
The distribution eruption length is approximately normal as n>30 with a range of 225 seconds, a mean of 210 seconds, and a standard deviation of ~10.7517 seconds.
Do I need to mention outliers and range here?
b)
According to the central limit theorem, a sample of n>=30 so our sample of 40 tells us this is approximately normal. Also, 40*10 is less than the population of 421 and we are told the sample is random.
normCdf(lower=-1e99,upper=200,μ=210,σ=10.751744)=0.176164

I had one question about Sampling distribution. So basically is sampling distribution just taking results from sample(random from a population) distribution and then taking the mean of them(xbar’s)? If so then how would you properly describe this because when I searched online the results were not too clear and if not could you maybe explain it in simpler terms?
Thanks.

Did you mean standard deviation instead of standard error?

1 Like

a. The sampling distribution of 40 random samples of eruption would be approximately normal. The distribution of 40 random samples would be centered at the mean of 210. The shape of the distribution would be bell-shaped and approximately symmetrical. The sampling distribution would be spread with a standard deviation of 68/sqrt(40) = 10.752. The sampling distribution would not have any unusual features or gaps.

b.
Assumptions:
-We have a random sample of geyser eruptions.
-Population of eruptions is at least 400.
-Since the sample size is large enough (n>30) due to CLT, the sampling distribution is approximately normal
-Sigma_d is known
Calculations
p(x_bar ≤ 200) = normalcdf(-1E99, 200, 210, 10.752) = 0.1762
Conclusion
The probability that the sample mean eruption length for a random sample of 40 eruptions is 200 seconds or less is 0.1762.

Good work! Quick feedback on your answers: part (a) has everything needed to describe a distribution (center, shape, and spread are mentioned, so no need for outliers/range), though you should show where the 10.75 seconds calculation came from. Part b, you’ve done all appropriate calculations.

As for your question - it’s a little hard to answer in text, but I’ll do my best. A sampling distribution will show us the results from lots of different individual samples taken from a population; each “dot” on a sampling distribution will represent an x-bar (or p-hat) obtained from a single sample. A sample or population distribution shows you individuals. So a sample distribution of heights would show you the individual height of each individual person in the sample. A sampling distribution of heights would show you the mean heights from multiple samples of individuals. The bigger the sample size, the more consistent those mean heights would get, hence the Central Limit Theorem saying that if n > 30, our sampling distribution is approximately normal, and standard error (a fancy way of saying standard deviation, but for samples, not individuals) decreases as n increases (if sample results become more consistent, variability decreases).

In theory, a sampling distribution displays all possible x-bars/p-hats we could get from all the different possible combinations of samples, and provides the basis for calculating margins of error, test statistics, or p-values.

To bring it all back around, what you calculated in part (a) is what a graph of a whole bunch of x-bars obtained from a whole bunch of separate samples of n = 40 eruptions selected out of the original data set of the 421 eruptions would look like. The graph would look approximately normal, with a mean of 210 and a standard error (deviation, but for samples) of 10.75. That graph would provide the basis for us to make calculations about a single sample if we were going to run a hypothesis test or build a confidence interval.

1 Like

Solid work once again! The only possible issue: in part (a), you mention the shape as “approximately normal”, but don’t give the reason for that until part (b) (since n = 40 > 30, the CLT applies). On some rubrics, we’d be able to give you retroactive credit for part (a) based on that description in part (b), but it’s always safe to show that the CLT applies whenever you’re citing a sampling distribution of x-bar being approximately normal.

So sampling distribution is all combinations of xbars/phats of a single smaple from the population? 2550 north lake drive
suite 2
milwaukee, wi 53211

✉️ help@fiveable.me

*ap® and advanced placement® are registered trademarks of the college board, which was not involved in the production of, and does not endorse, this product.