# AP Stats Unit 6 Practice FRQ #2

After completing a sale, a car company likes to send a follow-up survey where customers can indicate their level of satisfaction with their experience. One of the questions in the survey asks “would you recommend our company to a friend looking to purchase a vehicle?” The company wonders if people would answer the question differently based on whether they bought a new or used vehicle. From a list of all 2018 vehicle sales, the company randomly selects 105 customers who bought a new vehicle 120 customers who bought a used vehicle. 88 of the customers who bought new vehicles answered “yes,” while 85 of the customers who bought used vehicles answered “yes.”

At the significance level of 0.05, do the data provide convincing statistical evidence that the proportion of customers who would answer “yes” to the survey question is different for new vs used vehicle sales?

P(yes|new) = 88/105 = 0.838
P(yes|used) = 85/120 = 0.708
margin of error = +/- (1.960)sqrt[(0.838(0.162)/105)+(0.708(0.292)/120)]
margin of error = +/- 0.108
confidence interval = 0.05 +/- 0.108 = (-0.058, 0.113)
No, the data do not provide convincing statistical evidence that the proportion of customers who would answer “yes” to the survey question is different for new vs used vehicle sales. Since 0 is captured in the 95% confidence interval of -0.058 and 0.113, the data shows that the true difference in proportions could be 0.

Hello again!

I’ll give feedback on your work below, but I want to start with noticing that you used a 2-sample confidence interval to answer the question. That is a totally valid strategy for a situation like this, but only because the alternative hypothesis was “different”; had the scenario asked “higher” or “lower” the confidence interval would not work in the same way. Typically, when given a significance level, and asked if there is “convincing statistical evidence” of something, we should be running a hypothesis test. That said, you will still be scored for your work with the confidence interval. The scoring for a “convincing statistical evidence…” scenario includes:

1. Stating null/alternative hypotheses
2. Defining the parameters in the null/alternative hypotheses
3. Choosing an appropriate test/interval by name
4. Checking the conditions to run the chosen test/interval
5. Writing the results from the chosen test/interval
6. Correctly interpreting the results from the chosen test/interval in terms of whether we do or don’t have evidence for the alternative hypothesis.

Given that list (some parts are scored together to create a question with 3-4 scoring components), you can likely see that your work doesn’t have enough there to be earning much of the available credit. You calculate the appropriate margin of error, and therefore obtain a confidence interval, but never name the interval, check conditions (random samples, approximately normal sampling distribution [at least 10 successes/at least 10 failures], 10% condition), or write hypotheses. Additionally, you used “0.05” in the interval, instead of using (0.838 - 0.708 = 0.13) as your difference of proportions to add/subtract the margin of error. That would have led you to a different confidence interval where 0 was not included. Given that your interval did include 0 though, your conclusion that we do not have convincing evidence would get scored as correct, because you interpreted the answer you got correctly. Unfortunately, you would not get credit for the other components of the question.

Whew, that was a lot of writing! I hope I was able to be clear in my explanations though - it’s clear that you know many of the concepts, but need some work in how to address the types of questions the AP exam gives. Having said all of the above, I should note that with the formatting changes of this year’s test, you will likely not need to do the level of calculations that you did. It’s much more likely that you will be given an interval (or p-value, or something else) and be asked to just do the interpretation. Or perhaps choose an appropriate procedure and check conditions, but not actually finish building the interval/obtaining the p-value.

Thanks for reading, and I hope this helps!

~Jerry

p_1 = the proportion of customers who bought a new vehicle and answered yes to the survey question
p_2 = the proportion of customers who bought a new vehicle and answered yes to the survey question

2-sample z test for p_1 - p_2

H_0: p_1 - p_2 = 0, H_a: p_1 - p_2 not equal 0

Conditions:
Random - Stated that the company “randomly selects” customers for the survey
10% Condition for Independence - satisfied since it is safe to assume that there are at least 105(10) = 1050 customers who bought a new vehicle at the car company, and at least 120(10) = 1200 customers who bought a used vehicle at the car company.

Large Counts Condition - satisfied since
n_1p-hat_1 = 1050.838 = 87.99 >=10
n_1*(1-p-hat_1) = 1050.162 = 17.01 >=10
n_2
p-hat_2 = 1200.708 = 84.96 >= 10
n_2
(1-p-hat_2) = 120*0.292 = 35.04 >=10

With Large Counts Condition satisfied, the sampling distribution of p-hat_1 - p-hat_2 is approximately normal.

p-hat_1 = 88/105 = 0.838, n_1=105

p-hat_2 = 85/120 = 0.708, n_2=120

z* = 2.303

P-val = P(z>=2.303 or z<=-2.303) = 0.021247

Since 0.021247 < alpha of 0.05, we reject the null hypothesis, because there is convincing statistical evidence that the proportion of customers who would answer “yes” to the survey question is different for new vs used vehicle sales.

Boys, it’s too bad that we probably won’t get a “straight-up” hypothesis test like this on the exam, because you are READY for them.Strong execution from top to bottom, presented clearly. One thing that you’re going to facepalm about: you defined the parameters p1 and p2 as the exact same thing. “p2” should say “used”

test type- two sample proportional difference hypothesis z-test
formula

1. introduction-
H0= p1-p2=0
Ha=p1-p2 does not = 0

p1=the proportion of all customers from a list of 2018 vehicle sales who bought a new vehicle and would recommend our company to a friend
p2=the proportion of all customers from a list of 2018 vehicle sales who bought a used vehicle and would recommend our company to a friend

1. conditions-
Since we are dealing with a two-sample proportional difference hypothesis test, we will have to pool/combine our proportions.
pc=88+85/105+120=0.75
qc=1-0.75=0.25
*we have to check for the independence of our pooled data --> .75(225)=168.75 >=10 & .25(225)=56.25 >= 10

conditions for new cars:

• simple random sample: stated in the problem- “the company randomly selects 105 customers…”
• independence: 10(105)=1050 Assume that the population of new cars purchased in 2018 is greater than 1050.
• normal: .84(105)=(88 >= 10), .162(105)=(17 >= 10)
All of the conditions for new cars are met.

conditions for old cars:

• simple random sample: stated in the problem- “the company randomly selects…120 customers…”
• independence: 10(120)=1200 Assume that the population of new cars purchased in 2018 is greater than 1200.
• normal: .708(120)=(85 >= 10), .292(120)=(35 >= 10)
All of the conditions for old cars are met.
1. solve-
z=.84-.708/(sqrt.(.25x.75)/105 + (.25x.75)/120) = 2.28 --> *I looked at table z to find the p-value of -2.28, p-value=.0129(2)=.0258

2. conclusion-
(.0258<.05 our significance level) --> Reject the H0 in favor of the Ha. We have significant evidence that the proportion of all customers from a list of 2018 vehicle sales who bought a new vehicle and said they would recommend the company is different than the proportion of all customers from a list of 2018 vehicle sales who bought a used vehicle and said that they would recommend our company to a friend.

QUESTION: In the fiveable livestream about hypothesis tests for proportions, there was an similar example about a pharmaceutical company testing a new headache remedy and testing 2 treatments (old vs. new) given to 2 different sample of people.
For the null hypothesis, we did not do a difference of P_new and P_old (p1-p2) and instead we did P_new= P_old and for the alternative: P_new> P_Old.

For this question could the hypothesis have been Ho : p1=P2, Ha = p1 does not equal p2?
p1= Proportion of customers who would answer “yes” to the survey question
p2= Proportion of customers who would answer “no” to the survey question.

Thank you so much for your help.

p1= Proportion of customers who bought a new car and answers “yes” to the survey question.
p2= Proportion of customers who bought a used car and answered “yes” to the survey question.

Ho= p1-p2=0 Ha= p1-p2 does not equal 0

We are interested in conducting a 2 sample z test for a difference in population proportions.

Conditions:
Random- A random sample of 105 customers who bought a new vehicle and 120 customers who bought a used vehicle is taken
Normal- Sample of new cars: np = 105 * 0.838= 88 is greater than or equal to 10.
n(1-p)= 105(0.162)= 17 is greater than or equal to 10.
Sample of used cars: np= 120* 0.708= 85 is greater than or equal to 10.
n(1-p)= 120(0.292)= 35 is greater than or equal to 10.

Calculator: 2-Prop Z Test {x1=88, n1=105, x2=85, n2=120, p1 does not equal p2} = p: 0.0212

Since the p-value of 0.0212 is less than our alpha level of 0.05, we have convincing statistical evidence to reject the null hypothesis. The proportion of customers who would answer “yes” to the survey question is different for new vs. used vehicle sales.

Hypotheses
H_o: p_1 = p_2
H_a: p_1 ≠ p_2
Where p_1 is the true proportion of customers who bought a new vehicle and answered “yes” to the survey question.
Where p_2 is the true proportion of customers who bought a used vehicle and answered “yes” to the survey question.

Assumptions
Independence:
-We have 2 independent random samples of customers from 2018 vehicle sales.
-Population of new vehicle customers is at least 1050 and the population of used vehicle customers is at least 1200.
Normality:
n_1 * p-hat_1 = 105 * 0.8381 = 88 ≥ 10
n_1 * (1-p-hat_1) = 105* (0.1619) = 16.9995 ≥ 10
n_2 * p-hat_2 = 120 * 0.7083 = 84.996 ≥ 10
n_2 * (1-p-hat_2) = 120 * 0.2917 = 35.004 ≥ 10
Since all 4 are greater than 10, the sampling distribution is approximately normal.

Calculations

p_hat_combined = 105(0.8381) + 120(0.7083) / 105+102 = 0.7689
z = (0.8381 - 0.7083) - 0 / sqrt((0.7689 * (1-0.7689) / 105) + (0.7869 * (1-0.7869) / 120) = 2.3036
p-value = 2*normalcdf(2.3036, 1E99, 0, 1) = 0.0212
alpha = 0.05
p-value<alpha

Conclusion

Since the p-value<alpha, we reject the H_o. There is sufficient evidence to suggest that the proportion of customers who bought a new vehicle and answered “yes” to the survey question is different from the proportion of customers who bought a used vehicle and answered “yes” to the survey question.

This is about as thorough a response as I’ve seen! Very well done - you’ve nailed all of the components. Be ready for just one or two of those components to be explicitly tested this year.

1 Like

Nice job! You’ve defined parameters, checked conditions, named the test, obtained appropriate test statistic and p-value, and made an appropriate conclusion. To address your question - we could do p1 = p2 and p1 =/= p2, but not for the parameters you defined. It would need to be p1 = proportion of customers purchasing new cars who would say yes; p2 = proportion of customers purchasing used cars who would say yes

Well done from top to bottom - you’ve got parameters, conditions, appropriate calculations, and appropriate conclusions. You’re ready! Note that it’s likely that you’ll be asked for these components in isolation this year, as opposed to “all at once” like this.

Thank you so much for your help. I had a quick question about the conditions/assumptions of a 2 sample proportion procedure. Is it necessary that we show independence for pooled data? For example, do we have to show that np_hat_c >= 10 and np_hat_c >= 10? If so, why do we need to show this?

It sounds like you’re asking about the “normal” condition (at least 10 successes/failures) and not the “independence” condition (sample is no more than 10% of entire population if selection is done without replacement). If that’s the case, you do not need to check the pooled data - you can do it as you did and check the individual samples.

Identify)

• pn: proportion of customers who bought a new vehicle who would answer “yes” to the survey question.
• pu: proportion of customers who bought a used vehicle who would answer “yes” to the survey question.
• pn-pu: Difference in the proportion of customers who bought a new and used vehicle who would answer “yes” to the survey question.
• Ho: pn=pu
• Ha: pn≠pu
• 2 proportion Z Test

Conditions)

• Random: We were told the that both samples were randomly selected.
• Normal: The sampling distribution is approximately normal because both…
nn×pn>=10 (105×88/105>10) & nn×(1-pn)>=10 (105×17/105>10)
nu×pu>=10 (120×85/120>10) & nu×(1-pu)>=10 (120×35/120>10)
• 10% Condition: There are more than 10×105 new car buyers and 10×120 used car buyers.
• The two samples are Independent

Calculations)

zTest_2Prop(xn=88,nn=105,xu=85,nu=120) p1≠p2=2.30356
x=2.30356
p-value=0.021247

My Teacher told me that for this years exam the actual formulas don’t need to be written and that the calculator function is enough. Have you heard otherwise?

Interpret)

Because the p-value(0.021247) is below a reasonable alpha(0.05), we reject the null hypothesis(Ho). There is sufficient evidence to conclude that the proportion of customers who would answer “yes” to the survey question is different for new vs used vehicle sales.

Nice work! To answer your question: you actually never need to show the formulas on hypothesis test or confidence interval FRQs… assuming you’ve correctly named the interval (which you have here). This year, because the test is being designed to accommodate people without a calculator, it is assumed that you will not need to run a full hypothesis test like this one. It is likely that it will be “broken up” into parts: perhaps you’ll need to write hypotheses and check conditions in one part; perhaps you’ll be given a p-value in another and asked to make a conclusion.

Your work is all correct, by the way.

1 Like

2 sample z test for a difference in proportions
Parameters:
p_new = the true proportion of buyers of a new vehicle from the car company who would answer yes to the the question in the survey

p_old = the true proportion of buyers of a used vehicle from the car company who would answer yes to the the question in the survey
Hypotheses:
Ho: p_new = p_old
Ha: p_new is not equal to p_old
Conditions:

1. Random: Random samples of 105 and 120 customers who bought new and used cars, respectively, from the company stated CHECK
2. Success/ Failure: Because the number of success and failures are greater than 10 - CHECK:
n_newp-hat_new =88 >10 and n_newq-hat_new =17>10 and n_used*p-hat_used = 85 >10 and n_used * q-hat_used = 35>10
3. 10% condition (10% * pop > n OR 10*n <population size): It is reasonable to assume that there are more than 105 *10 = 1,050 and 120 * 10 = 1,200 customers who bought cars from the company in 2018 that were new and used, respectively. CHECK
4. Independence: It is reasonable to assume that those customers who buy new and used cars from the company in 2018 are independent of each other. CHECK
Mechanics:
p-hat_new = 0.838095 p-hat_used = 0.708333 p-hat_pooled = 0.768889 n_1 =105 n_2 = 120
z = 2.30356
P-Value( z>2.30356) = 0.021247
Conclusion:
We reject the null hypothesis, Ho, in favor of the alternative because the p value of 0.021247 is less than alpha = 0.05. We have sufficient evidence that the true proportion of customers who would answer yes to the survey question stated that bought new and used cars from the car company in 2018 are different.

Quick question: How do I know which test to use? There are so many options, like 2-sample z test, t test etc. I’m very confused.

Nice work! You’ve defined parameters, chosen the test, checked conditions, made appropriate conclusions… the whole nine yards. Be prepared to do some of these steps in isolation on Friday.

Hi Denise -

That’s a good question! There are 5 procedures you should be familiar with:

1. One-sample z-test for a population proportion

2. Two-sample z-test for a difference in population proportions

3. One-sample t-test for a population mean

4. Two-sample t-test for a difference in population means

5. Matched-pairs t-test for a difference in population means

Options #1 and #2 are only if you have a categorical variable, and thus are using proportions. Options #3-5 are for quantitative variables, and thus means. For a larger breakdown (and some practice questions!), I’d check out this previous stream where I review in-depth the different options and when to use them:

https://app.fiveable.me/ap-stats/unit-7/review-inference-procedures/watch/GaUXKaMPLXkI8jTrCtbM

1 Like

2550 north lake drive
suite 2
milwaukee, wi 53211

✉️ help@fiveable.me

*ap® and advanced placement® are registered trademarks of the college board, which was not involved in the production of, and does not endorse, this product.