thr3ads.net - R help - [R] Setting up hypothesis tests with the infer library? [Mar 2025]

If this information is useful, please help other people find it:
Share via:

Kevin Zembower

2025-Mar-29 16:09 UTC

[R] Setting up hypothesis tests with the infer library?

Hello, all,

We're now starting to cover hypothesis tests in my Stats 101 course. As
usual in courses using the Lock5 textbook, 3rd ed., the homework
answers are calculated using their StatKey application. In addition
(and for no extra credit), I'm trying to solve the problems using R. In
the case of hypothesis test, in addition to manually setting up
randomized null hypothesis distributions and graphing them, I'm using
the infer library. I've been really impressed with this library and
enjoy solving this type of problem with it.

One of the first steps in solving a hypothesis test with infer is to
set up the initial sampling dataset. Often, in Lock5 problems, this is
a dataset that can be downloaded with library(Lock5Data). However,
other problems are worded like this:

==========================In 1980 and again in 2010, a Gallup poll asked a
random sample of 1000
US citizens ?Are you in favor of the death penalty for a person
convicted of murder?? In 1980, the proportion saying yes was 0.66. In
2010, it was 0.64. Does this data provide evidence that the proportion
of US citizens favoring the death penalty was higher in 1980 than it
was in 2010? Use p1 for the proportion in 1980 and p2 for the
proportion in 2010.
===========================
I've been setting up problems like this with code similar to:
==========================df <- data.frame(
    survey = c(rep("1980", 1000), rep("2010", 1000)),
    DP = c(rep("Y", 0.66*1000), rep("N", 1000 -
(0.66*1000)),
           rep("Y", 0.64*1000), rep("N", 1000 -
(0.64*1000))))

(d_hat <- df %>%
     specify(response = DP, explanatory = survey, success = "Y")
%>%
     calculate(stat = "diff in props", order = c("1980",
"2010")))
===========================
My question is, is this the way I should be setting up datasets for
problems of this type? Is there a more efficient way, that doesn't
require the construction of the whole sample dataset?

It seems like I should be able to do something like this:
================(df <- data.frame(group1count = 660, #Or, group1prop = 0.66
                 group1samplesize = 1000,
                 group2count = 640, #Or, group2prop = 0.64
                 group2samplesize = 1000))
================
Am I overlooking a way to set up these sample dataframes for infer?

Thanks for your advice and guidance.

-Kevin

Michael Dewey

2025-Mar-29 16:34 UTC

head link

[R] Setting up hypothesis tests with the infer library?

Dear Kevin

Unless it is a course requirement that you do it this way it would be 
easier to use the chisq.test function. You can then just use the 
frequencies which you have (660, 340, 640, 360). I will not give you 
example code since your learning would be enhanced by having to do it 
yourself but if you get stuck come back with your code and what went wrong.

There are many other options in packages which you could install but 
what I suggest should work out of the box.

Michael

On 29/03/2025 16:09, Kevin Zembower via R-help wrote:> Hello, all,
> 
> We're now starting to cover hypothesis tests in my Stats 101 course. As
> usual in courses using the Lock5 textbook, 3rd ed., the homework
> answers are calculated using their StatKey application. In addition
> (and for no extra credit), I'm trying to solve the problems using R. In
> the case of hypothesis test, in addition to manually setting up
> randomized null hypothesis distributions and graphing them, I'm using
> the infer library. I've been really impressed with this library and
> enjoy solving this type of problem with it.
> 
> One of the first steps in solving a hypothesis test with infer is to
> set up the initial sampling dataset. Often, in Lock5 problems, this is
> a dataset that can be downloaded with library(Lock5Data). However,
> other problems are worded like this:
> 
> ==========================> In 1980 and again in 2010, a Gallup poll
asked a random sample of 1000
> US citizens ?Are you in favor of the death penalty for a person
> convicted of murder?? In 1980, the proportion saying yes was 0.66. In
> 2010, it was 0.64. Does this data provide evidence that the proportion
> of US citizens favoring the death penalty was higher in 1980 than it
> was in 2010? Use p1 for the proportion in 1980 and p2 for the
> proportion in 2010.
> ===========================> 
> I've been setting up problems like this with code similar to:
> ==========================> df <- data.frame(
>      survey = c(rep("1980", 1000), rep("2010", 1000)),
>      DP = c(rep("Y", 0.66*1000), rep("N", 1000 -
(0.66*1000)),
>             rep("Y", 0.64*1000), rep("N", 1000 -
(0.64*1000))))
> 
> (d_hat <- df %>%
>       specify(response = DP, explanatory = survey, success = "Y")
%>%
>       calculate(stat = "diff in props", order =
c("1980", "2010")))
> ===========================> 
> My question is, is this the way I should be setting up datasets for
> problems of this type? Is there a more efficient way, that doesn't
> require the construction of the whole sample dataset?
> 
> It seems like I should be able to do something like this:
> ================> (df <- data.frame(group1count = 660, #Or,
group1prop = 0.66
>                   group1samplesize = 1000,
>                   group2count = 640, #Or, group2prop = 0.64
>                   group2samplesize = 1000))
> ================> 
> Am I overlooking a way to set up these sample dataframes for infer?
> 
> Thanks for your advice and guidance.
> 
> -Kevin
> 
> 
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Michael Dewey

Rui Barradas

2025-Mar-29 16:42 UTC

head link

[R] Setting up hypothesis tests with the infer library?

?s 16:09 de 29/03/2025, Kevin Zembower via R-help
escreveu:> Hello, all,
> 
> We're now starting to cover hypothesis tests in my Stats 101 course. As
> usual in courses using the Lock5 textbook, 3rd ed., the homework
> answers are calculated using their StatKey application. In addition
> (and for no extra credit), I'm trying to solve the problems using R. In
> the case of hypothesis test, in addition to manually setting up
> randomized null hypothesis distributions and graphing them, I'm using
> the infer library. I've been really impressed with this library and
> enjoy solving this type of problem with it.
> 
> One of the first steps in solving a hypothesis test with infer is to
> set up the initial sampling dataset. Often, in Lock5 problems, this is
> a dataset that can be downloaded with library(Lock5Data). However,
> other problems are worded like this:
> 
> ==========================> In 1980 and again in 2010, a Gallup poll
asked a random sample of 1000
> US citizens ?Are you in favor of the death penalty for a person
> convicted of murder?? In 1980, the proportion saying yes was 0.66. In
> 2010, it was 0.64. Does this data provide evidence that the proportion
> of US citizens favoring the death penalty was higher in 1980 than it
> was in 2010? Use p1 for the proportion in 1980 and p2 for the
> proportion in 2010.
> ===========================> 
> I've been setting up problems like this with code similar to:
> ==========================> df <- data.frame(
>      survey = c(rep("1980", 1000), rep("2010", 1000)),
>      DP = c(rep("Y", 0.66*1000), rep("N", 1000 -
(0.66*1000)),
>             rep("Y", 0.64*1000), rep("N", 1000 -
(0.64*1000))))
> 
> (d_hat <- df %>%
>       specify(response = DP, explanatory = survey, success = "Y")
%>%
>       calculate(stat = "diff in props", order =
c("1980", "2010")))
> ===========================> 
> My question is, is this the way I should be setting up datasets for
> problems of this type? Is there a more efficient way, that doesn't
> require the construction of the whole sample dataset?
> 
> It seems like I should be able to do something like this:
> ================> (df <- data.frame(group1count = 660, #Or,
group1prop = 0.66
>                   group1samplesize = 1000,
>                   group2count = 640, #Or, group2prop = 0.64
>                   group2samplesize = 1000))
> ================> 
> Am I overlooking a way to set up these sample dataframes for infer?
> 
> Thanks for your advice and guidance.
> 
> -Kevin
> 
> 
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.Hello,

base R is perfectly capable of solving the problem.
Something like this.


year <- c(1980, 2010)
p <- c(0.66, 0.64)
n <- c(1000, 1000)
df1 <- data.frame(year, p, n)
df1$yes <- with(df1, p*n)
df1$no <- with(df1, n - yes)

mat <- as.matrix(df1[c("yes", "no")])

prop.test(mat)
#>
#>  2-sample test for equality of proportions with continuity correction
#>
#> data:  mat
#> X-squared = 0.79341, df = 1, p-value = 0.3731
#> alternative hypothesis: two.sided
#> 95 percent confidence interval:
#>  -0.02279827  0.06279827
#> sample estimates:
#> prop 1 prop 2
#>   0.66   0.64

chisq.test(mat)
#>
#>  Pearson's Chi-squared test with Yates' continuity correction
#>
#> data:  mat
#> X-squared = 0.79341, df = 1, p-value = 0.3731


Hope this helps,

Rui Barradas


-- 
Este e-mail foi analisado pelo software antiv?rus AVG para verificar a presen?a
de v?rus.
www.avg.com

Ebert,Timothy Aaron

2025-Mar-29 19:19 UTC

head link

[R] Setting up hypothesis tests with the infer library?

How about calculating a 95% confidence interval about the estimated proportion
in favor. The PooledInfRate package will do this for you. If confidence
intervals overlap then there is no significant difference.

-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Kevin Zembower
via R-help
Sent: Saturday, March 29, 2025 12:10 PM
To: R-help email list <r-help at r-project.org>
Subject: [R] Setting up hypothesis tests with the infer library?

[External Email]

Hello, all,

We're now starting to cover hypothesis tests in my Stats 101 course. As
usual in courses using the Lock5 textbook, 3rd ed., the homework answers are
calculated using their StatKey application. In addition (and for no extra
credit), I'm trying to solve the problems using R. In the case of hypothesis
test, in addition to manually setting up randomized null hypothesis
distributions and graphing them, I'm using the infer library. I've been
really impressed with this library and enjoy solving this type of problem with
it.

One of the first steps in solving a hypothesis test with infer is to set up the
initial sampling dataset. Often, in Lock5 problems, this is a dataset that can
be downloaded with library(Lock5Data). However, other problems are worded like
this:

==========================In 1980 and again in 2010, a Gallup poll asked a
random sample of 1000 US citizens "Are you in favor of the death penalty
for a person convicted of murder?" In 1980, the proportion saying yes was
0.66. In 2010, it was 0.64. Does this data provide evidence that the proportion
of US citizens favoring the death penalty was higher in 1980 than it was in
2010? Use p1 for the proportion in 1980 and p2 for the proportion in 2010.
===========================
I've been setting up problems like this with code similar to:
==========================df <- data.frame(
    survey = c(rep("1980", 1000), rep("2010", 1000)),
    DP = c(rep("Y", 0.66*1000), rep("N", 1000 -
(0.66*1000)),
           rep("Y", 0.64*1000), rep("N", 1000 -
(0.64*1000))))

(d_hat <- df %>%
     specify(response = DP, explanatory = survey, success = "Y")
%>%
     calculate(stat = "diff in props", order = c("1980",
"2010"))) ===========================
My question is, is this the way I should be setting up datasets for problems of
this type? Is there a more efficient way, that doesn't require the
construction of the whole sample dataset?

It seems like I should be able to do something like this:
================(df <- data.frame(group1count = 660, #Or, group1prop = 0.66
                 group1samplesize = 1000,
                 group2count = 640, #Or, group2prop = 0.64
                 group2samplesize = 1000)) ================
Am I overlooking a way to set up these sample dataframes for infer?

Thanks for your advice and guidance.

-Kevin

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Apparently Analagous Threads

Search for more maybe matching threads

R help - Mar 2025 - Setting up hypothesis tests with the infer library?

[R] Setting up hypothesis tests with the infer library?

[R] Setting up hypothesis tests with the infer library?

[R] Setting up hypothesis tests with the infer library?

[R] Setting up hypothesis tests with the infer library?

Apparently Analagous Threads