thr3ads.net - R help - [R] Setting up hypothesis tests with the infer library? [Mar 2025]

If this information is useful, please help other people find it:
Share via:

Kevin Zembower

2025-Mar-29 16:59 UTC

[R] Setting up hypothesis tests with the infer library?

Hi, Rui and Michael, thank you both for replying.

Yeah, I'm not supposed to know about Chi-squared yet. So far, all of
our work with hypothesis tests has involved creating the sample data,
then resampling it to create a null distribution, and finally computing
p-values.

The prop.test() would work. obviously. I'll look into that. I didn't
know about it.

I'm really struck by the changes in statistics teaching methods,
compared to my first exposure to statistics more than 20 years ago. I
can't ever remember doing a simulation then, probably due to the lack
of computing resources. It was all calculations based on the Normal
curve. Now, we haven't even been introduced to any calculations
involving the Normal curve, and won't be for another two chapters. It's
the last chapter we study in this one-semester course. In this course,
it's all been simulations, bootstraps, randomized distributions, etc.

Thank you, again, Rui and Michael, for your help for me.

-Kevin

On Sat, 2025-03-29 at 16:42 +0000, Rui Barradas wrote:> ?s 16:09 de 29/03/2025, Kevin Zembower via R-help escreveu:
> > Hello, all,
> > 
> > We're now starting to cover hypothesis tests in my Stats 101
> > course. As
> > usual in courses using the Lock5 textbook, 3rd ed., the homework
> > answers are calculated using their StatKey application. In addition
> > (and for no extra credit), I'm trying to solve the problems using
> > R. In
> > the case of hypothesis test, in addition to manually setting up
> > randomized null hypothesis distributions and graphing them, I'm
> > using
> > the infer library. I've been really impressed with this library
and
> > enjoy solving this type of problem with it.
> > 
> > One of the first steps in solving a hypothesis test with infer is
> > to
> > set up the initial sampling dataset. Often, in Lock5 problems, this
> > is
> > a dataset that can be downloaded with library(Lock5Data). However,
> > other problems are worded like this:
> > 
> > ==========================> > In 1980 and again in 2010, a
Gallup poll asked a random sample of
> > 1000
> > US citizens ?Are you in favor of the death penalty for a person
> > convicted of murder?? In 1980, the proportion saying yes was 0.66.
> > In
> > 2010, it was 0.64. Does this data provide evidence that the
> > proportion
> > of US citizens favoring the death penalty was higher in 1980 than
> > it
> > was in 2010? Use p1 for the proportion in 1980 and p2 for the
> > proportion in 2010.
> > ===========================> > 
> > I've been setting up problems like this with code similar to:
> > ==========================> > df <- data.frame(
> > ???? survey = c(rep("1980", 1000), rep("2010",
1000)),
> > ???? DP = c(rep("Y", 0.66*1000), rep("N", 1000 -
(0.66*1000)),
> > ??????????? rep("Y", 0.64*1000), rep("N", 1000 -
(0.64*1000))))
> > 
> > (d_hat <- df %>%
> > ????? specify(response = DP, explanatory = survey, success =
"Y")
> > %>%
> > ????? calculate(stat = "diff in props", order =
c("1980", "2010")))
> > ===========================> > 
> > My question is, is this the way I should be setting up datasets for
> > problems of this type? Is there a more efficient way, that doesn't
> > require the construction of the whole sample dataset?
> > 
> > It seems like I should be able to do something like this:
> > ================> > (df <- data.frame(group1count = 660, #Or,
group1prop = 0.66
> > ????????????????? group1samplesize = 1000,
> > ????????????????? group2count = 640, #Or, group2prop = 0.64
> > ????????????????? group2samplesize = 1000))
> > ================> > 
> > Am I overlooking a way to set up these sample dataframes for infer?
> > 
> > Thanks for your advice and guidance.
> > 
> > -Kevin
> > 
> > 
> > 
> > ______________________________________________
> > R-help at r-project.org?mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > https://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> Hello,
> 
> base R is perfectly capable of solving the problem.
> Something like this.
> 
> 
> year <- c(1980, 2010)
> p <- c(0.66, 0.64)
> n <- c(1000, 1000)
> df1 <- data.frame(year, p, n)
> df1$yes <- with(df1, p*n)
> df1$no <- with(df1, n - yes)
> 
> mat <- as.matrix(df1[c("yes", "no")])
> 
> prop.test(mat)
> #>
> #>? 2-sample test for equality of proportions with continuity
> correction
> #>
> #> data:? mat
> #> X-squared = 0.79341, df = 1, p-value = 0.3731
> #> alternative hypothesis: two.sided
> #> 95 percent confidence interval:
> #>? -0.02279827? 0.06279827
> #> sample estimates:
> #> prop 1 prop 2
> #>?? 0.66?? 0.64
> 
> chisq.test(mat)
> #>
> #>? Pearson's Chi-squared test with Yates' continuity correction
> #>
> #> data:? mat
> #> X-squared = 0.79341, df = 1, p-value = 0.3731
> 
> 
> Hope this helps,
> 
> Rui Barradas
> 
>

Ebert,Timothy Aaron

2025-Mar-29 21:45 UTC

head link

[R] Setting up hypothesis tests with the infer library?

The computer intensive approaches (randomization, permutation, bootstrap,
jackknife) are awesome when you have enough data. In this age we are all about
huge data sets. Yet basic agricultural research often does not come close. I
have three to ten replicates per treatment.

-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Kevin Zembower
via R-help
Sent: Saturday, March 29, 2025 1:00 PM
To: Rui Barradas <ruipbarradas at sapo.pt>; R-help email list <r-help
at r-project.org>
Subject: Re: [R] Setting up hypothesis tests with the infer library?

[External Email]

Hi, Rui and Michael, thank you both for replying.

Yeah, I'm not supposed to know about Chi-squared yet. So far, all of our
work with hypothesis tests has involved creating the sample data, then
resampling it to create a null distribution, and finally computing p-values.

The prop.test() would work. obviously. I'll look into that. I didn't
know about it.

I'm really struck by the changes in statistics teaching methods, compared to
my first exposure to statistics more than 20 years ago. I can't ever
remember doing a simulation then, probably due to the lack of computing
resources. It was all calculations based on the Normal curve. Now, we
haven't even been introduced to any calculations involving the Normal curve,
and won't be for another two chapters. It's the last chapter we study in
this one-semester course. In this course, it's all been simulations,
bootstraps, randomized distributions, etc.

Thank you, again, Rui and Michael, for your help for me.

-Kevin

On Sat, 2025-03-29 at 16:42 +0000, Rui Barradas wrote:> ?s 16:09 de 29/03/2025, Kevin Zembower via R-help escreveu:
> > Hello, all,
> >
> > We're now starting to cover hypothesis tests in my Stats 101
course.
> > As usual in courses using the Lock5 textbook, 3rd ed., the homework
> > answers are calculated using their StatKey application. In addition
> > (and for no extra credit), I'm trying to solve the problems using
R.
> > In the case of hypothesis test, in addition to manually setting up
> > randomized null hypothesis distributions and graphing them, I'm
> > using the infer library. I've been really impressed with this
> > library and enjoy solving this type of problem with it.
> >
> > One of the first steps in solving a hypothesis test with infer is to
> > set up the initial sampling dataset. Often, in Lock5 problems, this
> > is a dataset that can be downloaded with library(Lock5Data).
> > However, other problems are worded like this:
> >
> > ==========================> > In 1980 and again in 2010, a
Gallup poll asked a random sample of
> > 1000
> > US citizens "Are you in favor of the death penalty for a person
> > convicted of murder?" In 1980, the proportion saying yes was
0.66.
> > In
> > 2010, it was 0.64. Does this data provide evidence that the
> > proportion of US citizens favoring the death penalty was higher in
> > 1980 than it was in 2010? Use p1 for the proportion in 1980 and p2
> > for the proportion in 2010.
> > ===========================> >
> > I've been setting up problems like this with code similar to:
> > ==========================> > df <- data.frame(
> >      survey = c(rep("1980", 1000), rep("2010",
1000)),
> >      DP = c(rep("Y", 0.66*1000), rep("N", 1000 -
(0.66*1000)),
> >             rep("Y", 0.64*1000), rep("N", 1000 -
(0.64*1000))))
> >
> > (d_hat <- df %>%
> >       specify(response = DP, explanatory = survey, success =
"Y")
> > %>%
> >       calculate(stat = "diff in props", order =
c("1980", "2010")))
> > ===========================> >
> > My question is, is this the way I should be setting up datasets for
> > problems of this type? Is there a more efficient way, that doesn't
> > require the construction of the whole sample dataset?
> >
> > It seems like I should be able to do something like this:
> > ================> > (df <- data.frame(group1count = 660, #Or,
group1prop = 0.66
> >                   group1samplesize = 1000,
> >                   group2count = 640, #Or, group2prop = 0.64
> >                   group2samplesize = 1000)) ================> >
> > Am I overlooking a way to set up these sample dataframes for infer?
> >
> > Thanks for your advice and guidance.
> >
> > -Kevin
> >
> >
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://st/
> >
at.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C02%7Ctebert%40ufl
> > .edu%7C3e1b3976ace146b04b9708dd6ee330d7%7C0d4da0f84a314d76ace60a6233
> > 1e1b84%7C0%7C0%7C638788644237747596%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0
> > eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIs
> >
IldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=NHrIk9trSNYpRmKhfvgUcXLxVYD%2FrN
> > N8gzVeeGzHP0g%3D&reserved=0
> > PLEASE do read the posting guide
> > https://ww/
> >
w.r-project.org%2Fposting-guide.html&data=05%7C02%7Ctebert%40ufl.edu
> > %7C3e1b3976ace146b04b9708dd6ee330d7%7C0d4da0f84a314d76ace60a62331e1b
> > 84%7C0%7C0%7C638788644237762889%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1h
> > cGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldU
> >
IjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=qFMxPlsnaG90guahv6RIwc%2FdGxnbeKqqJ6
> > oUMnNirIU%3D&reserved=0 and provide commented, minimal,
> > self-contained, reproducible code.
> Hello,
>
> base R is perfectly capable of solving the problem.
> Something like this.
>
>
> year <- c(1980, 2010)
> p <- c(0.66, 0.64)
> n <- c(1000, 1000)
> df1 <- data.frame(year, p, n)
> df1$yes <- with(df1, p*n)
> df1$no <- with(df1, n - yes)
>
> mat <- as.matrix(df1[c("yes", "no")])
>
> prop.test(mat)
> #>
> #>  2-sample test for equality of proportions with continuity
> correction #> #> data:  mat #> X-squared = 0.79341, df = 1,
p-value > 0.3731 #> alternative hypothesis: two.sided #> 95 percent
confidence
> interval:
> #>  -0.02279827  0.06279827
> #> sample estimates:
> #> prop 1 prop 2
> #>   0.66   0.64
>
> chisq.test(mat)
> #>
> #>  Pearson's Chi-squared test with Yates' continuity correction
#> #>
> data:  mat #> X-squared = 0.79341, df = 1, p-value = 0.3731
>
>
> Hope this helps,
>
> Rui Barradas
>
>


______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius

2025-Mar-30 03:08 UTC

head link

[R] Setting up hypothesis tests with the infer library?

> On Mar 29, 2025, at 9:59?AM, Kevin Zembower via R-help <r-help at
r-project.org> wrote:
> 
> Hi, Rui and Michael, thank you both for replying.
> 
> Yeah, I'm not supposed to know about Chi-squared yet. So far, all of
> our work with hypothesis tests has involved creating the sample data,
> then resampling it to create a null distribution, and finally computing
> p-values.
You might want to look at the "resample" package. It's been around
for a while. One of its authors is Tim Hasterberg, a well respected member of
the R community.. I think it was written as support for the original edition of
Chihara and Hesterberg: "Mathematical Statistics with Resampling and R, 3rd
Edition (2022)".

-- 
David.
> 
> The prop.test() would work. obviously. I'll look into that. I
didn't
> know about it.
> 
> I'm really struck by the changes in statistics teaching methods,
> compared to my first exposure to statistics more than 20 years ago. I
> can't ever remember doing a simulation then, probably due to the lack
> of computing resources. It was all calculations based on the Normal
> curve. Now, we haven't even been introduced to any calculations
> involving the Normal curve, and won't be for another two chapters.
It's
> the last chapter we study in this one-semester course. In this course,
> it's all been simulations, bootstraps, randomized distributions, etc.
> 
> Thank you, again, Rui and Michael, for your help for me.
> 
> -Kevin
> 
> On Sat, 2025-03-29 at 16:42 +0000, Rui Barradas wrote:
>> ?s 16:09 de 29/03/2025, Kevin Zembower via R-help escreveu:
>>> Hello, all,
>>> 
>>> We're now starting to cover hypothesis tests in my Stats 101
>>> course. As
>>> usual in courses using the Lock5 textbook, 3rd ed., the homework
>>> answers are calculated using their StatKey application. In addition
>>> (and for no extra credit), I'm trying to solve the problems
using
>>> R. In
>>> the case of hypothesis test, in addition to manually setting up
>>> randomized null hypothesis distributions and graphing them, I'm
>>> using
>>> the infer library. I've been really impressed with this library
and
>>> enjoy solving this type of problem with it.
>>> 
>>> One of the first steps in solving a hypothesis test with infer is
>>> to
>>> set up the initial sampling dataset. Often, in Lock5 problems, this
>>> is
>>> a dataset that can be downloaded with library(Lock5Data). However,
>>> other problems are worded like this:
>>> 
>>> ==========================>>> In 1980 and again in 2010, a
Gallup poll asked a random sample of
>>> 1000
>>> US citizens ?Are you in favor of the death penalty for a person
>>> convicted of murder?? In 1980, the proportion saying yes was 0.66.
>>> In
>>> 2010, it was 0.64. Does this data provide evidence that the
>>> proportion
>>> of US citizens favoring the death penalty was higher in 1980 than
>>> it
>>> was in 2010? Use p1 for the proportion in 1980 and p2 for the
>>> proportion in 2010.
>>> ===========================>>> 
>>> I've been setting up problems like this with code similar to:
>>> ==========================>>> df <- data.frame(
>>>      survey = c(rep("1980", 1000), rep("2010",
1000)),
>>>      DP = c(rep("Y", 0.66*1000), rep("N", 1000
- (0.66*1000)),
>>>             rep("Y", 0.64*1000), rep("N", 1000
- (0.64*1000))))
>>> 
>>> (d_hat <- df %>%
>>>       specify(response = DP, explanatory = survey, success =
"Y")
>>> %>%
>>>       calculate(stat = "diff in props", order =
c("1980", "2010")))
>>> ===========================>>> 
>>> My question is, is this the way I should be setting up datasets for
>>> problems of this type? Is there a more efficient way, that
doesn't
>>> require the construction of the whole sample dataset?
>>> 
>>> It seems like I should be able to do something like this:
>>> ================>>> (df <- data.frame(group1count =
660, #Or, group1prop = 0.66
>>>                   group1samplesize = 1000,
>>>                   group2count = 640, #Or, group2prop = 0.64
>>>                   group2samplesize = 1000))
>>> ================>>> 
>>> Am I overlooking a way to set up these sample dataframes for infer?
>>> 
>>> Thanks for your advice and guidance.
>>> 
>>> -Kevin
>>> 
>>> 
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> https://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> Hello,
>> 
>> base R is perfectly capable of solving the problem.
>> Something like this.
>> 
>> 
>> year <- c(1980, 2010)
>> p <- c(0.66, 0.64)
>> n <- c(1000, 1000)
>> df1 <- data.frame(year, p, n)
>> df1$yes <- with(df1, p*n)
>> df1$no <- with(df1, n - yes)
>> 
>> mat <- as.matrix(df1[c("yes", "no")])
>> 
>> prop.test(mat)
>> #>
>> #>  2-sample test for equality of proportions with continuity
>> correction
>> #>
>> #> data:  mat
>> #> X-squared = 0.79341, df = 1, p-value = 0.3731
>> #> alternative hypothesis: two.sided
>> #> 95 percent confidence interval:
>> #>  -0.02279827  0.06279827
>> #> sample estimates:
>> #> prop 1 prop 2
>> #>   0.66   0.64
>> 
>> chisq.test(mat)
>> #>
>> #>  Pearson's Chi-squared test with Yates' continuity
correction
>> #>
>> #> data:  mat
>> #> X-squared = 0.79341, df = 1, p-value = 0.3731
>> 
>> 
>> Hope this helps,
>> 
>> Rui Barradas
>> 
>> 
> 
> 
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Maybe Matching Threads

Search for more reasonably related threads

R help - Mar 2025 - Setting up hypothesis tests with the infer library?

[R] Setting up hypothesis tests with the infer library?

[R] Setting up hypothesis tests with the infer library?

[R] Setting up hypothesis tests with the infer library?

Maybe Matching Threads