thr3ads.net - R help - [R] Multiple sets of proportion tests [Nov 2017]

If this information is useful, please help other people find it:
Share via:

Allaisone 1

2017-Nov-24 11:09 UTC

[R] Multiple sets of proportion tests

Hi all ,


I have a dataframe  of 200 columns and 2 rows. The first row in each column
contains the frequency of cases in group I . The second row in each column
contains the frequency of cases in group II. The frequency of trails is a fixed
value for group I(e.g.200) and it is also another fixed values for group II
(e.g. 100). The dataset looks like this :-

> Mydata

                                      variable I      variable II    Variable
III  ......... 200

Freq.of cases (gp I)      6493               9375               5524

Freq. of cases (gpII)     509                  462                 54



The result I need for the first column can be given using this code :


 MyResultsI <- prop.test(Mydata$variable I ,c(200,100))
for the second  column :-
MyResultsII <- prop.test(Mydata$variable II ,c(200,100))  and so on ..


I need to do the analysis for all columns and have only the columns with
significant p-value results to be written in the the third row under each column
so the final output has to be something like this :-


                                      variable I        Variable III  .........

Freq.of cases (gp I)      6493                   5524

Freq. of cases (gpII)     509                      54

p-values                          0.02               0.010

Note, for example, that the 2nd column has bee removed as it resulted in a
non-significant p-value result while col 1 and col 3 were included since p-value
is less than 0.05.

I'm not sure how to get the p-values only without other details but for the
analysis itself , I believe it can be done with apply() function but its not
clear to me how to specify the 2nd argument(n=samlpe sizes) in the prop.test.

 MyResults <- apply(Mydata, 2, function(x)prop.test(Mydata,c(200,100))

How can I modify the "n" argument part to solve the issue of
non-equivalent length between "x" and "n" ?. How can I
modify this further to return only significant p-values results ?. Any help
would be very appreciated ..

Regards

	[[alternative HTML version deleted]]

Thierry Onkelinx

2017-Nov-24 21:06 UTC

head link

[R] Multiple sets of proportion tests

Hi anonymous,

?prop.test states that it returns a list. And one of the element is
'p.value'.  str() on the output of prop.test() reveals that too. So
prop.test()$p.value or prop.test()["p.value"] should work.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Kliniekstraat 25, B-1070 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////


Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging in
Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis.
Vanaf dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000 Brussel.

///////////////////////////////////////////////////////////////////////////////////////////



2017-11-24 12:09 GMT+01:00 Allaisone 1 <allaisone1 at
hotmail.com>:>
> Hi all ,
>
>
> I have a dataframe  of 200 columns and 2 rows. The first row in each column
contains the frequency of cases in group I . The second row in each column
contains the frequency of cases in group II. The frequency of trails is a fixed
value for group I(e.g.200) and it is also another fixed values for group II
(e.g. 100). The dataset looks like this :-
>
>
>> Mydata
>
>
>                                       variable I      variable II   
Variable III  ......... 200
>
> Freq.of cases (gp I)      6493               9375               5524
>
> Freq. of cases (gpII)     509                  462                 54
>
>
>
> The result I need for the first column can be given using this code :
>
>
>  MyResultsI <- prop.test(Mydata$variable I ,c(200,100))
> for the second  column :-
> MyResultsII <- prop.test(Mydata$variable II ,c(200,100))  and so on ..
>
>
> I need to do the analysis for all columns and have only the columns with
significant p-value results to be written in the the third row under each column
so the final output has to be something like this :-
>
>
>                                       variable I        Variable III 
.........
>
> Freq.of cases (gp I)      6493                   5524
>
> Freq. of cases (gpII)     509                      54
>
> p-values                          0.02               0.010
>
> Note, for example, that the 2nd column has bee removed as it resulted in a
non-significant p-value result while col 1 and col 3 were included since p-value
is less than 0.05.
>
> I'm not sure how to get the p-values only without other details but for
the analysis itself , I believe it can be done with apply() function but its not
clear to me how to specify the 2nd argument(n=samlpe sizes) in the prop.test.
>
>  MyResults <- apply(Mydata, 2, function(x)prop.test(Mydata,c(200,100))
>
> How can I modify the "n" argument part to solve the issue of
non-equivalent length between "x" and "n" ?. How can I
modify this further to return only significant p-values results ?. Any help
would be very appreciated ..
>
> Regards
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Allaisone 1

2017-Nov-24 23:35 UTC

head link

[R] Multiple sets of proportion tests

Thank you for clarifying this point but my main question was about how to modify
my code to do the analysis correctly. The code I mentioned :-

MyResults <- apply(Mydata, 2, function(x)prop.test(Mydata,c(200,100))



Results in this error : 'x' and 'n' must have the same length in
the prop.test(x,n).


How can I modify "x' or "n" arguments so the analysis gives
me the desired output

shown in my previous post ?

________________________________
From: Thierry Onkelinx <thierry.onkelinx at inbo.be>
Sent: 24 November 2017 21:06:39
To: Allaisone 1
Cc: r-help at r-project.org
Subject: Re: [R] Multiple sets of proportion tests

Hi anonymous,

?prop.test states that it returns a list. And one of the element is
'p.value'.  str() on the output of prop.test() reveals that too. So
prop.test()$p.value or prop.test()["p.value"] should work.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Kliniekstraat 25, B-1070 Brussel
www.inbo.be<http://www.inbo.be>

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////


Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging in
Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis.
Vanaf dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000 Brussel.

///////////////////////////////////////////////////////////////////////////////////////////



2017-11-24 12:09 GMT+01:00 Allaisone 1 <allaisone1 at
hotmail.com>:>
> Hi all ,
>
>
> I have a dataframe  of 200 columns and 2 rows. The first row in each column
contains the frequency of cases in group I . The second row in each column
contains the frequency of cases in group II. The frequency of trails is a fixed
value for group I(e.g.200) and it is also another fixed values for group II
(e.g. 100). The dataset looks like this :-
>
>
>> Mydata
>
>
>                                       variable I      variable II   
Variable III  ......... 200
>
> Freq.of cases (gp I)      6493               9375               5524
>
> Freq. of cases (gpII)     509                  462                 54
>
>
>
> The result I need for the first column can be given using this code :
>
>
>  MyResultsI <- prop.test(Mydata$variable I ,c(200,100))
> for the second  column :-
> MyResultsII <- prop.test(Mydata$variable II ,c(200,100))  and so on ..
>
>
> I need to do the analysis for all columns and have only the columns with
significant p-value results to be written in the the third row under each column
so the final output has to be something like this :-
>
>
>                                       variable I        Variable III 
.........
>
> Freq.of cases (gp I)      6493                   5524
>
> Freq. of cases (gpII)     509                      54
>
> p-values                          0.02               0.010
>
> Note, for example, that the 2nd column has bee removed as it resulted in a
non-significant p-value result while col 1 and col 3 were included since p-value
is less than 0.05.
>
> I'm not sure how to get the p-values only without other details but for
the analysis itself , I believe it can be done with apply() function but its not
clear to me how to specify the 2nd argument(n=samlpe sizes) in the prop.test.
>
>  MyResults <- apply(Mydata, 2, function(x)prop.test(Mydata,c(200,100))
>
> How can I modify the "n" argument part to solve the issue of
non-equivalent length between "x" and "n" ?. How can I
modify this further to return only significant p-values results ?. Any help
would be very appreciated ..
>
> Regards
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
	[[alternative HTML version deleted]]

Apparently Analagous Threads

Search for more reasonably related threads

R help - Nov 2017 - Multiple sets of proportion tests

[R] Multiple sets of proportion tests

[R] Multiple sets of proportion tests

[R] Multiple sets of proportion tests

Apparently Analagous Threads