Hi all , I have a dataframe of 200 columns and 2 rows. The first row in each column contains the frequency of cases in group I . The second row in each column contains the frequency of cases in group II. The frequency of trails is a fixed value for group I(e.g.200) and it is also another fixed values for group II (e.g. 100). The dataset looks like this :-> Mydatavariable I variable II Variable III ......... 200 Freq.of cases (gp I) 6493 9375 5524 Freq. of cases (gpII) 509 462 54 The result I need for the first column can be given using this code : MyResultsI <- prop.test(Mydata$variable I ,c(200,100)) for the second column :- MyResultsII <- prop.test(Mydata$variable II ,c(200,100)) and so on .. I need to do the analysis for all columns and have only the columns with significant p-value results to be written in the the third row under each column so the final output has to be something like this :- variable I Variable III ......... Freq.of cases (gp I) 6493 5524 Freq. of cases (gpII) 509 54 p-values 0.02 0.010 Note, for example, that the 2nd column has bee removed as it resulted in a non-significant p-value result while col 1 and col 3 were included since p-value is less than 0.05. I'm not sure how to get the p-values only without other details but for the analysis itself , I believe it can be done with apply() function but its not clear to me how to specify the 2nd argument(n=samlpe sizes) in the prop.test. MyResults <- apply(Mydata, 2, function(x)prop.test(Mydata,c(200,100)) How can I modify the "n" argument part to solve the issue of non-equivalent length between "x" and "n" ?. How can I modify this further to return only significant p-values results ?. Any help would be very appreciated .. Regards [[alternative HTML version deleted]]
Hi anonymous, ?prop.test states that it returns a list. And one of the element is 'p.value'. str() on the output of prop.test() reveals that too. So prop.test()$p.value or prop.test()["p.value"] should work. Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Kliniekstraat 25, B-1070 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging in Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis. Vanaf dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000 Brussel. /////////////////////////////////////////////////////////////////////////////////////////// 2017-11-24 12:09 GMT+01:00 Allaisone 1 <allaisone1 at hotmail.com>:> > Hi all , > > > I have a dataframe of 200 columns and 2 rows. The first row in each column contains the frequency of cases in group I . The second row in each column contains the frequency of cases in group II. The frequency of trails is a fixed value for group I(e.g.200) and it is also another fixed values for group II (e.g. 100). The dataset looks like this :- > > >> Mydata > > > variable I variable II Variable III ......... 200 > > Freq.of cases (gp I) 6493 9375 5524 > > Freq. of cases (gpII) 509 462 54 > > > > The result I need for the first column can be given using this code : > > > MyResultsI <- prop.test(Mydata$variable I ,c(200,100)) > for the second column :- > MyResultsII <- prop.test(Mydata$variable II ,c(200,100)) and so on .. > > > I need to do the analysis for all columns and have only the columns with significant p-value results to be written in the the third row under each column so the final output has to be something like this :- > > > variable I Variable III ......... > > Freq.of cases (gp I) 6493 5524 > > Freq. of cases (gpII) 509 54 > > p-values 0.02 0.010 > > Note, for example, that the 2nd column has bee removed as it resulted in a non-significant p-value result while col 1 and col 3 were included since p-value is less than 0.05. > > I'm not sure how to get the p-values only without other details but for the analysis itself , I believe it can be done with apply() function but its not clear to me how to specify the 2nd argument(n=samlpe sizes) in the prop.test. > > MyResults <- apply(Mydata, 2, function(x)prop.test(Mydata,c(200,100)) > > How can I modify the "n" argument part to solve the issue of non-equivalent length between "x" and "n" ?. How can I modify this further to return only significant p-values results ?. Any help would be very appreciated .. > > Regards > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Thank you for clarifying this point but my main question was about how to modify my code to do the analysis correctly. The code I mentioned :- MyResults <- apply(Mydata, 2, function(x)prop.test(Mydata,c(200,100)) Results in this error : 'x' and 'n' must have the same length in the prop.test(x,n). How can I modify "x' or "n" arguments so the analysis gives me the desired output shown in my previous post ? ________________________________ From: Thierry Onkelinx <thierry.onkelinx at inbo.be> Sent: 24 November 2017 21:06:39 To: Allaisone 1 Cc: r-help at r-project.org Subject: Re: [R] Multiple sets of proportion tests Hi anonymous, ?prop.test states that it returns a list. And one of the element is 'p.value'. str() on the output of prop.test() reveals that too. So prop.test()$p.value or prop.test()["p.value"] should work. Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Kliniekstraat 25, B-1070 Brussel www.inbo.be<http://www.inbo.be> /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// Van 14 tot en met 19 december 2017 verhuizen we uit onze vestiging in Brussel naar het Herman Teirlinckgebouw op de site Thurn & Taxis. Vanaf dan ben je welkom op het nieuwe adres: Havenlaan 88 bus 73, 1000 Brussel. /////////////////////////////////////////////////////////////////////////////////////////// 2017-11-24 12:09 GMT+01:00 Allaisone 1 <allaisone1 at hotmail.com>:> > Hi all , > > > I have a dataframe of 200 columns and 2 rows. The first row in each column contains the frequency of cases in group I . The second row in each column contains the frequency of cases in group II. The frequency of trails is a fixed value for group I(e.g.200) and it is also another fixed values for group II (e.g. 100). The dataset looks like this :- > > >> Mydata > > > variable I variable II Variable III ......... 200 > > Freq.of cases (gp I) 6493 9375 5524 > > Freq. of cases (gpII) 509 462 54 > > > > The result I need for the first column can be given using this code : > > > MyResultsI <- prop.test(Mydata$variable I ,c(200,100)) > for the second column :- > MyResultsII <- prop.test(Mydata$variable II ,c(200,100)) and so on .. > > > I need to do the analysis for all columns and have only the columns with significant p-value results to be written in the the third row under each column so the final output has to be something like this :- > > > variable I Variable III ......... > > Freq.of cases (gp I) 6493 5524 > > Freq. of cases (gpII) 509 54 > > p-values 0.02 0.010 > > Note, for example, that the 2nd column has bee removed as it resulted in a non-significant p-value result while col 1 and col 3 were included since p-value is less than 0.05. > > I'm not sure how to get the p-values only without other details but for the analysis itself , I believe it can be done with apply() function but its not clear to me how to specify the 2nd argument(n=samlpe sizes) in the prop.test. > > MyResults <- apply(Mydata, 2, function(x)prop.test(Mydata,c(200,100)) > > How can I modify the "n" argument part to solve the issue of non-equivalent length between "x" and "n" ?. How can I modify this further to return only significant p-values results ?. Any help would be very appreciated .. > > Regards > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]