---------- Forwarded message ---------- From: Dr. Venkatesh <drvenki at liv.ac.uk> Date: Sun, May 9, 2010 at 4:55 AM Subject: R apply() help -urgent To: r-help at r-project.org I have a file with 4873 rows of 1s or 0s and has 26 alphabets (A-Z) as columns. the 27th column also has 1s and 0s but stands for a different variable (pLoss). columns 1 and 2 are not significant and hence lets ignore them for now. here is how the file looks Cat GL A B C D E F G H I J K L M N O P Q R S T U V W X Y Z pLoss H 5 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 E 5 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 P 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 P 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 F 6 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 E 4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 H 5 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 J 4 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 J 4 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 E 5 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 S 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 .. .. .. .. .. .. Alphabets A-Z stand for different categories of protein families and pLoss stands for their presence or absence in an animal. I intend to do Fisher's test for 26 individual 2X2 tables constructed from each of these alphabets vs pLoss. For example, here is what I did for alphabet A and then B and then C.... so on. (I have attached R-input.csv for your perusal)> data1 <- read.table("R_input.csv", header = T) > datatable <- table(data1$A, data1$pLoss) #create a new datatable2 or 3with table(data1$B.. or (data1$C.. and so on> datatable0 1 0 31 4821 1 0 21 now run the Fisher's test for these datatables one by one for the 26 alphabets :( fisher.test(datatable), ... fisher.test(datatable2)... in this case, the task is just for 26 columns.. so I can do it manually. But I would like to do an automated extraction and fisher's test for all the columns. I tried reading the tutorials and trying a few examples. Cant really come up with anything sensible. How can I use apply() in this regard? or is there any other way, a loop may be? to solve this issue. Please help. Thanks a million in advance, Dr Venkatesh Patel School of Biological Sciences University of Liverpool United Kingdom -- ? There can be miracles when you believe ?? Though hope is frail, it's hard to kill ? Who knows what miracles you can achieve when you believe ?? Somehow you will.. when you believe!! ? -- ? There can be miracles when you believe ?? Though hope is frail, it's hard to kill ? Who knows what miracles you can achieve when you believe ?? Somehow you will.. when you believe!! ?
On May 9, 2010, at 12:30 AM, Venkatesh Patel wrote:> ---------- Forwarded message ---------- > From: Dr. Venkatesh <drvenki at liv.ac.uk> > Date: Sun, May 9, 2010 at 4:55 AM > Subject: R apply() help -urgent > To: r-help at r-project.org > > > I have a file with 4873 rows of 1s or 0s and has 26 alphabets (A-Z) as > columns. the 27th column also has 1s and 0s but stands for a different > variable (pLoss). columns 1 and 2 are not significant and hence lets > ignore > them for now. > > here is how the file looks > > Cat GL A B C D E F G H I J K L M N > O P Q > R S T U V W X Y Z pLoss > H 5 0 0 0 0 0 0 0 1 0 0 0 0 0 0 > 0 0 0 > 0 0 0 0 0 0 0 0 0 1 > E 5 0 0 0 0 1 0 0 0 0 0 0 0 0 0 > 0 0 0 > 0 0 0 0 0 0 0 0 0 1 > P 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 1 0 > 0 0 0 0 0 0 0 0 0 1 > snipped > .. > ..> Alphabets A-Z stand for different categories of protein families and > pLoss > stands for their presence or absence in an animal. > > I intend to do Fisher's test for 26 individual 2X2 tables > constructed from > each of these alphabets vs pLoss. > > For example, here is what I did for alphabet A and then B and then > C.... so > on. (I have attached R-input.csv for your perusal) > >> data1 <- read.table("R_input.csv", header = T) >> datatable <- table(data1$A, data1$pLoss) #create a new datatable2 >> or 3 > with table(data1$B.. or (data1$C.. and so on >> datatable > > 0 1 > 0 31 4821 > 1 0 21 > > now run the Fisher's test for these datatables one by one for the 26 > alphabets :( > > fisher.test(datatable), ... fisher.test(datatable2)... > > in this case, the task is just for 26 columns.. so I can do it > manually. > > But I would like to do an automated extraction and fisher's test for > all the > columns.tbl.list <- apply(dfrm[, 3:29] , 2, function(x) table(x, dfrm$pLoss)) lapply(tbl.list, function(x) if (nrow(x) >1 && ncol(x) >1) fisher.test(x))> > I tried reading the tutorials and trying a few examples. Cant really > come up > with anything sensible. > > How can I use apply() in this regard? or is there any other way, a > loop may > be? to solve this issue. > > Please help. > > Thanks a million in advance, > > Dr Venkatesh Patel > School of Biological Sciences > University of Liverpool > United Kingdom > > > -- > ? There can be miracles when you believe ?? Though hope is > frail, it's hard > to kill ? Who knows what miracles you can achieve when you believe > ?? > Somehow you will.. when you believe!! ? > > > > -- > ? There can be miracles when you believe ?? Though hope is > frail, it's hard > to kill ? Who knows what miracles you can achieve when you believe > ?? > Somehow you will.. when you believe!! ? > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Hi, Instead of accessing the data by its name in data.frame, you may use index by "[[ ]]" as well. For your case, the looping may looks like for (i in 1:26){ datatable <- table(data1[[i+2]], data1$pLoss) #create a new datatable2 or 3 fisher.test(datatable) } Bests, Ruihong On 05/09/2010 06:30 AM, Venkatesh Patel wrote:> ---------- Forwarded message ---------- > From: Dr. Venkatesh<drvenki@liv.ac.uk> > Date: Sun, May 9, 2010 at 4:55 AM > Subject: R apply() help -urgent > To: r-help@r-project.org > > > I have a file with 4873 rows of 1s or 0s and has 26 alphabets (A-Z) as > columns. the 27th column also has 1s and 0s but stands for a different > variable (pLoss). columns 1 and 2 are not significant and hence lets ignore > them for now. > > here is how the file looks > > Cat GL A B C D E F G H I J K L M N O P Q > R S T U V W X Y Z pLoss > H 5 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 1 > E 5 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 1 > P 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 > 0 0 0 0 0 0 0 0 0 1 > P 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 > 0 0 0 0 0 0 0 0 0 1 > F 6 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 1 > E 4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 1 > H 5 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 1 > J 4 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 1 > J 4 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 1 > E 5 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 1 > S 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 1 0 0 0 0 0 0 0 1 > .. > .. > .. > .. > .. > .. > > Alphabets A-Z stand for different categories of protein families and pLoss > stands for their presence or absence in an animal. > > I intend to do Fisher's test for 26 individual 2X2 tables constructed from > each of these alphabets vs pLoss. > > For example, here is what I did for alphabet A and then B and then C.... so > on. (I have attached R-input.csv for your perusal) > > >> data1<- read.table("R_input.csv", header = T) >> datatable<- table(data1$A, data1$pLoss) #create a new datatable2 or 3 >> > with table(data1$B.. or (data1$C.. and so on > >> datatable >> > 0 1 > 0 31 4821 > 1 0 21 > > now run the Fisher's test for these datatables one by one for the 26 > alphabets :( > > fisher.test(datatable), ... fisher.test(datatable2)... > > in this case, the task is just for 26 columns.. so I can do it manually. > > But I would like to do an automated extraction and fisher's test for all the > columns. > > I tried reading the tutorials and trying a few examples. Cant really come up > with anything sensible. > > How can I use apply() in this regard? or is there any other way, a loop may > be? to solve this issue. > > Please help. > > Thanks a million in advance, > > Dr Venkatesh Patel > School of Biological Sciences > University of Liverpool > United Kingdom > > > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ph.D. Student / Chair of Econometrics Humboldt Universität zu Berlin http://amor.cms.hu-berlin.de/~huangrui/ [[alternative HTML version deleted]]
Venkatesh Is this what you are looking for? # Example data df=data.frame(A=c(1,0,0,0,1),B=c(1,1,0,0,0),C=c(1,0,0,0,0),Val=c(1,0,1,0,1)) # Variation of code of David Winsemius tbl = lapply(df[, 1:3] , function(x) table(x, df$Val)) fet = lapply(tbl, function(x) fisher.test(x)) # Identify internal objects of fet names(fet$A) # get p values p.val = do.call(rbind,fet)[,1] p.val.df=data.frame(pval=matrix(unlist(p.val))) # get conf int ci = do.call(rbind,fet)[,2] ci.df=data.frame(matrix(unlist(ci),byrow=TRUE,ncol=2)) # add back id of letter id = data.frame(idletter=colnames(df[,-4])) # results res = data.frame(id,pvalue=p.val.df,confint=ci.df) There is undoubtly a more elegant way of handling getting the p-values and conf int than my attempt. HTH Pete -- View this message in context: http://r.789695.n4.nabble.com/Fwd-R-apply-help-urgent-tp2164281p2164867.html Sent from the R help mailing list archive at Nabble.com.
Set up a function for the fisher.test on a 2x2 table and then include this in the apply function for columns as in the example below. The result is a list with names A to Z # set up a dummy data set with 100 rows Cat<-LETTERS[sample(1:6,100, replace=T)] GL<-sample(1:6, 100, replace=T) dat<-matrix(sample(c(0,1),100*27, replace=T), nrow=100) colnames(dat)<-c(LETTERS[1:26],"pLoss") data1<-data.frame(Cat, GL, dat) # define function fro fisher.test ff<-function(x,y){ fisher.test(table(x,y)) } # apply function to columns A to Z results<-apply(data1[,LETTERS[1:26]],2, ff, y=data1[,"pLoss"]) # the results are in the form of a list with names A to Z results$C On 19:59, Venkatesh Patel wrote:> ---------- Forwarded message ---------- > From: Dr. Venkatesh<drvenki at liv.ac.uk> > Date: Sun, May 9, 2010 at 4:55 AM > Subject: R apply() help -urgent > To: r-help at r-project.org > > > I have a file with 4873 rows of 1s or 0s and has 26 alphabets (A-Z) as > columns. the 27th column also has 1s and 0s but stands for a different > variable (pLoss). columns 1 and 2 are not significant and hence lets ignore > them for now. > > here is how the file looks > > Cat GL A B C D E F G H I J K L M N O P Q > R S T U V W X Y Z pLoss > H 5 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 1 > E 5 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 1 > P 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 > 0 0 0 0 0 0 0 0 0 1 > P 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 > 0 0 0 0 0 0 0 0 0 1 > F 6 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 1 > E 4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 1 > H 5 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 1 > J 4 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 1 > J 4 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 1 > E 5 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 1 > S 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 1 0 0 0 0 0 0 0 1 > .. > .. > .. > .. > .. > .. > > Alphabets A-Z stand for different categories of protein families and pLoss > stands for their presence or absence in an animal. > > I intend to do Fisher's test for 26 individual 2X2 tables constructed from > each of these alphabets vs pLoss. > > For example, here is what I did for alphabet A and then B and then C.... so > on. (I have attached R-input.csv for your perusal) > > >> data1<- read.table("R_input.csv", header = T) >> datatable<- table(data1$A, data1$pLoss) #create a new datatable2 or 3 >> > with table(data1$B.. or (data1$C.. and so on > >> datatable >> > 0 1 > 0 31 4821 > 1 0 21 > > now run the Fisher's test for these datatables one by one for the 26 > alphabets :( > > fisher.test(datatable), ... fisher.test(datatable2)... > > in this case, the task is just for 26 columns.. so I can do it manually. > > But I would like to do an automated extraction and fisher's test for all the > columns. > > I tried reading the tutorials and trying a few examples. Cant really come up > with anything sensible. > > How can I use apply() in this regard? or is there any other way, a loop may > be? to solve this issue. > > Please help. > > Thanks a million in advance, > > Dr Venkatesh Patel > School of Biological Sciences > University of Liverpool > United Kingdom > > >