---------- Forwarded message ---------- From: Dr. Venkatesh <drvenki at liv.ac.uk> Date: Sun, May 9, 2010 at 4:55 AM Subject: R apply() help -urgent To: r-help at r-project.org I have a file with 4873 rows of 1s or 0s and has 26 alphabets (A-Z) as columns. the 27th column also has 1s and 0s but stands for a different variable (pLoss). columns 1 and 2 are not significant and hence lets ignore them for now. here is how the file looks Cat GL A B C D E F G H I J K L M N O P Q R S T U V W X Y Z pLoss H 5 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 E 5 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 P 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 P 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 F 6 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 E 4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 H 5 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 J 4 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 J 4 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 E 5 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 S 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 .. .. .. .. .. .. Alphabets A-Z stand for different categories of protein families and pLoss stands for their presence or absence in an animal. I intend to do Fisher's test for 26 individual 2X2 tables constructed from each of these alphabets vs pLoss. For example, here is what I did for alphabet A and then B and then C.... so on. (I have attached R-input.csv for your perusal)> data1 <- read.table("R_input.csv", header = T) > datatable <- table(data1$A, data1$pLoss) #create a new datatable2 or 3with table(data1$B.. or (data1$C.. and so on> datatable0 1 0 31 4821 1 0 21 now run the Fisher's test for these datatables one by one for the 26 alphabets :( fisher.test(datatable), ... fisher.test(datatable2)... in this case, the task is just for 26 columns.. so I can do it manually. But I would like to do an automated extraction and fisher's test for all the columns. I tried reading the tutorials and trying a few examples. Cant really come up with anything sensible. How can I use apply() in this regard? or is there any other way, a loop may be? to solve this issue. Please help. Thanks a million in advance, Dr Venkatesh Patel School of Biological Sciences University of Liverpool United Kingdom -- ? There can be miracles when you believe ?? Though hope is frail, it's hard to kill ? Who knows what miracles you can achieve when you believe ?? Somehow you will.. when you believe!! ? -- ? There can be miracles when you believe ?? Though hope is frail, it's hard to kill ? Who knows what miracles you can achieve when you believe ?? Somehow you will.. when you believe!! ?
On May 9, 2010, at 12:30 AM, Venkatesh Patel wrote:> ---------- Forwarded message ---------- > From: Dr. Venkatesh <drvenki at liv.ac.uk> > Date: Sun, May 9, 2010 at 4:55 AM > Subject: R apply() help -urgent > To: r-help at r-project.org > > > I have a file with 4873 rows of 1s or 0s and has 26 alphabets (A-Z) as > columns. the 27th column also has 1s and 0s but stands for a different > variable (pLoss). columns 1 and 2 are not significant and hence lets > ignore > them for now. > > here is how the file looks > > Cat GL A B C D E F G H I J K L M N > O P Q > R S T U V W X Y Z pLoss > H 5 0 0 0 0 0 0 0 1 0 0 0 0 0 0 > 0 0 0 > 0 0 0 0 0 0 0 0 0 1 > E 5 0 0 0 0 1 0 0 0 0 0 0 0 0 0 > 0 0 0 > 0 0 0 0 0 0 0 0 0 1 > P 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 1 0 > 0 0 0 0 0 0 0 0 0 1 > snipped > .. > ..> Alphabets A-Z stand for different categories of protein families and > pLoss > stands for their presence or absence in an animal. > > I intend to do Fisher's test for 26 individual 2X2 tables > constructed from > each of these alphabets vs pLoss. > > For example, here is what I did for alphabet A and then B and then > C.... so > on. (I have attached R-input.csv for your perusal) > >> data1 <- read.table("R_input.csv", header = T) >> datatable <- table(data1$A, data1$pLoss) #create a new datatable2 >> or 3 > with table(data1$B.. or (data1$C.. and so on >> datatable > > 0 1 > 0 31 4821 > 1 0 21 > > now run the Fisher's test for these datatables one by one for the 26 > alphabets :( > > fisher.test(datatable), ... fisher.test(datatable2)... > > in this case, the task is just for 26 columns.. so I can do it > manually. > > But I would like to do an automated extraction and fisher's test for > all the > columns.tbl.list <- apply(dfrm[, 3:29] , 2, function(x) table(x, dfrm$pLoss)) lapply(tbl.list, function(x) if (nrow(x) >1 && ncol(x) >1) fisher.test(x))> > I tried reading the tutorials and trying a few examples. Cant really > come up > with anything sensible. > > How can I use apply() in this regard? or is there any other way, a > loop may > be? to solve this issue. > > Please help. > > Thanks a million in advance, > > Dr Venkatesh Patel > School of Biological Sciences > University of Liverpool > United Kingdom > > > -- > ? There can be miracles when you believe ?? Though hope is > frail, it's hard > to kill ? Who knows what miracles you can achieve when you believe > ?? > Somehow you will.. when you believe!! ? > > > > -- > ? There can be miracles when you believe ?? Though hope is > frail, it's hard > to kill ? Who knows what miracles you can achieve when you believe > ?? > Somehow you will.. when you believe!! ? > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Hi,
Instead of accessing the data by its name in data.frame, you may use 
index by "[[ ]]" as well. For your case,  the looping may looks like
for  (i in 1:26){
    datatable <- table(data1[[i+2]], data1$pLoss) #create a new 
datatable2 or 3
    fisher.test(datatable)
}
Bests,
Ruihong
On 05/09/2010 06:30 AM, Venkatesh Patel wrote:> ---------- Forwarded message ----------
> From: Dr. Venkatesh<drvenki@liv.ac.uk>
> Date: Sun, May 9, 2010 at 4:55 AM
> Subject: R apply() help -urgent
> To: r-help@r-project.org
>
>
> I have a file with 4873 rows of 1s or 0s and has 26 alphabets (A-Z) as
> columns. the 27th column also has 1s and 0s but stands for a different
> variable (pLoss). columns 1 and 2 are not significant and hence lets ignore
> them for now.
>
> here is how the file looks
>
> Cat    GL  A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P  
Q
>    R   S   T   U   V   W   X   Y   Z     pLoss
> H      5   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> E      5   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> P      6   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1  
0
>    0   0   0   0   0   0   0   0   0     1
> P      5   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1  
0
>    0   0   0   0   0   0   0   0   0     1
> F      6   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> E      4   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> H      5   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> J      4   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> J      4   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> E      5   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> S      6   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
>    0   1   0   0   0   0   0   0   0     1
> ..
> ..
> ..
> ..
> ..
> ..
>
> Alphabets A-Z stand for different categories of protein families and pLoss
> stands for their presence or absence in an animal.
>
> I intend to do Fisher's test for 26 individual 2X2 tables constructed
from
> each of these alphabets vs pLoss.
>
> For example, here is what I did for alphabet A and then B and then C.... so
> on. (I have attached R-input.csv for your perusal)
>
>    
>> data1<- read.table("R_input.csv", header = T)
>> datatable<- table(data1$A, data1$pLoss) #create a new datatable2 or
3
>>      
> with table(data1$B.. or  (data1$C.. and so on
>    
>> datatable
>>      
>         0    1
>    0   31 4821
>    1    0   21
>
> now run the Fisher's test for these datatables one by one for the 26
> alphabets :(
>
> fisher.test(datatable), ... fisher.test(datatable2)...
>
> in this case, the task is just for 26 columns.. so I can do it manually.
>
> But I would like to do an automated extraction and fisher's test for
all the
> columns.
>
> I tried reading the tutorials and trying a few examples. Cant really come
up
> with anything sensible.
>
> How can I use apply() in this regard? or is there any other way, a loop may
> be? to solve this issue.
>
> Please help.
>
> Thanks a million in advance,
>
> Dr Venkatesh Patel
> School of Biological Sciences
> University of Liverpool
> United Kingdom
>
>
>    
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>    
-- 
Ph.D. Student / Chair of Econometrics
Humboldt Universität zu Berlin
http://amor.cms.hu-berlin.de/~huangrui/
	[[alternative HTML version deleted]]
Venkatesh Is this what you are looking for? # Example data df=data.frame(A=c(1,0,0,0,1),B=c(1,1,0,0,0),C=c(1,0,0,0,0),Val=c(1,0,1,0,1)) # Variation of code of David Winsemius tbl = lapply(df[, 1:3] , function(x) table(x, df$Val)) fet = lapply(tbl, function(x) fisher.test(x)) # Identify internal objects of fet names(fet$A) # get p values p.val = do.call(rbind,fet)[,1] p.val.df=data.frame(pval=matrix(unlist(p.val))) # get conf int ci = do.call(rbind,fet)[,2] ci.df=data.frame(matrix(unlist(ci),byrow=TRUE,ncol=2)) # add back id of letter id = data.frame(idletter=colnames(df[,-4])) # results res = data.frame(id,pvalue=p.val.df,confint=ci.df) There is undoubtly a more elegant way of handling getting the p-values and conf int than my attempt. HTH Pete -- View this message in context: http://r.789695.n4.nabble.com/Fwd-R-apply-help-urgent-tp2164281p2164867.html Sent from the R help mailing list archive at Nabble.com.
Set up a function for the fisher.test on a 2x2 table and then include 
this in the apply function for columns as in the example below. The 
result is a list with names A to Z
# set up a dummy data set with 100 rows
Cat<-LETTERS[sample(1:6,100, replace=T)]
GL<-sample(1:6, 100, replace=T)
dat<-matrix(sample(c(0,1),100*27, replace=T), nrow=100)
colnames(dat)<-c(LETTERS[1:26],"pLoss")
data1<-data.frame(Cat, GL, dat)
# define function fro fisher.test
ff<-function(x,y){
fisher.test(table(x,y))
}
# apply function to columns A to Z
results<-apply(data1[,LETTERS[1:26]],2, ff, y=data1[,"pLoss"])
# the results are in the form of a list with names A to Z
results$C
On 19:59, Venkatesh Patel wrote:> ---------- Forwarded message ----------
> From: Dr. Venkatesh<drvenki at liv.ac.uk>
> Date: Sun, May 9, 2010 at 4:55 AM
> Subject: R apply() help -urgent
> To: r-help at r-project.org
>
>
> I have a file with 4873 rows of 1s or 0s and has 26 alphabets (A-Z) as
> columns. the 27th column also has 1s and 0s but stands for a different
> variable (pLoss). columns 1 and 2 are not significant and hence lets ignore
> them for now.
>
> here is how the file looks
>
> Cat    GL  A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P  
Q
>    R   S   T   U   V   W   X   Y   Z     pLoss
> H      5   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> E      5   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> P      6   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1  
0
>    0   0   0   0   0   0   0   0   0     1
> P      5   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1  
0
>    0   0   0   0   0   0   0   0   0     1
> F      6   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> E      4   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> H      5   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> J      4   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> J      4   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> E      5   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> S      6   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
>    0   1   0   0   0   0   0   0   0     1
> ..
> ..
> ..
> ..
> ..
> ..
>
> Alphabets A-Z stand for different categories of protein families and pLoss
> stands for their presence or absence in an animal.
>
> I intend to do Fisher's test for 26 individual 2X2 tables constructed
from
> each of these alphabets vs pLoss.
>
> For example, here is what I did for alphabet A and then B and then C.... so
> on. (I have attached R-input.csv for your perusal)
>
>    
>> data1<- read.table("R_input.csv", header = T)
>> datatable<- table(data1$A, data1$pLoss) #create a new datatable2 or
3
>>      
> with table(data1$B.. or  (data1$C.. and so on
>    
>> datatable
>>      
>         0    1
>    0   31 4821
>    1    0   21
>
> now run the Fisher's test for these datatables one by one for the 26
> alphabets :(
>
> fisher.test(datatable), ... fisher.test(datatable2)...
>
> in this case, the task is just for 26 columns.. so I can do it manually.
>
> But I would like to do an automated extraction and fisher's test for
all the
> columns.
>
> I tried reading the tutorials and trying a few examples. Cant really come
up
> with anything sensible.
>
> How can I use apply() in this regard? or is there any other way, a loop may
> be? to solve this issue.
>
> Please help.
>
> Thanks a million in advance,
>
> Dr Venkatesh Patel
> School of Biological Sciences
> University of Liverpool
> United Kingdom
>
>
>