thr3ads.net - R help - [R] Fwd: R apply() help -urgent [May 2010]

If this information is useful, please help other people find it:
Share via:

Venkatesh Patel

2010-May-09 04:30 UTC

[R] Fwd: R apply() help -urgent

---------- Forwarded message ----------
From: Dr. Venkatesh <drvenki at liv.ac.uk>
Date: Sun, May 9, 2010 at 4:55 AM
Subject: R apply() help -urgent
To: r-help at r-project.org


I have a file with 4873 rows of 1s or 0s and has 26 alphabets (A-Z) as
columns. the 27th column also has 1s and 0s but stands for a different
variable (pLoss). columns 1 and 2 are not significant and hence lets ignore
them for now.

here is how the file looks

Cat    GL  A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q
  R   S   T   U   V   W   X   Y   Z     pLoss
H      5   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0     1
E      5   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0     1
P      6   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0
  0   0   0   0   0   0   0   0   0     1
P      5   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0
  0   0   0   0   0   0   0   0   0     1
F      6   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0     1
E      4   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0     1
H      5   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0     1
J      4   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0     1
J      4   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0     1
E      5   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0     1
S      6   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   1   0   0   0   0   0   0   0     1
..
..
..
..
..
..

Alphabets A-Z stand for different categories of protein families and pLoss
stands for their presence or absence in an animal.

I intend to do Fisher's test for 26 individual 2X2 tables constructed from
each of these alphabets vs pLoss.

For example, here is what I did for alphabet A and then B and then C.... so
on. (I have attached R-input.csv for your perusal)
> data1 <- read.table("R_input.csv", header = T)
> datatable <- table(data1$A, data1$pLoss) #create a new datatable2 or 3
with table(data1$B.. or  (data1$C.. and so on> datatable
       0    1
  0   31 4821
  1    0   21

now run the Fisher's test for these datatables one by one for the 26
alphabets :(

fisher.test(datatable), ... fisher.test(datatable2)...

in this case, the task is just for 26 columns.. so I can do it manually.

But I would like to do an automated extraction and fisher's test for all the
columns.

I tried reading the tutorials and trying a few examples. Cant really come up
with anything sensible.

How can I use apply() in this regard? or is there any other way, a loop may
be? to solve this issue.

Please help.

Thanks a million in advance,

Dr Venkatesh Patel
School of Biological Sciences
University of Liverpool
United Kingdom


-- 
? There can be miracles when you believe ?? Though hope is frail, it's hard
to kill ? Who knows what miracles you can achieve when you believe ??
Somehow you will.. when you believe!! ?



-- 
? There can be miracles when you believe ?? Though hope is frail, it's hard
to kill ? Who knows what miracles you can achieve when you believe ??
Somehow you will.. when you believe!! ?

David Winsemius

2010-May-09 14:53 UTC

head link

[R] Fwd: R apply() help -urgent

On May 9, 2010, at 12:30 AM, Venkatesh Patel wrote:
> ---------- Forwarded message ----------
> From: Dr. Venkatesh <drvenki at liv.ac.uk>
> Date: Sun, May 9, 2010 at 4:55 AM
> Subject: R apply() help -urgent
> To: r-help at r-project.org
>
>
> I have a file with 4873 rows of 1s or 0s and has 26 alphabets (A-Z) as
> columns. the 27th column also has 1s and 0s but stands for a different
> variable (pLoss). columns 1 and 2 are not significant and hence lets  
> ignore
> them for now.
>
> here is how the file looks
>
> Cat    GL  A   B   C   D   E   F   G   H   I   J   K   L   M   N    
> O   P   Q
>  R   S   T   U   V   W   X   Y   Z     pLoss
> H      5   0   0   0   0   0   0   0   1   0   0   0   0   0   0    
> 0   0   0
>  0   0   0   0   0   0   0   0   0     1
> E      5   0   0   0   0   1   0   0   0   0   0   0   0   0   0    
> 0   0   0
>  0   0   0   0   0   0   0   0   0     1
> P      6   0   0   0   0   0   0   0   0   0   0   0   0   0   0    
> 0   1   0
>  0   0   0   0   0   0   0   0   0     1
> snipped
> ..
> ..
> Alphabets A-Z stand for different categories of protein families and  
> pLoss
> stands for their presence or absence in an animal.
>
> I intend to do Fisher's test for 26 individual 2X2 tables  
> constructed from
> each of these alphabets vs pLoss.
>
> For example, here is what I did for alphabet A and then B and then  
> C.... so
> on. (I have attached R-input.csv for your perusal)
>
>> data1 <- read.table("R_input.csv", header = T)
>> datatable <- table(data1$A, data1$pLoss) #create a new datatable2  
>> or 3
> with table(data1$B.. or  (data1$C.. and so on
>> datatable
>
>       0    1
>  0   31 4821
>  1    0   21
>
> now run the Fisher's test for these datatables one by one for the 26
> alphabets :(
>
> fisher.test(datatable), ... fisher.test(datatable2)...
>
> in this case, the task is just for 26 columns.. so I can do it  
> manually.
>
> But I would like to do an automated extraction and fisher's test for  
> all the
> columns.
  tbl.list <- apply(dfrm[, 3:29] , 2, function(x) table(x, dfrm$pLoss))
  lapply(tbl.list, function(x) if (nrow(x) >1 && ncol(x) >1)  
fisher.test(x))
>
> I tried reading the tutorials and trying a few examples. Cant really  
> come up
> with anything sensible.
>
> How can I use apply() in this regard? or is there any other way, a  
> loop may
> be? to solve this issue.
>
> Please help.
>
> Thanks a million in advance,
>
> Dr Venkatesh Patel
> School of Biological Sciences
> University of Liverpool
> United Kingdom
>
>
> -- 
> ? There can be miracles when you believe ?? Though hope is  
> frail, it's hard
> to kill ? Who knows what miracles you can achieve when you believe  
> ??
> Somehow you will.. when you believe!! ?
>
>
>
> -- 
> ? There can be miracles when you believe ?? Though hope is  
> frail, it's hard
> to kill ? Who knows what miracles you can achieve when you believe  
> ??
> Somehow you will.. when you believe!! ?
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT

Ruihong Huang

2010-May-09 15:32 UTC

head link

[R] Fwd: R apply() help -urgent

Hi,

Instead of accessing the data by its name in data.frame, you may use 
index by "[[ ]]" as well. For your case,  the looping may looks like

for  (i in 1:26){
    datatable <- table(data1[[i+2]], data1$pLoss) #create a new 
datatable2 or 3
    fisher.test(datatable)
}

Bests,
Ruihong

On 05/09/2010 06:30 AM, Venkatesh Patel wrote:> ---------- Forwarded message ----------
> From: Dr. Venkatesh<drvenki@liv.ac.uk>
> Date: Sun, May 9, 2010 at 4:55 AM
> Subject: R apply() help -urgent
> To: r-help@r-project.org
>
>
> I have a file with 4873 rows of 1s or 0s and has 26 alphabets (A-Z) as
> columns. the 27th column also has 1s and 0s but stands for a different
> variable (pLoss). columns 1 and 2 are not significant and hence lets ignore
> them for now.
>
> here is how the file looks
>
> Cat    GL  A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P  
Q
>    R   S   T   U   V   W   X   Y   Z     pLoss
> H      5   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> E      5   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> P      6   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1  
0
>    0   0   0   0   0   0   0   0   0     1
> P      5   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1  
0
>    0   0   0   0   0   0   0   0   0     1
> F      6   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> E      4   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> H      5   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> J      4   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> J      4   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> E      5   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> S      6   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
>    0   1   0   0   0   0   0   0   0     1
> ..
> ..
> ..
> ..
> ..
> ..
>
> Alphabets A-Z stand for different categories of protein families and pLoss
> stands for their presence or absence in an animal.
>
> I intend to do Fisher's test for 26 individual 2X2 tables constructed
from
> each of these alphabets vs pLoss.
>
> For example, here is what I did for alphabet A and then B and then C.... so
> on. (I have attached R-input.csv for your perusal)
>
>    
>> data1<- read.table("R_input.csv", header = T)
>> datatable<- table(data1$A, data1$pLoss) #create a new datatable2 or
3
>>      
> with table(data1$B.. or  (data1$C.. and so on
>    
>> datatable
>>      
>         0    1
>    0   31 4821
>    1    0   21
>
> now run the Fisher's test for these datatables one by one for the 26
> alphabets :(
>
> fisher.test(datatable), ... fisher.test(datatable2)...
>
> in this case, the task is just for 26 columns.. so I can do it manually.
>
> But I would like to do an automated extraction and fisher's test for
all the
> columns.
>
> I tried reading the tutorials and trying a few examples. Cant really come
up
> with anything sensible.
>
> How can I use apply() in this regard? or is there any other way, a loop may
> be? to solve this issue.
>
> Please help.
>
> Thanks a million in advance,
>
> Dr Venkatesh Patel
> School of Biological Sciences
> University of Liverpool
> United Kingdom
>
>
>    
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>    

-- 
Ph.D. Student / Chair of Econometrics
Humboldt Universität zu Berlin
http://amor.cms.hu-berlin.de/~huangrui/

	[[alternative HTML version deleted]]

Pete B

2010-May-10 01:43 UTC

head link

[R] Fwd: R apply() help -urgent

Venkatesh

Is this what you are looking for?

# Example data
df=data.frame(A=c(1,0,0,0,1),B=c(1,1,0,0,0),C=c(1,0,0,0,0),Val=c(1,0,1,0,1))

# Variation of code of David Winsemius
tbl = lapply(df[, 1:3] , function(x) table(x, df$Val)) 
fet = lapply(tbl, function(x) fisher.test(x))

# Identify internal objects of fet
names(fet$A)

# get p values
p.val =  do.call(rbind,fet)[,1]
p.val.df=data.frame(pval=matrix(unlist(p.val)))

# get conf int
ci = do.call(rbind,fet)[,2]
ci.df=data.frame(matrix(unlist(ci),byrow=TRUE,ncol=2))

# add back id of letter
id = data.frame(idletter=colnames(df[,-4]))

# results
res = data.frame(id,pvalue=p.val.df,confint=ci.df)


There is undoubtly a more elegant way of handling getting the p-values and
conf int than my attempt.

HTH

Pete

-- 
View this message in context:
http://r.789695.n4.nabble.com/Fwd-R-apply-help-urgent-tp2164281p2164867.html
Sent from the R help mailing list archive at Nabble.com.

Mike White

2010-May-11 08:04 UTC

head link

[R] Fwd: R apply() help -urgent

Set up a function for the fisher.test on a 2x2 table and then include 
this in the apply function for columns as in the example below. The 
result is a list with names A to Z

# set up a dummy data set with 100 rows
Cat<-LETTERS[sample(1:6,100, replace=T)]
GL<-sample(1:6, 100, replace=T)
dat<-matrix(sample(c(0,1),100*27, replace=T), nrow=100)
colnames(dat)<-c(LETTERS[1:26],"pLoss")
data1<-data.frame(Cat, GL, dat)

# define function fro fisher.test
ff<-function(x,y){
fisher.test(table(x,y))
}

# apply function to columns A to Z
results<-apply(data1[,LETTERS[1:26]],2, ff, y=data1[,"pLoss"])
# the results are in the form of a list with names A to Z
results$C


On 19:59, Venkatesh Patel wrote:> ---------- Forwarded message ----------
> From: Dr. Venkatesh<drvenki at liv.ac.uk>
> Date: Sun, May 9, 2010 at 4:55 AM
> Subject: R apply() help -urgent
> To: r-help at r-project.org
>
>
> I have a file with 4873 rows of 1s or 0s and has 26 alphabets (A-Z) as
> columns. the 27th column also has 1s and 0s but stands for a different
> variable (pLoss). columns 1 and 2 are not significant and hence lets ignore
> them for now.
>
> here is how the file looks
>
> Cat    GL  A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P  
Q
>    R   S   T   U   V   W   X   Y   Z     pLoss
> H      5   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> E      5   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> P      6   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1  
0
>    0   0   0   0   0   0   0   0   0     1
> P      5   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1  
0
>    0   0   0   0   0   0   0   0   0     1
> F      6   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> E      4   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> H      5   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> J      4   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> J      4   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> E      5   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0  
0
>    0   0   0   0   0   0   0   0   0     1
> S      6   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
>    0   1   0   0   0   0   0   0   0     1
> ..
> ..
> ..
> ..
> ..
> ..
>
> Alphabets A-Z stand for different categories of protein families and pLoss
> stands for their presence or absence in an animal.
>
> I intend to do Fisher's test for 26 individual 2X2 tables constructed
from
> each of these alphabets vs pLoss.
>
> For example, here is what I did for alphabet A and then B and then C.... so
> on. (I have attached R-input.csv for your perusal)
>
>    
>> data1<- read.table("R_input.csv", header = T)
>> datatable<- table(data1$A, data1$pLoss) #create a new datatable2 or
3
>>      
> with table(data1$B.. or  (data1$C.. and so on
>    
>> datatable
>>      
>         0    1
>    0   31 4821
>    1    0   21
>
> now run the Fisher's test for these datatables one by one for the 26
> alphabets :(
>
> fisher.test(datatable), ... fisher.test(datatable2)...
>
> in this case, the task is just for 26 columns.. so I can do it manually.
>
> But I would like to do an automated extraction and fisher's test for
all the
> columns.
>
> I tried reading the tutorials and trying a few examples. Cant really come
up
> with anything sensible.
>
> How can I use apply() in this regard? or is there any other way, a loop may
> be? to solve this issue.
>
> Please help.
>
> Thanks a million in advance,
>
> Dr Venkatesh Patel
> School of Biological Sciences
> University of Liverpool
> United Kingdom
>
>
>

Possibly Parallel Threads

Search for more possibly parallel threads

R help - May 2010 - Fwd: R apply() help -urgent

[R] Fwd: R apply() help -urgent

[R] Fwd: R apply() help -urgent

[R] Fwd: R apply() help -urgent

[R] Fwd: R apply() help -urgent

[R] Fwd: R apply() help -urgent

Possibly Parallel Threads