thr3ads.net - R help - [R] Z score [Oct 2012]

If this information is useful, please help other people find it:
Share via:

Vedant Sharma

2012-Oct-24 06:17 UTC

[R] Z score

Hi,

I need to find the z-score of the data present in a speardsheet. The values
needs to be calculated for each gene across the samples (refer the
example). And, it should be a simple thing, but I am unable to do it right
now !

The example re the structure of the spreadsheet is  -

# Example:

MyFile <- read.csv(
text"Names,'Sample_1','Sample_2','Sample_3'
Gene_1,87,77,88
Gene_2,98,22,34
Gene_3,33,43,33
Gene_4,78,,81
", header=TRUE, row.names=1, as.is=TRUE, quote="'",
na.strings="" )

And, I think this formula that can be used for z score is -

(x-mean(x))/sd(x)

And, apply() function for rows should work. But bottomline - I am unable to
do it correctly.

Could you show me - using apply () or some other alternative function.

Thank you.

Cheers,
Ved

	[[alternative HTML version deleted]]

Rui Barradas

2012-Oct-24 10:17 UTC

head link

[R] Z score

Hello,

Try the following.

apply(MyFile, 1, scale)

Hope this helps,

Rui Barradas
Em 24-10-2012 07:17, Vedant Sharma escreveu:> Hi,
>
> I need to find the z-score of the data present in a speardsheet. The values
> needs to be calculated for each gene across the samples (refer the
> example). And, it should be a simple thing, but I am unable to do it right
> now !
>
> The example re the structure of the spreadsheet is  -
>
> # Example:
>
> MyFile <- read.csv( text>
"Names,'Sample_1','Sample_2','Sample_3'
> Gene_1,87,77,88
> Gene_2,98,22,34
> Gene_3,33,43,33
> Gene_4,78,,81
> ", header=TRUE, row.names=1, as.is=TRUE, quote="'",
na.strings="" )
>
> And, I think this formula that can be used for z score is -
>
> (x-mean(x))/sd(x)
>
> And, apply() function for rows should work. But bottomline - I am unable to
> do it correctly.
>
> Could you show me - using apply () or some other alternative function.
>
> Thank you.
>
> Cheers,
> Ved
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

arun

2012-Oct-24 10:36 UTC

head link

[R] Z score

Hi,
Try this:
?res<-do.call(rbind,lapply(lapply(apply(MyFile,1,function(x)
x[!is.na(x)]),function(x) (x-mean(x))/sd(x)),function(x)
x[c("Sample_1","Sample_2","Sample_3")]))
?res
#???????? Sample_1?? Sample_2?? Sample_3
#Gene_1? 0.4931970 -1.1507929? 0.6575959
#Gene_2? 1.1421818 -0.7179429 -0.4242390
#Gene_3 -0.5773503? 1.1547005 -0.5773503
#Gene_4 -0.7071068???????? NA? 0.7071068
A.K.





----- Original Message -----
From: Vedant Sharma <vedantgeet at gmail.com>
To: R help <r-help at r-project.org>
Cc: 
Sent: Wednesday, October 24, 2012 2:17 AM
Subject: [R] Z score

Hi,

I need to find the z-score of the data present in a speardsheet. The values
needs to be calculated for each gene across the samples (refer the
example). And, it should be a simple thing, but I am unable to do it right
now !

The example re the structure of the spreadsheet is? -

# Example:

MyFile <- read.csv(
text"Names,'Sample_1','Sample_2','Sample_3'
Gene_1,87,77,88
Gene_2,98,22,34
Gene_3,33,43,33
Gene_4,78,,81
", header=TRUE, row.names=1, as.is=TRUE, quote="'",
na.strings="" )

And, I think this formula that can be used for z score is -

(x-mean(x))/sd(x)

And, apply() function for rows should work. But bottomline - I am unable to
do it correctly.

Could you show me - using apply () or some other alternative function.

Thank you.

Cheers,
Ved

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

arun

2012-Oct-24 10:53 UTC

head link

[R] Z score

Hi,

In cases, with more sample columns, you could also use this:
?res2<-t(sapply(lapply(apply(MyFile,1,function(x) x[!is.na(x)]),function(x)
(x-mean(x))/sd(x)),function(x) x[colnames(MyFile)] ))
res2
?# ?????? Sample_1?? Sample_2?? Sample_3
#Gene_1? 0.4931970 -1.1507929? 0.6575959
#Gene_2? 1.1421818 -0.7179429 -0.4242390
#Gene_3 -0.5773503? 1.1547005 -0.5773503
#Gene_4 -0.7071068???????? NA? 0.7071068
A.K.



----- Original Message -----
From: Vedant Sharma <vedantgeet at gmail.com>
To: R help <r-help at r-project.org>
Cc: 
Sent: Wednesday, October 24, 2012 2:17 AM
Subject: [R] Z score

Hi,

I need to find the z-score of the data present in a speardsheet. The values
needs to be calculated for each gene across the samples (refer the
example). And, it should be a simple thing, but I am unable to do it right
now !

The example re the structure of the spreadsheet is? -

# Example:

MyFile <- read.csv(
text"Names,'Sample_1','Sample_2','Sample_3'
Gene_1,87,77,88
Gene_2,98,22,34
Gene_3,33,43,33
Gene_4,78,,81
", header=TRUE, row.names=1, as.is=TRUE, quote="'",
na.strings="" )

And, I think this formula that can be used for z score is -

(x-mean(x))/sd(x)

And, apply() function for rows should work. But bottomline - I am unable to
do it correctly.

Could you show me - using apply () or some other alternative function.

Thank you.

Cheers,
Ved

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

arun

2012-Oct-25 01:06 UTC

head link

[R] Z score

Hi Ved,

Sorry, I didn't test it well enough at that time.? 

In your example file,
?#there were NAs
MyFile1 <- read.csv(
text"Names,'Sample_1','Sample_2','Sample_3'
Gene_1,87,77,88
Gene_2,98,22,34
Gene_3,33,43,33
Gene_4,78,,81
", header=TRUE, row.names=1, as.is=TRUE, quote="'",
na.strings="" )


#Here, the apply() function outputs a list when I remove the NA from the last
row.
?apply(MyFile1,1,function(x) x[!is.na(x)]) #outputs a list
#$Gene_1
#Sample_1 Sample_2 Sample_3 
?#???? 87?????? 77?????? 88 

#$Gene_2
#Sample_1 Sample_2 Sample_3 
?#???? 98?????? 22?????? 34 

#$Gene_3
#Sample_1 Sample_2 Sample_3 
?#???? 33?????? 43?????? 33 

#$Gene_4
#Sample_1 Sample_3 
?#???? 78?????? 81 

# Without NAs
MyFile2 <- read.csv(
text"Names,'Sample_1','Sample_2','Sample_3'
Gene_1,87,77,88
Gene_2,98,22,34
Gene_3,33,43,33
Gene_4,78,48,81
", header=TRUE, row.names=1, as.is=TRUE, quote="'",
na.strings="" )

apply(dat3,1,function(x) x[!is.na(x)]) # the output is a matrix
#???????? Gene_1 Gene_2 Gene_3 Gene_4
#Sample_1???? 87???? 98???? 33???? 78
#Sample_2???? 77???? 22???? 43???? 48
#Sample_3???? 88???? 34???? 33???? 81
is.matrix(apply(dat3,1,function(x) x[!is.na(x)]) )
#[1] TRUE

#Consider another case
MyFile3 <- read.csv(
text"Names,'Sample_1','Sample_2','Sample_3'
Gene_1,87,77,88
Gene_2,,22,34
Gene_3,33,43,33
Gene_4,78,,81
", header=TRUE, row.names=1, as.is=TRUE, quote="'",
na.strings="" )

t(sapply(lapply(apply(MyFile3,1,function(x) x[!is.na(x)]),function(x)
(x-mean(x))/sd(x)),function(x) x[colnames(MyFile3)] )) #works because the
apply() output is a list
#??????? Sample_1?? Sample_2?? Sample_3
#Gene_1? 0.4931970 -1.1507929? 0.6575959
#Gene_2???????? NA -0.7071068? 0.7071068
#Gene_3 -0.5773503? 1.1547005 -0.5773503
#Gene_4 -0.7071068???????? NA? 0.7071068


#Yet another case:
MyFile4 <- read.csv(
text"Names,'Sample_1','Sample_2','Sample_3'
Gene_1,87,77
Gene_2,,22,34
Gene_3,33,,33
Gene_4,78,,81
", header=TRUE, row.names=1, as.is=TRUE, quote="'",
na.strings="" )
?apply(MyFile4,1,function(x) x[!is.na(x)]) #output is a matrix because equal
number of NAs were present in each row
#???? Gene_1 Gene_2 Gene_3 Gene_4
#[1,]???? 87???? 22???? 33???? 78
#[2,]???? 77???? 34???? 33???? 81
t(sapply(lapply(apply(MyFile4,1,function(x) x[!is.na(x)]),function(x)
(x-mean(x))/sd(x)),function(x) x[colnames(MyFile4)] )) #doesn't work



#In your dataset, there were no NAs
dat1<-read.csv("Bcl2_With_expressions.csv",sep="\t",row.names=1)
MyFile<-dat1[,-1]

?str(apply(MyFile,1,function(x) x[!is.na(x)])) # a matrix 
# num [1:29, 1:18] 10.48 10.96 9.28 11.1 10.95 ...
?#- attr(*, "dimnames")=List of 2
?# ..$ : chr [1:29] "ALL2" "MLL8" "ALL42"
"MLL5" ...
?# ..$ : chr [1:18] "BAX" "BCL2L15" "BCL2"
"BMF" ...

#In this case, 
either
?res2<-apply(MyFile,1,function(x) (x-mean(x))/sd(x))

#or

?res1<-apply(apply(MyFile,1,function(x) x[!is.na(x)]),2,function(x)
(x-mean(x))/sd(x)) #works

?
?identical(res1,res2)
#[1] TRUE

?head(res1,2)
?# ???????? BAX?? BCL2L15???? BCL2??????? BMF??????? BAD????? MCL1???? BCL2L1
#ALL2 0.1216373 -0.215256 1.040758 -0.4078606 -0.2427741 0.6967070 -0.1054749
#MLL8 0.6565878 -1.446252 1.052566 -0.1825442 -0.2312166 0.9882503 -0.9687260
? # ???????? BOK???? BCL2A1??? BCL2L14?????? BAK1????? BBC3??? BCL2L11
#ALL2 -0.1465807? 0.5353133 -0.1772439 -0.3751981 0.6341806 -1.2432273
#MLL8? 0.2918296 -0.8466821? 0.3088331 -1.4025846 0.7056799? 0.9944288
? # ???????? BID???? NOXA1??????? BIK????????? HRK??? BCL2L2
#ALL2 -2.2961643 0.2105960 -0.9195998 -0.001731806 1.6691590
#MLL8 -0.5103087 0.3433778? 1.2352986 -0.568548518 0.3674839


Hope it helps
A.K.







________________________________
From: Vedant Sharma <vedantgeet at gmail.com>
To: arun <smartpink111 at yahoo.com> 
Sent: Wednesday, October 24, 2012 7:56 PM
Subject: Re: [R] Z score


Hello Arun,

Thank you. I could manage to get the answer. 

However, this particular code, however, doesn't seem to work when I try to
read from a .csv file (as attached). And, I am inquisitive to find out the
reason !

MyFile <- read.csv (file.choose(), header=T, row.names=1)
MyFile <- MyFile [,-1]
res2<-t(sapply(lapply(apply(MyFile,1,function(x) x[!is.na(x)]),function(x)
(x-mean(x))/sd(x)),function(x) x[colnames(MyFile)] ))

Thanks again !! 

Cheers,
Ved

============================================

On Wed, Oct 24, 2012 at 9:53 PM, arun <smartpink111 at yahoo.com> wrote:

Hi,>
>In cases, with more sample columns, you could also use this:
>?res2<-t(sapply(lapply(apply(MyFile,1,function(x)
x[!is.na(x)]),function(x) (x-mean(x))/sd(x)),function(x) x[colnames(MyFile)] ))
>res2
>
>?# ?????? Sample_1?? Sample_2?? Sample_3
>#Gene_1? 0.4931970 -1.1507929? 0.6575959
>#Gene_2? 1.1421818 -0.7179429 -0.4242390
>#Gene_3 -0.5773503? 1.1547005 -0.5773503
>#Gene_4 -0.7071068???????? NA? 0.7071068
>A.K.
>
>
>
>----- Original Message -----
>From: Vedant Sharma <vedantgeet at gmail.com>
>To: R help <r-help at r-project.org>
>Cc:
>Sent: Wednesday, October 24, 2012 2:17 AM
>Subject: [R] Z score
>
>
>Hi,
>
>I need to find the z-score of the data present in a speardsheet. The values
>needs to be calculated for each gene across the samples (refer the
>example). And, it should be a simple thing, but I am unable to do it right
>now !
>
>The example re the structure of the spreadsheet is? -
>
># Example:
>
>MyFile <- read.csv(
text>"Names,'Sample_1','Sample_2','Sample_3'
>Gene_1,87,77,88
>Gene_2,98,22,34
>Gene_3,33,43,33
>Gene_4,78,,81
>", header=TRUE, row.names=1, as.is=TRUE, quote="'",
na.strings="" )
>
>And, I think this formula that can be used for z score is -
>
>(x-mean(x))/sd(x)
>
>And, apply() function for rows should work. But bottomline - I am unable to
>do it correctly.
>
>Could you show me - using apply () or some other alternative function.
>
>Thank you.
>
>Cheers,
>Ved
>
>
>??? [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>

R help - Oct 2012 - Z score

[R] Z score

[R] Z score

[R] Z score

[R] Z score

[R] Z score

Seemingly Similar Threads