Hi, I want to be able to create a vector of z-scores from a vector of continuous data, conditional on a group membership vector. Say you have 20 numbers distributed normally with a mean of 50 and an sd of 10: x <- rnorm(20, 50, 10) Then you have a vector that delineates 2 groups within x: group <- sort(rep(c("A", "B"), 10)) test.data <- data.frame(cbind(x, group)) I know that if you break up the x vector into 2 different vectors then it becomes easy to calculate the z scores for each vector, then you stack them and append them to the original data frame. Is there anyway to apply this sort of calculation without splitting the original vector up? I tried a really complex ifelse statement but it didn't seem to work. Thanks in advance, Matthew Dubins
Hello - First, I doubt you really want to cbind() those two vectors within the data.frame() function call. test.data <- data.frame(x, group) is probably what you want. That may be the source of your trouble. If you really want a vector returned, the following should work given your test.data is constructed without the cbind(): unlist(by(test.data$x, test.data$group, function(x) (x - mean(x)) / sd(x)), use.names = FALSE) Is that what you're after? Erik Matthew Dubins wrote:> Hi, > > I want to be able to create a vector of z-scores from a vector of > continuous data, conditional on a group membership vector. > > Say you have 20 numbers distributed normally with a mean of 50 and an sd > of 10: > > x <- rnorm(20, 50, 10) > > > Then you have a vector that delineates 2 groups within x: > > group <- sort(rep(c("A", "B"), 10)) > > test.data <- data.frame(cbind(x, group)) > > I know that if you break up the x vector into 2 different vectors then > it becomes easy to calculate the z scores for each vector, then you > stack them and append them to the original > data frame. Is there anyway to apply this sort of calculation without > splitting the original vector up? I tried a really complex ifelse > statement but it didn't seem to work. > > Thanks in advance, > Matthew Dubins > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Wayne.W.Jones at shell.com
2007-Sep-27 06:49 UTC
[R] Getting group-wise standard scores of a vector
tapply is also very useful: my.df<-data.frame(x=rnorm(20, 50, 10),group=factor(sort(rep(c("A", "B"), 10)))) tapply(my.df$x,my.df$group,function(x){(x-mean(x))/sd(x)}) -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]On Behalf Of Matthew Dubins Sent: 26 September 2007 21:57 To: r-help at r-project.org Subject: [R] Getting group-wise standard scores of a vector Hi, I want to be able to create a vector of z-scores from a vector of continuous data, conditional on a group membership vector. Say you have 20 numbers distributed normally with a mean of 50 and an sd of 10: x <- rnorm(20, 50, 10) Then you have a vector that delineates 2 groups within x: group <- sort(rep(c("A", "B"), 10)) test.data <- data.frame(cbind(x, group)) I know that if you break up the x vector into 2 different vectors then it becomes easy to calculate the z scores for each vector, then you stack them and append them to the original data frame. Is there anyway to apply this sort of calculation without splitting the original vector up? I tried a really complex ifelse statement but it didn't seem to work. Thanks in advance, Matthew Dubins ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Apparently Analagous Threads
- column-wise z-scores by group
- R routines vs. MATLAB/SPSS Routines
- Recoding scores of negatively worded item
- Plotting numbers at a specified decimal length on a plot()
- In factor analysis in the psych package, how can I work out which factors the columns in $scores relate to? How do I know what each of the scores is scoring?