Hi Noah,
I am unclear if the 0s should be standardized or not---I am assuming
since you want them excluded from the calculation of the mean and SD,
you do not want (0 - M) / sigma. If that is the case, here is an
example:
## read in your data
## FYI: providing via dput() would be easier next time
d <- read.table(textConnection("
code v1 v2
G1 1.2 2.3
G1 0 2.4
G1 1.4 3.4
G2 2.9 2.3
G2 4.3 4.4"), header = TRUE)
closeAllConnections()
## temporary data as a matrix
tmp <- as.matrix(d[-1])
## index 0s and set to missing
tmp[index.0 <- which(tmp == 0, arr.ind = TRUE)] <- NA
## scale by column and d$code and pull back to matrix
tmp <- do.call("rbind", by(tmp, d$code, scale))
## NAs back to 0s
tmp[index.0] <- 0
d[, 2:3] <- tmp
If you want the zeros standardized, it will take a bit of a different
approach. The other issue that could come up here is speed, but that
can get to be very dataset dependent (e.g., what is most efficient for
a few levels of code may not be the same as what is efficient for many
columns, etc. That said, it would not take much work to create a
parallelized version of what by() is doing, and scale is already
vectorized so it works pretty darn fast assuming you pass it a matrix.
Cheers,
Josh
On Sat, Dec 10, 2011 at 1:44 PM, Noah Silverman <noahsilverman at
ucla.edu> wrote:> Hi,
>
> I'm having difficulty coming up with a good way to subest some data to
generate statistics.
>
> My data frame has multiple observations by group.
>
> Here is an overly-simplified toy example of the data
> =========================> code ? ?v1 ? ? ?v2
> G1 ? ? ? ? ? ? ?1.2 ? ? 2.3
> G1 ? ? ? ? ? ? ?0 ? ? ? 2.4
> G1 ? ? ? ? ? ? ?1.4 ? ? 3.4
> G2 ? ? ? ? ? ? ?2.9 ? ? 2.3
> G2 ? ? ? ? ? ? ?4.3 ? ? 4.4
> etc..
> ========================>
> I want to normalize the data *by group* ?for certain variable. ?But, I want
to ignore 0 values when calculating the mean and standard deviation.
>
> What I *want* to do is something like this:
> ======================> ? ? ? ? for (code in unique (d$code) ){
> ? ? ? ? ? ? ? ? mu <- mean( d[which(d[d$code==code,v1] !=0 ), v1] )
> ? ? ? ? ? ? ? ? sig <- sd( d[which(d[d$code==code,v1] !=0 ), v1] )
> ? ? ? ? ? ? ? ? d[which(d[d$code==code,v1] !=0 ), cname] <-
(d[which(d[d$code==code,v1] !=0 ), v1] - mu) / sig
> ? ? ? ? }
> ======================>
> My goal, if it isn't apparent, is to replace values with their
normalized value. ?(But, the statistics used for normalization are calculated
skipping zero values.)
>
> This doesn't work as the indexing from the which command is relative
(1,2,3, etc.)
>
>
> Suggestions?
>
>
>
> --
> Noah Silverman
> UCLA Department of Statistics
> 8208 Math Sciences Building
> Los Angeles, CA 90095
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/