Hi, I need a bit of guidance with the sapply function. I've read the help page, but am still a bit unsure how to use it. I have a large data frame with about 100 columns and 30,000 rows. One of the columns is "group" of which there are about 2,000 distinct "groups". I want to normalize (sum to 1) one of my variables per-group. Normally, I would just write a huge "for each" loop, but have read that is hugely inefficient with R. The old way would be (just an example, syntax might not be perfect): for (group in data$group){ for (score in data[data$group == group]){ new_score <- score / sum(data$score[data$group==group]) } } How would I simplify this with sapply? Thanks! -- Noah
Try this: data$score <- ave(data$score, data$group, FUN = prop.table) On Sun, Aug 30, 2009 at 6:08 PM, Noah Silverman<noah at smartmediacorp.com> wrote:> Hi, > > I need a bit of guidance with the sapply function. ?I've read the help page, > but am still a bit unsure how to use it. > > I have a large data frame with about 100 columns and 30,000 rows. ?One of > the columns is "group" of which there are about 2,000 distinct "groups". > > I want to normalize (sum to 1) one of my variables per-group. > > Normally, I would just write a huge "for each" loop, but have read that is > hugely inefficient with R. > > The old way would be (just an example, syntax might not be perfect): > > for (group in data$group){ > ? ?for (score in data[data$group == group]){ > ? ? ? ?new_score <- score / sum(data$score[data$group==group]) > ? ?} > } > > How would I simplify this with sapply? > > Thanks! > > -- > Noah > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On 30/08/2009 6:08 PM, Noah Silverman wrote:> Hi, > > I need a bit of guidance with the sapply function. I've read the help > page, but am still a bit unsure how to use it. > > I have a large data frame with about 100 columns and 30,000 rows. One > of the columns is "group" of which there are about 2,000 distinct "groups". > > I want to normalize (sum to 1) one of my variables per-group. > > Normally, I would just write a huge "for each" loop, but have read that > is hugely inefficient with R.Don't believe what you read, try it. If the for loop takes 100 times longer than the fastest method, but it still only takes 10 seconds, is it worth optimizing? Duncan Murdoch> > The old way would be (just an example, syntax might not be perfect): > > for (group in data$group){ > for (score in data[data$group == group]){ > new_score <- score / sum(data$score[data$group==group]) > } > } > > How would I simplify this with sapply? > > Thanks! > > -- > Noah > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Sun, Aug 30, 2009 at 5:08 PM, Noah Silverman<noah at smartmediacorp.com> wrote:> Hi, > > I need a bit of guidance with the sapply function. ?I've read the help page, > but am still a bit unsure how to use it. > > I have a large data frame with about 100 columns and 30,000 rows. ?One of > the columns is "group" of which there are about 2,000 distinct "groups". > > I want to normalize (sum to 1) one of my variables per-group. > > Normally, I would just write a huge "for each" loop, but have read that is > hugely inefficient with R. > > The old way would be (just an example, syntax might not be perfect): > > for (group in data$group){ > ? ?for (score in data[data$group == group]){ > ? ? ? ?new_score <- score / sum(data$score[data$group==group]) > ? ?} > }It might be easier to use ddply from the plyr package. The command you want would be: data <- ddply(data, "group", transform, score = score / sum(score)) More information at http://had.co.nz/plyr. Hadley -- http://had.co.nz/