Hi, I have a very large data set (aprox. 100,000 rows.) The data comes from around 10,000 "groups" with about 10 entered per group. The values are in one column, the group ID is an integer in the second column. I want to normalize the values by group: for(g in unique(groups){ x[group==g] / sum(x[group==g]) } This works find in a loop, but is slow. Is there a faster way to do this? Thanks!
Not tested but should work: sums = tapply(x, group, sum); sums.ext = sums[ match(group, names(sums))] normalized = x/sums.ext It may be that the tapply is just as slow as your loop though, I'm not sure. HTH, Peter On Thu, Nov 29, 2012 at 10:55 AM, Noah Silverman <noahsilverman at ucla.edu> wrote:> Hi, > > I have a very large data set (aprox. 100,000 rows.) > > The data comes from around 10,000 "groups" with about 10 entered per group. > > The values are in one column, the group ID is an integer in the second column. > > I want to normalize the values by group: > > for(g in unique(groups){ > x[group==g] / sum(x[group==g]) > } > > This works find in a loop, but is slow. Is there a faster way to do this? > > Thanks! > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Yes, type in: ?by for example: data <- data.frame(fac=factor(c("A","A","B","B")), vec=c(1:4) ) by(data$vec,data$fac, FUN=sum) Best, Mikołaj Hnatiuk 2012/11/29 Noah Silverman <noahsilverman@ucla.edu>> Hi, > > I have a very large data set (aprox. 100,000 rows.) > > The data comes from around 10,000 "groups" with about 10 entered per group. > > The values are in one column, the group ID is an integer in the second > column. > > I want to normalize the values by group: > > for(g in unique(groups){ > x[group==g] / sum(x[group==g]) > } > > This works find in a loop, but is slow. Is there a faster way to do this? > > Thanks! > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hello, If yopu want one value per group use tapply(), if you want one value per value of x use ave() tapply(x, group, FUN = function(.x) .x/sum(.x)) ave(x, group, FUN = function(.x) .x/sum(.x)) Hope this helps, Rui Barradas Em 29-11-2012 18:55, Noah Silverman escreveu:> Hi, > > I have a very large data set (aprox. 100,000 rows.) > > The data comes from around 10,000 "groups" with about 10 entered per group. > > The values are in one column, the group ID is an integer in the second column. > > I want to normalize the values by group: > > for(g in unique(groups){ > x[group==g] / sum(x[group==g]) > } > > This works find in a loop, but is slow. Is there a faster way to do this? > > Thanks! > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
try the 'data.table' package. Takes about 0.1 seconds to normalize the data.> x <- data.frame(id = sample(10000, 100000, TRUE), value = runif(100000)) > require(data.table)Loading required package: data.table data.table 1.8.2 For help type: help("data.table")> system.time({+ x <- data.table(x) + newX <- x[ + , list(value = value # keep original value + , normValue = value / sum(value) + ) + , by = id + ] + }) user system elapsed 0.03 0.01 0.11> > head(newX, 20)id value normValue 1: 8094 0.6805425 0.101140797 2: 8094 0.3154233 0.046877543 3: 8094 0.8998646 0.133735993 4: 8094 0.8858863 0.131658564 5: 8094 0.1859526 0.027635892 6: 8094 0.4694456 0.069768023 7: 8094 0.9302886 0.138257544 8: 8094 0.7482040 0.111196505 9: 8094 0.9052426 0.134535255 10: 8094 0.4650028 0.069107739 11: 8094 0.2428116 0.036086145 12: 6287 0.1979209 0.037505820 13: 6287 0.5117723 0.096980353 14: 6287 0.6425769 0.121767688 15: 6287 0.0397795 0.007538177 16: 6287 0.1255722 0.023795811 17: 6287 0.5606742 0.106247214 18: 6287 0.4818579 0.091311594 19: 6287 0.3913614 0.074162596 20: 6287 0.4622984 0.087605098>On Thu, Nov 29, 2012 at 1:55 PM, Noah Silverman <noahsilverman at ucla.edu> wrote:> Hi, > > I have a very large data set (aprox. 100,000 rows.) > > The data comes from around 10,000 "groups" with about 10 entered per group. > > The values are in one column, the group ID is an integer in the second column. > > I want to normalize the values by group: > > for(g in unique(groups){ > x[group==g] / sum(x[group==g]) > } > > This works find in a loop, but is slow. Is there a faster way to do this? > > Thanks! > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.
On 29-11-2012, at 19:55, Noah Silverman wrote:> Hi, > > I have a very large data set (aprox. 100,000 rows.) > > The data comes from around 10,000 "groups" with about 10 entered per group. > > The values are in one column, the group ID is an integer in the second column. > > I want to normalize the values by group: > > for(g in unique(groups){ > x[group==g] / sum(x[group==g]) > } > > This works find in a loop, but is slow. Is there a faster way to do this?Toy example: gx <- data.frame(group=rep(1:4,each=3), x=1:12) gx gx$x <- ave(gx$x, gx$group, FUN=function(x) x/sum(x)) gx Berend
HI All, I am very new to R tool. Can some one please suggest me some tutorial links for understanding SVM using R. Regards, Vivek
On 30.11.2012 05:08, vivek kumar singh wrote:> HI All, > > I am very new to R tool. Can some one please suggest me some tutorial > links for understanding SVM using R.After reading some textbook about the SVM, go ahead and look for ?svm in package e1071. Best, Uwe Ligges> > Regards, > Vivek > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.