thr3ads.net - R help - [R] Fast Normalize by Group [Nov 2012]

If this information is useful, please help other people find it:
Share via:

Noah Silverman

2012-Nov-29 18:55 UTC

[R] Fast Normalize by Group

Hi,

I have a very large data set (aprox. 100,000 rows.)

The data comes from around 10,000 "groups" with about 10 entered per
group.

The values are in one column, the group ID is an integer in the second column.

I want to normalize the values by group:

for(g in unique(groups){
	x[group==g] / sum(x[group==g])
}

This works find in a loop, but is slow.  Is there a faster way to do this?

Thanks!

Peter Langfelder

2012-Nov-29 19:02 UTC

head link

[R] Fast Normalize by Group

Not tested but should work:

sums = tapply(x, group, sum);
sums.ext = sums[ match(group, names(sums))]
normalized = x/sums.ext

It may be that the tapply is just as slow as your loop though, I'm not sure.

HTH,

Peter


On Thu, Nov 29, 2012 at 10:55 AM, Noah Silverman <noahsilverman at
ucla.edu> wrote:> Hi,
>
> I have a very large data set (aprox. 100,000 rows.)
>
> The data comes from around 10,000 "groups" with about 10 entered
per group.
>
> The values are in one column, the group ID is an integer in the second
column.
>
> I want to normalize the values by group:
>
> for(g in unique(groups){
>         x[group==g] / sum(x[group==g])
> }
>
> This works find in a loop, but is slow.  Is there a faster way to do this?
>
> Thanks!
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Mikołaj Hnatiuk

2012-Nov-29 19:05 UTC

head link

[R] Fast Normalize by Group

Yes, type in:
?by

for example:
data <-
data.frame(fac=factor(c("A","A","B","B")),
vec=c(1:4) )
by(data$vec,data$fac, FUN=sum)

Best,
Mikołaj Hnatiuk

2012/11/29 Noah Silverman <noahsilverman@ucla.edu>
> Hi,
>
> I have a very large data set (aprox. 100,000 rows.)
>
> The data comes from around 10,000 "groups" with about 10 entered
per group.
>
> The values are in one column, the group ID is an integer in the second
> column.
>
> I want to normalize the values by group:
>
> for(g in unique(groups){
>         x[group==g] / sum(x[group==g])
> }
>
> This works find in a loop, but is slow.  Is there a faster way to do this?
>
> Thanks!
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Rui Barradas

2012-Nov-29 19:12 UTC

head link

[R] Fast Normalize by Group

Hello,

If yopu want one value per group use tapply(), if you want one value per 
value of x use ave()

tapply(x, group, FUN = function(.x) .x/sum(.x))
ave(x, group, FUN = function(.x) .x/sum(.x))


Hope this helps,

Rui Barradas
Em 29-11-2012 18:55, Noah Silverman escreveu:> Hi,
>
> I have a very large data set (aprox. 100,000 rows.)
>
> The data comes from around 10,000 "groups" with about 10 entered
per group.
>
> The values are in one column, the group ID is an integer in the second
column.
>
> I want to normalize the values by group:
>
> for(g in unique(groups){
> 	x[group==g] / sum(x[group==g])
> }
>
> This works find in a loop, but is slow.  Is there a faster way to do this?
>
> Thanks!
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

jim holtman

2012-Nov-29 19:13 UTC

head link

[R] Fast Normalize by Group

try the 'data.table' package.  Takes about 0.1 seconds to normalize the
data.
> x <- data.frame(id = sample(10000, 100000, TRUE), value = runif(100000))
> require(data.table)Loading required package: data.table
data.table 1.8.2  For help type:
help("data.table")> system.time({+     x <- data.table(x)
+     newX <- x[
+         , list(value = value  # keep original value
+             , normValue = value / sum(value)
+             )
+         , by = id
+         ]
+ })
   user  system elapsed
   0.03    0.01    0.11>
> head(newX, 20)      id     value   normValue
 1: 8094 0.6805425 0.101140797
 2: 8094 0.3154233 0.046877543
 3: 8094 0.8998646 0.133735993
 4: 8094 0.8858863 0.131658564
 5: 8094 0.1859526 0.027635892
 6: 8094 0.4694456 0.069768023
 7: 8094 0.9302886 0.138257544
 8: 8094 0.7482040 0.111196505
 9: 8094 0.9052426 0.134535255
10: 8094 0.4650028 0.069107739
11: 8094 0.2428116 0.036086145
12: 6287 0.1979209 0.037505820
13: 6287 0.5117723 0.096980353
14: 6287 0.6425769 0.121767688
15: 6287 0.0397795 0.007538177
16: 6287 0.1255722 0.023795811
17: 6287 0.5606742 0.106247214
18: 6287 0.4818579 0.091311594
19: 6287 0.3913614 0.074162596
20: 6287 0.4622984 0.087605098>

On Thu, Nov 29, 2012 at 1:55 PM, Noah Silverman <noahsilverman at
ucla.edu> wrote:> Hi,
>
> I have a very large data set (aprox. 100,000 rows.)
>
> The data comes from around 10,000 "groups" with about 10 entered
per group.
>
> The values are in one column, the group ID is an integer in the second
column.
>
> I want to normalize the values by group:
>
> for(g in unique(groups){
>         x[group==g] / sum(x[group==g])
> }
>
> This works find in a loop, but is slow.  Is there a faster way to do this?
>
> Thanks!
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

Berend Hasselman

2012-Nov-29 19:19 UTC

head link

[R] Fast Normalize by Group

On 29-11-2012, at 19:55, Noah Silverman wrote:
> Hi,
> 
> I have a very large data set (aprox. 100,000 rows.)
> 
> The data comes from around 10,000 "groups" with about 10 entered
per group.
> 
> The values are in one column, the group ID is an integer in the second
column.
> 
> I want to normalize the values by group:
> 
> for(g in unique(groups){
> 	x[group==g] / sum(x[group==g])
> }
> 
> This works find in a loop, but is slow.  Is there a faster way to do this?
Toy example:

gx <- data.frame(group=rep(1:4,each=3), x=1:12)
gx
gx$x <- ave(gx$x, gx$group, FUN=function(x) x/sum(x))
gx


Berend

vivek kumar singh

2012-Nov-30 04:08 UTC

head link

[R] SVM using R

HI All,

I am very new to R tool. Can some one please suggest me some tutorial 
links for understanding SVM using R.

Regards,
Vivek

Uwe Ligges

2012-Dec-01 18:34 UTC

head link

[R] SVM using R

On 30.11.2012 05:08, vivek kumar singh wrote:> HI All,
>
> I am very new to R tool. Can some one please suggest me some tutorial
> links for understanding SVM using R.
After reading some textbook about the SVM, go ahead and look for ?svm in 
package e1071.

Best,
Uwe Ligges
>
> Regards,
> Vivek
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Apparently Analagous Threads

Search for more reasonably related threads

R help - Nov 2012 - Fast Normalize by Group

[R] Fast Normalize by Group

[R] Fast Normalize by Group

[R] Fast Normalize by Group

[R] Fast Normalize by Group

[R] Fast Normalize by Group

[R] Fast Normalize by Group

[R] SVM using R

[R] SVM using R

Apparently Analagous Threads