Hi, All,
I wonder anyone can help me find a faster algorithm to
get the values of unique ID (most ID has 2-3 values,
varies).
My data looks like:
ID Values
1 250
2 300
1 251
3 5000
4 600
10 521
3 5500
I would like output to look like:
ID, avg(values), stdev(values), value 1,val 2,val3,...
I used 2 for loops trying to get the values
for (i in 1:n){
value <- NULL
for(j in 1:m){
if(data[j,1] == uniqueid[i]){
value <- c(value, data[j,2])
....
}
}
}
Since both n and m are about 10000, the algorithm is
really slow. I believe there is some function out
there that can do better than this in R.
Thanks
-Jindan
tst.df <- data.frame(ID=rep(1:2, 2), Values=1:4) tapply(tst.df$Values, tst.df$ID, mean) tapply(tst.df$Values, tst.df$ID, sd) Is this what you want? (Time it using "start.time <- proc.time()" before and "elapsed.time <- proc.time()-start.time" after.) Spencer Graves Jane Yu wrote:> Hi, All, > I wonder anyone can help me find a faster algorithm to > get the values of unique ID (most ID has 2-3 values, > varies). > My data looks like: > ID Values > 1 250 > 2 300 > 1 251 > 3 5000 > 4 600 > 10 521 > 3 5500 > I would like output to look like: > ID, avg(values), stdev(values), value 1,val 2,val3,... > > I used 2 for loops trying to get the values > for (i in 1:n){ > value <- NULL > for(j in 1:m){ > if(data[j,1] == uniqueid[i]){ > value <- c(value, data[j,2]) > .... > } > } > } > Since both n and m are about 10000, the algorithm is > really slow. I believe there is some function out > there that can do better than this in R. > > Thanks > > -Jindan > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
You might want to try working with split(). For example: df <- data.frame(ID = c(1,1,1,2,2,2), Values = c(40,50,40,60,70,80)) sdf <- split(df, df$ID) lapply(sdf, function(x) with(x, c(ID[1],mean(Values),sd(Values),Values))) Since each ID could take different numbers of values (I assume), you''ll still end up with a "ragged array". -roger _______________________________ UCLA Department of Statistics rpeng at stat.ucla.edu http://www.stat.ucla.edu/~rpeng On Thu, 27 Mar 2003, Jane Yu wrote:> Hi, All, > I wonder anyone can help me find a faster algorithm to > get the values of unique ID (most ID has 2-3 values, > varies). > My data looks like: > ID Values > 1 250 > 2 300 > 1 251 > 3 5000 > 4 600 > 10 521 > 3 5500 > I would like output to look like: > ID, avg(values), stdev(values), value 1,val 2,val3,... > > I used 2 for loops trying to get the values > for (i in 1:n){ > value <- NULL > for(j in 1:m){ > if(data[j,1] == uniqueid[i]){ > value <- c(value, data[j,2]) > .... > } > } > } > Since both n and m are about 10000, the algorithm is > really slow. I believe there is some function out > there that can do better than this in R. > > Thanks > > -Jindan > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >