Hi, All, I wonder anyone can help me find a faster algorithm to get the values of unique ID (most ID has 2-3 values, varies). My data looks like: ID Values 1 250 2 300 1 251 3 5000 4 600 10 521 3 5500 I would like output to look like: ID, avg(values), stdev(values), value 1,val 2,val3,... I used 2 for loops trying to get the values for (i in 1:n){ value <- NULL for(j in 1:m){ if(data[j,1] == uniqueid[i]){ value <- c(value, data[j,2]) .... } } } Since both n and m are about 10000, the algorithm is really slow. I believe there is some function out there that can do better than this in R. Thanks -Jindan
tst.df <- data.frame(ID=rep(1:2, 2), Values=1:4) tapply(tst.df$Values, tst.df$ID, mean) tapply(tst.df$Values, tst.df$ID, sd) Is this what you want? (Time it using "start.time <- proc.time()" before and "elapsed.time <- proc.time()-start.time" after.) Spencer Graves Jane Yu wrote:> Hi, All, > I wonder anyone can help me find a faster algorithm to > get the values of unique ID (most ID has 2-3 values, > varies). > My data looks like: > ID Values > 1 250 > 2 300 > 1 251 > 3 5000 > 4 600 > 10 521 > 3 5500 > I would like output to look like: > ID, avg(values), stdev(values), value 1,val 2,val3,... > > I used 2 for loops trying to get the values > for (i in 1:n){ > value <- NULL > for(j in 1:m){ > if(data[j,1] == uniqueid[i]){ > value <- c(value, data[j,2]) > .... > } > } > } > Since both n and m are about 10000, the algorithm is > really slow. I believe there is some function out > there that can do better than this in R. > > Thanks > > -Jindan > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
You might want to try working with split(). For example: df <- data.frame(ID = c(1,1,1,2,2,2), Values = c(40,50,40,60,70,80)) sdf <- split(df, df$ID) lapply(sdf, function(x) with(x, c(ID[1],mean(Values),sd(Values),Values))) Since each ID could take different numbers of values (I assume), you''ll still end up with a "ragged array". -roger _______________________________ UCLA Department of Statistics rpeng at stat.ucla.edu http://www.stat.ucla.edu/~rpeng On Thu, 27 Mar 2003, Jane Yu wrote:> Hi, All, > I wonder anyone can help me find a faster algorithm to > get the values of unique ID (most ID has 2-3 values, > varies). > My data looks like: > ID Values > 1 250 > 2 300 > 1 251 > 3 5000 > 4 600 > 10 521 > 3 5500 > I would like output to look like: > ID, avg(values), stdev(values), value 1,val 2,val3,... > > I used 2 for loops trying to get the values > for (i in 1:n){ > value <- NULL > for(j in 1:m){ > if(data[j,1] == uniqueid[i]){ > value <- c(value, data[j,2]) > .... > } > } > } > Since both n and m are about 10000, the algorithm is > really slow. I believe there is some function out > there that can do better than this in R. > > Thanks > > -Jindan > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >