Hello, Is there a simple and fast function that returns a vector of the number of replications for each object of a vector ? For example : I have a vector of IDs : ids <- c( "ID1", "ID2", "ID2", "ID3", "ID3","ID3", "ID5") I want the function returns the following vector where each term is the number of replicates for the given id : c( 1, 2, 2, 3,3,3,1 ) Of course I have a vector of more than 40 000 ID and the function I wrote (it orders my data and checks on ID:Name of the data if the next term is the same as the previous one (see below) ) is really slow (30minutes for 44290 terms). But I don't have time by now to write a C function. Thanks a lot for your help, Laetitia. Here is the function I have written maybe I have done something not optimized : repVector <- function(obj){ # order IDName ord <- gif.indexByIDName(obj) ordobj <- obj[ord,] nspots <- nrow(obj) # vector of spot replicates number spotrep <- rep(NA, nspots ) # function to get ID:Name for a given spot spotidname <- function(ind){ paste(ordobj$genes[ind, c("ID","Name") ], collapse=":") } spot <- 1 while( spot < nspots ){ i<-1 while( spotidname(spot) == spotidname(spot + i) ){ i <- i + 1 } spotrep[spot : (spot + i-1)] <- i spot <- spot + i #cat("spot : ",spot,"\n") } obj$genes$spotrep <- spotrep[order(ord)] obj }
Laetitia Marisa wrote:> Hello, > > Is there a simple and fast function that returns a vector of the number > of replications for each object of a vector ? > For example : > I have a vector of IDs : > ids <- c( "ID1", "ID2", "ID2", "ID3", "ID3","ID3", "ID5") > > I want the function returns the following vector where each term is the > number of replicates for the given id : > c( 1, 2, 2, 3,3,3,1 )One-liner: > table(ids)[ids] ids ID1 ID2 ID2 ID3 ID3 ID3 ID5 1 2 2 3 3 3 1 'table(ids)' computes the counts, then the subscripting [ids] looks it all up. Now try it on your 40,000-long vector! Barry
Laetitia Marisa <Laetitia.Marisa at cgm.cnrs-gif.fr> writes:> Hello, > > Is there a simple and fast function that returns a vector of the number > of replications for each object of a vector ? > For example : > I have a vector of IDs : > ids <- c( "ID1", "ID2", "ID2", "ID3", "ID3","ID3", "ID5") > > I want the function returns the following vector where each term is the > number of replicates for the given id : > c( 1, 2, 2, 3,3,3,1 ) > > Of course I have a vector of more than 40 000 ID and the function I > wrote (it orders my data and checks on ID:Name of the data if the next > term is the same as the previous one (see below) ) is really slow > (30minutes for 44290 terms). But I don't have time by now to write a C > function. > Thanks a lot for your help,Will this do it?> table(ids)[ids]ids ID1 ID2 ID2 ID3 ID3 ID3 ID5 1 2 2 3 3 3 1 Or (could be faster):> f <- factor(ids,levels=unique(ids)) > as.vector(table(f))[f][1] 1 2 2 3 3 3 1 -- O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Laetitia Marisa wrote:> Hello, > > Is there a simple and fast function that returns a vector of the number > of replications for each object of a vector ? > For example : > I have a vector of IDs : > ids <- c( "ID1", "ID2", "ID2", "ID3", "ID3","ID3", "ID5") > > I want the function returns the following vector where each term is the > number of replicates for the given id : > c( 1, 2, 2, 3,3,3,1 )> rep(as.vector(table(ids)), as.vector(table(ids))) [1] 1 2 2 3 3 3 1> Of course I have a vector of more than 40 000 ID and the function I > wrote (it orders my data and checks on ID:Name of the data if the next > term is the same as the previous one (see below) ) is really slow > (30minutes for 44290 terms). But I don't have time by now to write a C > function. > Thanks a lot for your help, > > Laetitia. > > > > Here is the function I have written maybe I have done something not > optimized : > > repVector <- function(obj){ > > # order IDName > ord <- gif.indexByIDName(obj) > ordobj <- obj[ord,] > > nspots <- nrow(obj) > # vector of spot replicates number > spotrep <- rep(NA, nspots ) > > # function to get ID:Name for a given spot > spotidname <- function(ind){ > paste(ordobj$genes[ind, c("ID","Name") ], collapse=":") > } > > spot <- 1 > > while( spot < nspots ){ > i<-1 > while( spotidname(spot) == spotidname(spot + i) ){ > > i <- i + 1 > } > > spotrep[spot : (spot + i-1)] <- i > spot <- spot + i > #cat("spot : ",spot,"\n") > } > > obj$genes$spotrep <- spotrep[order(ord)] > > obj > > } > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 452-1424 (M, W, F) fax: (917) 438-0894
This should work:> ids <- c( "ID1", "ID2", "ID2", "ID3", "ID3","ID3", "ID5") > table(ids)[ids]ids ID1 ID2 ID2 ID3 ID3 ID3 ID5 1 2 2 3 3 3 1 Andy From: Laetitia Marisa> > Hello, > > Is there a simple and fast function that returns a vector of > the number > of replications for each object of a vector ? > For example : > I have a vector of IDs : > ids <- c( "ID1", "ID2", "ID2", "ID3", "ID3","ID3", "ID5") > > I want the function returns the following vector where each > term is the > number of replicates for the given id : > c( 1, 2, 2, 3,3,3,1 ) > > Of course I have a vector of more than 40 000 ID and the function I > wrote (it orders my data and checks on ID:Name of the data if > the next > term is the same as the previous one (see below) ) is really slow > (30minutes for 44290 terms). But I don't have time by now to > write a C > function. > Thanks a lot for your help, > > Laetitia. > > > > Here is the function I have written maybe I have done something not > optimized : > > repVector <- function(obj){ > > # order IDName > ord <- gif.indexByIDName(obj) > ordobj <- obj[ord,] > > nspots <- nrow(obj) > # vector of spot replicates number > spotrep <- rep(NA, nspots ) > > # function to get ID:Name for a given spot > spotidname <- function(ind){ > paste(ordobj$genes[ind, c("ID","Name") ], collapse=":") > } > > spot <- 1 > > while( spot < nspots ){ > i<-1 > while( spotidname(spot) == spotidname(spot + i) ){ > > i <- i + 1 > } > > spotrep[spot : (spot + i-1)] <- i > spot <- spot + i > #cat("spot : ",spot,"\n") > } > > obj$genes$spotrep <- spotrep[order(ord)] > > obj > > } > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
Try this: ave(as.numeric(factor(ds)), ds, FUN = length) See ?ave for more info. On 1/24/06, Laetitia Marisa <Laetitia.Marisa at cgm.cnrs-gif.fr> wrote:> Hello, > > Is there a simple and fast function that returns a vector of the number > of replications for each object of a vector ? > For example : > I have a vector of IDs : > ids <- c( "ID1", "ID2", "ID2", "ID3", "ID3","ID3", "ID5") > > I want the function returns the following vector where each term is the > number of replicates for the given id : > c( 1, 2, 2, 3,3,3,1 ) > > Of course I have a vector of more than 40 000 ID and the function I > wrote (it orders my data and checks on ID:Name of the data if the next > term is the same as the previous one (see below) ) is really slow > (30minutes for 44290 terms). But I don't have time by now to write a C > function. > Thanks a lot for your help, > > Laetitia. > > > > Here is the function I have written maybe I have done something not > optimized : > > repVector <- function(obj){ > > # order IDName > ord <- gif.indexByIDName(obj) > ordobj <- obj[ord,] > > nspots <- nrow(obj) > # vector of spot replicates number > spotrep <- rep(NA, nspots ) > > # function to get ID:Name for a given spot > spotidname <- function(ind){ > paste(ordobj$genes[ind, c("ID","Name") ], collapse=":") > } > > spot <- 1 > > while( spot < nspots ){ > i<-1 > while( spotidname(spot) == spotidname(spot + i) ){ > > i <- i + 1 > } > > spotrep[spot : (spot + i-1)] <- i > spot <- spot + i > #cat("spot : ",spot,"\n") > } > > obj$genes$spotrep <- spotrep[order(ord)] > > obj > > } > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
table() -thomas On Tue, 24 Jan 2006, Laetitia Marisa wrote:> Hello, > > Is there a simple and fast function that returns a vector of the number > of replications for each object of a vector ? > For example : > I have a vector of IDs : > ids <- c( "ID1", "ID2", "ID2", "ID3", "ID3","ID3", "ID5") > > I want the function returns the following vector where each term is the > number of replicates for the given id : > c( 1, 2, 2, 3,3,3,1 ) > > Of course I have a vector of more than 40 000 ID and the function I > wrote (it orders my data and checks on ID:Name of the data if the next > term is the same as the previous one (see below) ) is really slow > (30minutes for 44290 terms). But I don't have time by now to write a C > function. > Thanks a lot for your help, > > Laetitia. > > > > Here is the function I have written maybe I have done something not > optimized : > > repVector <- function(obj){ > > # order IDName > ord <- gif.indexByIDName(obj) > ordobj <- obj[ord,] > > nspots <- nrow(obj) > # vector of spot replicates number > spotrep <- rep(NA, nspots ) > > # function to get ID:Name for a given spot > spotidname <- function(ind){ > paste(ordobj$genes[ind, c("ID","Name") ], collapse=":") > } > > spot <- 1 > > while( spot < nspots ){ > i<-1 > while( spotidname(spot) == spotidname(spot + i) ){ > > i <- i + 1 > } > > spotrep[spot : (spot + i-1)] <- i > spot <- spot + i > #cat("spot : ",spot,"\n") > } > > obj$genes$spotrep <- spotrep[order(ord)] > > obj > > } > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle
?table> ids <- c( "ID1", "ID2", "ID2", "ID3", "ID3","ID3", "ID5") > > x <- table(ids) > xids ID1 ID2 ID3 ID5 1 2 3 1> count <- x[ids] # index using the names in the string > countids ID1 ID2 ID2 ID3 ID3 ID3 ID5 1 2 2 3 3 3 1>On 1/24/06, Laetitia Marisa <Laetitia.Marisa@cgm.cnrs-gif.fr> wrote:> > Hello, > > Is there a simple and fast function that returns a vector of the number > of replications for each object of a vector ? > For example : > I have a vector of IDs : > ids <- c( "ID1", "ID2", "ID2", "ID3", "ID3","ID3", "ID5") > > I want the function returns the following vector where each term is the > number of replicates for the given id : > c( 1, 2, 2, 3,3,3,1 ) > > Of course I have a vector of more than 40 000 ID and the function I > wrote (it orders my data and checks on ID:Name of the data if the next > term is the same as the previous one (see below) ) is really slow > (30minutes for 44290 terms). But I don't have time by now to write a C > function. > Thanks a lot for your help, > > Laetitia. > > > > Here is the function I have written maybe I have done something not > optimized : > > repVector <- function(obj){ > > # order IDName > ord <- gif.indexByIDName(obj) > ordobj <- obj[ord,] > > nspots <- nrow(obj) > # vector of spot replicates number > spotrep <- rep(NA, nspots ) > > # function to get ID:Name for a given spot > spotidname <- function(ind){ > paste(ordobj$genes[ind, c("ID","Name") ], collapse=":") > } > > spot <- 1 > > while( spot < nspots ){ > i<-1 > while( spotidname(spot) == spotidname(spot + i) ){ > > i <- i + 1 > } > > spotrep[spot : (spot + i-1)] <- i > spot <- spot + i > #cat("spot : ",spot,"\n") > } > > obj$genes$spotrep <- spotrep[order(ord)] > > obj > > } > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >-- Jim Holtman Cincinnati, OH +1 513 247 0281 What the problem you are trying to solve? [[alternative HTML version deleted]]
Ah. It's a bit more complicated than just table(), because you want the result to be the same length. tt <- table(id) tt[match(id,names(tt))] -thomas On Tue, 24 Jan 2006, Laetitia Marisa wrote:> Hello, > > Is there a simple and fast function that returns a vector of the number > of replications for each object of a vector ? > For example : > I have a vector of IDs : > ids <- c( "ID1", "ID2", "ID2", "ID3", "ID3","ID3", "ID5") > > I want the function returns the following vector where each term is the > number of replicates for the given id : > c( 1, 2, 2, 3,3,3,1 ) > > Of course I have a vector of more than 40 000 ID and the function I > wrote (it orders my data and checks on ID:Name of the data if the next > term is the same as the previous one (see below) ) is really slow > (30minutes for 44290 terms). But I don't have time by now to write a C > function. > Thanks a lot for your help, > > Laetitia. > > > > Here is the function I have written maybe I have done something not > optimized : > > repVector <- function(obj){ > > # order IDName > ord <- gif.indexByIDName(obj) > ordobj <- obj[ord,] > > nspots <- nrow(obj) > # vector of spot replicates number > spotrep <- rep(NA, nspots ) > > # function to get ID:Name for a given spot > spotidname <- function(ind){ > paste(ordobj$genes[ind, c("ID","Name") ], collapse=":") > } > > spot <- 1 > > while( spot < nspots ){ > i<-1 > while( spotidname(spot) == spotidname(spot + i) ){ > > i <- i + 1 > } > > spotrep[spot : (spot + i-1)] <- i > spot <- spot + i > #cat("spot : ",spot,"\n") > } > > obj$genes$spotrep <- spotrep[order(ord)] > > obj > > } > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle
There's an even faster one, which nobody seems to have mentioned yet: rep(l <- rle(ids)$lengths, l) Timing on my 2.8GHz NetBSD system shows:> length(ids)[1] 45150> # Gabor: > system.time(for (i in 1:100) ave(as.numeric(factor(ids)), ids, FUN length))[1] 3.45 0.06 3.54 0.00 0.00> # Barry (and others I think): > system.time(for (i in 1:100) table(ids)[ids])[1] 2.13 0.05 2.20 0.00 0.00> Me: > system.time(for (i in 1:100) rep(l <- rle(ids)$lengths, l))[1] 1.60 0.00 1.62 0.00 0.00 Of course the difference between 21 milliseconds and 16 milliseconds is not great, unless you are doing this a lot. Ray Brownrigg> From: Gabor Grothendieck <ggrothendieck at gmail.com> > > Nice. I timed it and its much faster than mine too. > > On 1/24/06, Barry Rowlingson <B.Rowlingson at lancaster.ac.uk> wrote: > > Laetitia Marisa wrote: > > > Hello, > > > > > > Is there a simple and fast function that returns a vector of the number > > > of replications for each object of a vector ? > > > For example : > > > I have a vector of IDs : > > > ids <- c( "ID1", "ID2", "ID2", "ID3", "ID3","ID3", "ID5") > > > > > > I want the function returns the following vector where each term is the > > > number of replicates for the given id : > > > c( 1, 2, 2, 3,3,3,1 ) > > > > One-liner: > > > > > table(ids)[ids] > > ids > > ID1 ID2 ID2 ID3 ID3 ID3 ID5 > > 1 2 2 3 3 3 1 > > > > 'table(ids)' computes the counts, then the subscripting [ids] looks it > > all up. > > > > Now try it on your 40,000-long vector! > > > > Barry