I have a dataframe with a column, say "x" consisting of values, each value appearing different times, e.g. x: 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ... and a vector, including e.g.: y: 2,9,10,... I need a subset of the dataframe: all rows where x is equal to one of the values in y. Currently I use a loop for this, but because x and y are large this is very slow. Is there any idea how to solve this problem faster? Thank you, Bernhard
On 2/8/2006 9:21 AM, Bernhard Baumgartner wrote:> I have a dataframe with a column, say "x" consisting of values, each > value appearing different times, e.g. > x: 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ... > and a vector, including e.g.: > y: 2,9,10,... > I need a subset of the dataframe: all rows where x is equal to one of > the values in y. Currently I use a loop for this, but because x and y > are large this is very slow. > Is there any idea how to solve this problem faster?It's actually very easy. Assume your dataframe is df, then subset(df, x %in% y) will give you what you want (assuming there is no column y in the dataframe). Duncan Murdoch> Thank you, > Bernhard > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Bernhard Baumgartner wrote:> I have a dataframe with a column, say "x" consisting of values, each > value appearing different times, e.g. > x: 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ... > and a vector, including e.g.: > y: 2,9,10,... > I need a subset of the dataframe: all rows where x is equal to one of > the values in y. Currently I use a loop for this, but because x and y > are large this is very slow. > Is there any idea how to solve this problem faster?mydata <- data.frame(X = sample(1:10, 10000, replace=TRUE), Y = sample(c(2,9,10), 10000, replace=TRUE)) newdata <- mydata[mydata$X %in% unique(mydata$Y),] ?"%in%" -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 452-1424 (M, W, F) fax: (917) 438-0894
Sounds like you may need no use match(). On Wed, 2006-02-08 at 15:21 +0100, Bernhard Baumgartner wrote:> I have a dataframe with a column, say "x" consisting of values, each > value appearing different times, e.g. > x: 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ... > and a vector, including e.g.: > y: 2,9,10,... > I need a subset of the dataframe: all rows where x is equal to one of > the values in y. Currently I use a loop for this, but because x and y > are large this is very slow. > Is there any idea how to solve this problem faster? > Thank you, > Bernhard > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
Hi something like xx<-data.frame(x=sample(1:10,100,replace=T)) y<-c(2,5,8) xx[xx$x%in%y,] HTH Petr On 8 Feb 2006 at 15:21, Bernhard Baumgartner wrote: From: "Bernhard Baumgartner" <bernhard.baumgartner at wiwi.uni-regensburg.de> Organization: Universitaet Regensburg To: r-help at stat.math.ethz.ch Date sent: Wed, 08 Feb 2006 15:21:46 +0100 Priority: normal Subject: [R] dataframe subset> I have a dataframe with a column, say "x" consisting of values, each > value appearing different times, e.g. x: > 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ... and a vector, including e.g.: > y: 2,9,10,... I need a subset of the dataframe: all rows where x is > equal to one of the values in y. Currently I use a loop for this, but > because x and y are large this is very slow. Is there any idea how to > solve this problem faster? Thank you, Bernhard > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.htmlPetr Pikal petr.pikal at precheza.cz
Thanks to all, the %in% function solved my problem! Bernhard
Here's one way, x <- data.frame(V=c(1,1,1,1,2,2,4,4,4,9,10,10,10,10,10)) y <- data.frame(V=c(2,9,10)) xy <- merge(x,y,all=FALSE) Pay close attention to what happens if you have duplicate values in y, say y <- data.frame(V=c(2,9,10,10))> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of > Bernhard Baumgartner > Sent: Wednesday, February 08, 2006 9:22 AM > To: r-help at stat.math.ethz.ch > Subject: [R] dataframe subset > > I have a dataframe with a column, say "x" consisting of values, each > value appearing different times, e.g. > x: 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ... > and a vector, including e.g.: > y: 2,9,10,... > I need a subset of the dataframe: all rows where x is equal to one of > the values in y. Currently I use a loop for this, but because x and y > are large this is very slow. > Is there any idea how to solve this problem faster? > Thank you, > Bernhard > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >