thr3ads.net - R help - [R] dataframe subset [Feb 2006]

If this information is useful, please help other people find it:
Share via:

Bernhard Baumgartner

2006-Feb-08 14:21 UTC

[R] dataframe subset

I have a dataframe with a column, say "x" consisting of values, each 
value appearing different times, e.g.
x: 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ...
and a vector, including e.g.:
y: 2,9,10,...
I need a subset of the dataframe: all rows where x is equal to one of 
the values in y. Currently I use a loop for this, but because x and y 
are large this is very slow. 
Is there any idea how to solve this problem faster?
Thank you,
Bernhard

Duncan Murdoch

2006-Feb-08 14:47 UTC

head link

[R] dataframe subset

On 2/8/2006 9:21 AM, Bernhard Baumgartner wrote:> I have a dataframe with a column, say "x" consisting of values,
each
> value appearing different times, e.g.
> x: 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ...
> and a vector, including e.g.:
> y: 2,9,10,...
> I need a subset of the dataframe: all rows where x is equal to one of 
> the values in y. Currently I use a loop for this, but because x and y 
> are large this is very slow. 
> Is there any idea how to solve this problem faster?
It's actually very easy.  Assume your dataframe is df, then

subset(df, x %in% y)

will give you what you want (assuming there is no column y in the 
dataframe).

Duncan Murdoch
> Thank you,
> Bernhard
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Chuck Cleland

2006-Feb-08 14:48 UTC

head link

[R] dataframe subset

Bernhard Baumgartner wrote:> I have a dataframe with a column, say "x" consisting of values,
each
> value appearing different times, e.g.
> x: 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ...
> and a vector, including e.g.:
> y: 2,9,10,...
> I need a subset of the dataframe: all rows where x is equal to one of 
> the values in y. Currently I use a loop for this, but because x and y 
> are large this is very slow. 
> Is there any idea how to solve this problem faster?
mydata <- data.frame(X = sample(1:10, 10000, replace=TRUE),
                      Y = sample(c(2,9,10), 10000, replace=TRUE))

newdata <- mydata[mydata$X %in% unique(mydata$Y),]

?"%in%"

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 452-1424 (M, W, F)
fax: (917) 438-0894

Adaikalavan Ramasamy

2006-Feb-08 15:06 UTC

head link

[R] dataframe subset

Sounds like you may need no use match().

On Wed, 2006-02-08 at 15:21 +0100, Bernhard Baumgartner
wrote:> I have a dataframe with a column, say "x" consisting of values,
each
> value appearing different times, e.g.
> x: 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ...
> and a vector, including e.g.:
> y: 2,9,10,...
> I need a subset of the dataframe: all rows where x is equal to one of 
> the values in y. Currently I use a loop for this, but because x and y 
> are large this is very slow. 
> Is there any idea how to solve this problem faster?
> Thank you,
> Bernhard
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

Petr Pikal

2006-Feb-08 15:07 UTC

head link

[R] dataframe subset

Hi

something like

xx<-data.frame(x=sample(1:10,100,replace=T))
y<-c(2,5,8)
xx[xx$x%in%y,]

HTH
Petr



On 8 Feb 2006 at 15:21, Bernhard Baumgartner wrote:

From:           	"Bernhard Baumgartner" <bernhard.baumgartner at
wiwi.uni-regensburg.de>
Organization:   	Universitaet Regensburg
To:             	r-help at stat.math.ethz.ch
Date sent:      	Wed, 08 Feb 2006 15:21:46 +0100
Priority:       	normal
Subject:        	[R] dataframe subset
> I have a dataframe with a column, say "x" consisting of values,
each
> value appearing different times, e.g. x:
> 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ... and a vector, including e.g.:
> y: 2,9,10,... I need a subset of the dataframe: all rows where x is
> equal to one of the values in y. Currently I use a loop for this, but
> because x and y are large this is very slow. Is there any idea how to
> solve this problem faster? Thank you, Bernhard
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
Petr Pikal
petr.pikal at precheza.cz

Bernhard Baumgartner

2006-Feb-08 15:17 UTC

head link

[R] dataframe subset

Thanks to all,

the %in% function solved my problem!

Bernhard

bogdan romocea

2006-Feb-08 15:19 UTC

head link

[R] dataframe subset

Here's one way,
  x <- data.frame(V=c(1,1,1,1,2,2,4,4,4,9,10,10,10,10,10))
  y <- data.frame(V=c(2,9,10))
  xy <- merge(x,y,all=FALSE)
Pay close attention to what happens if you have duplicate values in y, say
  y <- data.frame(V=c(2,9,10,10))

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of
> Bernhard Baumgartner
> Sent: Wednesday, February 08, 2006 9:22 AM
> To: r-help at stat.math.ethz.ch
> Subject: [R] dataframe subset
>
> I have a dataframe with a column, say "x" consisting of values,
each
> value appearing different times, e.g.
> x: 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ...
> and a vector, including e.g.:
> y: 2,9,10,...
> I need a subset of the dataframe: all rows where x is equal to one of
> the values in y. Currently I use a loop for this, but because x and y
> are large this is very slow.
> Is there any idea how to solve this problem faster?
> Thank you,
> Bernhard
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>

Reasonably Related Threads

Search for more possibly parallel threads

R help - Feb 2006 - dataframe subset

[R] dataframe subset

[R] dataframe subset

[R] dataframe subset

[R] dataframe subset

[R] dataframe subset

[R] dataframe subset

[R] dataframe subset

Reasonably Related Threads