thr3ads.net - R help - [R] Subsetting multiple rows of a data frame at once [Jul 2013]

If this information is useful, please help other people find it:
Share via:

arun

2013-Jul-03 11:37 UTC

[R] Subsetting multiple rows of a data frame at once

Hi,
Try this:

set.seed(24)
df<- data.frame(x=sample(seq(0.25,4.25,by=.05),1e5,replace=TRUE),y=
sample(seq(0.10,1.05,by=.05),1e5,replace=TRUE),z=rnorm(1e5))

#Used a shorter vector 
x1<- c(1.05,2.85,3.40,4.25,0.25)
y1<- c(0.25,0.10,0.90,0.25,1.05)

res<-do.call(rbind,lapply(seq_along(x1),function(i)
subset(df,x==x1[i]&y==y1[i])))
head(res,2)
#??????? x??? y????????? z
#466? 1.05 0.25? 0.7865224
#4119 1.05 0.25 -1.5679096
?tail(res,2)
#???????? x??? y????????? z
#98120 0.25 1.05 -2.1239596
#98178 0.25 1.05? 0.3321464


A.K.

Hi Everyone, 

First time poster so any posting rules i should know about feel free to
advise...

I've got a data frame of 250 000 rows in columns of x y and z. 

i need to extract 20-30 rows from the data frame with specific x
 and y values, such that i can find the z value that corresponds. There 
is no repeated data. (its actually 250 000 squares in a 5x5m grid) 

to find them individually i can use subset successfully 

result<-subset(df,x==1.05 & y==c0.25) 

gives me the row in the dataframe with that x and y value. 

so if i have 

x = 1.05 2.85 3.40 4.25 0.25 3.05 3.70 0.20 0.30 0.70 1.05 1.20 
1.40 1.90 2.70 3.25 3.55 4.60 2.05 2.15 3.70 4.85 4.90 1.60 2.45 3.20 
3.90 4.45 

and 

y= 0.25 0.10 0.90 0.25 1.05 1.70 2.05 2.90 2.35 2.60 2.55 2.15 
2.75 2.05 2.70 2.25 2.55 2.05 3.65 3.05 3.00 3.50 3.75 4.85 4.50 4.50 
3.35 4.90 

then how can i retrieve the rows for all those values at once. 

if i name x=xt and y=yt and then 

result<-subset(df,x==xt & y==yt) 

then i get 

result 
[1] x ? ? ?y ? ? ?Height 
<0 rows> (or 0-length row.names) 

i dont understand why zero rows are selected. obviously im 
applying the vectors inappropriately, but i cant seem to find anything 
on this method of subsetting online. 

Thanks for any replies!

arun

2013-Jul-04 04:22 UTC

head link

[R] Subsetting multiple rows of a data frame at once

Hi,
Possibly, FAQ 7.31
Using the same example:
set.seed(24)
df<- data.frame(x=sample(seq(0.25,4.25,by=.05),1e5,replace=TRUE),y=
sample(seq(0.10,1.05,by=.05),1e5,replace=TRUE),z=rnorm(1e5))
dfOld<- df
?df[,1:2]<- lapply(df[,1:2],function(x) sprintf("%.2f",x))
x1<- c(1.05,2.85,3.40,4.25,0.25)
y1<- c(0.25,0.10,0.90,0.25,1.05) 
?x1New<-sprintf("%.2f",x1)
?y1New<- sprintf("%.2f",y1)
res1<-do.call(rbind,lapply(seq_along(x1New),function(i)
subset(df,x==x1New[i]&y==y1New[i])))

res<-do.call(rbind,lapply(seq_along(x1),function(i)
subset(dfOld,x==x1[i]&y==y1[i])))
dim(res1)
#[1] 318?? 3
? dim(res)
#[1] 250?? 3
?res1[,1:2]<- lapply(res1[,1:2],as.numeric)
str(res1)
#'data.frame':??? 318 obs. of? 3 variables:
# $ x: num? 1.05 1.05 1.05 1.05 1.05 1.05 1.05 1.05 1.05 1.05 ...
# $ y: num? 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 ...
# $ z: num? 0.787 -1.568 -1.626 -0.221 -0.7 ...
A.K.


nevermind.... error on my behalf got it going. 

I have another issue, it leaves some values out. ive seperately 
searched the df and theyre definitely in there... so it there some sort 
of exclusion rule? there are about 8 of the 28 missing... the first row 
missing is 3.05,1.70 . i looked up the documentation for subset but i 
cant see why it would skip ones... 

thanks 


----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: R help <r-help at r-project.org>
Cc: 
Sent: Wednesday, July 3, 2013 7:37 AM
Subject: Re: Subsetting multiple rows of a data frame at once


Hi,
Try this:

set.seed(24)
df<- data.frame(x=sample(seq(0.25,4.25,by=.05),1e5,replace=TRUE),y=
sample(seq(0.10,1.05,by=.05),1e5,replace=TRUE),z=rnorm(1e5))

#Used a shorter vector 
x1<- c(1.05,2.85,3.40,4.25,0.25)
y1<- c(0.25,0.10,0.90,0.25,1.05)

res<-do.call(rbind,lapply(seq_along(x1),function(i)
subset(df,x==x1[i]&y==y1[i])))
head(res,2)
#??????? x??? y????????? z
#466? 1.05 0.25? 0.7865224
#4119 1.05 0.25 -1.5679096
?tail(res,2)
#???????? x??? y????????? z
#98120 0.25 1.05 -2.1239596
#98178 0.25 1.05? 0.3321464


A.K.

Hi Everyone, 

First time poster so any posting rules i should know about feel free to
advise...

I've got a data frame of 250 000 rows in columns of x y and z. 

i need to extract 20-30 rows from the data frame with specific x
and y values, such that i can find the z value that corresponds. There 
is no repeated data. (its actually 250 000 squares in a 5x5m grid) 

to find them individually i can use subset successfully 

result<-subset(df,x==1.05 & y==c0.25) 

gives me the row in the dataframe with that x and y value. 

so if i have 

x = 1.05 2.85 3.40 4.25 0.25 3.05 3.70 0.20 0.30 0.70 1.05 1.20 
1.40 1.90 2.70 3.25 3.55 4.60 2.05 2.15 3.70 4.85 4.90 1.60 2.45 3.20 
3.90 4.45 

and 

y= 0.25 0.10 0.90 0.25 1.05 1.70 2.05 2.90 2.35 2.60 2.55 2.15 
2.75 2.05 2.70 2.25 2.55 2.05 3.65 3.05 3.00 3.50 3.75 4.85 4.50 4.50 
3.35 4.90 

then how can i retrieve the rows for all those values at once. 

if i name x=xt and y=yt and then 

result<-subset(df,x==xt & y==yt) 

then i get 

result 
[1] x ? ? ?y ? ? ?Height 
<0 rows> (or 0-length row.names) 

i dont understand why zero rows are selected. obviously im 
applying the vectors inappropriately, but i cant seem to find anything 
on this method of subsetting online. 

Thanks for any replies!?

R help - Jul 2013 - Subsetting multiple rows of a data frame at once

[R] Subsetting multiple rows of a data frame at once

[R] Subsetting multiple rows of a data frame at once