Simply put, I want to subset the data frame 'a' where 'y=0'.> a <- as.data.frame(cbind(x=1:10, y=c(1,0,NA,1,0,NA,NA,1,1,0))) > ax y 1 1 1 2 2 0 3 3 NA 4 4 1 5 5 0 6 6 NA 7 7 NA 8 8 1 9 9 1 10 10 0> names(a)[1] "x" "y"> table(a$y)0 1 3 4> table(a$y, useNA="always")0 1 <NA> 3 4 3> b <- a[a$y==0,] > bx y 2 2 0 NA NA NA 5 5 0 NA.1 NA NA NA.2 NA NA 10 10 0> is(a$y)[1] "numeric" "vector" Instead of only pulling the rows where a$y==0, i'm getting where they're 0, OR NA. ? Again I feel like either something was changed when I wasn't looking.. or I'm reaaaaaaly forgetting something important. Thanks, Robin Jeffries MS, DrPH Candidate Department of Biostatistics, UCLA 530-633-STAT(7828) rjeffries@ucla.edu [[alternative HTML version deleted]]
Hello,
From the help page for
?`==`
Note
Do not use |==| and |!=| for tests, such as in |if| expressions, where
you must get a single |TRUE| or |FALSE|. Unless you are absolutely sure
that nothing unusual can happen, you should use the |identical
<http://127.0.0.1:16370/library/base/help/identical>| function instead.
inx <- sapply(a$y, identical, 0)
inx
[1] FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE
a[inx, ]
x y
2 2 0
5 5 0
10 10 0
Or, I think better because it's simpler:
not.na <- !is.na(a$y)
a[not.na & a$y == 0, ]
x y
2 2 0
5 5 0
10 10 0
Hope this helps,
Rui Barradas
Em 15-08-2012 21:06, Robin Jeffries escreveu:> Simply put, I want to subset the data frame 'a' where
'y=0'.
>
>> a <- as.data.frame(cbind(x=1:10, y=c(1,0,NA,1,0,NA,NA,1,1,0)))
>> a
> x y
> 1 1 1
> 2 2 0
> 3 3 NA
> 4 4 1
> 5 5 0
> 6 6 NA
> 7 7 NA
> 8 8 1
> 9 9 1
> 10 10 0
>
>> names(a)
> [1] "x" "y"
>
>> table(a$y)
> 0 1
> 3 4
>
>> table(a$y, useNA="always")
> 0 1 <NA>
> 3 4 3
>
>> b <- a[a$y==0,]
>> b
> x y
> 2 2 0
> NA NA NA
> 5 5 0
> NA.1 NA NA
> NA.2 NA NA
> 10 10 0
>
>> is(a$y)
> [1] "numeric" "vector"
>
>
> Instead of only pulling the rows where a$y==0, i'm getting where
they're 0,
> OR NA. ? Again I feel like either something was changed when I wasn't
> looking.. or I'm reaaaaaaly forgetting something important.
>
> Thanks,
>
> Robin Jeffries
> MS, DrPH Candidate
> Department of Biostatistics,
> UCLA
> 530-633-STAT(7828)
> rjeffries@ucla.edu
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
It makes sense if you think it through. Your index vector is
a$y==0
[1] FALSE TRUE NA FALSE TRUE NA NA FALSE FALSE TRUE
and ?"[" says
NAs in indexing:
When extracting, a numerical, logical or character 'NA' index
picks an unknown element and so returns 'NA' in the corresponding
element of a logical, integer, numeric, complex or character
result, and 'NULL' for a list. (It returns '00' for a raw
result.]
so this is what one has to expect. Here are a couple alternatives for
getting what you want.
a[which(a$y==0),]
a[a$y %in% 0,]
Best,
Ista
On Wed, Aug 15, 2012 at 4:06 PM, Robin Jeffries <rjeffries at ucla.edu>
wrote:> Simply put, I want to subset the data frame 'a' where
'y=0'.
>
>> a <- as.data.frame(cbind(x=1:10, y=c(1,0,NA,1,0,NA,NA,1,1,0)))
>> a
> x y
> 1 1 1
> 2 2 0
> 3 3 NA
> 4 4 1
> 5 5 0
> 6 6 NA
> 7 7 NA
> 8 8 1
> 9 9 1
> 10 10 0
>
>> names(a)
> [1] "x" "y"
>
>> table(a$y)
> 0 1
> 3 4
>
>> table(a$y, useNA="always")
> 0 1 <NA>
> 3 4 3
>
>> b <- a[a$y==0,]
>> b
> x y
> 2 2 0
> NA NA NA
> 5 5 0
> NA.1 NA NA
> NA.2 NA NA
> 10 10 0
>
>> is(a$y)
> [1] "numeric" "vector"
>
>
> Instead of only pulling the rows where a$y==0, i'm getting where
they're 0,
> OR NA. ? Again I feel like either something was changed when I wasn't
> looking.. or I'm reaaaaaaly forgetting something important.
>
> Thanks,
>
> Robin Jeffries
> MS, DrPH Candidate
> Department of Biostatistics,
> UCLA
> 530-633-STAT(7828)
> rjeffries at ucla.edu
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
HI, Try this: subset(a,y==0) #??? x y #2?? 2 0 #5?? 5 0 #10 10 0 #or subset(a,y%in%0) #??? x y #2?? 2 0 #5?? 5 0 #10 10 0 A.K. ----- Original Message ----- From: Robin Jeffries <rjeffries at ucla.edu> To: r-help at r-project.org Cc: Sent: Wednesday, August 15, 2012 4:06 PM Subject: [R] Subsetting with missing data Simply put, I want to subset the data frame 'a' where 'y=0'.> a <- as.data.frame(cbind(x=1:10, y=c(1,0,NA,1,0,NA,NA,1,1,0))) > a? ? x? y 1? 1? 1 2? 2? 0 3? 3 NA 4? 4? 1 5? 5? 0 6? 6 NA 7? 7 NA 8? 8? 1 9? 9? 1 10 10? 0> names(a)[1] "x" "y"> table(a$y)0 1 3 4> table(a$y, useNA="always")? 0? ? 1 <NA> ? 3? ? 4? ? 3> b <- a[a$y==0,] > b? ? ? x? y 2? ? 2? 0 NA? NA NA 5? ? 5? 0 NA.1 NA NA NA.2 NA NA 10? 10? 0> is(a$y)[1] "numeric" "vector" Instead of only pulling the rows where a$y==0, i'm getting where they're 0, OR NA. ? Again I feel like either something was changed when I wasn't looking.. or I'm reaaaaaaly forgetting something important. Thanks, Robin Jeffries MS, DrPH Candidate Department of Biostatistics, UCLA 530-633-STAT(7828) rjeffries at ucla.edu ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.