thr3ads.net - R help - [R] Inexplicably different results using subset vs bracket notation on logical variable [Aug 2012]

If this information is useful, please help other people find it:
Share via:

Mauricio Cornejo

2012-Aug-27 22:08 UTC

[R] Inexplicably different results using subset vs bracket notation on logical variable

Hi,

Would anyone have any idea as to why I would obtain completely different results
when subsetting using the subset function vs bracket notation?

I have a data frame with 65 variables and 4382 rows. When I use execute the
following subset command I get the correct results (125
rows)> subset(df, Renewal==TRUE, 1:2)  

However, I tried to obtain the same results with bracket notation as follows. 
The output gave me all the rows in the data frame and not just the subset of 125
I was looking for.> df[df$Renewal==TRUE, 1:2]
The 'Renewal' variable is of logical type and is the last (65th)
variable in the data frame.  However, values are either TRUE or NA (there are no
'FALSE' values).

My attempts at replicating this with a small dummy data set, for including here,
have not worked (i.e. I don't get an error when I use synthetic data).  Any
ideas on what could be going on?

Many thanks for any insights anyone may have,
Mauricio

	[[alternative HTML version deleted]]

William Dunlap

2012-Aug-28 03:02 UTC

head link

[R] Inexplicably different results using subset vs bracket notation on logical variable

subset(dataFrame, subset) does the equivalent of dataFrame[!is.na(subset) &
subset,].
I.e., it treats the NA's in the subset argument the same as FALSEs. 
Doesn't help(subset)
mention this?

By the way, if Renewal is a logical vector, it will be identical to
Renewal==TRUE so
you may as well leave off the "==TRUE".

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at
r-project.org] On Behalf
> Of Mauricio Cornejo
> Sent: Monday, August 27, 2012 3:09 PM
> To: r-help at r-project.org
> Subject: [R] Inexplicably different results using subset vs bracket
notation on logical
> variable
> 
> Hi,
> 
> Would anyone have any idea as to why I would obtain completely different
results when
> subsetting using the subset function vs bracket notation?
> 
> I have a data frame with 65 variables and 4382 rows. When I use execute the
following
> subset command I get the correct results (125 rows)
> > subset(df, Renewal==TRUE, 1:2)
> 
> 
> However, I tried to obtain the same results with bracket notation as
follows.? The output
> gave me all the rows in the data frame and not just the subset of 125 I was
looking for.
> > df[df$Renewal==TRUE, 1:2]
> 
> The 'Renewal' variable is of logical type and is the last (65th)
variable in the data
> frame.? However, values are either TRUE or NA (there are no 'FALSE'
values).
> 
> My attempts at replicating this with a small dummy data set, for including
here, have not
> worked (i.e. I don't get an error when I use synthetic data).? Any
ideas on what could be
> going on?
> 
> Many thanks for any insights anyone may have,
> Mauricio
> 
> 	[[alternative HTML version deleted]]

David Winsemius

2012-Aug-28 03:11 UTC

head link

[R] Inexplicably different results using subset vs bracket notation on logical variable

On Aug 27, 2012, at 5:08 PM, Mauricio Cornejo wrote:
> Hi,
>
> Would anyone have any idea as to why I would obtain completely  
> different results when subsetting using the subset function vs  
> bracket notation?
>
> I have a data frame with 65 variables and 4382 rows. When I use  
> execute the following subset command I get the correct results (125  
> rows)
>> subset(df, Renewal==TRUE, 1:2)
>
>
> However, I tried to obtain the same results with bracket notation as  
> follows.  The output gave me all the rows in the data frame and not  
> just the subset of 125 I was looking for.
>> df[df$Renewal==TRUE, 1:2]
>
> The 'Renewal' variable is of logical type and is the last (65th)  
> variable in the data frame.  However, values are either TRUE or NA  
> (there are no 'FALSE' values).
That's exactly it. If a logical index returns NA, its row is included  
in the output of "[" extraction. You can correct what I consider a  
failing and others consider a feature with:

df[df$Renewal==TRUE & !is.na(df$Renewal), 1:2]
>
> My attempts at replicating this with a small dummy data set, for  
> including here, have not worked (i.e. I don't get an error when I  
> use synthetic data).  Any ideas on what could be going on?
You _should_ get the predicted behavior. Perhaps your test case was  
flawed?

 > dat <- data.frame(test1=1, Renewal=as.logical( sample(c(0,1,NA),  
20, repl=TRUE)))
 > dat[dat$Renewal==TRUE, ]
      test1 Renewal
NA      NA      NA
NA.1    NA      NA
3        1    TRUE
NA.2    NA      NA
NA.3    NA      NA
6        1    TRUE
7        1    TRUE
8        1    TRUE
NA.4    NA      NA
12       1    TRUE
NA.5    NA      NA
NA.6    NA      NA
16       1    TRUE
17       1    TRUE
NA.7    NA      NA
NA.8    NA      NA

This is all described in ?"["

-- 

David Winsemius, MD
Alameda, CA, USA

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Aug 2012 - Inexplicably different results using subset vs bracket notation on logical variable

[R] Inexplicably different results using subset vs bracket notation on logical variable

[R] Inexplicably different results using subset vs bracket notation on logical variable

[R] Inexplicably different results using subset vs bracket notation on logical variable

Seemingly Similar Threads