Mauricio Cornejo
2012-Aug-27 22:08 UTC
[R] Inexplicably different results using subset vs bracket notation on logical variable
Hi, Would anyone have any idea as to why I would obtain completely different results when subsetting using the subset function vs bracket notation? I have a data frame with 65 variables and 4382 rows. When I use execute the following subset command I get the correct results (125 rows)> subset(df, Renewal==TRUE, 1:2)However, I tried to obtain the same results with bracket notation as follows. The output gave me all the rows in the data frame and not just the subset of 125 I was looking for.> df[df$Renewal==TRUE, 1:2]The 'Renewal' variable is of logical type and is the last (65th) variable in the data frame. However, values are either TRUE or NA (there are no 'FALSE' values). My attempts at replicating this with a small dummy data set, for including here, have not worked (i.e. I don't get an error when I use synthetic data). Any ideas on what could be going on? Many thanks for any insights anyone may have, Mauricio [[alternative HTML version deleted]]
William Dunlap
2012-Aug-28 03:02 UTC
[R] Inexplicably different results using subset vs bracket notation on logical variable
subset(dataFrame, subset) does the equivalent of dataFrame[!is.na(subset) & subset,]. I.e., it treats the NA's in the subset argument the same as FALSEs. Doesn't help(subset) mention this? By the way, if Renewal is a logical vector, it will be identical to Renewal==TRUE so you may as well leave off the "==TRUE". Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf > Of Mauricio Cornejo > Sent: Monday, August 27, 2012 3:09 PM > To: r-help at r-project.org > Subject: [R] Inexplicably different results using subset vs bracket notation on logical > variable > > Hi, > > Would anyone have any idea as to why I would obtain completely different results when > subsetting using the subset function vs bracket notation? > > I have a data frame with 65 variables and 4382 rows. When I use execute the following > subset command I get the correct results (125 rows) > > subset(df, Renewal==TRUE, 1:2) > > > However, I tried to obtain the same results with bracket notation as follows.? The output > gave me all the rows in the data frame and not just the subset of 125 I was looking for. > > df[df$Renewal==TRUE, 1:2] > > The 'Renewal' variable is of logical type and is the last (65th) variable in the data > frame.? However, values are either TRUE or NA (there are no 'FALSE' values). > > My attempts at replicating this with a small dummy data set, for including here, have not > worked (i.e. I don't get an error when I use synthetic data).? Any ideas on what could be > going on? > > Many thanks for any insights anyone may have, > Mauricio > > [[alternative HTML version deleted]]
David Winsemius
2012-Aug-28 03:11 UTC
[R] Inexplicably different results using subset vs bracket notation on logical variable
On Aug 27, 2012, at 5:08 PM, Mauricio Cornejo wrote:> Hi, > > Would anyone have any idea as to why I would obtain completely > different results when subsetting using the subset function vs > bracket notation? > > I have a data frame with 65 variables and 4382 rows. When I use > execute the following subset command I get the correct results (125 > rows) >> subset(df, Renewal==TRUE, 1:2) > > > However, I tried to obtain the same results with bracket notation as > follows. The output gave me all the rows in the data frame and not > just the subset of 125 I was looking for. >> df[df$Renewal==TRUE, 1:2] > > The 'Renewal' variable is of logical type and is the last (65th) > variable in the data frame. However, values are either TRUE or NA > (there are no 'FALSE' values).That's exactly it. If a logical index returns NA, its row is included in the output of "[" extraction. You can correct what I consider a failing and others consider a feature with: df[df$Renewal==TRUE & !is.na(df$Renewal), 1:2]> > My attempts at replicating this with a small dummy data set, for > including here, have not worked (i.e. I don't get an error when I > use synthetic data). Any ideas on what could be going on?You _should_ get the predicted behavior. Perhaps your test case was flawed? > dat <- data.frame(test1=1, Renewal=as.logical( sample(c(0,1,NA), 20, repl=TRUE))) > dat[dat$Renewal==TRUE, ] test1 Renewal NA NA NA NA.1 NA NA 3 1 TRUE NA.2 NA NA NA.3 NA NA 6 1 TRUE 7 1 TRUE 8 1 TRUE NA.4 NA NA 12 1 TRUE NA.5 NA NA NA.6 NA NA 16 1 TRUE 17 1 TRUE NA.7 NA NA NA.8 NA NA This is all described in ?"[" -- David Winsemius, MD Alameda, CA, USA
Possibly Parallel Threads
- How to use 'switch' with strings containing spaces?
- Problems understanding use of regular expression (in gsub) for manipulating currency
- How to subset based on column name that is a number ?
- How to use access results of gregexpr in data frames
- Collaboration made simple with bracket notation