To the R help list, When using a data frame, there is no warning or error message when I refer to a non-existent variable inside the data frame. Example: ##---------------------------------------------- a <- c(1,2,3) b <- c(11,22,33) df <- data.frame(a,b) df ## correct: there is a column in df named 'a' ## the sum is correctly performed sum(df$a==2) ## incorrect: there is no column in df named 'aaa', ## but the sum is performed anyway without either warning or error sum(df$aaa==2) ##---------------------------------------------- Is there some way to make R issue either a warning or an error message in such a situation? I am using R version 2.15.1 64-bit on Windows 7 Professional. Thank you very much. Paulo Barata --------------------------------------------------------------------- Paulo Barata ENSP - Funda??o Oswaldo Cruz Rua Leopoldo Bulh?es 1480 - 8A 21041-210 Rio de Janeiro - RJ Brazil E-mail: paulo.barata at ensp.fiocruz.br
This seems more or less correct to me. 1> sum(df$a==1) [1] 1 1> sum(df$a==2) [1] 1 1> sum(df$aaa==2) [1] 0 There is no df$aaa so the length is 0 which is what I think you are asking. What am I missing? John Kane Kingston ON Canada> -----Original Message----- > From: paulo.barata at ensp.fiocruz.br > Sent: Sun, 15 Jul 2012 11:30:37 -0300 > To: r-help at r-project.org > Subject: [R] variable (column) in a data frame > > > To the R help list, > > When using a data frame, there is no warning or error message > when I refer to a non-existent variable inside the data frame. > > Example: > > ##---------------------------------------------- > > a <- c(1,2,3) > b <- c(11,22,33) > df <- data.frame(a,b) > df > > ## correct: there is a column in df named 'a' > ## the sum is correctly performed > sum(df$a==2) > > ## incorrect: there is no column in df named 'aaa', > ## but the sum is performed anyway without either warning or error > sum(df$aaa==2) > > ##---------------------------------------------- > > Is there some way to make R issue either a warning or an error > message in such a situation? > > I am using R version 2.15.1 64-bit on Windows 7 Professional. > > Thank you very much. > > Paulo Barata > > --------------------------------------------------------------------- > Paulo Barata > > ENSP - Funda??o Oswaldo Cruz > Rua Leopoldo Bulh?es 1480 - 8A > 21041-210 Rio de Janeiro - RJ > Brazil > E-mail: paulo.barata at ensp.fiocruz.br > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.____________________________________________________________ GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google Talk? and most webmails
Dr. Dalgaard, Thank you. But pre-checking with is.null() or using with() doesn't solve the problem of catching spelling mistakes in the name of a variable inside a data frame, when using the df$var notation often in a program. Is there some way for R to behave, in relation to a variable inside a data frame, the same way it behaves for a variable not in a data frame? For example: ##---------------------------------------- a <- c(1,2,3) ## the variable exists, we get a correct answer a==1 ## the variable does not exist, R rightly points this out aaa==1 ##---------------------------------------- My point is, if we make a spelling mistake in a program when referring to a variable inside a data frame, using the df$var notation, there seems to be no way of getting warned about that. Thank you once again. Paulo Barata --------------------------------------------------------------------- ---------- Original Message ----------- From: peter dalgaard <pdalgd at gmail.com> To: "Paulo Barata" <paulo.barata at ensp.fiocruz.br> Sent: Sun, 15 Jul 2012 16:47:35 +0200 Subject: Re: [R] variable (column) in a data frame> On Jul 15, 2012, at 16:30 , Paulo Barata wrote: > > > > > To the R help list, > > > > When using a data frame, there is no warning or error message > > when I refer to a non-existent variable inside the data frame. > > > > Example: > > > > ##---------------------------------------------- > > > > a <- c(1,2,3) > > b <- c(11,22,33) > > df <- data.frame(a,b) > > df > > > > ## correct: there is a column in df named 'a' > > ## the sum is correctly performed > > sum(df$a==2) > > > > ## incorrect: there is no column in df named 'aaa', > > ## but the sum is performed anyway without either warning or error > > sum(df$aaa==2) > > > > ##---------------------------------------------- > > > > Is there some way to make R issue either a warning or an error > > message in such a situation? > > > > You can pre-check for is.null(df$aaa) or use with(df, sum(aaa==2)). > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean.------- End of Original Message -------
Hi, I guess you can try this: #You will get the same result here: ?df$aaa==2 logical(0) !df$aaa==2 logical(0) #But it is different for the variable present in the dataframe ?df$a==4 [1] FALSE FALSE FALSE ?!df$a==4 [1] TRUE TRUE TRUE ?identical(df$aaa==2,!df$aaa==2) [1] TRUE ?identical(df$a==4,!df$a==4) [1] FALSE A.K. ----- Original Message ----- From: Paulo Barata <paulo.barata at ensp.fiocruz.br> To: r-help at r-project.org Cc: Sent: Sunday, July 15, 2012 10:30 AM Subject: [R] variable (column) in a data frame To the R help list, When using a data frame, there is no warning or error message when I refer to a non-existent variable inside the data frame. Example: ##---------------------------------------------- a <- c(1,2,3) b <- c(11,22,33) df <- data.frame(a,b) df ## correct: there is a column in df named 'a' ## the sum is correctly performed sum(df$a==2) ## incorrect: there is no column in df named 'aaa', ## but the sum is performed anyway without either warning or error sum(df$aaa==2) ##---------------------------------------------- Is there some way to make R issue either a warning or an error message in such a situation? I am using R version 2.15.1 64-bit on Windows 7 Professional. Thank you very much. Paulo Barata --------------------------------------------------------------------- Paulo Barata ENSP - Funda??o Oswaldo Cruz Rua Leopoldo Bulh?es 1480 - 8A 21041-210? Rio de Janeiro - RJ Brazil E-mail: paulo.barata at ensp.fiocruz.br ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On 2012-07-15 10:01, Paulo Barata wrote:> > Dear Peter, > > Thank you. I will try to modify my programming habits. > But it seems there is a flaw in R, when it accepts a reference > to a non-existent variable inside a data frame with the df$var > notation. This should be corrected somehow. > > Paulo Barata >Paulo, I understand your concerns and I do think that the "best" thing would be to excise the $ shortcut from the language or, at least, make y$x equivalent to y[["x", exact = TRUE]]. But, as has been pointed out before, that might not be easy. Nevertheless, even y[["x"]] may not be the ultimate panacea. Consider your own example: df <- data.frame(a = 1:3, b=11:13) sum(df[["aaa"]] == 2) #[1] 0 which results from df[["aaa"]] == 2 #logical(0) The safest extraction is y[ , "x"]: sum(df[ , "aaa"] == 2) #Error in `[.data.frame`(df, , "aaa") : undefined columns selected But then, this comes down to whether one thinks that addressing a nonexistent variable should result in an error or should return NULL. The bottom line probably is that the $ behaviour will not change in the near future and one would simply be well advised to be aware of its behaviour. Every language has its quirks. Just be thankful that the R language isn't as big a mess as the English language (which I do love dearly). Peter Ehlers> --------------------------------------------------------------------- > > > ---------- Original Message ----------- > From: Peter Ehlers<ehlers at ucalgary.ca> > To: Paulo Barata<paulo.barata at ensp.fiocruz.br> > Cc: "r-help at r-project.org"<r-help at r-project.org>, peter dalgaard > <pdalgd at gmail.com> > Sent: Sun, 15 Jul 2012 09:29:11 -0700 > Subject: Re: [R] variable (column) in a data frame > >> On 2012-07-15 08:41, Paulo Barata wrote: >>> >>> Dr. Dalgaard, >>> >>> Thank you. But pre-checking with is.null() or using with() >>> doesn't solve the problem of catching spelling mistakes >>> in the name of a variable inside a data frame, when using >>> the df$var notation often in a program. >>> >>> Is there some way for R to behave, in relation to a variable >>> inside a data frame, the same way it behaves for a variable >>> not in a data frame? For example: >>> >>> ##---------------------------------------- >>> a<- c(1,2,3) >>> >>> ## the variable exists, we get a correct answer >>> a==1 >>> >>> ## the variable does not exist, R rightly points this out >>> aaa==1 >>> ##---------------------------------------- >>> >>> My point is, if we make a spelling mistake in a program when referring >>> to a variable inside a data frame, using the df$var notation, >>> there seems to be no way of getting warned about that. >> >> You could wean yourself from the $-habit. It's convenient but can >> lead to the problems you're experiencing (and this has been >> discussed before). For programming, if you're prone to make >> spelling errors, you should prefer df[, "aaa"]. See ?Extract. >> >> Peter Ehlers >> >>> >>> Thank you once again. >>> >>> Paulo Barata >>> >>> --------------------------------------------------------------------- >>> >>> >>> ---------- Original Message ----------- >>> From: peter dalgaard<pdalgd at gmail.com> >>> To: "Paulo Barata"<paulo.barata at ensp.fiocruz.br> >>> Sent: Sun, 15 Jul 2012 16:47:35 +0200 >>> Subject: Re: [R] variable (column) in a data frame >>> >>>> On Jul 15, 2012, at 16:30 , Paulo Barata wrote: >>>> >>>>> >>>>> To the R help list, >>>>> >>>>> When using a data frame, there is no warning or error message >>>>> when I refer to a non-existent variable inside the data frame. >>>>> >>>>> Example: >>>>> >>>>> ##---------------------------------------------- >>>>> >>>>> a<- c(1,2,3) >>>>> b<- c(11,22,33) >>>>> df<- data.frame(a,b) >>>>> df >>>>> >>>>> ## correct: there is a column in df named 'a' >>>>> ## the sum is correctly performed >>>>> sum(df$a==2) >>>>> >>>>> ## incorrect: there is no column in df named 'aaa', >>>>> ## but the sum is performed anyway without either warning or error >>>>> sum(df$aaa==2) >>>>> >>>>> ##---------------------------------------------- >>>>> >>>>> Is there some way to make R issue either a warning or an error >>>>> message in such a situation? >>>>> >>>> >>>> You can pre-check for is.null(df$aaa) or use with(df, sum(aaa==2)). >>>> >>>> -- >>>> Peter Dalgaard, Professor, >>>> Center for Statistics, Copenhagen Business School >>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >>>> Phone: (+45)38153501 >>>> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com >>>> >>>> -- >>>> This message has been scanned for viruses and >>>> dangerous content by MailScanner, and is >>>> believed to be clean. >>> ------- End of Original Message ------- >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> -- >> This message has been scanned for viruses and >> dangerous content by MailScanner, and is >> believed to be clean. > ------- End of Original Message ------- >
Hoi Pauli, There is a difference between two ways of accessing columns in a matrex:> df$aaaNULL> df["AAA"]Error in `[.data.frame`(df, "AAA") : undefined columns selected So df["AAA"] or df[,"AAA"] gives the error message you expect. ------------------- Frans -----Oorspronkelijk bericht----- Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Namens Paulo Barata Verzonden: zondag 15 juli 2012 16:31 Aan: r-help at r-project.org Onderwerp: [R] variable (column) in a data frame To the R help list, When using a data frame, there is no warning or error message when I refer to a non-existent variable inside the data frame. Example: ##---------------------------------------------- a <- c(1,2,3) b <- c(11,22,33) df <- data.frame(a,b) df ## correct: there is a column in df named 'a' ## the sum is correctly performed sum(df$a==2) ## incorrect: there is no column in df named 'aaa', ## but the sum is performed anyway without either warning or error sum(df$aaa==2) ##---------------------------------------------- Is there some way to make R issue either a warning or an error message in such a situation? I am using R version 2.15.1 64-bit on Windows 7 Professional. Thank you very much. Paulo Barata --------------------------------------------------------------------- Paulo Barata ENSP - Funda??o Oswaldo Cruz Rua Leopoldo Bulh?es 1480 - 8A 21041-210 Rio de Janeiro - RJ Brazil E-mail: paulo.barata at ensp.fiocruz.br ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.