Dear r-devel list members, I stumbled across the following behaviour of read.table() recently: Suppose that I have the data a " " "" "" "" "" in a file or copied to the clipboard, and issue the command> DF <- read.table("clipboard") > DFV1 V2 V3 1 a NA NA 2 NA NA> is.na(DF)V1 V2 V3 [1,] FALSE TRUE TRUE [2,] FALSE TRUE TRUE I was surprised by the NAs. Note that they occur only when a column consists entirely of empty strings or strings composed of blanks. On the other hand> data.frame(A=c("", "", ""))A 1 2 3 works as I would have expected. A work-around for me was> DF[is.na(DF)] <- "" > DFV1 V2 V3 1 a 2 But, as I said, I found the behaviour of read.table() puzzling. All this is with R 2.5.0 on a Windows XP Pro SP 2 system. Comments? Thanks, John -------------------------------- John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox
Perhaps this has to do with the fact that there is not enough information available to establish the class of those columns. For example, try this: read.table("clipboard", colClasses = "character") On 5/9/07, John Fox <jfox at mcmaster.ca> wrote:> Dear r-devel list members, > > I stumbled across the following behaviour of read.table() recently: Suppose > that I have the data > > a " " "" > "" "" "" > > in a file or copied to the clipboard, and issue the command > > > DF <- read.table("clipboard") > > DF > V1 V2 V3 > 1 a NA NA > 2 NA NA > > > is.na(DF) > V1 V2 V3 > [1,] FALSE TRUE TRUE > [2,] FALSE TRUE TRUE > > I was surprised by the NAs. Note that they occur only when a column consists > entirely of empty strings or strings composed of blanks. > > On the other hand > > > data.frame(A=c("", "", "")) > A > 1 > 2 > 3 > > works as I would have expected. > > A work-around for me was > > > DF[is.na(DF)] <- "" > > DF > V1 V2 V3 > 1 a > 2 > > But, as I said, I found the behaviour of read.table() puzzling. > > All this is with R 2.5.0 on a Windows XP Pro SP 2 system. > > Comments? > > Thanks, > John > > -------------------------------- > John Fox, Professor > Department of Sociology > McMaster University > Hamilton, Ontario > Canada L8S 4M4 > 905-525-9140x23604 > http://socserv.mcmaster.ca/jfox > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
On Wed, 9 May 2007, John Fox wrote:> Dear r-devel list members, > > I stumbled across the following behaviour of read.table() recently:Suppose> that I have the data > > a " " "" > "" "" "" > > in a file or copied to the clipboard, and issue the command > >> DF <- read.table("clipboard") >> DF > V1 V2 V3 > 1 a NA NA > 2 NA NA > >> is.na(DF) > V1 V2 V3 > [1,] FALSE TRUE TRUE > [2,] FALSE TRUE TRUE > > I was surprised by the NAs. Note that they occur only when a column consists > entirely of empty strings or strings composed of blanks. > > On the other hand > >> data.frame(A=c("", "", "")) > A > 1 > 2 > 3 > > works as I would have expected.How did you expect R to know that "" meant a character column? You are allowed to quote any type of column, so as far as read.table is concerned the columns is entirely empty and so its type is unknown. It defaults to the simplest possible type, logical. The answer is I think to use colClasses="character". It is probably slightly more accurate to say that if colClasses is not given, all columns are read as character columns, and then converted to the simplest possible type. In earlier versions of R you could get NULL columns (if there were no rows at all), but now the simplest is logical. Brian> A work-around for me was > >> DF[is.na(DF)] <- "" >> DF > V1 V2 V3 > 1 a > 2 > > But, as I said, I found the behaviour of read.table() puzzling. > > All this is with R 2.5.0 on a Windows XP Pro SP 2 system. > > Comments? > > Thanks, > John > > -------------------------------- > John Fox, Professor > Department of Sociology > McMaster University > Hamilton, Ontario > Canada L8S 4M4 > 905-525-9140x23604 > http://socserv.mcmaster.ca/jfox > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Dear Brian (and Gabor), Thanks -- that makes sense. John -------------------------------- John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox --------------------------------> -----Original Message----- > From: r-devel-bounces at r-project.org > [mailto:r-devel-bounces at r-project.org] On Behalf Of Prof Brian Ripley > Sent: Wednesday, May 09, 2007 12:05 PM > To: John Fox > Cc: r-devel at r-project.org > Subject: Re: [Rd] Behaviour of read.table with empty columns > > On Wed, 9 May 2007, John Fox wrote: > > > Dear r-devel list members, > > > > I stumbled across the following behaviour of read.table() recently: > Suppose > > that I have the data > > > > a " " "" > > "" "" "" > > > > in a file or copied to the clipboard, and issue the command > > > >> DF <- read.table("clipboard") > >> DF > > V1 V2 V3 > > 1 a NA NA > > 2 NA NA > > > >> is.na(DF) > > V1 V2 V3 > > [1,] FALSE TRUE TRUE > > [2,] FALSE TRUE TRUE > > > > I was surprised by the NAs. Note that they occur only when a column > > consists entirely of empty strings or strings composed of blanks. > > > > On the other hand > > > >> data.frame(A=c("", "", "")) > > A > > 1 > > 2 > > 3 > > > > works as I would have expected. > > How did you expect R to know that "" meant a character > column? You are allowed to quote any type of column, so as > far as read.table is concerned the columns is entirely empty > and so its type is unknown. It defaults to the simplest > possible type, logical. > > The answer is I think to use colClasses="character". > > It is probably slightly more accurate to say that if > colClasses is not given, all columns are read as character > columns, and then converted to the simplest possible type. > In earlier versions of R you could get NULL columns (if there > were no rows at all), but now the simplest is logical. > > Brian > > > A work-around for me was > > > >> DF[is.na(DF)] <- "" > >> DF > > V1 V2 V3 > > 1 a > > 2 > > > > But, as I said, I found the behaviour of read.table() puzzling. > > > > All this is with R 2.5.0 on a Windows XP Pro SP 2 system. > > > > Comments? > > > > Thanks, > > John > > > > -------------------------------- > > John Fox, Professor > > Department of Sociology > > McMaster University > > Hamilton, Ontario > > Canada L8S 4M4 > > 905-525-9140x23604 > > http://socserv.mcmaster.ca/jfox > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >