Suppose I have a data file (possibly with a huge number of columns), where the columns with factors are coded as "1", "2", "3", etc ... The default behavior of read.table is to convert these columns to integer vectors. Is there a way to get read.table to recognize that columns of quoted numbers represent factors (while unquoted numbers are interpreted as integers), without explicitly setting them with colClasses ?
On Mon, 2010-10-04 at 09:39 -0700, james hirschorn wrote:> Suppose I have a data file (possibly with a huge number of columns), where the > columns with factors are coded as "1", "2", "3", etc ... The default behavior of > read.table is to convert these columns to integer vectors. > > Is there a way to get read.table to recognize that columns of quoted numbers > represent factors (while unquoted numbers are interpreted as integers), without > explicitly setting them with colClasses ? > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.Hi James, I think you solve ypur problem using the options colClasses in the read.table command, something like this rea.table('name.of.table',colClasses=c(rep(30,'integer'),rep(5,'numeric'),etc)) -- Bernardo Rangel Tura, M.D,MPH,Ph.D National Institute of Cardiology Brazil
On Oct 4, 2010, at 18:39 , james hirschorn wrote:> Suppose I have a data file (possibly with a huge number of columns), where the > columns with factors are coded as "1", "2", "3", etc ... The default behavior of > read.table is to convert these columns to integer vectors. > > Is there a way to get read.table to recognize that columns of quoted numbers > represent factors (while unquoted numbers are interpreted as integers), without > explicitly setting them with colClasses ?I don't think there's a simple way, because the modus operandi of read.table is to read everything as character and then see whether it can be converted to numeric, and at that point any quotes will have been lost. One possibility, somewhat dependent on the exact file format, would be to temporarily set quote="", see which columns contains quote characters, and, on a second pass, read those columns as factors, using a computed colClasses argument. It will break down if you have space-separated columns with quoted multi-word strings, though.> > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
On Mon, Oct 4, 2010 at 12:39 PM, james hirschorn <j_hirschorn at yahoo.com> wrote:> Suppose I have a data file (possibly with a huge number of columns), where the > columns with factors are coded as "1", "2", "3", etc ... The default behavior of > read.table is to convert these columns to integer vectors. > > Is there a way to get read.table to recognize that columns of quoted numbers > represent factors (while unquoted numbers are interpreted as integers), without > explicitly setting them with colClasses ?Although its a bit messy its nevertheless only a few lines of code to transform the quote-and-digit columns to non-numeric, read them in and transform back. For example, if ! does not appear in the file we could insert ! characters into the quote-and-digit columns and remove them afterwards: L <- readLines("myfile.dat") L2 <- gsub('"(\\d+)"', "!\\1", L) # insert ! DF <- read.table(textConnection(L2), header = TRUE) # remove ! ix <- sapply(DF, is.factor) DF[ix] <- lapply(DF[ix], function(x) factor(gsub("!", "", x))) str(DF) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com