Oliver Kullmann
2011-Aug-28 14:13 UTC
[R] read.table: deciding automatically between two colClasses values
Hello, I have a function for reading a data-frame from a file, which contains E = read.table(file = filename, header = T, colClasses = c(rep("integer",6),"numeric","integer",rep("numeric",8)), ...) Now a small variation arose, where colClasses = c(rep("integer",4),"numeric","integer",rep("numeric",8)) needed to be used (so just a small change). I want to have it convenient for the user, so no user intervention shall be needed, but the function should choose between the two different values "4" and "6" here according to the header-line. Now this seems to be a problem: I found only count.fields, which however is not able just to read the first line. Reading the whole file (just to read the first line) is awkward, and also these files typically have millions of lines. The only possibility to influence count.fields seems via skip, but this I could only use to skip to the last line, which reads the file nevertheless, and I also don't know the number of lines in the file. Perhaps one could catch an error, when the first invocation of read.table fails, and try the second one. However tryCatch doesn't seem to make it simple to write something like E = try(expr1 otherwise expr2) (if expr1 fails, evaluate expr2 instead) ? Oliver
Joshua Wiley
2011-Aug-28 14:23 UTC
[R] read.table: deciding automatically between two colClasses values
Hi Oliver, Look at ?readLines I imagine something like: tmp <- readLines(filename, n = 1L) (do stuff with the first line to decide) IntN <- 6 (or 4) NumN <- 8 (or whatever) E <- read.table(file = filename, header = TRUE, colClasses c(rep("integer", IntN), "numeric", "integer", rep("numeric", NumN)), ...) Cheers, Josh On Sun, Aug 28, 2011 at 7:13 AM, Oliver Kullmann <O.Kullmann at swansea.ac.uk> wrote:> Hello, > > I have a function for reading a data-frame from a file, which contains > > ?E = read.table(file = filename, > ? ? ? ?header = T, > ? ? ? ?colClasses = c(rep("integer",6),"numeric","integer",rep("numeric",8)), > ? ? ? ?...) > > Now a small variation arose, where > > colClasses = c(rep("integer",4),"numeric","integer",rep("numeric",8)) > > needed to be used (so just a small change). > I want to have it convenient for the user, so no user intervention shall > be needed, but the function should choose between the two different values > "4" and "6" here according to the header-line. > > Now this seems to be a problem: I found only count.fields, which > however is not able just to read the first line. Reading the > whole file (just to read the first line) is awkward, and also these > files typically have millions of lines. The only possibility to influence > count.fields seems via skip, but this I could only use to skip to the > last line, which reads the file nevertheless, and I also don't know > the number of lines in the file. > > Perhaps one could catch an error, when the first invocation of > read.table fails, and try the second one. However tryCatch doesn't > seem to make it simple to write something like > > E = try(expr1 otherwise expr2) > > (if expr1 fails, evaluate expr2 instead) ? > > Oliver > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/