Kurt Van Dijck
2019-Mar-22 17:16 UTC
[Rd] [PATCH 1/2] readtable: add hook for type conversions per column
This commit adds a function parameter to readtable. The function is called for every column. The goal is to allow specific (non-standard) type conversions depending on the input. When the parameter is not given, or the function returns NULL, the legacy default applies. The colClasses parameter still takes precedence, i.e. the colConvertFn only applies to the default conversions. This allows to properly load a .csv with timestamps expressed in the (quite common) %d/%m/%y %H:%M format, which was impossible since overruling as.POSIXlt makes a copy in the users namespace, and read.table would still take the base version of as.POSIXlt. Rather than fixing my specific requirement, this hook allows to probe for any custom format and do smart things with little syntax. Signed-off-by: Kurt Van Dijck <dev.kurt at vandijck-laurijssen.be> --- src/library/utils/R/readtable.R | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/src/library/utils/R/readtable.R b/src/library/utils/R/readtable.R index 238542e..076a707 100644 --- a/src/library/utils/R/readtable.R +++ b/src/library/utils/R/readtable.R @@ -65,6 +65,7 @@ function(file, header = FALSE, sep = "", quote = "\"'", dec = ".", strip.white = FALSE, blank.lines.skip = TRUE, comment.char = "#", allowEscapes = FALSE, flush = FALSE, stringsAsFactors = default.stringsAsFactors(), + colConvert = NULL, fileEncoding = "", encoding = "unknown", text, skipNul = FALSE) { if (missing(file) && !missing(text)) { @@ -226,9 +227,18 @@ function(file, header = FALSE, sep = "", quote = "\"'", dec = ".", if(rlabp) do[1L] <- FALSE # don't convert "row.names" for (i in (1L:cols)[do]) { data[[i]] <- - if (is.na(colClasses[i])) + if (is.na(colClasses[i])) { + tmp <- NULL + if (!is.null(colConvert)) + # attempt to convert from user provided hook + tmp <- colConvert(data[[i]]) + if (!is.null(tmp)) + (tmp) + else + # fallback, default type.convert(data[[i]], as.is = as.is[i], dec=dec, numerals=numerals, na.strings = character(0L)) + } ## as na.strings have already been converted to <NA> else if (colClasses[i] == "factor") as.factor(data[[i]]) else if (colClasses[i] == "Date") as.Date(data[[i]]) -- 1.8.5.rc3
Kurt Van Dijck
2019-Mar-22 17:16 UTC
[Rd] [PATCH 2/2] readtable: add test for type conversion hook 'colConvert'
Signed-off-by: Kurt Van Dijck <dev.kurt at vandijck-laurijssen.be> --- tests/reg-tests-2.R | 21 +++++++++++++++++++++ tests/reg-tests-2.Rout.save | 27 +++++++++++++++++++++++++++ 2 files changed, 48 insertions(+) diff --git a/tests/reg-tests-2.R b/tests/reg-tests-2.R index 9fd5242..5026fe7 100644 --- a/tests/reg-tests-2.R +++ b/tests/reg-tests-2.R @@ -1329,6 +1329,27 @@ unlink(foo) ## added in 2.0.0 +## colConvert in read.table +probecol <- function(col) { + tmp <- as.POSIXlt(col, optional=TRUE, tryFormats=c("%d/%m/%Y %H:%M")); + if (all(!is.na(tmp))) + return (tmp) + tmp <- as.POSIXlt(col, optional=TRUE, tryFormats=c("%d/%m/%Y")); + if (all(!is.na(tmp))) + return (tmp) +} + +Mat <- matrix(c(1:3, letters[1:3], 1:3, LETTERS[1:3], + c("22/4/1969", "8/4/1971", "23/9/1973"), + c("22/4/1969 6:01", " 8/4/1971 7:23", "23/9/1973 8:45")), + 3, 6) +foo <- tempfile() +write.table(Mat, foo, sep = ",", col.names = FALSE, row.names = FALSE) +read.table(foo, sep = ",", colConvert=probecol) +unlist(sapply(.Last.value, class)) +unlink(foo) + + ## write.table with complex columns (PR#7260, in part) write.table(data.frame(x = 0.5+1:4, y = 1:4 + 1.5i), file = "") # printed all as complex in 2.0.0. diff --git a/tests/reg-tests-2.Rout.save b/tests/reg-tests-2.Rout.save index 598dd71..668898e 100644 --- a/tests/reg-tests-2.Rout.save +++ b/tests/reg-tests-2.Rout.save @@ -4206,6 +4206,33 @@ Warning message: > ## added in 2.0.0 > > +> ## colConvert in read.table +> probecol <- function(col) { ++ tmp <- as.POSIXlt(col, optional=TRUE, tryFormats=c("%d/%m/%Y %H:%M")); ++ if (all(!is.na(tmp))) ++ return (tmp) ++ tmp <- as.POSIXlt(col, optional=TRUE, tryFormats=c("%d/%m/%Y")); ++ if (all(!is.na(tmp))) ++ return (tmp) ++ } +> +> Mat <- matrix(c(1:3, letters[1:3], 1:3, LETTERS[1:3], ++ c("22/4/1969", "8/4/1971", "23/9/1973"), ++ c("22/4/1969 6:01", " 8/4/1971 7:23", "23/9/1973 8:45")), ++ 3, 6) +> foo <- tempfile() +> write.table(Mat, foo, sep = ",", col.names = FALSE, row.names = FALSE) +> read.table(foo, sep = ",", colConvert=probecol) + V1 V2 V3 V4 V5 V6 +1 1 a 1 A 1969-04-22 1969-04-22 06:01:00 +2 2 b 2 B 1971-04-08 1971-04-08 07:23:00 +3 3 c 3 C 1973-09-23 1973-09-23 08:45:00 +> unlist(sapply(.Last.value, class)) + V1 V2 V3 V4 V51 V52 V61 V62 +"integer" "factor" "integer" "factor" "POSIXlt" "POSIXt" "POSIXlt" "POSIXt" +> unlink(foo) +> +> > ## write.table with complex columns (PR#7260, in part) > write.table(data.frame(x = 0.5+1:4, y = 1:4 + 1.5i), file = "") "x" "y" -- 1.8.5.rc3
Kurt Van Dijck
2019-Mar-26 18:52 UTC
[Rd] [PATCH 1/2] readtable: add hook for type conversions per column
Hello, I want to find out if this patch is ok or not, and if not, what should change. Kind regards, Kurt
Michael Lawrence
2019-Mar-26 19:48 UTC
[Rd] [PATCH 1/2] readtable: add hook for type conversions per column
Please file a bug on bugzilla so we can discuss this further. On Tue, Mar 26, 2019 at 11:53 AM Kurt Van Dijck < dev.kurt at vandijck-laurijssen.be> wrote:> Hello, > > I want to find out if this patch is ok or not, and if not, what should > change. > > Kind regards, > Kurt > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]