Hi, Thanks for this solution! Very slick! I see what you mean about the two calls to as_tibble(). I suppose I could do the following, but I doubt it is a gain... mm <- lapply(colnames(m), function(nm, m) type.convert(m[,nm], as.is = TRUE), m=m) names(mm) <- colnames(m) as_tibble(mm) # # A tibble: 4 ? 5 # A B C D E # <chr> <chr> <chr> <int> <dbl> # 1 a e i 1 11.2 # 2 b f j 2 12.2 # 3 c g k 3 13.2 # 4 d h l 4 14.2 I'll benchmark these with writing to a temporary file and pasting together a string. Cheers and thanks, Ben On Apr 6, 2017, at 11:15 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com> wrote:> > Hi Ben, > > type.convert should do the trick: > > m %>% > as_tibble() %>% > lapply(type.convert) %>% > as_tibble() > > I am not too happy about to double 'as_tibble' but it get the job done. > > HTH > Ulrik > > On Thu, 6 Apr 2017 at 16:41 Ben Tupper <btupper at bigelow.org <mailto:btupper at bigelow.org>> wrote: > Hello, > > I have a workflow yields a character matrix that I convert to a tibble. Here is a simple example. > > library(tibble) > library(readr) > > m <- matrix(c(letters[1:12], 1:4, (11:14 + 0.2)), ncol = 5) > colnames(m) <- LETTERS[1:5] > > x <- as_tibble(m) > > # # A tibble: 4 ? 5 > # A B C D E > # <chr> <chr> <chr> <chr> <chr> > # 1 a e i 1 11.2 > # 2 b f j 2 12.2 > # 3 c g k 3 13.2 > # 4 d h l 4 14.2 > > The workflow output columns can be a mix of a known set column outputs. Some of the columns really should be converted to non-character types before I proceed. Right now I explictly set the column classes with something like this... > > mode(x[['D']]) <- 'integer' > mode(x[['E']]) <- 'numeric' > > # # A tibble: 4 ? 5 > # A B C D E > # <chr> <chr> <chr> <int> <dbl> > # 1 a e i 1 11.2 > # 2 b f j 2 12.2 > # 3 c g k 3 13.2 > # 4 d h l 4 14.2 > > > I wonder if there is a way to use the read_* functions in the readr package to read the character matrix into a tibble directly which would leverage readr's excellent column class guessing. I can see in the vignette ( https://cran.r-project.org/web/packages/readr/vignettes/readr.html <https://cran.r-project.org/web/packages/readr/vignettes/readr.html> ) that I'm not too far off in thinking this could be done (step 1 tantalizingly says 'The flat file is parsed into a rectangular matrix of strings.') > > I know that I could either write the matrix to a file or paste it all into a character vector and then use read_* functions, but I confess I am looking for a straighter path by simply passing the matrix to a function like readr::read_matrix() or the like. > > Thanks! > Ben > > Ben Tupper > Bigelow Laboratory for Ocean Sciences > 60 Bigelow Drive, P.O. Box 380 > East Boothbay, Maine 04544 > http://www.bigelow.org <http://www.bigelow.org/> > > ______________________________________________ > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code.Ben Tupper Bigelow Laboratory for Ocean Sciences 60 Bigelow Drive, P.O. Box 380 East Boothbay, Maine 04544 http://www.bigelow.org [[alternative HTML version deleted]]
David L Carlson
2017-Apr-06 19:34 UTC
[R] readr to generate tibble from a character matrix
Ulrik's solution gives you factors. To get them as characters, add as.is=TRUE:> m %>%+ as_tibble() %>% + lapply(type.convert, as.is=TRUE) %>% + as_tibble() # A tibble: 4 ? 5 A B C D E <chr> <chr> <chr> <int> <dbl> 1 a e i 1 11.2 2 b f j 2 12.2 3 c g k 3 13.2 4 d h l 4 14.2 Other possibilities:> mm <- lapply(data.frame(m, stringsAsFactors=FALSE), type.convert, as.is=TRUE) > as_tibble(mm)# Your solution simplified by converting to a data.frame> as_tibble(lapply(as_tibble(m), type.convert, as.is=TRUE))# Ulrik's solution but without the pipes. Shows why you need 2 as_tibbles() ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ben Tupper Sent: Thursday, April 6, 2017 11:42 AM To: Ulrik Stervbo <ulrik.stervbo at gmail.com> Cc: R-help Mailing List <r-help at r-project.org> Subject: Re: [R] readr to generate tibble from a character matrix Hi, Thanks for this solution! Very slick! I see what you mean about the two calls to as_tibble(). I suppose I could do the following, but I doubt it is a gain... mm <- lapply(colnames(m), function(nm, m) type.convert(m[,nm], as.is = TRUE), m=m) names(mm) <- colnames(m) as_tibble(mm) # # A tibble: 4 ? 5 # A B C D E # <chr> <chr> <chr> <int> <dbl> # 1 a e i 1 11.2 # 2 b f j 2 12.2 # 3 c g k 3 13.2 # 4 d h l 4 14.2 I'll benchmark these with writing to a temporary file and pasting together a string. Cheers and thanks, Ben On Apr 6, 2017, at 11:15 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com> wrote:> > Hi Ben, > > type.convert should do the trick: > > m %>% > as_tibble() %>% > lapply(type.convert) %>% > as_tibble() > > I am not too happy about to double 'as_tibble' but it get the job done. > > HTH > Ulrik > > On Thu, 6 Apr 2017 at 16:41 Ben Tupper <btupper at bigelow.org <mailto:btupper at bigelow.org>> wrote: > Hello, > > I have a workflow yields a character matrix that I convert to a tibble. Here is a simple example. > > library(tibble) > library(readr) > > m <- matrix(c(letters[1:12], 1:4, (11:14 + 0.2)), ncol = 5) > colnames(m) <- LETTERS[1:5] > > x <- as_tibble(m) > > # # A tibble: 4 ? 5 > # A B C D E > # <chr> <chr> <chr> <chr> <chr> > # 1 a e i 1 11.2 > # 2 b f j 2 12.2 > # 3 c g k 3 13.2 > # 4 d h l 4 14.2 > > The workflow output columns can be a mix of a known set column outputs. Some of the columns really should be converted to non-character types before I proceed. Right now I explictly set the column classes with something like this... > > mode(x[['D']]) <- 'integer' > mode(x[['E']]) <- 'numeric' > > # # A tibble: 4 ? 5 > # A B C D E > # <chr> <chr> <chr> <int> <dbl> > # 1 a e i 1 11.2 > # 2 b f j 2 12.2 > # 3 c g k 3 13.2 > # 4 d h l 4 14.2 > > > I wonder if there is a way to use the read_* functions in the readr package to read the character matrix into a tibble directly which would leverage readr's excellent column class guessing. I can see in the vignette ( https://cran.r-project.org/web/packages/readr/vignettes/readr.html <https://cran.r-project.org/web/packages/readr/vignettes/readr.html> ) that I'm not too far off in thinking this could be done (step 1 tantalizingly says 'The flat file is parsed into a rectangular matrix of strings.') > > I know that I could either write the matrix to a file or paste it all into a character vector and then use read_* functions, but I confess I am looking for a straighter path by simply passing the matrix to a function like readr::read_matrix() or the like. > > Thanks! > Ben > > Ben Tupper > Bigelow Laboratory for Ocean Sciences > 60 Bigelow Drive, P.O. Box 380 > East Boothbay, Maine 04544 > http://www.bigelow.org <http://www.bigelow.org/> > > ______________________________________________ > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code.Ben Tupper Bigelow Laboratory for Ocean Sciences 60 Bigelow Drive, P.O. Box 380 East Boothbay, Maine 04544 http://www.bigelow.org [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thanks! I made up a little test for converting from character matrix to tibble: dumping to file and reading back, pasting up a big string, using pipes, using as.data.frame and using a pipeless version. By far and away it is worth using Ulrik's or your solution compared to dumping the matrix to a file and then reading back OR pasting the matrix into one honking big string. There is a difference in how the various methods interpret date-time inputs, but otherwise the results are all identical. Cheers, Ben #### START library(nycflights13) library(tibble) library(magrittr) library(readr) library(microbenchmark) m <- as.matrix(flights) via_file <- function(m){ filename = tempfile(fileext = '.csv') write.csv(m, file = filename, row.names = FALSE, quote = FALSE) readr::read_csv(filename) } via_paste <- function(m){ s <- paste( c(paste(colnames(m), collapse = ","), apply(m, 1, paste, collapse = ",")), collapse = "\n") readr::read_csv(s) } via_pipes <- function(m){ m %>% tibble::as_tibble() %>% lapply(type.convert, as.is = TRUE) %>% tibble::as_tibble() } via_dataframe <- function(m){ mm <- lapply(data.frame(m, stringsAsFactors=FALSE), type.convert, as.is=TRUE) tibble::as_tibble(mm) } via_pipeless <- function(m){ tibble::as_tibble(lapply(tibble::as_tibble(m), type.convert, as.is=TRUE)) } X <- list( file=via_file(m), paste=via_paste(m), pipes=via_pipes(m), dataframe=via_dataframe(m), pipeless=via_pipeless(m)) sapply(names(X), function(n) all.equal(X[[n]], X[[1]])) # $file # [1] TRUE # $paste # [1] TRUE # $pipes # [1] "Incompatible type for column time_hour1: x character, y POSIXct" "Incompatible type for column time_hour2: x character, y POSIXt" # $dataframe # [1] "Incompatible type for column time_hour1: x character, y POSIXct" "Incompatible type for column time_hour2: x character, y POSIXt" # $pipeless # [1] "Incompatible type for column time_hour1: x character, y POSIXct" "Incompatible type for column time_hour2: x character, y POSIXt" microbenchmark( via_file(m), via_paste(m), via_pipes(m), via_dataframe(m), via_pipeless(m), times = 5 ) #Unit: milliseconds # expr min lq mean median uq max neval # via_file(m) 2362.7778 2396.2277 2415.9207 2413.0772 2439.5752 2467.9457 5 # via_paste(m) 5287.8176 5305.6228 5622.1432 5666.0165 5919.3568 5931.9023 5 # via_pipes(m) 461.4782 464.5656 506.4157 509.5532 542.1091 554.3726 5 # via_dataframe(m) 507.4674 514.2550 553.1791 515.9132 518.0807 710.1794 5 # via_pipeless(m) 448.9529 470.1074 499.4392 470.6874 500.6027 606.8459 5 sessionInfo() # R version 3.3.1 (2016-06-21) # Platform: x86_64-apple-darwin13.4.0 (64-bit) # Running under: OS X 10.11.6 (El Capitan) # locale: # [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 # attached base packages: # [1] stats graphics grDevices utils datasets methods base # other attached packages: # [1] microbenchmark_1.4-2.1 readr_1.0.0 magrittr_1.5 tibble_1.2 nycflights13_0.2.0 # loaded via a namespace (and not attached): # [1] colorspace_1.2-6 scales_0.4.1 plyr_1.8.4 assertthat_0.1 tools_3.3.1 gtable_0.2.0 Rcpp_0.12.9 ggplot2_2.1.0 grid_3.3.1 munsell_0.4.3 ### END> On Apr 6, 2017, at 3:34 PM, David L Carlson <dcarlson at tamu.edu> wrote: > > Ulrik's solution gives you factors. To get them as characters, add as.is=TRUE: > >> m %>% > + as_tibble() %>% > + lapply(type.convert, as.is=TRUE) %>% > + as_tibble() > # A tibble: 4 ? 5 > A B C D E > <chr> <chr> <chr> <int> <dbl> > 1 a e i 1 11.2 > 2 b f j 2 12.2 > 3 c g k 3 13.2 > 4 d h l 4 14.2 > > Other possibilities: > >> mm <- lapply(data.frame(m, stringsAsFactors=FALSE), type.convert, as.is=TRUE) >> as_tibble(mm) > # Your solution simplified by converting to a data.frame > >> as_tibble(lapply(as_tibble(m), type.convert, as.is=TRUE)) > # Ulrik's solution but without the pipes. Shows why you need 2 as_tibbles() > > ------------------------------------- > David L Carlson > Department of Anthropology > Texas A&M University > College Station, TX 77840-4352 > > -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ben Tupper > Sent: Thursday, April 6, 2017 11:42 AM > To: Ulrik Stervbo <ulrik.stervbo at gmail.com> > Cc: R-help Mailing List <r-help at r-project.org> > Subject: Re: [R] readr to generate tibble from a character matrix > > Hi, > > Thanks for this solution! Very slick! > > I see what you mean about the two calls to as_tibble(). I suppose I could do the following, but I doubt it is a gain... > > mm <- lapply(colnames(m), function(nm, m) type.convert(m[,nm], as.is = TRUE), m=m) > names(mm) <- colnames(m) > as_tibble(mm) > > # # A tibble: 4 ? 5 > # A B C D E > # <chr> <chr> <chr> <int> <dbl> > # 1 a e i 1 11.2 > # 2 b f j 2 12.2 > # 3 c g k 3 13.2 > # 4 d h l 4 14.2 > > I'll benchmark these with writing to a temporary file and pasting together a string. > > Cheers and thanks, > Ben > > On Apr 6, 2017, at 11:15 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com> wrote: >> >> Hi Ben, >> >> type.convert should do the trick: >> >> m %>% >> as_tibble() %>% >> lapply(type.convert) %>% >> as_tibble() >> >> I am not too happy about to double 'as_tibble' but it get the job done. >> >> HTH >> Ulrik >> >> On Thu, 6 Apr 2017 at 16:41 Ben Tupper <btupper at bigelow.org <mailto:btupper at bigelow.org>> wrote: >> Hello, >> >> I have a workflow yields a character matrix that I convert to a tibble. Here is a simple example. >> >> library(tibble) >> library(readr) >> >> m <- matrix(c(letters[1:12], 1:4, (11:14 + 0.2)), ncol = 5) >> colnames(m) <- LETTERS[1:5] >> >> x <- as_tibble(m) >> >> # # A tibble: 4 ? 5 >> # A B C D E >> # <chr> <chr> <chr> <chr> <chr> >> # 1 a e i 1 11.2 >> # 2 b f j 2 12.2 >> # 3 c g k 3 13.2 >> # 4 d h l 4 14.2 >> >> The workflow output columns can be a mix of a known set column outputs. Some of the columns really should be converted to non-character types before I proceed. Right now I explictly set the column classes with something like this... >> >> mode(x[['D']]) <- 'integer' >> mode(x[['E']]) <- 'numeric' >> >> # # A tibble: 4 ? 5 >> # A B C D E >> # <chr> <chr> <chr> <int> <dbl> >> # 1 a e i 1 11.2 >> # 2 b f j 2 12.2 >> # 3 c g k 3 13.2 >> # 4 d h l 4 14.2 >> >> >> I wonder if there is a way to use the read_* functions in the readr package to read the character matrix into a tibble directly which would leverage readr's excellent column class guessing. I can see in the vignette ( https://cran.r-project.org/web/packages/readr/vignettes/readr.html <https://cran.r-project.org/web/packages/readr/vignettes/readr.html> ) that I'm not too far off in thinking this could be done (step 1 tantalizingly says 'The flat file is parsed into a rectangular matrix of strings.') >> >> I know that I could either write the matrix to a file or paste it all into a character vector and then use read_* functions, but I confess I am looking for a straighter path by simply passing the matrix to a function like readr::read_matrix() or the like. >> >> Thanks! >> Ben >> >> Ben Tupper >> Bigelow Laboratory for Ocean Sciences >> 60 Bigelow Drive, P.O. Box 380 >> East Boothbay, Maine 04544 >> http://www.bigelow.org <http://www.bigelow.org/> >> >> ______________________________________________ >> R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. > > Ben Tupper > Bigelow Laboratory for Ocean Sciences > 60 Bigelow Drive, P.O. Box 380 > East Boothbay, Maine 04544 > http://www.bigelow.org > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.Ben Tupper Bigelow Laboratory for Ocean Sciences 60 Bigelow Drive, P.O. Box 380 East Boothbay, Maine 04544 http://www.bigelow.org