TELLERIA RUIZ DE AGUIRRE, JUAN
2017-Sep-14 07:48 UTC
[R] Print All Warnings that Occurr in All Parallel Nodes
Dear R Users, I have developed the following code for importing a series of zipped CSV by parallel computing. My problems are that: A) Some ZIP Files (Which contain CSVs inside) are corrupted, and cannot be opened. B) After executing parRapply I can only see the last.warning variable error, for knowing which CSV have failed in each node, but I cannot see all warnings, only 1 at a time. So: * For showing a list of all warnings in all nodes, I was thinking of using the following function in the code: warnings(DISPOIN_CSV_List <- parRapply(c1, DISPOIN_DIR_REL, parRaplly_Function)) Would this work? * And also, How could I check that a CSV can be opened before applying the function, and create an empty data.frame for those CSV. Thank you, Juan CODE ################################################################################ ## DISPOIN Data Import Into MariaDB ################################################################################ ## ----------------------------------------------------------------------------- ## Packages ## ----------------------------------------------------------------------------- # update.packages("RODBC") # update.packages("tidyverse") ## ----------------------------------------------------------------------------- ## Libraries ## ----------------------------------------------------------------------------- suppressMessages(require(RODBC)) suppressMessages(require(tidyverse)) suppressMessages(require(parallel)) ## ----------------------------------------------------------------------------- ## CMD: Command for DISPOIN's Directory Acquisition ## ----------------------------------------------------------------------------- # shell(cmd = 'pushd "\\srvdiscsv\data" && dir *AL*.zip /b /s > D:\DISPOIN_Data_Directories.csv && popd') ## ----------------------------------------------------------------------------- ## RODBC ## ----------------------------------------------------------------------------- ## A) MariaDB Connection String con <- odbcConnect("MariaDB_Tornado24") invisible(sqlQuery(con, "USE dispoin;")) # B) Import R Data Directories from MariaDB DISPOIN_DIR_REL <- as_tibble(sqlFetch(con, "dispoin.t_DISPOIN_DIR_REL")) odbcClose(con) # C) Import Zipped CSV data into List of Dataframes, which latter on are compiled as a single dataframe by # means of rbind # C.1) parRapply Function Initialization: parRaplly_Function <- function (DISPOIN_CSV_Row) { return(read_csv2( file = DISPOIN_CSV_Row, col_names = c( "SCADA", "TAG", "ID_del_AEG", "Descripcion", "Time_ON", "Time_OFF", "Delta_Time", "Comentario", "Es_Alarma", "Es_Ultima", "Comentarios"), col_types = cols( "SCADA" = "c", "TAG" = "c", "ID_del_AEG" = "c", "Descripcion" = "c", "Time_ON" = "c", "Time_OFF" = "c", "Delta_Time" = "c", "Comentario" = "c", "Es_Alarma" = "c", "Es_Ultima" = "c", "Comentarios" = "c"), locale = default_locale(), na = c("", " "), quoted_na = TRUE, quote = "\"", comment = "", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), progress = FALSE)) } # C.2) parallel Package: Environment Settings no_cores <- detectCores() c1 <- makeCluster(no_cores) invisible(clusterEvalQ(c1, library(readr))) setDefaultCluster(c1) # C.3) parRapply Function Application: DISPOIN_CSV_List <- parRapply(c1, DISPOIN_DIR_REL, parRaplly_Function) suppressWarnings(stopCluster(c1)) # D) List's Tibbles Compilation into a single Tibble: DISPOIN_CSV <- do.call(rbind, DISPOIN_CSV_List) # E) Write Compiled Table into CSV: write_csv( DISPOIN_CSV, path = file.path("D:/MySQL/R", "DISPOIN_CSV.csv"), na = "\\N", append = FALSE, col_names = TRUE) # F) Data Cleaning: Environment Variable Removal rm(list=ls()) [[alternative HTML version deleted]]
William Dunlap
2017-Sep-14 19:00 UTC
[R] Print All Warnings that Occurr in All Parallel Nodes
> How could I check that a CSV can be opened before applying the function, > and create an empty data.frame for those CSV.Use tryCatch(). E.g., instead of result <- read_csv2(file) use result <- tryCatch(read_csv2(file), error=function(e) makeEmptyDataFrame(conditionMessage(e))) where makeEmptyDataFrame(msg=NULL) is a function (which you write) that returns a data.frame with no rows but with the proper column names and types. I show it with a msg (message) argument, as you might want to attach the error message to it as an attribute so you can see what went wrong. Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Sep 14, 2017 at 12:48 AM, TELLERIA RUIZ DE AGUIRRE, JUAN < JTELLERIA at external.gamesacorp.com> wrote:> Dear R Users, > > I have developed the following code for importing a series of zipped CSV > by parallel computing. > > My problems are that: > > A) Some ZIP Files (Which contain CSVs inside) are corrupted, and cannot be > opened. > B) After executing parRapply I can only see the last.warning variable > error, for knowing which CSV have failed in each node, but I cannot see all > warnings, only 1 at a time. > > So: > > * For showing a list of all warnings in all nodes, I was thinking of using > the following function in the code: > > warnings(DISPOIN_CSV_List <- parRapply(c1, DISPOIN_DIR_REL, > parRaplly_Function)) > > Would this work? > > * And also, How could I check that a CSV can be opened before applying the > function, and create an empty data.frame for those CSV. > > Thank you, > Juan > > > CODE > ############################################################ > #################### > ## DISPOIN Data Import Into MariaDB > ############################################################ > #################### > > ## ------------------------------------------------------------ > ----------------- > ## Packages > ## ------------------------------------------------------------ > ----------------- > > # update.packages("RODBC") > # update.packages("tidyverse") > > ## ------------------------------------------------------------ > ----------------- > ## Libraries > ## ------------------------------------------------------------ > ----------------- > > suppressMessages(require(RODBC)) > suppressMessages(require(tidyverse)) > suppressMessages(require(parallel)) > > ## ------------------------------------------------------------ > ----------------- > ## CMD: Command for DISPOIN's Directory Acquisition > ## ------------------------------------------------------------ > ----------------- > > # shell(cmd = 'pushd "\\srvdiscsv\data" && dir *AL*.zip /b /s > > D:\DISPOIN_Data_Directories.csv && popd') > > ## ------------------------------------------------------------ > ----------------- > ## RODBC > ## ------------------------------------------------------------ > ----------------- > > ## A) MariaDB Connection String > > con <- odbcConnect("MariaDB_Tornado24") > > invisible(sqlQuery(con, "USE dispoin;")) > > # B) Import R Data Directories from MariaDB > > DISPOIN_DIR_REL <- as_tibble(sqlFetch(con, "dispoin.t_DISPOIN_DIR_REL")) > > odbcClose(con) > > # C) Import Zipped CSV data into List of Dataframes, which latter on are > compiled as a single dataframe by > # means of rbind > > # C.1) parRapply Function Initialization: > > parRaplly_Function <- function (DISPOIN_CSV_Row) > { > return(read_csv2( > file = DISPOIN_CSV_Row, > col_names = c( > "SCADA", > "TAG", > "ID_del_AEG", > "Descripcion", > "Time_ON", > "Time_OFF", > "Delta_Time", > "Comentario", > "Es_Alarma", > "Es_Ultima", > "Comentarios"), > col_types = cols( > "SCADA" = "c", > "TAG" = "c", > "ID_del_AEG" = "c", > "Descripcion" = "c", > "Time_ON" = "c", > "Time_OFF" = "c", > "Delta_Time" = "c", > "Comentario" = "c", > "Es_Alarma" = "c", > "Es_Ultima" = "c", > "Comentarios" = "c"), > locale = default_locale(), > na = c("", " "), > quoted_na = TRUE, > quote = "\"", > comment = "", > trim_ws = TRUE, > skip = 0, > n_max = Inf, > guess_max = min(1000, n_max), > progress = FALSE)) > } > > # C.2) parallel Package: Environment Settings > > no_cores <- detectCores() > > c1 <- makeCluster(no_cores) > > invisible(clusterEvalQ(c1, library(readr))) > > setDefaultCluster(c1) > > # C.3) parRapply Function Application: > > DISPOIN_CSV_List <- parRapply(c1, DISPOIN_DIR_REL, parRaplly_Function) > > suppressWarnings(stopCluster(c1)) > > # D) List's Tibbles Compilation into a single Tibble: > > DISPOIN_CSV <- do.call(rbind, DISPOIN_CSV_List) > > # E) Write Compiled Table into CSV: > > write_csv( > DISPOIN_CSV, > path = file.path("D:/MySQL/R", "DISPOIN_CSV.csv"), > na = "\\N", > append = FALSE, > col_names = TRUE) > > # F) Data Cleaning: Environment Variable Removal > > rm(list=ls()) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Maybe Matching Threads
- parRapply and parCapply return a list in corner cases
- what is the faster way to search for a pattern in a few million entries data frame ?
- Typos/omissions/inconsistencies in man page for clusterApply
- fast way to search for a pattern in a few million entries data frame
- what is the faster way to search for a pattern in a few million entries data frame ?