Dear R-Help, I have 30 of year-based excel files and each file contain month sheets. I have some problem here. My data is daily rainfall but there is extra 1 day (first date of next month) for several sheets. My main goal is to get the minimum value for every month. First, how to extract those data to list of data frame based on year and delete every overlapping date? Second, how to sort it based on date with ascending order (old to new)? Third, how to get the maximum together with the date? I did this one, ... file.list <- list.files(pattern='*.xlsx') file.list<-mixedsort(file.list) # https://stackoverflow.com/questions/12945687/read-all-worksheets-in-an-excel-workbook-into-an-r-list-with-data-frames read_excel_allsheets <- function(filename, tibble = FALSE) { sheets <- readxl::excel_sheets(filename) x <- lapply(sheets, function(X) read.xlsx(filename, sheet=X, rows=9:40, cols=1:2)) if(!tibble) x <- lapply(x, as.data.frame) names(x) <- sheets x } pon<-lapply(file.list, function(i) read_excel_allsheets(i)) pon1<-do.call("rbind",pon) names(pon1) <- paste0("M.", 1:360) pon1 <-lapply(pon1,function(x){x$RR[x$RR==8888] <- NA; x}) pon1 <-lapply(pon1,function(x){x$RR[x$RR==""] <- NA; x}) maxi<-lapply(pon1, function(x) max(x$RR,na.rm=T)) maxi<-data.frame(Reduce(rbind, maxi)) names(maxi)<-"maxi" .... but the list start from January for every year, and move to February and so on. And there is no date in "maxi". Here some sample what I get from my simple code.> pon1[256:258]$M.256Tanggal RR 1 01-09-2001 5.2 2 02-09-2001 0.3 3 03-09-2001 29.0 4 04-09-2001 0.7 5 05-09-2001 9.6 6 06-09-2001 0.7 7 07-09-2001 NA 8 08-09-2001 13.2 9 09-09-2001 NA 10 10-09-2001 NA 11 11-09-2001 0.0 12 12-09-2001 66.0 13 13-09-2001 0.0 14 14-09-2001 57.6 15 15-09-2001 18.0 16 16-09-2001 29.2 17 17-09-2001 52.2 18 18-09-2001 7.0 19 19-09-2001 NA 20 20-09-2001 74.5 21 21-09-2001 20.3 22 22-09-2001 49.6 23 23-09-2001 0.0 24 24-09-2001 1.3 25 25-09-2001 0.0 26 26-09-2001 1.0 27 27-09-2001 0.1 28 28-09-2001 1.9 29 29-09-2001 9.5 30 30-09-2001 3.3 31 01-10-2001 0.0 $M.257 Tanggal RR 1 01-09-2002 0.0 2 02-09-2002 0.0 3 03-09-2002 0.0 4 04-09-2002 12.8 5 05-09-2002 1.0 6 06-09-2002 0.0 7 07-09-2002 NA 8 08-09-2002 22.2 9 09-09-2002 NA 10 10-09-2002 NA 11 11-09-2002 0.0 12 12-09-2002 0.0 13 13-09-2002 0.0 14 14-09-2002 NA 15 15-09-2002 0.0 16 16-09-2002 0.0 17 17-09-2002 0.0 18 18-09-2002 13.3 19 19-09-2002 0.0 20 20-09-2002 0.0 21 21-09-2002 0.0 22 22-09-2002 0.0 23 23-09-2002 0.0 24 24-09-2002 0.0 25 25-09-2002 0.0 26 26-09-2002 0.5 27 27-09-2002 2.1 28 28-09-2002 NA 29 29-09-2002 18.5 30 30-09-2002 0.0 31 01-10-2002 NA $M.258 Tanggal RR 1 01-09-2003 0.0 2 02-09-2003 0.0 3 03-09-2003 0.0 4 04-09-2003 4.0 5 05-09-2003 0.3 6 06-09-2003 0.0 7 07-09-2003 NA 8 08-09-2003 0.0 9 09-09-2003 0.0 10 10-09-2003 0.0 11 11-09-2003 NA 12 12-09-2003 1.0 13 13-09-2003 0.0 14 14-09-2003 60.0 15 15-09-2003 4.5 16 16-09-2003 0.1 17 17-09-2003 2.1 18 18-09-2003 NA 19 19-09-2003 0.0 20 20-09-2003 NA 21 21-09-2003 NA 22 22-09-2003 31.5 23 23-09-2003 42.0 24 24-09-2003 43.3 25 25-09-2003 2.8 26 26-09-2003 21.4 27 27-09-2003 0.8 28 28-09-2003 42.3 29 29-09-2003 5.3 30 30-09-2003 17.3 31 01-10-2003 0.0 Any lead or help is very appreciate. Best, Ani [[alternative HTML version deleted]]
Hello, Maybe the following will get you close to what you want. # remove the last row from every df pon1 <- lapply(pon1, function(DF){ DF[[1]] <- as.Date(DF[["Tanggal"]], "%d-%m-%Y") DF[-nrow(DF), ] }) # order the list by year-month inx_ym <- sapply(pon1, function(DF){ format(DF[["Tanggal"]][1], "%Y-%m") }) pon1 <- pon1[order(inx_ym)] # get the minimum and maximum of every "RR" min.RR <- sapply(pon1, function(DF) min(DF[["RR"]], na.rm = TRUE)) max.RR <- sapply(pon1, function(DF) max(DF[["RR"]], na.rm = TRUE)) Hope this helps, Rui Barradas ?s 07:50 de 13/11/19, ani jaya escreveu:> Dear R-Help, > > I have 30 of year-based excel files and each file contain month sheets. I > have some problem here. My data is daily rainfall but there is extra 1 day > (first date of next month) for several sheets. My main goal is to get the > minimum value for every month. > > First, how to extract those data to list of data frame based on year and > delete every overlapping date? > Second, how to sort it based on date with ascending order (old to new)? > Third, how to get the maximum together with the date? > > I did this one, > > ... > file.list <- list.files(pattern='*.xlsx') > file.list<-mixedsort(file.list) > > # > https://stackoverflow.com/questions/12945687/read-all-worksheets-in-an-excel-workbook-into-an-r-list-with-data-frames > > read_excel_allsheets <- function(filename, tibble = FALSE) { > sheets <- readxl::excel_sheets(filename) > x <- lapply(sheets, function(X) read.xlsx(filename, sheet=X, rows=9:40, > cols=1:2)) > if(!tibble) x <- lapply(x, as.data.frame) > names(x) <- sheets > x > } > > pon<-lapply(file.list, function(i) read_excel_allsheets(i)) > pon1<-do.call("rbind",pon) > names(pon1) <- paste0("M.", 1:360) > pon1 <-lapply(pon1,function(x){x$RR[x$RR==8888] <- NA; x}) > pon1 <-lapply(pon1,function(x){x$RR[x$RR==""] <- NA; x}) > maxi<-lapply(pon1, function(x) max(x$RR,na.rm=T)) > maxi<-data.frame(Reduce(rbind, maxi)) > names(maxi)<-"maxi" > .... > > but the list start from January for every year, and move to February and so > on. And there is no date in "maxi". Here some sample what I get from my > simple code. > >> pon1[256:258]$M.256 > Tanggal RR > 1 01-09-2001 5.2 > 2 02-09-2001 0.3 > 3 03-09-2001 29.0 > 4 04-09-2001 0.7 > 5 05-09-2001 9.6 > 6 06-09-2001 0.7 > 7 07-09-2001 NA > 8 08-09-2001 13.2 > 9 09-09-2001 NA > 10 10-09-2001 NA > 11 11-09-2001 0.0 > 12 12-09-2001 66.0 > 13 13-09-2001 0.0 > 14 14-09-2001 57.6 > 15 15-09-2001 18.0 > 16 16-09-2001 29.2 > 17 17-09-2001 52.2 > 18 18-09-2001 7.0 > 19 19-09-2001 NA > 20 20-09-2001 74.5 > 21 21-09-2001 20.3 > 22 22-09-2001 49.6 > 23 23-09-2001 0.0 > 24 24-09-2001 1.3 > 25 25-09-2001 0.0 > 26 26-09-2001 1.0 > 27 27-09-2001 0.1 > 28 28-09-2001 1.9 > 29 29-09-2001 9.5 > 30 30-09-2001 3.3 > 31 01-10-2001 0.0 > > $M.257 > Tanggal RR > 1 01-09-2002 0.0 > 2 02-09-2002 0.0 > 3 03-09-2002 0.0 > 4 04-09-2002 12.8 > 5 05-09-2002 1.0 > 6 06-09-2002 0.0 > 7 07-09-2002 NA > 8 08-09-2002 22.2 > 9 09-09-2002 NA > 10 10-09-2002 NA > 11 11-09-2002 0.0 > 12 12-09-2002 0.0 > 13 13-09-2002 0.0 > 14 14-09-2002 NA > 15 15-09-2002 0.0 > 16 16-09-2002 0.0 > 17 17-09-2002 0.0 > 18 18-09-2002 13.3 > 19 19-09-2002 0.0 > 20 20-09-2002 0.0 > 21 21-09-2002 0.0 > 22 22-09-2002 0.0 > 23 23-09-2002 0.0 > 24 24-09-2002 0.0 > 25 25-09-2002 0.0 > 26 26-09-2002 0.5 > 27 27-09-2002 2.1 > 28 28-09-2002 NA > 29 29-09-2002 18.5 > 30 30-09-2002 0.0 > 31 01-10-2002 NA > > $M.258 > Tanggal RR > 1 01-09-2003 0.0 > 2 02-09-2003 0.0 > 3 03-09-2003 0.0 > 4 04-09-2003 4.0 > 5 05-09-2003 0.3 > 6 06-09-2003 0.0 > 7 07-09-2003 NA > 8 08-09-2003 0.0 > 9 09-09-2003 0.0 > 10 10-09-2003 0.0 > 11 11-09-2003 NA > 12 12-09-2003 1.0 > 13 13-09-2003 0.0 > 14 14-09-2003 60.0 > 15 15-09-2003 4.5 > 16 16-09-2003 0.1 > 17 17-09-2003 2.1 > 18 18-09-2003 NA > 19 19-09-2003 0.0 > 20 20-09-2003 NA > 21 21-09-2003 NA > 22 22-09-2003 31.5 > 23 23-09-2003 42.0 > 24 24-09-2003 43.3 > 25 25-09-2003 2.8 > 26 26-09-2003 21.4 > 27 27-09-2003 0.8 > 28 28-09-2003 42.3 > 29 29-09-2003 5.3 > 30 30-09-2003 17.3 > 31 01-10-2003 0.0 > > > Any lead or help is very appreciate. > > Best, > > Ani > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Thank you very much Mr. Rui, but for delete the duplicated row I use: ... library(tidyverse) alldata<-data.frame(Reduce(rbind, pon1)) c<-(which(duplicated(alldata$Tanggal))) #duplicate alldata<-alldata[-c,] attach(alldata) .... because not every last row from every df is bad one. Another problem is I want to know when the max value is occurred. So basically I have maximum value every month (maxi, n=360, from 1986 to 2015) and I want to find annual_maxima. ... maxi<-lapply(pon1, function(x) max(x$RR,na.rm=T)) maxi<-data.frame(Reduce(rbind, maxi)) names(maxi)<-"maxi" annual_maxima <- rep(NA,30) date <- rep(NA,30) for(i in 1:30){ annual_maxima[i] <- max(maxi$maxi[(i*12-11):(i*12)]) date[i]<-Tanggal[which(RR==annual_maxima[i])] } .... Here "alldata" contain "Tanggal" in this case is date and rainfall ("RR"). What I get is error stated that: In date[i] <- Tanggal[which(RR == annual_maxima[i])] : number of items to replace is not a multiple of replacement length Maybe you have some idea where the problem is, I would be thankful. Best, Ani On Wed, Nov 13, 2019 at 5:49 PM Rui Barradas <ruipbarradas at sapo.pt> wrote:> Hello, > > Maybe the following will get you close to what you want. > > > # remove the last row from every df > pon1 <- lapply(pon1, function(DF){ > DF[[1]] <- as.Date(DF[["Tanggal"]], "%d-%m-%Y") > DF[-nrow(DF), ] > }) > > > # order the list by year-month > inx_ym <- sapply(pon1, function(DF){ > format(DF[["Tanggal"]][1], "%Y-%m") > }) > pon1 <- pon1[order(inx_ym)] > > > # get the minimum and maximum of every "RR" > min.RR <- sapply(pon1, function(DF) min(DF[["RR"]], na.rm = TRUE)) > max.RR <- sapply(pon1, function(DF) max(DF[["RR"]], na.rm = TRUE)) > > > Hope this helps, > > Rui Barradas > > > > ?s 07:50 de 13/11/19, ani jaya escreveu: > > Dear R-Help, > > > > I have 30 of year-based excel files and each file contain month sheets. I > > have some problem here. My data is daily rainfall but there is extra 1 > day > > (first date of next month) for several sheets. My main goal is to get the > > minimum value for every month. > > > > First, how to extract those data to list of data frame based on year and > > delete every overlapping date? > > Second, how to sort it based on date with ascending order (old to new)? > > Third, how to get the maximum together with the date? > > > > I did this one, > > > > ... > > file.list <- list.files(pattern='*.xlsx') > > file.list<-mixedsort(file.list) > > > > # > > > https://stackoverflow.com/questions/12945687/read-all-worksheets-in-an-excel-workbook-into-an-r-list-with-data-frames > > > > read_excel_allsheets <- function(filename, tibble = FALSE) { > > sheets <- readxl::excel_sheets(filename) > > x <- lapply(sheets, function(X) read.xlsx(filename, sheet=X, > rows=9:40, > > cols=1:2)) > > if(!tibble) x <- lapply(x, as.data.frame) > > names(x) <- sheets > > x > > } > > > > pon<-lapply(file.list, function(i) read_excel_allsheets(i)) > > pon1<-do.call("rbind",pon) > > names(pon1) <- paste0("M.", 1:360) > > pon1 <-lapply(pon1,function(x){x$RR[x$RR==8888] <- NA; x}) > > pon1 <-lapply(pon1,function(x){x$RR[x$RR==""] <- NA; x}) > > maxi<-lapply(pon1, function(x) max(x$RR,na.rm=T)) > > maxi<-data.frame(Reduce(rbind, maxi)) > > names(maxi)<-"maxi" > > .... > > > > but the list start from January for every year, and move to February and > so > > on. And there is no date in "maxi". Here some sample what I get from my > > simple code. > > > >> pon1[256:258]$M.256 > > Tanggal RR > > 1 01-09-2001 5.2 > > 2 02-09-2001 0.3 > > 3 03-09-2001 29.0 > > 4 04-09-2001 0.7 > > 5 05-09-2001 9.6 > > 6 06-09-2001 0.7 > > 7 07-09-2001 NA > > 8 08-09-2001 13.2 > > 9 09-09-2001 NA > > 10 10-09-2001 NA > > 11 11-09-2001 0.0 > > 12 12-09-2001 66.0 > > 13 13-09-2001 0.0 > > 14 14-09-2001 57.6 > > 15 15-09-2001 18.0 > > 16 16-09-2001 29.2 > > 17 17-09-2001 52.2 > > 18 18-09-2001 7.0 > > 19 19-09-2001 NA > > 20 20-09-2001 74.5 > > 21 21-09-2001 20.3 > > 22 22-09-2001 49.6 > > 23 23-09-2001 0.0 > > 24 24-09-2001 1.3 > > 25 25-09-2001 0.0 > > 26 26-09-2001 1.0 > > 27 27-09-2001 0.1 > > 28 28-09-2001 1.9 > > 29 29-09-2001 9.5 > > 30 30-09-2001 3.3 > > 31 01-10-2001 0.0 > > > > $M.257 > > Tanggal RR > > 1 01-09-2002 0.0 > > 2 02-09-2002 0.0 > > 3 03-09-2002 0.0 > > 4 04-09-2002 12.8 > > 5 05-09-2002 1.0 > > 6 06-09-2002 0.0 > > 7 07-09-2002 NA > > 8 08-09-2002 22.2 > > 9 09-09-2002 NA > > 10 10-09-2002 NA > > 11 11-09-2002 0.0 > > 12 12-09-2002 0.0 > > 13 13-09-2002 0.0 > > 14 14-09-2002 NA > > 15 15-09-2002 0.0 > > 16 16-09-2002 0.0 > > 17 17-09-2002 0.0 > > 18 18-09-2002 13.3 > > 19 19-09-2002 0.0 > > 20 20-09-2002 0.0 > > 21 21-09-2002 0.0 > > 22 22-09-2002 0.0 > > 23 23-09-2002 0.0 > > 24 24-09-2002 0.0 > > 25 25-09-2002 0.0 > > 26 26-09-2002 0.5 > > 27 27-09-2002 2.1 > > 28 28-09-2002 NA > > 29 29-09-2002 18.5 > > 30 30-09-2002 0.0 > > 31 01-10-2002 NA > > > > $M.258 > > Tanggal RR > > 1 01-09-2003 0.0 > > 2 02-09-2003 0.0 > > 3 03-09-2003 0.0 > > 4 04-09-2003 4.0 > > 5 05-09-2003 0.3 > > 6 06-09-2003 0.0 > > 7 07-09-2003 NA > > 8 08-09-2003 0.0 > > 9 09-09-2003 0.0 > > 10 10-09-2003 0.0 > > 11 11-09-2003 NA > > 12 12-09-2003 1.0 > > 13 13-09-2003 0.0 > > 14 14-09-2003 60.0 > > 15 15-09-2003 4.5 > > 16 16-09-2003 0.1 > > 17 17-09-2003 2.1 > > 18 18-09-2003 NA > > 19 19-09-2003 0.0 > > 20 20-09-2003 NA > > 21 21-09-2003 NA > > 22 22-09-2003 31.5 > > 23 23-09-2003 42.0 > > 24 24-09-2003 43.3 > > 25 25-09-2003 2.8 > > 26 26-09-2003 21.4 > > 27 27-09-2003 0.8 > > 28 28-09-2003 42.3 > > 29 29-09-2003 5.3 > > 30 30-09-2003 17.3 > > 31 01-10-2003 0.0 > > > > > > Any lead or help is very appreciate. > > > > Best, > > > > Ani > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > >[[alternative HTML version deleted]]