Hi, I have 100 files (4 attached for your reference) with different file names, different start and end dates. Years and months occupy 1st and 2nd columns while days occupy the rest of the 33 columns in each file. If date starts before 1961 and ends after 2005, extract all rows between 1961 to 2005 in all 100 files, else, if date starts after 1961 and does not go up till 2005,?retain?the values as they are, then generate a date vector "%Y-%m" from 1961 to 2005 and fill spaces without values using 'NA'. For example, in one file I have data from 1970 to 2000. I would like to generate dates from 1961 to 2005, fill 1961-1966, and 2001-2005 with 'NA'. Do same for all 100 files. After doing the extracting and replacing, all files will have a common date window (1961-2005). Now, delete year and month from each file (i.e. first two columns in each file) and convert each file to as.vector (column vector. i.e take column 4 and place under column 3 etc). My expected output would then be 100 files each having a column vector. Finally, I would like to use the original file names as the resulting column names for each file. Then combine all 100 files in a data.frame Using 4 files, final output should be 'equal rows * 4 columns', e.g 16354 rows * 4 columns, say. Thanks so much. ? Atem -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: dt3031093.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130604/b1496dc3/attachment.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: dt3031400.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130604/b1496dc3/attachment-0001.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: dt3032800.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130604/b1496dc3/attachment-0002.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: dt3033880.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130604/b1496dc3/attachment-0003.txt>
Hi, May be this helps: I duplicated your dataset (only one was attached) and changed the dates. lstf1<- list.files(pattern=".txt") lstf1 #[1] "dt3031093-1.txt" "dt3031093-2.txt" "dt3031093-3.txt" #3rd one has less number of observations. fun1<- function(lstf){ ??? ?lst1<-lapply(lstf,function(x) read.table(x,sep="",header=TRUE,stringsAsFactors=FALSE)) ??? ?lst2<- lapply(lst1,function(x) x[x$V1>=1961 & x$V1<=2005,]) ??? ?lst3<- lapply(lst2,function(x) { ??? ??? ??? ??? ???? if((min(x$V1)>1961)|(max(x$V1)<2005)){ ??? ??? ??? ??? ??? ??? ?n1<- (min(x$V1)-1961)*12 ??? ??? ??? ??? ??? ??? ?x1<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n1)) ??? ??? ??? ??? ??? ??? ?n2<- (2005-max(x$V1))*12 ??? ??? ??? ??? ??? ??? ?x2<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n2)) ??? ??? ??? ??? ??? ??? ?x3<- rbind(x1,x,x2) ??? ??? ??? ??? ??? ??? } ??? ??? ??? ????????????? else { ??? ??? ??? ??? ??? x ??? ??? ??? ??? ??? } }) ??? ?lst4<- lapply(lst3,function(x) data.frame(col1=unlist(x[,-c(1:2)]))) ??? ?lst5<- lapply(seq_along(lst4),function(i){ ??? ??? ??? ??? ??? ??? x<- lst4[[i]] ??? ??? ??? ??? ??? ??? colnames(x)<- lstf[i] ??? ??? ??? ??? ??? ??? row.names(x)<- 1:nrow(x) ??? ??? ??? ??? ??? ??? x ??? ??? ??? ??? ??? ??? }) ??????? lst5} res<-fun1(lstf1) ?lapply(res,head,3) #[[1]] #? dt3031093-1.txt #1??????????? 0.21 #2??????????? 0.00 #3??????????? 0.21 # #[[2]] #? dt3031093-2.txt #1??????????? 0.21 #2??????????? 0.00 #3??????????? 0.21 # #[[3]] #? dt3031093-3.txt #1????????????? NA #2????????????? NA #3????????????? NA sapply(res,nrow) #[1] 16740 16740 16740 A.K. ________________________________ From: Zilefac Elvis <zilefacelvis at yahoo.com> To: "r-help at r-project.org" <r-help at r-project.org> Cc: "smartpink111 at yahoo.com" <smartpink111 at yahoo.com> Sent: Tuesday, June 4, 2013 6:23 PM Subject: dates and time series management Hi, I have 100 files (4 attached for your reference) with different file names, different start and end dates. Years and months occupy 1st and 2nd columns while days occupy the rest of the 33 columns in each file. If date starts before 1961 and ends after 2005, extract all rows between 1961 to 2005 in all 100 files, else, if date starts after 1961 and does not go up till 2005,?retain?the values as they are, then generate a date vector "%Y-%m" from 1961 to 2005 and fill spaces without values using 'NA'. For example, in one file I have data from 1970 to 2000. I would like to generate dates from 1961 to 2005, fill 1961-1966, and 2001-2005 with 'NA'. Do same for all 100 files. After doing the extracting and replacing, all files will have a common date window (1961-2005). Now, delete year and month from each file (i.e. first two columns in each file) and convert each file to as.vector (column vector. i.e take column 4 and place under column 3 etc). My expected output would then be 100 files each having a column vector. Finally, I would like to use the original file names as the resulting column names for each file. Then combine all 100 files in a data.frame Using 4 files, final output should be 'equal rows * 4 columns', e.g 16354 rows * 4 columns, say. Thanks so much. Atem?
Hi, Try this: lstf1<- list.files(pattern=".txt") length(lstf1) #[1] 119 fun2<- function(lstf){ ?lst1<-lapply(lstf,function(x) readLines(x)) ?lst2<-lapply(lst1,function(x) {gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) ?lst3<-lapply(lst2,function(x) {x<-gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) ?lst4<- lapply(lst3,function(x) read.table(text=x,header=TRUE,stringsAsFactors=FALSE,sep="",fill=TRUE)) ?lst5<- lapply(lst4,function(x) x[x$V1>=1961 & x$V1<=2005,]) ?lst6<- lapply(lst5,function(x) x[!is.na(x$V1),]) ?lst7<- lapply(lst6,function(x) { ???????????????????? if((min(x$V1)>1961)|(max(x$V1)<2005)){ ???????????????????????? n1<- (min(x$V1)-1961)*12 ???????????????????????? x1<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n1)) ???????????????????????? n2<- (2005-max(x$V1))*12 ???????????????????????? x2<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n2)) ???????????????????????? x3<- rbind(x1,x,x2) ??????????????????????? } ????????????????????????? else { ??????????????????? x ??????????????????? } }) lst8<- lapply(lst7,function(x) data.frame(col1=unlist(x[,-c(1:2)]))) ???? lst9<- lapply(seq_along(lst8),function(i){ ??????????????????????? x<- lst8[[i]] ??????????????????????? colnames(x)<- lstf1[i] ??????????????????????? row.names(x)<- 1:nrow(x) ??????????????????????? x ??????????????????????? }) ?do.call(cbind,lst9)} res<-fun2(lstf1) dim(res) #[1] 16740?? 119 res[1:5,1:3] ?# dt3011120.txt dt3011240.txt dt3011887.txt #1????????? 1.67??????????? NA????????? 0.17 #2????????? 0.00??????????? NA????????? 0.28 #3????????? 0.00??????????? NA????????? 0.00 #4????????? 0.00??????????? NA????????? 0.30 #5????????? 0.00??????????? NA????????? 0.00 ######################################## There are some formatting issues in your files: For eg. If I run the function line by line: ?lst1<-lapply(lstf1,function(x) readLines(x)) sapply(lst1,function(x) any(grepl("\\d+-9999.99",x))) ? [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[37]? TRUE FALSE? TRUE FALSE FALSE? TRUE FALSE FALSE FALSE FALSE FALSE FALSE ?[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[73] FALSE FALSE FALSE FALSE FALSE? TRUE FALSE FALSE FALSE FALSE? TRUE FALSE ?[85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE? TRUE ?[97] FALSE FALSE FALSE FALSE FALSE FALSE? TRUE? TRUE FALSE FALSE FALSE FALSE [109] FALSE FALSE FALSE FALSE? TRUE FALSE FALSE FALSE FALSE FALSE? TRUE ###means some rows in the a few files have: #-9999.99 0 0 0 0.00-9999.99 0 0.00-9999.99 0 0 0 0.00-9999.99 (no space before -9999.99) ?lst2<-lapply(lst1,function(x) {gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) sapply(lst2,function(x) any(grepl("\\d+-9999.99",x))) #still a few files had the problem ? [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE? TRUE FALSE ?[85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE? TRUE ?[97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE lst3<-lapply(lst2,function(x) {x<-gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) any(sapply(lst3,function(x) any(grepl("\\d+-9999.99",x)))) #[1] FALSE lst4<- lapply(lst3,function(x) read.table(text=x,header=TRUE,stringsAsFactors=FALSE,sep="",fill=TRUE)) any(sapply(lst4,function(x) any(sapply(x,is.character)))) #[1] FALSE ?lst5<- lapply(lst4,function(x) x[x$V1>=1961 & x$V1<=2005,]) lst6<- lapply(lst5,function(x) x[!is.na(x$V1),]) sapply(lst6,nrow) ?# [1] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [19] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [37] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [55] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [73] 540 540 540 540 528 492 528 540 348 540 540 480 540 540 540 540 540 540 # [91] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 528 540 540 540 #[109] 540 540 540 540 540 540 540 540 540 468 540 ???? lst7<- lapply(lst6,function(x) { ???????????????????? if((min(x$V1)>1961)|(max(x$V1)<2005)){ ???????????????????????? n1<- (min(x$V1)-1961)*12 ???????????????????????? x1<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n1)) ???????????????????????? n2<- (2005-max(x$V1))*12 ???????????????????????? x2<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n2)) ???????????????????????? x3<- rbind(x1,x,x2) ??????????????????????? } ????????????????????????? else { ??????????????????? x ??????????????????? } }) ?sapply(lst7,nrow) #? [1] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [19] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [37] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [55] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [73] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [91] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 #[109] 540 540 540 540 540 540 540 540 540 540 540 Hope this helps. A.K. ________________________________ From: Zilefac Elvis <zilefacelvis at yahoo.com> To: arun <smartpink111 at yahoo.com> Sent: Wednesday, June 5, 2013 2:05 AM Subject: Re: dates and time series management Hi A.K, Sorry my internet connection was so bad last evening. I have attached all the files as .zip. Below is the output you requested. As I explained, the start date in 'res' should be 1961 and end date should be 2005 in all 119 files. Thanks A.K> lapply(lst1,head,3)[[1]] ? V1.V2.V3.V4.V5.V6.V7.V8.V9.V10.V11.V12.V13.V14.V15.V16.V17.V18.V19.V20.V21.V22.V23.V24.V25.V26.V27.V28.V29.V30.V31.V32.V33 1 ? ? ? ? ? ? ? ? ? ? ? ?1915 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2 ? ? ? ? ? ? ? ? ? ? ? ?1915 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 3 ? ? ? ? ? ? ? ? ? ? ? ?1915 3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [
Hi Atem, No problem. ?which(res==-9999.99) # [1]?? 18246? 397379? 420059? 426569? 427109? 603659? 604199? 662518? 664678 #[10]? 698982? 699522? 700062? 701142? 754745 1289823 1500490 1589487 1716011 #[19] 1837083 ?which(res==-9999.99,arr.ind=TRUE) #??????? row col #1506?? 1506?? 2 #12359 12359? 24 #1559?? 1559? 26 #8069?? 8069? 26 #---------------------- res[ which(res==-9999.99,arr.ind=TRUE)]<-NA #or res[res==-9999.99]<-NA ?which(res==-9999.99) #integer(0) A.K. ________________________________ From: Zilefac Elvis <zilefacelvis at yahoo.com> To: arun <smartpink111 at yahoo.com> Sent: Wednesday, June 5, 2013 10:56 AM Subject: Re: dates and time series management Hi A.K, It works as expected. You are too smart. Can you find all -9999.99 and replace with NA, if only it exists? lst2<-lapply(lst1,function(x) {gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) ?lst3<-lapply(lst2,function(x) {x<-gsub("(\\d+)(-9999.99)","\\1 \\2",x)})? Thanks so much A.K. ________________________________ From: arun <smartpink111 at yahoo.com> To: Zilefac Elvis <zilefacelvis at yahoo.com> Cc: R help <r-help at r-project.org> Sent: Wednesday, June 5, 2013 7:44 AM Subject: Re: dates and time series management Hi, Try this: lstf1<- list.files(pattern=".txt") length(lstf1) #[1] 119 fun2<- function(lstf){ ?lst1<-lapply(lstf,function(x) readLines(x)) ?lst2<-lapply(lst1,function(x) {gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) ?lst3<-lapply(lst2,function(x) {x<-gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) ?lst4<- lapply(lst3,function(x) read.table(text=x,header=TRUE,stringsAsFactors=FALSE,sep="",fill=TRUE)) ?lst5<- lapply(lst4,function(x) x[x$V1>=1961 & x$V1<=2005,]) ?lst6<- lapply(lst5,function(x) x[!is.na(x$V1),]) ?lst7<- lapply(lst6,function(x) { ???????????????????? if((min(x$V1)>1961)|(max(x$V1)<2005)){ ???????????????????????? n1<- (min(x$V1)-1961)*12 ???????????????????????? x1<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n1)) ???????????????????????? n2<- (2005-max(x$V1))*12 ???????????????????????? x2<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n2)) ???????????????????????? x3<- rbind(x1,x,x2) ??????????????????????? } ????????????????????????? else { ??????????????????? x ??????????????????? } }) lst8<- lapply(lst7,function(x) data.frame(col1=unlist(x[,-c(1:2)]))) ???? lst9<- lapply(seq_along(lst8),function(i){ ??????????????????????? x<- lst8[[i]] ??????????????????????? colnames(x)<- lstf1[i] ??????????????????????? row.names(x)<- 1:nrow(x) ??????????????????????? x ??????????????????????? }) ?do.call(cbind,lst9)} res<-fun2(lstf1) dim(res) #[1] 16740?? 119 res[1:5,1:3] ?# dt3011120.txt dt3011240.txt dt3011887.txt #1????????? 1.67??????????? NA????????? 0.17 #2????????? 0.00??????????? NA????????? 0.28 #3????????? 0.00??????????? NA????????? 0.00 #4????????? 0.00??????????? NA????????? 0.30 #5????????? 0.00??????????? NA????????? 0.00 ######################################## There are some formatting issues in your files: For eg. If I run the function line by line: ?lst1<-lapply(lstf1,function(x) readLines(x)) sapply(lst1,function(x) any(grepl("\\d+-9999.99",x))) ? [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[37]? TRUE FALSE? TRUE FALSE FALSE? TRUE FALSE FALSE FALSE FALSE FALSE FALSE ?[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[73] FALSE FALSE FALSE FALSE FALSE? TRUE FALSE FALSE FALSE FALSE? TRUE FALSE ?[85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE? TRUE ?[97] FALSE FALSE FALSE FALSE FALSE FALSE? TRUE? TRUE FALSE FALSE FALSE FALSE [109] FALSE FALSE FALSE FALSE? TRUE FALSE FALSE FALSE FALSE FALSE? TRUE ###means some rows in the a few files have: #-9999.99 0 0 0 0.00-9999.99 0 0.00-9999.99 0 0 0 0.00-9999.99 (no space before -9999.99) ?lst2<-lapply(lst1,function(x) {gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) sapply(lst2,function(x) any(grepl("\\d+-9999.99",x))) #still a few files had the problem ? [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE? TRUE FALSE ?[85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE? TRUE ?[97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE lst3<-lapply(lst2,function(x) {x<-gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) any(sapply(lst3,function(x) any(grepl("\\d+-9999.99",x)))) #[1] FALSE lst4<- lapply(lst3,function(x) read.table(text=x,header=TRUE,stringsAsFactors=FALSE,sep="",fill=TRUE)) any(sapply(lst4,function(x) any(sapply(x,is.character)))) #[1] FALSE ?lst5<- lapply(lst4,function(x) x[x$V1>=1961 & x$V1<=2005,]) lst6<- lapply(lst5,function(x) x[!is.na(x$V1),]) sapply(lst6,nrow) ?# [1] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [19] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [37] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [55] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [73] 540 540 540 540 528 492 528 540 348 540 540 480 540 540 540 540 540 540 # [91] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 528 540 540 540 #[109] 540 540 540 540 540 540 540 540 540 468 540 ???? lst7<- lapply(lst6,function(x) { ???????????????????? if((min(x$V1)>1961)|(max(x$V1)<2005)){ ???????????????????????? n1<- (min(x$V1)-1961)*12 ???????????????????????? x1<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n1)) ???????????????????????? n2<- (2005-max(x$V1))*12 ???????????????????????? x2<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n2)) ???????????????????????? x3<- rbind(x1,x,x2) ??????????????????????? } ????????????????????????? else { ??????????????????? x ??????????????????? } }) ?sapply(lst7,nrow) #? [1] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [19] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [37] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [55] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [73] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [91] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 #[109] 540 540 540 540 540 540 540 540 540 540 540 Hope this helps. A.K. ________________________________ From: Zilefac Elvis <zilefacelvis at yahoo.com> To: arun <smartpink111 at yahoo.com> Sent: Wednesday, June 5, 2013 2:05 AM Subject: Re: dates and time series management Hi A.K, Sorry my internet connection was so bad last evening. I have attached all the files as .zip. Below is the output you requested. As I explained, the start date in 'res' should be 1961 and end date should be 2005 in all 119 files. Thanks A.K> lapply(lst1,head,3)[[1]] ? V1.V2.V3.V4.V5.V6.V7.V8.V9.V10.V11.V12.V13.V14.V15.V16.V17.V18.V19.V20.V21.V22.V23.V24.V25.V26.V27.V28.V29.V30.V31.V32.V33 1 ? ? ? ? ? ? ? ? ? ? ? ?1915 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2 ? ? ? ? ? ? ? ? ? ? ? ?1915 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 3 ? ? ? ? ? ? ? ? ? ? ? ?1915 3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [
Hi, Try this: lstf1<- list.files(pattern=".txt") length(lstf1) #[1] 119 #I changed the function a little bit to unlist by rows to match the dates column I created. fun2<- function(lstf){ ?lst1<-lapply(lstf,function(x) readLines(x)) ?lst2<-lapply(lst1,function(x) {gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) ?lst3<-lapply(lst2,function(x) {x<-gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) ?lst4<- lapply(lst3,function(x) read.table(text=x,header=TRUE,stringsAsFactors=FALSE,sep="",fill=TRUE)) ?lst5<- lapply(lst4,function(x) x[x$V1>=1961 & x$V1<=2005,]) ?lst6<- lapply(lst5,function(x) x[!is.na(x$V1),]) ?lst7<- lapply(lst6,function(x) { ???????????????????? if((min(x$V1)>1961)|(max(x$V1)<2005)){ ???????????????????????? n1<- (min(x$V1)-1961)*12 ???????????????????????? x1<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n1)) ???????????????????????? n2<- (2005-max(x$V1))*12 ???????????????????????? x2<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n2)) ???????????????????????? x3<- rbind(x1,x,x2) ??????????????????????? } ????????????????????????? else { ??????????????????? x ??????????????????? } }) ??? lst8<-lapply(lst7,function(x) data.frame(col1=unlist(data.frame(t(x)[-c(1:2),]),use.names=FALSE))) ####changed ???? lst9<- lapply(seq_along(lst8),function(i){ ??????????????????????? x<- lst8[[i]] ??????????????????????? colnames(x)<- lstf1[i] ??????????????????????? row.names(x)<- 1:nrow(x) ??????????????????????? x ??????????????????????? }) ?do.call(cbind,lst9)} res<-fun2(lstf1) dim(res) #[1] 16740?? 119 ?res[res==-9999.99]<-NA which(res==-9999.99) #integer(0) dates1<-seq.Date(as.Date('1Jan1961',format="%d%b%Y"),as.Date('31Dec2005',format="%d%b%Y"),by="day") dates2<- as.character(dates1) sldat<- split(dates2,list(gsub("-.*","",dates2))) ?lst11<-lapply(sldat,function(x) lapply(split(x,gsub(".*-(.*)-.*","\\1",x)), function(y){x1<-as.numeric(gsub(".*-.*-(.*)","\\1",y));if((31-max(x1))>0) {x2<-seq(max(x1)+1,31,1);x3<-paste0(unique(gsub("(.*-.*-).*","\\1",y)),x2);c(y,x3)} else y} )) any(sapply(lst1,function(x) any(lapply(x,length)!=31))) #[1] FALSE lst22<-lapply(lst11,function(x) unlist(x,use.names=FALSE)) sapply(lst22,length) #1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 # 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372 #1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 # 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372 #1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 # 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372 ?dates3<-unlist(lst22,use.names=FALSE) ?length(dates3) #[1] 16740 res1<- data.frame(dates=dates3,res,stringsAsFactors=FALSE) str(res1) 'data.frame':??? 16740 obs. of? 120 variables: ?$ dates??????? : chr? "1961-01-01" "1961-01-02" "1961-01-03" "1961-01-04" ... ?$ dt3011120.txt: num? 1.67 0 0 0 0 0 4.17 0 0 0 ... ?$ dt3011240.txt: num? NA NA NA NA NA NA NA NA NA NA ... ?$ dt3011887.txt: num? 0.17 0.28 0 0.3 0 0 1.78 0 0.3 0 ... ?$ dt3012205.txt: num? 0.34 0.21 0 0.51 0 0 2.82 0 0.3 0 ... ----------------------------------------------------------- res1$dates<-as.Date(res1$dates) ?res2<-res1[!is.na(res1$dates),] res2[1:3,1:3] #?????? dates dt3011120.txt dt3011240.txt #1 1961-01-01????????? 1.67??????????? NA #2 1961-01-02????????? 0.00??????????? NA #3 1961-01-03????????? 0.00??????????? NA ?dim(res2) #[1] 16436?? 120 Now, you can try the reshape() and the zoo(). Hope it helps. A.K. ________________________________ From: Zilefac Elvis <zilefacelvis at yahoo.com> To: arun <smartpink111 at yahoo.com> Sent: Wednesday, June 5, 2013 5:17 PM Subject: Re: dates and time series management Hi A.K, I am gradually improving my R skills thanks to your support. I have this code for the attached data. ******************************************************************************************************** library(hydroTSM) # Reading the data with 21 daily simulations, from 1961-01-01 up to 2005-12-31 x <- read.csv("data2.csv") # Creating a single variable with all the dates dates <- as.Date(paste0(x$year, "-", x$month, "-", x$day), format="%Y-%m-%d") # Creating a data.frame with 3 columns: simulation number, Date, Rainfall values x.new <- data.frame(s.num=x[,1], Date=dates, Rainfall=x[,5]) # Creating a data.frame with 22 columns: Dates + Rainfall values for21 simulations x.wide <- reshape(x.new, idvar = "Date", timevar = "s.num", direction = "wide") # Creating a zoo variable z <- zoo(x.wide[,-1], order.by=x.wide[,1]) # 5-day total rainfall for each one of the simulations z.5tot <- rollapply(data=z, width=5, FUN=sum, fill=NA, partial= TRUE, ? ? ? ? ? ? ? ? ? ? align="center")# to get the total of 5-day precipitation # Maximum value per year of 5-day total rainfall for each one of the simulations z.5max.annual <- daily2annual(z.5max, dates=1, FUN=max) ********************************************************************************************************* Problem: I am trying to do a similar thing with 'res' from our previous problem (see below). However, instead of width=5, I need something like? Max.Daily<-rollapply(data=z, width=372, FUN=max, by.column = TRUE, partial= TRUE, align="center") # width=1961 to 2005=45years, 16740/45=372 To do this, I need a date column vector just as I did above. Can you show me how to generate daily dates with??format="%Y-%m-%d"?? Days range from 1 to 31 for all months since we filled for example February having 28/29 days with NA.? Months from 1 to 12 and years from 1961 to 2005. If column 1 of 'res' contains dates, then we can use parts of the code above to extract the Maximum value for each year and for each column. So, my final output will be 45 * 119.? Thanks so much A.K. I keep learning hard though slowly. ? ________________________________ From: arun : R help <r-help at r-project.org> Sent: Wednesday, June 5, 2013 9:44 AM Subject: Re: dates and time series management Hi Atem, No problem. ?which(res==-9999.99) # [1]?? 18246? 397379? 420059? 426569? 427109? 603659? 604199? 662518? 664678 #[10]? 698982? 699522? 700062? 701142? 754745 1289823 1500490 1589487 1716011 #[19] 1837083 ?which(res==-9999.99,arr.ind=TRUE) #??????? row col #1506?? 1506?? 2 #12359 12359? 24 #1559?? 1559? 26 #8069?? 8069? 26 #---------------------- res[ which(res==-9999.99,arr.ind=TRUE)]<-NA #or res[res==-9999.99]<-NA ?which(res==-9999.99) #integer(0) A.K. ________________________________ From: Zilefac Elvis <zilefacelv 5, 2013 10:56 AM Subject: Re: dates and time series management Hi A.K, It works as expected. You are too smart. Can you find all -9999.99 and replace with NA, if only it exists? lst2<-lapply(lst1,function(x) {gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) ?lst3<-lapply(lst2,function(x) {x<-gsub("(\\d+)(-9999.99)","\\1 \\2",x)})? Thanks so much A.K. ________________________________ From: arun <smartpin <r-help at r-project.org> Sent: Wednesday, June 5, 2013 7:44 AM Subject: Re: dates and time series management Hi, Try this: lstf1<- list.files(pattern=".txt") length(lstf1) #[1] 119 fun2<- function(lstf){ ?lst1<-lapply(lstf,function(x) readLines(x)) ?lst2<-lapply(lst1,function(x) {gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) ?lst3<-lapply(lst2,function(x) {x<-gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) ?lst4<- lapply(lst3,function(x) read.table(text=x,header=TRUE,stringsAsFactors=FALSE,sep="",fill=TRUE)) ?lst5<- lapply(lst4,function(x) x[x$V1>=1961 & x$V1<=2005,]) ?lst6<- lapply(lst5,function(x) x[!is.na(x$V1),]) ?lst7<- lapply(lst6,function(x) { if((min(x$V1)>1961)|(max(x$V1)<2005)){ ???????????????????????? n1<- (min(x$V1)-1961)*12 ???????????????????????? x1<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n1)) ???????????????????????? n2<- (2005-max(x$V1))*12 ???????????????????????? x2<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n2)) ???????????????????????? x3<- rbind(x1,x,x2) ??????????????????????? } ????????????????????????? else { ??????????????????? x ??????????????????? } }) lst8<- lapply(lst7,function(x) data.frame(col1=unlist(x[,-c(1:2)]))) ???? lst9<- lapply(seq_along(lst8),function(i){ ??????????????????????? x<- lst8[[i]] ??????????????????????? colnames(x)<- lstf1[i] ??????????????????????? row.names(x)<- 1:nrow(x) ??????????????????????? x ??????????????????????? }) ?do.call(cbind,lst9)} res<-fun2(lstf1) dim(res) #[1] 16740?? 119 res[1:5,1:3] ?# dt3011120.txt dt3011240.txt dt3011887.txt #1????????? 1.67??????????? NA????????? 0.17 #2????????? 0.00??????????? NA????????? 0.28 #3????????? 0.00??????????? NA????????? 0.00 #4????????? 0.00??????????? NA????????? 0.30 #5????????? 0.00??????????? NA????????? 0.00 ######################################## There are some formatting issues in your files: For eg. If I run the function line by line: ?lst1<-lapply(lstf1,function(x) readLines(x)) sapply(lst1,function(x) any(grepl("\\d+-9999.99",x))) ? [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[37]? TRUE FALSE? TRUE FALSE FALSE? TRUE FALSE FALSE FALSE FALSE FALSE FALSE ?[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[73] FALSE FALSE FALSE FALSE FALSE? TRUE FALSE FALSE FALSE FALSE? TRUE FALSE ?[85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE? TRUE ?[97] FALSE FALSE FALSE FALSE FALSE FALSE? TRUE? TRUE FALSE FALSE FALSE FALSE [109] FALSE FALSE FALSE FALSE? TRUE FALSE FALSE FALSE FALSE FALSE? TRUE ###means some rows in the a few files have: #-9999.99 0 0 0 0.00-9999.99 0 0.00-9999.99 0 0 0 0.00-9999.99 (no space before -9999.99) ?lst2<-lapply(lst1,function(x) {gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) sapply(lst2,function(x) any(grepl("\\d+-9999.99",x))) #still a few files had the problem ? [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE? TRUE FALSE ?[85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE? TRUE ?[97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE lst3<-lapply(lst2,function(x) {x<-gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) any(sapply(lst3,function(x) any(grepl("\\d+-9999.99",x)))) #[1] FALSE lst4<- lapply(lst3,function(x) read.table(text=x,header=TRUE,stringsAsFactors=FALSE,sep="",fill=TRUE)) any(sapply(lst4,function(x) any(sapply(x,is.character)))) #[1] FALSE ?lst5<- lapply(lst4,function(x) x[x$V1>=1961 & x$V1<=2005,]) lst6<- lapply(lst5,function(x) x[!is.na(x$V1),]) sapply(lst6,nrow) ?# [1] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [19] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [37] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [55] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [73] 540 540 540 540 528 492 528 540 348 540 540 480 540 540 540 540 540 540 # [91] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 528 540 540 540 #[109] 540 540 540 540 540 540 540 540 540 468 540 ???? lst7<- lapply(lst6,function(x) { ???????????????????? if((min(x$V1)>1961)|(max(x$V1)<2005)){ ???????????????????????? n1<- (min(x$V1)-1961)*12 ???????????????????????? x1<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n1)) ???????????????????????? n2<- (2005-max(x$V1))*12 ???????????????????????? x2<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n2)) ???????????????????????? x3<- rbind(x1,x,x2) } ????????????????????????? else { ??????????????????? x ??????????????????? } }) ?sapply(lst7,nrow) #? [1] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [19] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [37] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [55] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [73] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [91] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 #[109] 540 540 540 540 540 540 540 540 540 540 540 Hope this helps. A.K. ________________________________ Fr om> Sent: Wednesday, June 5, 2013 2:05 AM Subject: Re: dates and time series management Hi A.K, Sorry my internet connection was so bad last evening. I have attached all the files as .zip. Below is the output you requested. As I explained, the start date in 'res' should be 1961 and end date should be 2005 in all 119 files. Thanks A.K> lapply(lst1,head,3)[[1]] ? V1.V2.V3.V4.V5.V6.V7.V8.V9.V10.V11.V12.V13.V14.V15.V16.V17.V18.V19.V20.V21.V22.V23.V24.V25.V26.V27.V28.V29.V30.V31.V32.V33 1 ? ? ? ? ? ? ? ? ? ? ? ?1915 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2 ? ? ? ? ? ? ? ? ? ? ? ?1915 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 3 ? ? ? ? ? ? ? ? ? ? ? ?1915 3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [???????????????????????????????????????????
Hi, I think it is due to the missing values: I get warnings() ?z.5.annualMax<- daily2annual(z, FUN=max, na.rm=TRUE,dates=1) #There were 50 or more warnings (use warnings() to see the first 50) write.csv(z.5max.annual, file = "Stations.csv") Just to validate? the result: I tried this: res3<- lapply(seq_len(ncol(res1[,-1])),function(i) {x<-data.frame(res1[,1],res1[,i+1]); names(x)<- c(names(res1)[1],names(res1)[i+1]);x}) res4<-lapply(res3,function(x) {x1<-x[!is.na(x$dates),]; na.omit(x1)}) library(zoo) zl<- lapply(res4,function(x) zoo(x[,-1],order.by=x[,1])) zl.max.annual<- lapply(zl,function(x) daily2annual(x,FUN=max,na.rm=TRUE)) #no warnings() #na.rm=TRUE inside the daily2annual() didn't show any effect. ?sapply(zl.max.annual,length) #? [1] 45 42 44 45 37 45 44 44 41 45 25 45 45 45 45 45 45 45 45 45 45 41 40 41 45 ?#[26] 44 45 45 45 45 44 42 45 38 45 45 45 38 44 31 45 45 45 42 45 36 42 45 42 45 ?#[51] 45 45 44 45 40 41 45 45 45 45 34 45 34 45 45 41 41 45 45 45 45 45 45 45 45 ?#[76] 45 43 40 29 44 29 42 45 40 44 33 45 45 43 40 45 45 45 45 43 45 34 45 44 45 #[101] 45 44 30 44 44 42 45 45 43 42 44 45 45 45 45 42 45 39 39 library(xts) zl.max.annual1<-lapply(zl.max.annual, as.xts) ?zl.merge<- Reduce(function(...) merge(...),zl.max.annual1)) ?zl.merge[1:3,1:8] #???????????? ..1?? ..2? ..2.1 ..2.2 ..2.3 ..2.4? ..2.5 ..2.6 #1961-01-01 35.37??? NA? 13.43 40.88 17.69 38.44? 50.56 36.93 #1962-01-01 34.54 34.85 102.97 39.84 73.43 68.88? 63.88 22.89 #1963-01-01 18.32??? NA? 64.18 51.49 14.61 40.79 127.74 25.07 ?z.5.annualMax[1:3,1:8] ?# ???????? dt3011120.txt dt3011240.txt dt3011887.txt dt3012205.txt #1961-01-01???????? 35.37??????????? NA???????? 13.43???????? 40.88 #1962-01-01???????? 34.54???????? 34.85??????? 102.97???????? 39.84 #1963-01-01???????? 18.32??????????? NA???????? 64.18???????? 51.49 ?# ???????? dt3012280.txt dt3015405.txt dt3015523.txt dt3015960.txt #1961-01-01???????? 17.69???????? 38.44???????? 50.56???????? 36.93 #1962-01-01???????? 73.43???????? 68.88???????? 63.88???????? 22.89 #1963-01-01???????? 14.61???????? 40.79??????? 127.74???????? 25.07 Looks like this is the same result as above. A.K. ________________________________ From: Zilefac Elvis <zilefacelvis at yahoo.com> To: arun <smartpink111 at yahoo.com> Sent: Wednesday, June 5, 2013 11:04 PM Subject: Re: dates and time series management Hi A.K, Here is the final code: ******************************************************************* lstf1<- list.files(pattern=".txt") length(lstf1) #[1] 119 fun2<- function(lstf){ ? lst1<-lapply(lstf,function(x) readLines(x)) ? lst2<-lapply(lst1,function(x) {gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) ? lst3<-lapply(lst2,function(x) {x<-gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) ? lst4<- lapply(lst3,function(x) read.table(text=x,header=TRUE,stringsAsFactors=FALSE,sep="",fill=TRUE)) ? lst5<- lapply(lst4,function(x) x[x$V1>=1961 & x$V1<=2005,]) ? lst6<- lapply(lst5,function(x) x[!is.na(x$V1),]) ? lst7<- lapply(lst6,function(x) { ? ? if((min(x$V1)>1961)|(max(x$V1)<2005)){ ? ? ? n1<- (min(x$V1)-1961)*12 ? ? ? x1<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n1)) ? ? ? n2<- (2005-max(x$V1))*12 ? ? ? x2<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n2)) ? ? ? x3<- rbind(x1,x,x2) ? ? } ? ? else { ? ? ? x ? ? } }) ? lst8<-lapply(lst7,function(x) data.frame(col1=unlist(data.frame(t(x)[-c(1:2),]),use.names=FALSE))) ####changed ? lst9<- lapply(seq_along(lst8),function(i){ ? ? x<- lst8[[i]] ? ? colnames(x)<- lstf1[i] ? ? row.names(x)<- 1:nrow(x) ? ? x ? }) ? do.call(cbind,lst9)} res<-fun2(lstf1) dim(res) res[res==-9999.99]<-NA which(res==-9999.99) dates1<-seq.Date(as.Date('1Jan1961',format="%d%b%Y"),as.Date('31Dec2005',format="%d%b%Y"),by="day") dates2<- as.character(dates1) sldat<- split(dates2,list(gsub("-.*","",dates2))) lst11<-lapply(sldat,function(x) lapply(split(x,gsub(".*-(.*)-.*","\\1",x)), function(y){x1<-as.numeric(gsub(".*-.*-(.*)","\\1",y));if((31-max(x1))>0) {x2<-seq(max(x1)+1,31,1);x3<-paste0(unique(gsub("(.*-.*-).*","\\1",y)),x2);c(y,x3)} else y} )) any(sapply(lst1,function(x) any(lapply(x,length)!=31))) lst22<-lapply(lst11,function(x) unlist(x,use.names=FALSE)) sapply(lst22,length) dates3<-unlist(lst22,use.names=FALSE) length(dates3) res1<- data.frame(dates=dates3,res,stringsAsFactors=FALSE) str(res1) res1$dates<-as.Date(res1$dates) res2<-res1[!is.na(res1$dates),] res2[1:3,1:3] dim(res2) z <- zoo(res2[,-1], order.by=res2[,1]) library(hydroTSM) z.5max.annual <- daily2annual(z, dates=1, FUN=max) # dates=1 refers to year-month-day format write.table(z.5max.annual, file = "Stations.csv", sep = ",")# write results file to current directory At the second before the last line of the code, I transform from daily to annual values and keep only the maximum value in each year. i.e max value in 365 days. MINOR PROBLEM: My output does contains some 'NA'. Does it mean that for that station, for that year, all data was NA or missing?? Thanks so much. Atem. ________________________________ From: arun <smartpink111 at yahoo.com> To: Zilefac Elvis <zilefacelvis at yahoo.com> Cc: R help <r-help at r-project.org> Sent: Wednesday, June 5, 2013 5:16 PM Subject: Re: dates and time series management Hi, Try this: lstf1<- list.files(pattern=".txt") length(lstf1) #[1] 119 #I changed the function a little bit to unlist by rows to match the dates column I created. fun2<- function(lstf){ ?lst1<-lapply(lstf,function(x) readLines(x)) ?lst2<-lapply(lst1,function(x) {gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) ?lst3<-lapply(lst2,function(x) {x<-gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) ?lst4<- lapply(lst3,function(x) read.table(text=x,header=TRUE,stringsAsFactors=FALSE,sep="",fill=TRUE)) ?lst5<- lapply(lst4,function(x) x[x$V1>=1961 & x$V1<=2005,]) ?lst6<- lapply(lst5,function(x) x[!is.na(x$V1),]) ?lst7<- lapply(lst6,function(x) { ???????????????????? if((min(x$V1)>1961)|(max(x$V1)<2005)){ ???????????????????????? n1<- (min(x$V1)-1961)*12 ???????????????????????? x1<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n1)) ???????????????????????? n2<- (2005-max(x$V1))*12 ???????????????????????? x2<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n2)) ???????????????????????? x3<- rbind(x1,x,x2) ??????????????????????? } ????????????????????????? else { ??????????????????? x ??????????????????? } }) ??? lst8<-lapply(lst7,function(x) data.frame(col1=unlist(data.frame(t(x)[-c(1:2),]),use.names=FALSE))) ####changed ???? lst9<- lapply(seq_along(lst8),function(i){ ??????????????????????? x<- lst8[[i]] ??????????????????????? colnames(x)<- lstf1[i] ??????????????????????? row.names(x)<- 1:nrow(x) ??????????????????????? x ??????????????????????? }) ?do.call(cbind,lst9)} res<-fun2(lstf1) dim(res) #[1] 16740?? 119 ?res[res==-9999.99]<-NA which(res==-9999.99) #integer(0) dates1<-seq.Date(as.Date('1Jan1961',format="%d%b%Y"),as.Date('31Dec2005',format="%d%b%Y"),by="day") dates2<- as.character(dates1) sldat<- split(dates2,list(gsub("-.*","",dates2))) ?lst11<-lapply(sldat,function(x) lapply(split(x,gsub(".*-(.*)-.*","\\1",x)), function(y){x1<-as.numeric(gsub(".*-.*-(.*)","\\1",y));if((31-max(x1))>0) {x2<-seq(max(x1)+1,31,1);x3<-paste0(unique(gsub("(.*-.*-).*","\\1",y)),x2);c(y,x3)} else y} )) any(sapply(lst1,function(x) any(lapply(x,length)!=31))) #[1] FALSE lst22<-lapply(lst11,function(x) unlist(x,use.names=FALSE)) sapply(lst22,length) #1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 # 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372 #1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 # 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372 #1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 # 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372? 372 ?dates3<-unlist(lst22,use.names=FALSE) ?length(dates3) #[1] 16740 res1<- data.frame(dates=dates3,res,stringsAsFactors=FALSE) str(res1) 'data.frame':??? 16740 obs. of? 120 variables: ?$ dates??????? : chr? "1961-01-01" "1961-01-02" "1961-01-03" "1961-01-04" ... ?$ dt3011120.txt: num? 1.67 0 0 0 0 0 4.17 0 0 0 ... ?$ dt3011240.txt: num? NA NA NA NA NA NA NA NA NA NA ... ?$ dt3011887.txt: num? 0.17 0.28 0 0.3 0 0 1.78 0 0.3 0 ... ?$ dt3012205.txt: num? 0.34 0.21 0 0.51 0 0 2.82 0 0.3 0 ... ----------------------------------------------------------- res1$dates<-as.Date(res1$dates) ?res2<-res1[!is.na(res1$dates),] res2[1:3,1:3] #?????? dates dt3011120.txt dt3011240.txt #1 1961-01-01????????? 1.67??????????? NA #2 1961-01-02????????? 0.00??????????? NA #3 1961-01-03????????? 0.00??????????? NA ?dim(res2) #[1] 16436?? 120 Now, you can try the reshape() and the zoo(). Hope it helps. A.K. ________________________________ From: Zilefac Elvis <zilefacelvis at yahoo.com> To: arun <smartpink111 at yahoo.com> Sent: Wednesday, June 5, 2013 5:17 PM Subject: Re: dates and time series management Hi A.K, I am gradually improving my R skills thanks to your support. I have this code for the attached data. ******************************************************************************************************** library(hydroTSM) # Reading the data with 21 daily simulations, from 1961-01-01 up to 2005-12-31 x <- read.csv("data2.csv") # Creating a single variable with all the dates dates <- as.Date(paste0(x$year, "-", x$month, "-", x$day), format="%Y-%m-%d") # Creating a data.frame with 3 columns: simulation number, Date, Rainfall values x.new <- data.frame(s.num=x[,1], Date=dates, Rainfall=x[,5]) # Creating a data.frame with 22 columns: Dates + Rainfall values for21 simulations x.wide <- reshape(x.new, idvar = "Date", timevar = "s.num", direction = "wide") # Creating a zoo variable z <- zoo(x.wide[,-1], order.by=x.wide[,1]) # 5-day total rainfall for each one of the simulations z.5tot <- rollapply(data=z, width=5, FUN=sum, fill=NA, partial= TRUE, ? ? ? ? ? ? ? ? ? ? align="center")# to get the total of 5-day precipitation # Maximum value per year of 5-day total rainfall for each one of the simulations z.5max.annual <- daily2annual(z.5max, dates=1, FUN=max) ********************************************************************************************************* Problem: I am trying to do a similar thing with 'res' from our previous problem (see below). However, instead of width=5, I need something like? Max.Daily<-rollapply(data=z, width=372, FUN=max, by.column = TRUE, partial= TRUE, align="center") # width=1961 to 2005=45years, 16740/45=372 To do this, I need a date column vector just as I did above. Can you show me how to generate daily dates with??format="%Y-%m-%d"?? Days range from 1 to 31 for all months since we filled for example February having 28/29 days with NA.? Months from 1 to 12 and years from 1961 to 2005. If column 1 of 'res' contains dates, then we can use parts of the code above to extract the Maximum value for each year and for each column. So, my final output will be 45 * 119.? Thanks so much A.K. I keep learning hard though slowly. ? ________________________________ From: arun <smartpink111 at yahoo.com> To: Zilefac Elvis <zilefacelvis at yahoo.com> Cc: R help <r-help at r-project.org> Sent: Wednesday, June 5, 2013 9:44 AM Subject: Re: dates and time series management Hi Atem, No problem. ?which(res==-9999.99) # [1]?? 18246? 397379? 420059? 426569? 427109? 603659? 604199? 662518? 664678 #[10]? 698982? 699522? 700062? 701142? 754745 1289823 1500490 1589487 1716011 #[19] 1837083 ?which(res==-9999.99,arr.ind=TRUE) #??????? row col #1506?? 1506?? 2 #12359 12359? 24 #1559?? 1559? 26 #8069?? 8069? 26 #---------------------- res[ which(res==-9999.99,arr.ind=TRUE)]<-NA #or res[res==-9999.99]<-NA ?which(res==-9999.99) #integer(0) A.K. ________________________________ From: Zilefac Elvis <zilefacelvis at yahoo.com> To: arun <smartpink111 at yahoo.com> Sent: Wednesday, June 5, 2013 10:56 AM Subject: Re: dates and time series management Hi A.K, It works as expected. You are too smart. Can you find all -9999.99 and replace with NA, if only it exists? lst2<-lapply(lst1,function(x) {gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) ?lst3<-lapply(lst2,function(x) {x<-gsub("(\\d+)(-9999.99)","\\1 \\2",x)})? Thanks so much A.K. ________________________________ From: arun <smartpink111 at yahoo.com> To: Zilefac Elvis <zilefacelvis at yahoo.com> Cc: R help <r-help at r-project.org> Sent: Wednesday, June 5, 2013 7:44 AM Subject: Re: dates and time series management Hi, Try this: lstf1<- list.files(pattern=".txt") length(lstf1) #[1] 119 fun2<- function(lstf){ ?lst1<-lapply(lstf,function(x) readLines(x)) ?lst2<-lapply(lst1,function(x) {gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) ?lst3<-lapply(lst2,function(x) {x<-gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) ?lst4<- lapply(lst3,function(x) read.table(text=x,header=TRUE,stringsAsFactors=FALSE,sep="",fill=TRUE)) ?lst5<- lapply(lst4,function(x) x[x$V1>=1961 & x$V1<=2005,]) ?lst6<- lapply(lst5,function(x) x[!is.na(x$V1),]) ?lst7<- lapply(lst6,function(x) { if((min(x$V1)>1961)|(max(x$V1)<2005)){ ???????????????????????? n1<- (min(x$V1)-1961)*12 ???????????????????????? x1<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n1)) ???????????????????????? n2<- (2005-max(x$V1))*12 ???????????????????????? x2<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n2)) ???????????????????????? x3<- rbind(x1,x,x2) ??????????????????????? } ????????????????????????? else { ??????????????????? x ??????????????????? } }) lst8<- lapply(lst7,function(x) data.frame(col1=unlist(x[,-c(1:2)]))) ???? lst9<- lapply(seq_along(lst8),function(i){ ??????????????????????? x<- lst8[[i]] ??????????????????????? colnames(x)<- lstf1[i] ??????????????????????? row.names(x)<- 1:nrow(x) ??????????????????????? x }) ?do.call(cbind,lst9)} res<-fun2(lstf1) dim(res) #[1] 16740?? 119 res[1:5,1:3] ?# dt3011120.txt dt3011240.txt dt3011887.txt #1????????? 1.67??????????? NA????????? 0.17 #2????????? 0.00??????????? NA????????? 0.28 #3????????? 0.00??????????? NA????????? 0.00 #4????????? 0.00??????????? NA????????? 0.30 #5????????? 0.00??????????? NA????????? 0.00 ######################################## There are some formatting issues in your files: For eg. If I run the function line by line: ?lst1<-lapply(lstf1,function(x) readLines(x)) sapply(lst1,function(x) any(grepl("\\d+-9999.99",x))) ? [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[37]? TRUE FALSE? TRUE FALSE FALSE? TRUE FALSE FALSE FALSE FALSE FALSE FALSE ?[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[73] FALSE FALSE FALSE FALSE FALSE? TRUE FALSE FALSE FALSE FALSE? TRUE FALSE ?[85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE? TRUE ?[97] FALSE FALSE FALSE FALSE FALSE FALSE? TRUE? TRUE FALSE FALSE FALSE FALSE [109] FALSE FALSE FALSE FALSE? TRUE FALSE FALSE FALSE FALSE FALSE? TRUE ###means some rows in the a few files have: #-9999.99 0 0 0 0.00-9999.99 0 0.00-9999.99 0 0 0 0.00-9999.99 (no space before -9999.99) ?lst2<-lapply(lst1,function(x) {gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) sapply(lst2,function(x) any(grepl("\\d+-9999.99",x))) #still a few files had the problem ? [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ?[73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE? TRUE FALSE ?[85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE? TRUE ?[97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE lst3<-lapply(lst2,function(x) {x<-gsub("(\\d+)(-9999.99)","\\1 \\2",x)}) any(sapply(lst3,function(x) any(grepl("\\d+-9999.99",x)))) #[1] FALSE lst4<- lapply(lst3,function(x) read.table(text=x,header=TRUE,stringsAsFactors=FALSE,sep="",fill=TRUE)) any(sapply(lst4,function(x) any(sapply(x,is.character)))) #[1] FALSE ?lst5<- lapply(lst4,function(x) x[x$V1>=1961 & x$V1<=2005,]) lst6<- lapply(lst5,function(x) x[!is.na(x$V1),]) sapply(lst6,nrow) ?# [1] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [19] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [37] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [55] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [73] 540 540 540 540 528 492 528 540 348 540 540 480 540 540 540 540 540 540 # [91] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 528 540 540 540 #[109] 540 540 540 540 540 540 540 540 540 468 540 ???? lst7<- lapply(lst6,function(x) { if((min(x$V1)>1961)|(max(x$V1)<2005)){ ???????????????????????? n1<- (min(x$V1)-1961)*12 ???????????????????????? x1<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n1)) ???????????????????????? n2<- (2005-max(x$V1))*12 ???????????????????????? x2<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n2)) ???????????????????????? x3<- rbind(x1,x,x2) } ????????????????????????? else { ??????????????????? x ??????????????????? } }) ?sapply(lst7,nrow) #? [1] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [19] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [37] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [55] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [73] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 # [91] 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 540 #[109] 540 540 540 540 540 540 540 540 540 540 540 Hope this helps. A.K. ________________________________ From: Zilefac Elvis <zilefacelvis at yahoo.com> To: arun <smartpink111 at yahoo.com> Sent: Wednesday, June 5, 2013 2:05 AM Subject: Re: dates and time series management Hi A.K, Sorry my internet connection was so bad last evening. I have attached all the files as .zip. Below is the output you requested. As I explained, the start date in 'res' should be 1961 and end date should be 2005 in all 119 files. Thanks A.K> lapply(lst1,head,3)[[1]] ? V1.V2.V3.V4.V5.V6.V7.V8.V9.V10.V11.V12.V13.V14.V15.V16.V17.V18.V19.V20.V21.V22.V23.V24.V25.V26.V27.V28.V29.V30.V31.V32.V33 1 ? ? ? ? ? ? ?1915 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2 ? ? ? ? ? ? ? ? ? ? ? ?1915 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 3 ? ? ? ? ? ? ? ? ? ? ? ?1915 3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [?????????????????????????????????????????????????????????????????????????????????????? ? ? ? ? ?