Hi, You may try: ?list.files() nm1 <- list.files(pattern=".txt") res <- lapply(nm1,function(x) { ??? ??? ??? ??? ??? ??? ??? ??? ln1 <- readLines(x) ???????????????????????????????? indx1 <- grep("DATE PROCESSED",ln1) ???????????????????????????????? indx2 <- grep("[A-Z]",ln1) ???????????????????????????????? ln2 <- if(max(indx2)==indx1) ln1[1:length(ln1)] else ln1[1:(indx2[match(indx1,indx2)+1]-1)] ???????????????????????????????? ln2 <- ln2[ln2!=""] ???????????????????????????????? indx3 <- grepl("[A-Z]",ln2) ???????????????????????????????? indx4 <- cumsum(c(TRUE,diff(which(!indx3))>1)) ??? ??? ??? ??? ??? ??? ??? ??? mat1 <- do.call(cbind, split(ln2[!indx3],indx4)) ???????????????????????????????? colnames(mat1) <-? ln2[indx3][-1] ???????????????????????????????? write.table(mat1,paste0(ln2[indx3][1],".txt"),row.names=FALSE,quote=FALSE,sep="\t")}) A.K. I have a number of .txt files (1,200) from which I need to parse a number of pieces of information. ?The files are read into R as such: TITLE EXAMPLE example 1 example 2 RELATED TITLE related title 1 DATE PROCESSED 06/12/2011 Some of the files have examples 1-4, others 1-12 and beyond. ? How can I create a script that will grab the information from the different .txt files, put it in a matrix, and spit it out in a .csv file with appropriately named columns (the column titles are in CAPS above, where the information that will in the column is lower case). Thanks in advance.
Hi, You may try: ?list.files() nm1 <- list.files(pattern=".txt") res <- lapply(nm1,function(x) { ??? ??? ??? ??? ??? ??? ??? ??? ln1 <- readLines(x) ???????????????????????????????? indx1 <- grep("DATE PROCESSED",ln1) ???????????????????????????????? indx2 <- grep("[A-Z]",ln1) ???????????????????????????????? ln2 <- if(max(indx2)==indx1) ln1[1:length(ln1)] else ln1[1:(indx2[match(indx1,indx2)+1]-1)] ???????????????????????????????? ln2 <- ln2[ln2!=""] ???????????????????????????????? indx3 <- grepl("[A-Z]",ln2) ???????????????????????????????? indx4 <- cumsum(c(TRUE,diff(which(!indx3))>1)) ??? ??? ??? ??? ??? ??? ??? ??? mat1 <- do.call(cbind, split(ln2[!indx3],indx4)) ???????????????????????????????? colnames(mat1) <-? ln2[indx3][-1] ???????????????????????????????? write.table(mat1,paste0(ln2[indx3][1],".txt"),row.names=FALSE,quote=FALSE,sep="\t")}) A.K. I have a number of .txt files (1,200) from which I need to parse a number of pieces of information. ?The files are read into R as such: TITLE EXAMPLE example 1 example 2 RELATED TITLE related title 1 DATE PROCESSED 06/12/2011 Some of the files have examples 1-4, others 1-12 and beyond. ? How can I create a script that will grab the information from the different .txt files, put it in a matrix, and spit it out in a .csv file with appropriately named columns (the column titles are in CAPS above, where the information that will in the column is lower case). Thanks in advance.