thr3ads.net - R help - [R] Help parsing from .txt [Oct 2013]

If this information is useful, please help other people find it:
Share via:

arun

2013-Oct-23 04:50 UTC

[R] Help parsing from .txt

Hi,
You may try:
?list.files()
nm1 <- list.files(pattern=".txt")

res <- lapply(nm1,function(x) {
??? ??? ??? ??? ??? ??? ??? ??? ln1 <- readLines(x)
???????????????????????????????? indx1 <- grep("DATE
PROCESSED",ln1)
???????????????????????????????? indx2 <- grep("[A-Z]",ln1)
???????????????????????????????? ln2 <- if(max(indx2)==indx1)
ln1[1:length(ln1)] else ln1[1:(indx2[match(indx1,indx2)+1]-1)]
???????????????????????????????? ln2 <- ln2[ln2!=""]
???????????????????????????????? indx3 <- grepl("[A-Z]",ln2)
???????????????????????????????? indx4 <-
cumsum(c(TRUE,diff(which(!indx3))>1))
??? ??? ??? ??? ??? ??? ??? ??? mat1 <- do.call(cbind,
split(ln2[!indx3],indx4))
???????????????????????????????? colnames(mat1) <-? ln2[indx3][-1]
????????????????????????????????
write.table(mat1,paste0(ln2[indx3][1],".txt"),row.names=FALSE,quote=FALSE,sep="\t")})



A.K.


I have a number of .txt files (1,200) from which I need to parse a 
number of pieces of information. ?The files are read into R as such: 

TITLE 
EXAMPLE 
example 1 
example 2 
RELATED TITLE 
related title 1 
DATE PROCESSED 
06/12/2011 

Some of the files have examples 1-4, others 1-12 and beyond. ? 

How can I create a script that will grab the information from 
the different .txt files, put it in a matrix, and spit it out in a .csv 
file with appropriately named columns (the column titles are in CAPS 
above, where the information that will in the column is lower case). 

Thanks in advance.

arun

2013-Oct-23 12:36 UTC

head link

[R] Help parsing from .txt

Hi,
You may try:
?list.files()
nm1 <- list.files(pattern=".txt")

res <- lapply(nm1,function(x) {
??? ??? ??? ??? ??? ??? ??? ??? ln1 <- readLines(x)
???????????????????????????????? indx1 <- grep("DATE
PROCESSED",ln1)
???????????????????????????????? indx2 <- grep("[A-Z]",ln1)
???????????????????????????????? ln2 <- if(max(indx2)==indx1)
ln1[1:length(ln1)] else ln1[1:(indx2[match(indx1,indx2)+1]-1)]
???????????????????????????????? ln2 <- ln2[ln2!=""]
???????????????????????????????? indx3 <- grepl("[A-Z]",ln2)
???????????????????????????????? indx4 <-
cumsum(c(TRUE,diff(which(!indx3))>1))
??? ??? ??? ??? ??? ??? ??? ??? mat1 <- do.call(cbind,
split(ln2[!indx3],indx4))
???????????????????????????????? colnames(mat1) <-? ln2[indx3][-1]
????????????????????????????????
write.table(mat1,paste0(ln2[indx3][1],".txt"),row.names=FALSE,quote=FALSE,sep="\t")})



A.K.


I have a number of .txt files (1,200) from which I need to parse a 
number of pieces of information. ?The files are read into R as such: 

TITLE 
EXAMPLE 
example 1 
example 2 
RELATED TITLE 
related title 1 
DATE PROCESSED 
06/12/2011 

Some of the files have examples 1-4, others 1-12 and beyond. ? 

How can I create a script that will grab the information from 
the different .txt files, put it in a matrix, and spit it out in a .csv 
file with appropriately named columns (the column titles are in CAPS 
above, where the information that will in the column is lower case). 

Thanks in advance.

R help - Oct 2013 - Help parsing from .txt

[R] Help parsing from .txt

[R] Help parsing from .txt