thr3ads.net - R help - [R] Index alternative to nasty FOR loop? [Aug 2008]

If this information is useful, please help other people find it:
Share via:

zack holden

2008-Aug-06 17:42 UTC

[R] Index alternative to nasty FOR loop?

Dear R wizards,
 
I have a folder containing 1000 files. For each file, I need to extract the
first row of each file, paste it to a new file, then write out that file. Then I
need to repeat this operation for each additional row (row 2, then row 3, etc)
for 23 rows in each file.
 
I can do this with a for loop (as below). 
 
Is there a way to use some of the indexing power of R to get around this nasty
loop?
 
Thank you in advance for any suggestions
 
###################
newoutfile <- data.frame()
list <- list.files("c:/data")
 
file = 1
for(file in list) {
   row <- file[1, ]
   newoutfile <- rbind(row, newoutfile)
   file = file + 1
write.csv(outfile, file = "output.csv")
}
####################
 
 
	[[alternative HTML version deleted]]

Charles C. Berry

2008-Aug-06 18:10 UTC

head link

[R] Index alternative to nasty FOR loop?

On Wed, 6 Aug 2008, zack holden wrote:
>
> Dear R wizards,
>
> I have a folder containing 1000 files. For each file, I need to extract the
first row of each file, paste it to a new file, then write out that file. Then I
need to repeat this operation for each additional row (row 2, then row 3, etc)
for 23 rows in each file.
>
> I can do this with a for loop (as below).

This is surprising!

Can you give us an example where this actually works???
>
> Is there a way to use some of the indexing power of R to get around this
nasty loop?
>
> Thank you in advance for any suggestions
>
> ###################
> newoutfile <- data.frame()
> list <- list.files("c:/data")
Bad practive to use 'list' as a variable name!

>
> file = 1
Above seems superfluous in view of the next line
> for(file in list) {
>   row <- file[1, ]
This doesn't really work, does it?

Where was dim(file) assigned?
>   newoutfile <- rbind(row, newoutfile)
Since 'newoutfile' was not intialized the above should have thrown an 
error.
>   file = file + 1
Ought to have thrown an error like

 	"non-numeric argument to binary operator"

when trying to execute the line above.
> write.csv(outfile, file = "output.csv")
> }
> ####################

You were asked to:

 	PLEASE do read the posting guide
 	http://www.R-project.org/posting-guide.html
 	and provide commented, minimal, self-contained, reproducible code.


Chuck

p.s. As you describe the problem, something like this should do

res <- sapply( list.files("c:/data",full=TRUE), readLines )

for ( i in seq(nrow(res) ) ){
 	write.csv( res[i,],
 		file=paste("row",i,"csv",sep='.'))
}

>
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

Dan Davison

2008-Aug-07 10:21 UTC

head link

[R] Index alternative to nasty FOR loop?

On Wed, Aug 06, 2008 at 05:42:21PM +0000, zack holden
wrote:> 
> Dear R wizards,
>  
> I have a folder containing 1000 files. For each file, I need to extract the
first row of each file, paste it to a new file, then write out that file. Then I
need to repeat this operation for each additional row (row 2, then row 3, etc)
for 23 rows in each file.
>  
> I can do this with a for loop (as below). 
Hi Zack,

There's a few problems with your sketched-out for loop (see below),
but if I've understood your problem, then here are a couple of
solutions that use for loops in the way you were intending. They both
take line i from file 1, line i from file 2, ..., and write them to a
file called lines_i, for i in 1:23. The first one is for the case when
you have tabular data, so it uses read.table, and write.table. You
might want to mess about with the arguments to read.table and
write.table, specifying whether you have a header, and whether you
want the row.names printed out, etc. The second one is similar but
just works line by line, regardless of what the line looks like
(i.e. doesn't assume you have tabular data in the files).

collate.lines.1 <- function(folder, nrows=23) {
    files <- list.files(folder, full.names=TRUE)
    for(file in files) {
        file.as.data.frame <- read.table(file)
        for(row in 1:nrows) {
            outfile <- paste("lines_", row, ".csv",
sep="")
            write.table(file.as.data.frame[row,], file=outfile, append=TRUE,
row.names=FALSE, col.names=FALSE, sep=",")
        }
    }
}

collate.lines.2 <- function(folder, nrows=23) {
    files <- list.files(folder, full.names=TRUE)
    for(file in files) {
        file.as.character.vector <- scan(file, what="",
sep="\n")
        for(row in 1:nrows) {
            outfile <- paste("lines", row, sep="_")
            cat(file.as.character.vector[row], "\n", file=outfile,
append=TRUE)
        }
    }
}
>  
> Is there a way to use some of the indexing power of R to get around this
nasty loop?
If you really mean that you want a solution without explicit for loops
in R, then that is possible. But I would recommend that you stick to
a straightforward solution until you're completely comfortable with
programming in that style. It's conceivable that the no-for-loop
versions might be faster if you have lots of files / rows, but don't
worry aout speed until it's a problem. Here's my effort at doing it
without for loops; it's a bit of a stretch and wasn't as easy to write
down as the first two. I've probably missed a cleaner solution.

collate.lines.1.fancy <- function(folder, nrows=23) {
    outfiles <- paste("lines_", 1:nrows, ".csv",
sep="")
    files <- list.files(folder, full.names=TRUE)
    files.as.data.frames <- lapply(files, read.table)
    x <- lapply(files.as.data.frames, function(df) split(df,
f=factor(1:nrow(df)))) ## split all rows apart
    x <- do.call(mapply, c(x, list(FUN=function(...) rbind(...),
SIMPLIFY=FALSE))) ## collate rows from different data frames
    write.function <- function(dataframe, outfile) write.table(dataframe,
file=outfile, row.names=FALSE, col.names=FALSE, sep=",")
    invisible(mapply(write.function, x, outfiles))
}
>  
> Thank you in advance for any suggestions
>  
> ###################
> newoutfile <- data.frame()
> list <- list.files("c:/data") ## 'list' not such a
good name as it's a built-in function
>  
> file = 1 ## you don't need this
> for(file in list) {
>    row <- file[1, ] ## that's not going to work; 'list' is a
character vector, you haven't got the files as data.frames yet
>    newoutfile <- rbind(row, newoutfile)
>    file = file + 1
> write.csv(outfile, file = "output.csv")
> }
> ####################
>  
>  
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Apparently Analagous Threads

Search for more possibly parallel threads

R help - Aug 2008 - Index alternative to nasty FOR loop?

[R] Index alternative to nasty FOR loop?

[R] Index alternative to nasty FOR loop?

[R] Index alternative to nasty FOR loop?

Apparently Analagous Threads