One can perform a for loop without indices over the columns of a dataframe like this: for( v in df ) ... some statements involving v ... Is there some way to do this for rows other than using indices: for( i in 1:nrow(df) ) ... some statements involving df[i,] ... If the dataframe had only numeric entries I could transpose it and then do it over columns but what about the general case?
Based on an off list email conversation, I had I am concerned that my original email was not sufficiently clear. Recall that I wanted to use a for loop to iterate over the rows of a dataframe without using indices. Its easy to do this over the columns (for(v in df) ...) but not for rows. What I wanted to do is might be something like this. Define a function, rows, which takes a dataframe, df, as input and converts it to the structure: list(df[1,], df[2,], ..., df[n,]) where there are n rows: rows <- function( df ) { ll <- NULL for( i in 1:nrow(df) ) ll <- append( ll, list(df[i,]) ) ll } This allows us to iterate over the rows of df without indices like this: data( iris ) df <- iris[1:3,] # use 1st 3 rows of iris data set as df for( v in rows(df) ) print(v) Of course, this involves iterating over the rows of df twice -- once within rows() and once in the for loop. Perhaps this is the price one must pay for being able to eliminate index computations from a for loop or is it? Have I answered my own question or is there a better way to use a for loop over the rows of a dataframe without indices? --- Date: Thu, 18 Dec 2003 19:20:04 -0500 From: Gabor Grothendieck <ggrothendieck at myway.com> To: <R-help at stat.math.ethz.ch> Subject: for loop over dataframe without indices One can perform a for loop without indices over the columns of a dataframe like this: for( v in df ) ... some statements involving v ... Is there some way to do this for rows other than using indices: for( i in 1:nrow(df) ) ... some statements involving df[i,] ... If the dataframe had only numeric entries I could transpose it and then do it over columns but what about the general case?
Try: > data(iris); df<-as.data.frame(t(iris[1:3,])) > for(i in df) print(i) [1] 5.1 3.5 1.4 0.2 setosa Levels: 0.2 1.4 3.5 5.1 setosa [1] 4.9 3.0 1.4 0.2 setosa Levels: 0.2 1.4 3.0 4.9 setosa [1] 4.7 3.2 1.3 0.2 setosa Levels: 0.2 1.3 3.2 4.7 setosa ... however, not very nice Peter Wolf Gabor Grothendieck wrote:>Based on an off list email conversation, I had I am concerned that >my original email was not sufficiently clear. > >Recall that I wanted to use a for loop to iterate over the rows of >a dataframe without using indices. Its easy to do this over >the columns (for(v in df) ...) but not for rows. > >What I wanted to do is might be something like this. >Define a function, rows, which takes a dataframe, df, as input >and converts it to the structure: >list(df[1,], df[2,], ..., df[n,]) where there are n rows: > > rows <- function( df ) { > ll <- NULL > for( i in 1:nrow(df) ) > ll <- append( ll, list(df[i,]) ) > ll > } > >This allows us to iterate over the rows of df without indices like this: > > data( iris ) > df <- iris[1:3,] # use 1st 3 rows of iris data set as df > for( v in rows(df) ) print(v) > >Of course, this involves iterating over the rows of df twice -- >once within rows() and once in the for loop. Perhaps this is >the price one must pay for being able to eliminate index >computations from a for loop or is it? Have I answered my >own question or is there a better way to use a for loop >over the rows of a dataframe without indices? > >--- >Date: Thu, 18 Dec 2003 19:20:04 -0500 >From: Gabor Grothendieck <ggrothendieck at myway.com> >To: <R-help at stat.math.ethz.ch> >Subject: for loop over dataframe without indices > > > > >One can perform a for loop without indices over the columns >of a dataframe like this: > >for( v in df ) ... some statements involving v ... > >Is there some way to do this for rows other than using indices: > >for( i in 1:nrow(df) ) ... some statements involving df[i,] ... > >If the dataframe had only numeric entries I could transpose it >and then do it over columns but what about the general case? > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >
Regarding my problem of how to use a for loop over the rows of a dataframe without using indices, several people mentioned using transpose and then iterating over the columns (which were the rows) and one person suggested apply(df,1,list); however, both these solutions coerce the data to different types. What I now realize is that the thing that is oddly missing in R is that you can't do an apply over the rows of a dataframe (at least not without having it coerced to an array and the elements coerced to possibly different types). The documentation does point this out. Its not a bug but its an omission that seems deserving of being addressed. Thus I propose that apply be extended to handle data frames directly. Any comments on this before I send a message to r-devel? (In terms of my previous posting, with such an apply one could do: rows <- function(df) apply( df, 1, function(x)x ) for( v in rows(df) ) ... some statements involving v ... There is still the limitation, of course, that one can only _access_ rows of df like this. One still needs indices to change them. As an aside, should id <- function(x)x and rows, as defined above, be predefined in R? id certainly plays a special role in mathematics and it seems natural to want to iterate over rows and not just columns of dataframes.
Thomas, Thanks for your response. Its is quite nifty. Pursuing your solutions, I think the objective should be to reproduce the output from t.data.frame defined as below (note that I posted a proposal to change t.data.frame to r-devel before I received your reply): t.data.frame <- function( df ) { ll <- NULL for( i in 1:nrow(df) ) ll <- append( ll, list(df[i,]) ) ll } Using the first 3 rows from the iris data set as our data frame, run the following which shows that your "by" solution works provided we nullify out the attributes afterwards. The do.call solution does not appear to work, as required, since it turns the data frame into a matrix. data(iris) df <- iris[1:3,] # Consider: id <- function(x)x # t.data.frame solution zt <- t(df) # by solution is good but it adds some junk attributes zby <- by( df, row.names(df), id ) identical(zt,zby) # FALSE # nullifying these attributes seems to do it zby2 <- zby attributes(zby2) <- NULL identical(zt,zby2) # TRUE # do.call doesn't work right since it appears to turn the result into a matrix str( do.call("mapply", list(id,df) ) ) # note matrix output Here is the result of pasting the above into R 1.8.1 on Windows 2000:> data(iris) > df <- iris[1:3,] > > # Consider: > > id <- function(x)x > > # t.data.frame solution > zt <- t(df) > > # by solution is good but it adds some junk attributes > zby <- by( df, row.names(df), id ) > identical(zt,zby)[1] FALSE> > # nullifying these attributes seems to do it > zby2 <- zby > attributes(zby2) <- NULL > identical(zt,zby2)[1] TRUE> > # do.call doesn't work right since it appears to turn the result into a matrix > str( do.call("mapply", list(id,df) ) )num [1:3, 1:5] 5.1 4.9 4.7 3.5 3 3.2 1.4 1.4 1.3 0.2 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : NULL>Based on your solution I think the proposal should be changed to: t.data.frame <- function(df) { z <- by( df, row.names(df), function(x)x ) attributes(z) <- NULL z } --- Date: Fri, 19 Dec 2003 10:03:55 -0800 (PST) From: Thomas Lumley <tlumley at u.washington.edu> To: Gabor Grothendieck <ggrothendieck at myway.com> Cc: <R-help at stat.math.ethz.ch> Subject: Re: [R] for loop over dataframe without indices On Fri, 19 Dec 2003, Gabor Grothendieck wrote:> > What I now realize is that the thing that is oddly > missing in R is that you can't do an apply over > the rows of a dataframe (at least not without having > it coerced to an array and the elements coerced to > possibly different types). The documentation does > point this out. Its not a bug but its an omission > that seems deserving of being addressed. >Since mapply() applies a function to each 'row' of a list of vectors, ou can achieve this effect with do.call("mapply", list(FUN,data.frame)) and also as a degenerate case of by(): by(data.frame, row.names(data.frame), FUN) These should probably be documented under apply() -thomas
I think I've found a problem with the by approach. Compare: data(iris) by( iris, row.names(iris), function(x)x )[1:5,] to iris[1:5,] It seems by has reordered the rows. Date: Fri, 19 Dec 2003 21:31:50 -0500 (EST) From: Gabor Grothendieck <ggrothendieck at myway.com> To: <tlumley at u.washington.edu> Cc: <R-help at stat.math.ethz.ch> Subject: Re: [R] for loop over dataframe without indices Thomas, Thanks for your response. Its is quite nifty. Pursuing your solutions, I think the objective should be to reproduce the output from t.data.frame defined as below (note that I posted a proposal to change t.data.frame to r-devel before I received your reply): t.data.frame <- function( df ) { ll <- NULL for( i in 1:nrow(df) ) ll <- append( ll, list(df[i,]) ) ll } Using the first 3 rows from the iris data set as our data frame, run the following which shows that your "by" solution works provided we nullify out the attributes afterwards. The do.call solution does not appear to work, as required, since it turns the data frame into a matrix. data(iris) df <- iris[1:3,] # Consider: id <- function(x)x # t.data.frame solution zt <- t(df) # by solution is good but it adds some junk attributes zby <- by( df, row.names(df), id ) identical(zt,zby) # FALSE # nullifying these attributes seems to do it zby2 <- zby attributes(zby2) <- NULL identical(zt,zby2) # TRUE # do.call doesn't work right since it appears to turn the result into a matrix str( do.call("mapply", list(id,df) ) ) # note matrix output Here is the result of pasting the above into R 1.8.1 on Windows 2000:> data(iris) > df <- iris[1:3,] > > # Consider: > > id <- function(x)x > > # t.data.frame solution > zt <- t(df) > > # by solution is good but it adds some junk attributes > zby <- by( df, row.names(df), id ) > identical(zt,zby)[1] FALSE> > # nullifying these attributes seems to do it > zby2 <- zby > attributes(zby2) <- NULL > identical(zt,zby2)[1] TRUE> > # do.call doesn't work right since it appears to turn the result into a matrix > str( do.call("mapply", list(id,df) ) )num [1:3, 1:5] 5.1 4.9 4.7 3.5 3 3.2 1.4 1.4 1.3 0.2 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : NULL>Based on your solution I think the proposal should be changed to: t.data.frame <- function(df) { z <- by( df, row.names(df), function(x)x ) attributes(z) <- NULL z } --- Date: Fri, 19 Dec 2003 10:03:55 -0800 (PST) From: Thomas Lumley <tlumley at u.washington.edu> To: Gabor Grothendieck <ggrothendieck at myway.com> Cc: <R-help at stat.math.ethz.ch> Subject: Re: [R] for loop over dataframe without indices On Fri, 19 Dec 2003, Gabor Grothendieck wrote:> > What I now realize is that the thing that is oddly > missing in R is that you can't do an apply over > the rows of a dataframe (at least not without having > it coerced to an array and the elements coerced to > possibly different types). The documentation does > point this out. Its not a bug but its an omission > that seems deserving of being addressed. >Since mapply() applies a function to each 'row' of a list of vectors, ou can achieve this effect with do.call("mapply", list(FUN,data.frame)) and also as a degenerate case of by(): by(data.frame, row.names(data.frame), FUN) These should probably be documented under apply() -thomas ______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help