thr3ads.net - R help - [R] for loop over dataframe without indices [Dec 2003]

If this information is useful, please help other people find it:
Share via:

Gabor Grothendieck

2003-Dec-19 00:20 UTC

[R] for loop over dataframe without indices

One can perform a for loop without indices over the columns
of a dataframe like this:

   for( v in df ) ... some statements involving v ...

Is there some way to do this for rows other than using indices:

   for( i in 1:nrow(df) ) ... some statements involving df[i,] ...

If the dataframe had only numeric entries I could transpose it
and then do it over columns but what about the general case?

Gabor Grothendieck

2003-Dec-19 04:48 UTC

head link

[R] for loop over dataframe without indices

Based on an off list email conversation, I had I am concerned that
my original email was not sufficiently clear.

Recall that I wanted to use a for loop to iterate over the rows of 
a dataframe without using indices.   Its easy to do this over
the columns (for(v in df) ...) but not for rows.

What I wanted to do is might be something like this. 
Define a function, rows, which takes a dataframe, df, as input 
and converts it to the structure: 
list(df[1,], df[2,], ..., df[n,]) where there are n rows:

     rows <- function( df ) { 
          ll <- NULL
          for( i in 1:nrow(df) ) 
               ll <- append( ll, list(df[i,]) )
          ll 
     }

This allows us to iterate over the rows of df without indices like this:

     data( iris )
     df <- iris[1:3,] # use 1st 3 rows of iris data set as df
     for( v in rows(df) ) print(v)

Of course, this involves iterating over the rows of df twice --
once within rows() and once in the for loop. Perhaps this is
the price one must pay for being able to eliminate index 
computations from a for loop or is it? Have I answered my 
own question or is there a better way to use a for loop 
over the rows of a dataframe without indices?

--- 
Date: Thu, 18 Dec 2003 19:20:04 -0500 
From: Gabor Grothendieck <ggrothendieck at myway.com>
To: <R-help at stat.math.ethz.ch> 
Subject: for loop over dataframe without indices 




One can perform a for loop without indices over the columns
of a dataframe like this:

for( v in df ) ... some statements involving v ...

Is there some way to do this for rows other than using indices:

for( i in 1:nrow(df) ) ... some statements involving df[i,] ...

If the dataframe had only numeric entries I could transpose it
and then do it over columns but what about the general case?

Peter Wolf

2003-Dec-19 08:22 UTC

head link

[R] for loop over dataframe without indices

Try:

 > data(iris); df<-as.data.frame(t(iris[1:3,]))
 > for(i in df) print(i)
[1] 5.1    3.5    1.4    0.2    setosa
Levels: 0.2 1.4 3.5 5.1 setosa
[1] 4.9    3.0    1.4    0.2    setosa
Levels: 0.2 1.4 3.0 4.9 setosa
[1] 4.7    3.2    1.3    0.2    setosa
Levels: 0.2 1.3 3.2 4.7 setosa

... however, not very nice

Peter Wolf

Gabor Grothendieck wrote:
>Based on an off list email conversation, I had I am concerned that
>my original email was not sufficiently clear.
>
>Recall that I wanted to use a for loop to iterate over the rows of 
>a dataframe without using indices.   Its easy to do this over
>the columns (for(v in df) ...) but not for rows.
>
>What I wanted to do is might be something like this. 
>Define a function, rows, which takes a dataframe, df, as input 
>and converts it to the structure: 
>list(df[1,], df[2,], ..., df[n,]) where there are n rows:
>
>     rows <- function( df ) { 
>          ll <- NULL
>          for( i in 1:nrow(df) ) 
>               ll <- append( ll, list(df[i,]) )
>          ll 
>     }
>
>This allows us to iterate over the rows of df without indices like this:
>
>     data( iris )
>     df <- iris[1:3,] # use 1st 3 rows of iris data set as df
>     for( v in rows(df) ) print(v)
>
>Of course, this involves iterating over the rows of df twice --
>once within rows() and once in the for loop. Perhaps this is
>the price one must pay for being able to eliminate index 
>computations from a for loop or is it? Have I answered my 
>own question or is there a better way to use a for loop 
>over the rows of a dataframe without indices?
>
>--- 
>Date: Thu, 18 Dec 2003 19:20:04 -0500 
>From: Gabor Grothendieck <ggrothendieck at myway.com>
>To: <R-help at stat.math.ethz.ch> 
>Subject: for loop over dataframe without indices 
>
>
>
>
>One can perform a for loop without indices over the columns
>of a dataframe like this:
>
>for( v in df ) ... some statements involving v ...
>
>Is there some way to do this for rows other than using indices:
>
>for( i in 1:nrow(df) ) ... some statements involving df[i,] ...
>
>If the dataframe had only numeric entries I could transpose it
>and then do it over columns but what about the general case?
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>  
>

Gabor Grothendieck

2003-Dec-19 16:48 UTC

head link

[R] for loop over dataframe without indices

Regarding my problem of how to use a for loop over
the rows of a dataframe without using indices,
several people mentioned using transpose and then
iterating over the columns (which were the rows)
and one person suggested apply(df,1,list);
however, both these solutions coerce the data to
different types.

What I now realize is that the thing that is oddly
missing in R is that you can't do an apply over
the rows of a dataframe (at least not without having
it coerced to an array and the elements coerced to
possibly different types).  The documentation does
point this out.  Its not a bug but its an omission
that seems deserving of being addressed.

Thus I propose that apply be extended to handle
data frames directly.   Any comments on this 
before I send a message to r-devel?


(In terms of my previous posting, with such an apply
one could do:

rows <- function(df) apply( df, 1, function(x)x )
for( v in rows(df) ) ... some statements involving v ...

There is still the limitation, of course, that one can
only _access_ rows of df like this.  One still needs
indices to change them.  

As an aside, should id <- function(x)x and rows, as defined
above, be predefined in R?  id certainly plays a special 
role in mathematics and it seems natural to want to iterate
over rows and not just columns of dataframes.

Gabor Grothendieck

2003-Dec-20 02:31 UTC

head link

[R] for loop over dataframe without indices

Thomas, Thanks for your response.  Its is quite nifty.  

Pursuing your solutions,
I think the objective should be to reproduce the output from 
t.data.frame defined as below (note that I posted a proposal
to change t.data.frame to r-devel before I received your reply):

t.data.frame <- function( df ) { 
          ll <- NULL
          for( i in 1:nrow(df) ) ll <- append( ll, list(df[i,]) )
          ll 
}

Using the first 3 rows from the iris data set as our data frame,
run the following which shows that your "by" solution works provided
we nullify out the attributes afterwards.  The do.call solution
does not appear to work, as required, since it turns the data 
frame into a matrix.

data(iris)
df <- iris[1:3,]

# Consider:

id <- function(x)x

# t.data.frame solution
zt <- t(df)

# by solution is good but it adds some junk attributes 
zby <- by( df, row.names(df), id )
identical(zt,zby) # FALSE

# nullifying these attributes seems to do it
zby2 <- zby
attributes(zby2) <- NULL
identical(zt,zby2) # TRUE

# do.call doesn't work right since it appears to turn the result into a
matrix
str( do.call("mapply", list(id,df) ) ) # note matrix output


Here is the result of pasting the above into R 1.8.1 on Windows 2000:
> data(iris)
> df <- iris[1:3,]
> 
> # Consider:
> 
> id <- function(x)x
> 
> # t.data.frame solution
> zt <- t(df)
> 
> # by solution is good but it adds some junk attributes 
> zby <- by( df, row.names(df), id )
> identical(zt,zby)
[1] FALSE> 
> # nullifying these attributes seems to do it
> zby2 <- zby
> attributes(zby2) <- NULL
> identical(zt,zby2)
[1] TRUE> 
> # do.call doesn't work right since it appears to turn the result into a
matrix
> str( do.call("mapply", list(id,df) ) ) num [1:3, 1:5] 5.1 4.9 4.7 3.5 3 3.2 1.4 1.4 1.3 0.2 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : NULL> 

Based on your solution I think the proposal should be changed
to:

t.data.frame <- function(df) {
  z <- by( df, row.names(df), function(x)x )
  attributes(z) <- NULL
  z
}


---

Date: Fri, 19 Dec 2003 10:03:55 -0800 (PST) 
From: Thomas Lumley <tlumley at u.washington.edu>
To: Gabor Grothendieck <ggrothendieck at myway.com> 
Cc: <R-help at stat.math.ethz.ch> 
Subject: Re: [R] for loop over dataframe without indices 

 
 
On Fri, 19 Dec 2003, Gabor Grothendieck wrote:>
> What I now realize is that the thing that is oddly
> missing in R is that you can't do an apply over
> the rows of a dataframe (at least not without having
> it coerced to an array and the elements coerced to
> possibly different types). The documentation does
> point this out. Its not a bug but its an omission
> that seems deserving of being addressed.
>
Since mapply() applies a function to each 'row' of a list of vectors, ou
can achieve this effect with
     do.call("mapply", list(FUN,data.frame))
and also as a degenerate case of by():
     by(data.frame, row.names(data.frame), FUN)

These should probably be documented under apply()


     -thomas

Gabor Grothendieck

2003-Dec-20 16:02 UTC

head link

[R] for loop over dataframe without indices

I think I've found a problem with the by approach.  Compare:

data(iris)
by( iris, row.names(iris), function(x)x )[1:5,]

to

iris[1:5,]

It seems by has reordered the rows.

 
Date: Fri, 19 Dec 2003 21:31:50 -0500 (EST) 
From: Gabor Grothendieck <ggrothendieck at myway.com>
To: <tlumley at u.washington.edu> 
Cc: <R-help at stat.math.ethz.ch> 
Subject: Re: [R] for loop over dataframe without indices 

 
 


Thomas, Thanks for your response. Its is quite nifty. 

Pursuing your solutions,
I think the objective should be to reproduce the output from 
t.data.frame defined as below (note that I posted a proposal
to change t.data.frame to r-devel before I received your reply):

t.data.frame <- function( df ) { 
ll <- NULL
for( i in 1:nrow(df) ) ll <- append( ll, list(df[i,]) )
ll 
}

Using the first 3 rows from the iris data set as our data frame,
run the following which shows that your "by" solution works provided
we nullify out the attributes afterwards. The do.call solution
does not appear to work, as required, since it turns the data 
frame into a matrix.

data(iris)
df <- iris[1:3,]

# Consider:

id <- function(x)x

# t.data.frame solution
zt <- t(df)

# by solution is good but it adds some junk attributes 
zby <- by( df, row.names(df), id )
identical(zt,zby) # FALSE

# nullifying these attributes seems to do it
zby2 <- zby
attributes(zby2) <- NULL
identical(zt,zby2) # TRUE

# do.call doesn't work right since it appears to turn the result into a
matrix
str( do.call("mapply", list(id,df) ) ) # note matrix output


Here is the result of pasting the above into R 1.8.1 on Windows 2000:
> data(iris)
> df <- iris[1:3,]
> 
> # Consider:
> 
> id <- function(x)x
> 
> # t.data.frame solution
> zt <- t(df)
> 
> # by solution is good but it adds some junk attributes 
> zby <- by( df, row.names(df), id )
> identical(zt,zby)
[1] FALSE> 
> # nullifying these attributes seems to do it
> zby2 <- zby
> attributes(zby2) <- NULL
> identical(zt,zby2)
[1] TRUE> 
> # do.call doesn't work right since it appears to turn the result into a
matrix
> str( do.call("mapply", list(id,df) ) )num [1:3, 1:5] 5.1 4.9 4.7 3.5 3 3.2 1.4 1.4 1.3 0.2 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : NULL> 

Based on your solution I think the proposal should be changed
to:

t.data.frame <- function(df) {
z <- by( df, row.names(df), function(x)x )
attributes(z) <- NULL
z
}


---

Date: Fri, 19 Dec 2003 10:03:55 -0800 (PST) 
From: Thomas Lumley <tlumley at u.washington.edu>
To: Gabor Grothendieck <ggrothendieck at myway.com> 
Cc: <R-help at stat.math.ethz.ch> 
Subject: Re: [R] for loop over dataframe without indices 



On Fri, 19 Dec 2003, Gabor Grothendieck wrote:>
> What I now realize is that the thing that is oddly
> missing in R is that you can't do an apply over
> the rows of a dataframe (at least not without having
> it coerced to an array and the elements coerced to
> possibly different types). The documentation does
> point this out. Its not a bug but its an omission
> that seems deserving of being addressed.
>
Since mapply() applies a function to each 'row' of a list of vectors, ou
can achieve this effect with
do.call("mapply", list(FUN,data.frame))
and also as a degenerate case of by():
by(data.frame, row.names(data.frame), FUN)

These should probably be documented under apply()


-thomas

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Maybe Matching Threads

Search for more reasonably related threads

R help - Dec 2003 - for loop over dataframe without indices

[R] for loop over dataframe without indices

[R] for loop over dataframe without indices

[R] for loop over dataframe without indices

[R] for loop over dataframe without indices

[R] for loop over dataframe without indices

[R] for loop over dataframe without indices

Maybe Matching Threads