thr3ads.net - R help - [R] proper way to process dataframe by rows [Nov 2004]

If this information is useful, please help other people find it:
Share via:

Jack Tanner

2004-Nov-29 02:25 UTC

[R] proper way to process dataframe by rows

This is a best practices / style question.

The way I use RODBC is I something like this:

 > foo <- sqlQuery(db, "select * from foo")
 > apply(foo, 1, function{...})

That is, I use apply to iterate over each result -- row -- in the 
RODBC-produced dataframe. Is this how one generally wants to do this?

My concern is that when apply iterates over the rows, it uses 
as.matrix() to convert the dataframe to a character representation of 
itself. Thus my database's carefully planned data types (that RODBC 
carefully preserved when returning query results) get completely lost as 
I process the data. I've taken to judiciously sprinkling as.numeric() 
and friends here and there, but this is just begging for bugs.

In other words, what is the smart way to process a dataframe by rows? Or 
is there, by chance, a specific technique or practice that is available 
for RODBC results but not for dataframes in general?

Thank you for your thoughts.

Gabor Grothendieck

2004-Nov-29 02:44 UTC

head link

[R] proper way to process dataframe by rows

Jack Tanner <ihok <at> hotmail.com> writes:

: 
: This is a best practices / style question.
: 
: The way I use RODBC is I something like this:
: 
:  > foo <- sqlQuery(db, "select * from foo")
:  > apply(foo, 1, function{...})
: 
: That is, I use apply to iterate over each result -- row -- in the 
: RODBC-produced dataframe. Is this how one generally wants to do this?
: 
: My concern is that when apply iterates over the rows, it uses 
: as.matrix() to convert the dataframe to a character representation of 
: itself. Thus my database's carefully planned data types (that RODBC 
: carefully preserved when returning query results) get completely lost as 
: I process the data. I've taken to judiciously sprinkling as.numeric() 
: and friends here and there, but this is just begging for bugs.
: 
: In other words, what is the smart way to process a dataframe by rows? Or 
: is there, by chance, a specific technique or practice that is available 
: for RODBC results but not for dataframes in general?
: 

Don't know about the best way but here is one way that does not 
convert to charadter:

R> data(iris)
R> irish <- head(iris)
R> irish
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
R> f <- function(i) with(irish[i,], Sepal.Length + Sepal.Width)
R> sapply(1:nrow(irish), f)
[1] 8.6 7.9 7.9 7.7 8.6 9.3

Duncan Murdoch

2004-Nov-29 03:38 UTC

head link

[R] proper way to process dataframe by rows

On Sun, 28 Nov 2004 21:25:24 -0500, Jack Tanner <ihok at hotmail.com>
wrote:
>This is a best practices / style question.
>
>The way I use RODBC is I something like this:
>
> > foo <- sqlQuery(db, "select * from foo")
> > apply(foo, 1, function{...})
>
>That is, I use apply to iterate over each result -- row -- in the 
>RODBC-produced dataframe. Is this how one generally wants to do this?
>
>My concern is that when apply iterates over the rows, it uses 
>as.matrix() to convert the dataframe to a character representation of 
>itself. Thus my database's carefully planned data types (that RODBC 
>carefully preserved when returning query results) get completely lost as 
>I process the data. I've taken to judiciously sprinkling as.numeric() 
>and friends here and there, but this is just begging for bugs.
>
>In other words, what is the smart way to process a dataframe by rows? Or 
>is there, by chance, a specific technique or practice that is available 
>for RODBC results but not for dataframes in general?
I would just use a for() loop if I didn't care about the speed too
much.  If I did, I'd avoid dealing with rows of dataframes:  access
using dataframe indexing is slow.  Depending what your function is,
you're probably better off extracting the columns of the dataframe as
vectors, and working with those.

Duncan Murdoch

Reasonably Related Threads

Search for more seemingly similar threads

R help - Nov 2004 - proper way to process dataframe by rows

[R] proper way to process dataframe by rows

[R] proper way to process dataframe by rows

[R] proper way to process dataframe by rows

Reasonably Related Threads