thr3ads.net - R help - [R] How do I combine lists of data.frames into a single data frame? [Jul 2010]

If this information is useful, please help other people find it:
Share via:

Ted Byers

2010-Jul-15 19:18 UTC

[R] How do I combine lists of data.frames into a single data frame?

The data.frame is constructed by one of the following functions:

funweek <- function(df)
  if (length(df$elapsed_time) > 5) {
    rv = fitdist(df$elapsed_time,"exp")
    rv$year = df$sale_year[1]
    rv$sample = df$sale_week[1]
    rv$granularity = "week"
    rv
  }
funmonth <- function(df)
  if (length(df$elapsed_time) > 5) {
    rv = fitdist(df$elapsed_time,"exp")
    rv$year = df$sale_year[1]
    rv$sample = df$sale_month[1]
    rv$granularity = "month"
    rv
  }

It is basically the data.frame created by fitdist extended to include the
variables used to distinguish one sample from another.

I have the following statement that gets me a set of IDs from my db:

ids <- dbGetQuery(con, "SELECT DISTINCT m_id FROM risk_input")

And then I have a loop that allows me to analyze one dataset after another:

for (i in 1:length(ids[,1])) {
  print(i)
  print(ids[i,1])

Then, after a set of statements that give me information about the dataset
(such as its size), within a conditional block that ensures I apply the
analysis only on sufficiently large samples, I have the following:

z <- lapply(split(moreinfo,list(moreinfo$sale_year,moreinfo$sale_week),drop
= TRUE), funweek)

or z <-
lapply(split(moreinfo,list(moreinfo$sale_year,moreinfo$sale_month),drop TRUE),
funmonth)

followed by:

str(z)

Of course, I close the loop and disconnect from my db.

NB: I don't see any way to get rid of the loop by adding ID as a factor to
split because I have to query the DB for several key bits of data in order
to determine whether or not there is sufficient data to work on.

I have everything working, except the final step of storing the results back
into the db.  Storing data in the Db is easy enough.  But I am at a loss as
to how to combine the lists placed in z in most of the iterations through
the ID loop into a single data.frame.

Now, I did take a look at rbind and cbind, but it isn't clear to me if
either is appropriate.  All the data frames have the same structure, but the
lists are of variable length, and I am not certain how either might be used
inside the IDs loop.

So, what is the best way to combine all lists assigned to z into a single
data.frame?

Thanks

Ted

	[[alternative HTML version deleted]]

Marc Schwartz

2010-Jul-15 19:27 UTC

head link

[R] How do I combine lists of data.frames into a single data frame?

On Jul 15, 2010, at 2:18 PM, Ted Byers wrote:
> The data.frame is constructed by one of the following functions:
> 
> funweek <- function(df)
>  if (length(df$elapsed_time) > 5) {
>    rv = fitdist(df$elapsed_time,"exp")
>    rv$year = df$sale_year[1]
>    rv$sample = df$sale_week[1]
>    rv$granularity = "week"
>    rv
>  }
> funmonth <- function(df)
>  if (length(df$elapsed_time) > 5) {
>    rv = fitdist(df$elapsed_time,"exp")
>    rv$year = df$sale_year[1]
>    rv$sample = df$sale_month[1]
>    rv$granularity = "month"
>    rv
>  }
> 
> It is basically the data.frame created by fitdist extended to include the
> variables used to distinguish one sample from another.
> 
> I have the following statement that gets me a set of IDs from my db:
> 
> ids <- dbGetQuery(con, "SELECT DISTINCT m_id FROM risk_input")
> 
> And then I have a loop that allows me to analyze one dataset after another:
> 
> for (i in 1:length(ids[,1])) {
>  print(i)
>  print(ids[i,1])
> 
> Then, after a set of statements that give me information about the dataset
> (such as its size), within a conditional block that ensures I apply the
> analysis only on sufficiently large samples, I have the following:
> 
> z <-
lapply(split(moreinfo,list(moreinfo$sale_year,moreinfo$sale_week),drop
> = TRUE), funweek)
> 
> or z <-
> lapply(split(moreinfo,list(moreinfo$sale_year,moreinfo$sale_month),drop
> TRUE), funmonth)
> 
> followed by:
> 
> str(z)
> 
> Of course, I close the loop and disconnect from my db.
> 
> NB: I don't see any way to get rid of the loop by adding ID as a factor
to
> split because I have to query the DB for several key bits of data in order
> to determine whether or not there is sufficient data to work on.
> 
> I have everything working, except the final step of storing the results
back
> into the db.  Storing data in the Db is easy enough.  But I am at a loss as
> to how to combine the lists placed in z in most of the iterations through
> the ID loop into a single data.frame.
> 
> Now, I did take a look at rbind and cbind, but it isn't clear to me if
> either is appropriate.  All the data frames have the same structure, but
the
> lists are of variable length, and I am not certain how either might be used
> inside the IDs loop.
> 
> So, what is the best way to combine all lists assigned to z into a single
> data.frame?
> 
> Thanks
> 
> Ted

Ted,

If each of the data frames in the list 'z' have the same column
structure, you can use:

  do.call(rbind, z)

The result of which will be a single data frame containing all of the rows from
each of the data frames in the list.

HTH,

Marc Schwartz

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Jul 2010 - How do I combine lists of data.frames into a single data frame?

[R] How do I combine lists of data.frames into a single data frame?

[R] How do I combine lists of data.frames into a single data frame?

Possibly Parallel Threads