Ted Byers
2010-Jul-15 19:18 UTC
[R] How do I combine lists of data.frames into a single data frame?
The data.frame is constructed by one of the following functions: funweek <- function(df) if (length(df$elapsed_time) > 5) { rv = fitdist(df$elapsed_time,"exp") rv$year = df$sale_year[1] rv$sample = df$sale_week[1] rv$granularity = "week" rv } funmonth <- function(df) if (length(df$elapsed_time) > 5) { rv = fitdist(df$elapsed_time,"exp") rv$year = df$sale_year[1] rv$sample = df$sale_month[1] rv$granularity = "month" rv } It is basically the data.frame created by fitdist extended to include the variables used to distinguish one sample from another. I have the following statement that gets me a set of IDs from my db: ids <- dbGetQuery(con, "SELECT DISTINCT m_id FROM risk_input") And then I have a loop that allows me to analyze one dataset after another: for (i in 1:length(ids[,1])) { print(i) print(ids[i,1]) Then, after a set of statements that give me information about the dataset (such as its size), within a conditional block that ensures I apply the analysis only on sufficiently large samples, I have the following: z <- lapply(split(moreinfo,list(moreinfo$sale_year,moreinfo$sale_week),drop = TRUE), funweek) or z <- lapply(split(moreinfo,list(moreinfo$sale_year,moreinfo$sale_month),drop TRUE), funmonth) followed by: str(z) Of course, I close the loop and disconnect from my db. NB: I don't see any way to get rid of the loop by adding ID as a factor to split because I have to query the DB for several key bits of data in order to determine whether or not there is sufficient data to work on. I have everything working, except the final step of storing the results back into the db. Storing data in the Db is easy enough. But I am at a loss as to how to combine the lists placed in z in most of the iterations through the ID loop into a single data.frame. Now, I did take a look at rbind and cbind, but it isn't clear to me if either is appropriate. All the data frames have the same structure, but the lists are of variable length, and I am not certain how either might be used inside the IDs loop. So, what is the best way to combine all lists assigned to z into a single data.frame? Thanks Ted [[alternative HTML version deleted]]
Marc Schwartz
2010-Jul-15 19:27 UTC
[R] How do I combine lists of data.frames into a single data frame?
On Jul 15, 2010, at 2:18 PM, Ted Byers wrote:> The data.frame is constructed by one of the following functions: > > funweek <- function(df) > if (length(df$elapsed_time) > 5) { > rv = fitdist(df$elapsed_time,"exp") > rv$year = df$sale_year[1] > rv$sample = df$sale_week[1] > rv$granularity = "week" > rv > } > funmonth <- function(df) > if (length(df$elapsed_time) > 5) { > rv = fitdist(df$elapsed_time,"exp") > rv$year = df$sale_year[1] > rv$sample = df$sale_month[1] > rv$granularity = "month" > rv > } > > It is basically the data.frame created by fitdist extended to include the > variables used to distinguish one sample from another. > > I have the following statement that gets me a set of IDs from my db: > > ids <- dbGetQuery(con, "SELECT DISTINCT m_id FROM risk_input") > > And then I have a loop that allows me to analyze one dataset after another: > > for (i in 1:length(ids[,1])) { > print(i) > print(ids[i,1]) > > Then, after a set of statements that give me information about the dataset > (such as its size), within a conditional block that ensures I apply the > analysis only on sufficiently large samples, I have the following: > > z <- lapply(split(moreinfo,list(moreinfo$sale_year,moreinfo$sale_week),drop > = TRUE), funweek) > > or z <- > lapply(split(moreinfo,list(moreinfo$sale_year,moreinfo$sale_month),drop > TRUE), funmonth) > > followed by: > > str(z) > > Of course, I close the loop and disconnect from my db. > > NB: I don't see any way to get rid of the loop by adding ID as a factor to > split because I have to query the DB for several key bits of data in order > to determine whether or not there is sufficient data to work on. > > I have everything working, except the final step of storing the results back > into the db. Storing data in the Db is easy enough. But I am at a loss as > to how to combine the lists placed in z in most of the iterations through > the ID loop into a single data.frame. > > Now, I did take a look at rbind and cbind, but it isn't clear to me if > either is appropriate. All the data frames have the same structure, but the > lists are of variable length, and I am not certain how either might be used > inside the IDs loop. > > So, what is the best way to combine all lists assigned to z into a single > data.frame? > > Thanks > > TedTed, If each of the data frames in the list 'z' have the same column structure, you can use: do.call(rbind, z) The result of which will be a single data frame containing all of the rows from each of the data frames in the list. HTH, Marc Schwartz
Possibly Parallel Threads
- How do I get rid of list elements where the value is NULL before applying rbind?
- exercise in frustration: applying a function to subsamples
- I need help making a data.fame comprised of selected columns of an original data frame.
- One problem with RMySQL and a query that returns an empty recordset
- Query about using timestamps returned by SQL as 'factor' for split