Ted Byers
2010-Jul-16 15:54 UTC
[R] I need help making a data.fame comprised of selected columns of an original data frame.
I must have missed something simple, but still, i don't know what. I obtained my basic data as follows: x <- sprintf("SELECT m_id,sale_date,YEAR(sale_date) AS sale_year,WEEK(sale_date) AS sale_week,return_type,0.0001 + DATEDIFF(return_date,sale_date) AS elapsed_time FROM `merchants2`.`risk_input` WHERE DATEDIFF(return_date,sale_date) IS NOT NULL") moreinfo <- dbGetQuery(con, x) I then made the data frame I want to use as follows: fun_m_id <- function(df) if (length(df$elapsed_time) > 5) { rv = fitdist(df$elapsed_time,"exp") rv$mid = df$m_id[1] rv } aaa <- lapply(split(moreinfo,list(moreinfo$m_id),drop = TRUE), fun_m_id) m_id_default_res <- do.call(rbind, aaa) At this point, each row in m_id_default_res corresponds to one data.frame produced by fitdist. When I print it, I get the output I expected. However, I need to store only some of it into my DB. And then, because fitdist produces a data frame that includes a lot of info I don't need to store in the DB, I tried making a new data.frame containing only the info I need as follows: ndf = data.frame() for (i in 1:length(m_id_default_res[,1])) { ndf$mid[i] = m_id_default_res$mid[i] ndf$estimate[i] = m_id_default_res$estimate[i] ndf$sd[i] = m_id_default_res$sd[i] ndf$n[i] = m_id_default_res[i] ndf$loglik[i] = m_id_default_res$loglik[i] ndf$aic[i] = m_id_default_res$aic[i] ndf$bic[i] = m_id_default_res$bic[i] ndf$chisq[i] = m_id_default_res$chisq[i] ndf$chisqpvalue[i] = m_id_default_res$chisqpvalue[i] ndf$chisqdf[i] = m_id_default_res$chisqdf[i] } ndf And I get the following error: Error in `$<-.data.frame`(`*tmp*`, "n", value = list(0.114752782316094)) : replacement has 1 rows, data has 0 I need to either get rid of the columns in m_id_default_res that I don't need, or I need to copy only those columns I need to a new data.frame. How do I do this. Obviously, doing an element-wise copy, at least as I tried to do it, doesn't work. Thanks, Ted [[alternative HTML version deleted]]
Steve Lianoglou
2010-Jul-16 16:04 UTC
[R] I need help making a data.fame comprised of selected columns of an original data frame.
Hi, First: it's kind of hard to play along w/o some reproducible data. To that end, you can paste into an email the output of: dput(moreinfo) If there are lots of rows in `moreinfo`, just give us the first ~10-20 dput(head(moreinfo, 20)) Anyway: <snip>> At this point, each row in m_id_default_res corresponds to one data.frame > produced by fitdist. ?When I print it, I get the output I expected. > However, I need to store only some of it into my DB. > > And then, because fitdist produces a data frame that includes a lot of info > I don't need to store in the DB, I tried making a new data.frame containing > only the info I need as follows: > ndf = data.frame() > for (i in 1:length(m_id_default_res[,1])) { > ?ndf$mid[i] = m_id_default_res$mid[i] > ?ndf$estimate[i] = m_id_default_res$estimate[i] > ?ndf$sd[i] = m_id_default_res$sd[i] > ?ndf$n[i] = m_id_default_res[i] > ?ndf$loglik[i] = m_id_default_res$loglik[i] > ?ndf$aic[i] = m_id_default_res$aic[i] > ?ndf$bic[i] = m_id_default_res$bic[i] > ?ndf$chisq[i] = m_id_default_res$chisq[i] > ?ndf$chisqpvalue[i] = m_id_default_res$chisqpvalue[i] > ?ndf$chisqdf[i] = m_id_default_res$chisqdf[i] > }Forget the for loop. How about: ndf <- m_id_default[, c('mid, 'estimate', 'sd', 'loglik', 'aic', 'bic', 'chisq', 'chisqpvalue', 'chisqdf') Having just written that, I see something strange in your for loop. Specifically this line:> ?ndf$n[i] = m_id_default_res[i]m_id_default_res is a data.frame, right? Why don't you try to see what `m_id_default_res[1]` returns. I'm not sure that that's what your error message is coming from, but I foresee this to be a problem anyway, if I follow your "build up" code correctly. Hope that helps, -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
Possibly Parallel Threads
- Troubles with DBI's dbWriteTable in RMySQL
- How do I get rid of list elements where the value is NULL before applying rbind?
- One problem with RMySQL and a query that returns an empty recordset
- Query about using timestamps returned by SQL as 'factor' for split
- install a package made using bioconductor package pdInfoBuilder