Hi, I am a beginner in R and have only read a few chapters in the R book, I was not able to find a solution for this simple problem. I have an empty data frame: a=data.frame(name="test") which I would like to extend in a for-loop (with data extracted from a database). Ideally I would like to extend the data frame like this: a["new_1"] = 1:10 a["new_1"] = 1:12 a["new_1"] = 1:14 R now obviously complains about the changing length of the new columns. However, I would like to have missing values being added whenever columns are shorter than a newer (longer) column. How can I do that? Thanks, Ralf
Ralf B wrote:> Hi, > > I am a beginner in R and have only read a few chapters in the R book, > I was not able to find a solution for this simple problem. > > I have an empty data frame: > > a=data.frame(name="test") > > which I would like to extend in a for-loop (with data extracted from a > database). Ideally I would like to extend the data frame like this: > > a["new_1"] = 1:10 > a["new_1"] = 1:12 > a["new_1"] = 1:14 >I would first read all the data into a list (maybe using lapply), where the columns are the parts of the list. Then you can find out which one is longest, and add NA's at the end of the other columns, and than use do.call("cbind", list_of_columns) to get the resulting data.frame: note that I use apply type of constructs a lot, it is sort of one line for loop. # Create a mockup list for this particular example column_list = lapply(round(runif(5, 1, 10)), function(len_column) { rep(len_column, times = len_column) }) # Find the length of the columns in the list len_columns = sapply(column_list, length) # add the NA's dum = lapply(column_list, function(col) { c(col, rep(NA, max(len_columns) - length(col))) }) # Make the dataframe dat = data.frame(do.call("cbind", dum)) it is quite a manual way of doing it, maybe someone else knows of an available R function to do it. But this is how I would do it. cheers, Paul> R now obviously complains about the changing length of the new > columns. However, I would like to have missing values being added > whenever columns are shorter than a newer (longer) column. How can I > do that? > > Thanks, > Ralf > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +3130 274 3113 Mon-Tue Phone: +3130 253 5773 Wed-Fri http://intamap.geo.uu.nl/~paul
On Feb 17, 2010, at 4:00 AM, Ralf B wrote:> Hi, > > I am a beginner in R and have only read a few chapters in the R book,Which "R book"?> I was not able to find a solution for this simple problem. > > I have an empty data frame: > > a=data.frame(name="test")No, you have a dataframe with one column of type "character" and one row: > a name 1 test> > which I would like to extend in a for-loop (with data extracted from a > database). Ideally I would like to extend the data frame like this: > > a["new_1"] = 1:10 > a["new_1"] = 1:12 > a["new_1"] = 1:14You cannot use indexing to "extend" dataframes. Dataframes are not appropriate for arbitrarily long data objects. Lists are better for that purpose.> > R now obviously complains about the changing length of the new > columns. However, I would like to have missing values being added > whenever columns are shorter than a newer (longer) column.You can wish for anything, but this is not the default behavior of dataframes. Under normal circumstances adding a shorter vector will result in "argument recycling", so to circumvent that you will need to add the appropriate length of NAs to the end of your vectors: dtfrm$new <- c(vec, rep(NA, length(vec)- nrow(dtfrm)) # can't use length dtfrm since that is number of columns # is this like telomeres on chromosomes? Add adding longer vector than the number of rows with "$ <- " assignment will cause an error; You could pre-dimension your dataframe so it is ready to hold your data: needed_rows <- 1000 > dtfrm <- data.frame(dummy=rep(NA, needed_rows) ) > str(dtfrm) 'data.frame': 1000 obs. of 1 variable: $ dummy: logi NA NA NA NA NA NA ... If you then need to add complete rows of new data, you can also use rbind. You probably ought to look at the mechanisms for accessing databases directly. Possible search terms: sqlite, sqldf, RODBC> How can I > do that?Read the help pages: ?"[" ?rbind ?cbind # may be the same> > Thanks, > Ralf > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT
If your data frame only has numeric entries you could represent it as a multivariate ts time series in which case this works:> xx <- ts(cbind(a = 1:2, b = 3:4, c = 5:6)); xxTime Series: Start = 1 End = 2 Frequency = 1 a b c 1 1 3 5 2 2 4 6> cbind(a = ts(1:4), b = xx[, "b"], c = xx[, "c"])Time Series: Start = 1 End = 4 Frequency = 1 a b c 1 1 3 5 2 2 4 6 3 3 NA NA 4 4 NA NA It can be done with the sqldf package like this:> library(sqldf) > DF <- data.frame(a = 1:5, b = 6:10) > DFnew <- data.frame(a = 1:7) > sqldf("select DFnew.a, DF.b from DFnew left join DF on DF.rowid = DFnew.rowid")a b 1 1 6 2 2 7 3 3 8 4 4 9 5 5 10 6 6 NA 7 7 NA On Wed, Feb 17, 2010 at 4:00 AM, Ralf B <ralf.bierig at gmail.com> wrote:> Hi, > > I am a beginner in R and have only read a few chapters in the R book, > I was not able to find a solution for this simple problem. > > I have an empty data frame: > > a=data.frame(name="test") > > which I would like to extend in a for-loop (with data extracted from a > database). Ideally I would like to extend the data frame like this: > > a["new_1"] = 1:10 > a["new_1"] = 1:12 > a["new_1"] = 1:14 > > R now obviously complains about the changing length of the new > columns. However, I would like to have missing values being added > whenever columns are shorter than a newer (longer) column. How can I > do that? > > Thanks, > Ralf > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >