karinlag at ifi.uio.no
2009-Aug-12 15:49 UTC
[R] inserting into data frame gives "invalid factor level, NAs generated"
I am calculating some values that I am inserting into a data frame. From what I have read, creating the dataframe ahead of time is more efficient, since rbind (so far the only solution I have found to appending to a data frame) is not very fast. What I am doing is the following: # create data frame goframe = data.frame(goA = character(10), goB = character(10), value numeric(10)) goframe[1,] = c("AA", "BB", 0.4) Result is:> goframe[1,] = c("AA", "BB", 0.4)Warning messages: 1: In `[<-.factor`(`*tmp*`, iseq, value = "AA") : invalid factor level, NAs generated 2: In `[<-.factor`(`*tmp*`, iseq, value = "BB") : invalid factor level, NAs generated>Is there another/better/more recomended way of doing this? If not, how do I do this without getting all the warnings? Thanks! Best, Karin Lagesen
Scott Sherrill-Mix
2009-Aug-12 16:02 UTC
[R] inserting into data frame gives "invalid factor level, NAs generated"
Your running into the pretty common factor vs character problem in R. By default data.frame turns character vectors into factor (sort of like ENUM in mysql) vectors. Since you only have 1 factor (empty string '') in your starting dataframe, when you go to insert new data R sees a new value and complains. You'd probably be pretty safe using character columns instead of factors for now (by adding stringsAsFactors=FALSE to data.frame()) e.g.: goframe<-data.frame(goA = character(10), goB = character(10), value =numeric(10),stringsAsFactors=FALSE) Scott Scott Sherrill-Mix Department of Microbiology University of Pennsylvania 402B Johnson Pavilion 3610 Hamilton Walk Philadelphia, PA 19104-6076 On Wed, Aug 12, 2009 at 11:49 AM, <karinlag at ifi.uio.no> wrote:> I am calculating some values that I am inserting into a data frame. From > what I have read, creating the dataframe ahead of time is more efficient, > since rbind (so far the only solution I have found to appending to a data > frame) is not very fast. > > What I am doing is the following: > > # create data frame > > goframe = data.frame(goA = character(10), goB = character(10), value > numeric(10)) > goframe[1,] = c("AA", "BB", 0.4) > > Result is: > >> goframe[1,] = c("AA", "BB", 0.4) > Warning messages: > 1: In `[<-.factor`(`*tmp*`, iseq, value = "AA") : > ?invalid factor level, NAs generated > 2: In `[<-.factor`(`*tmp*`, iseq, value = "BB") : > ?invalid factor level, NAs generated >> > > Is there another/better/more recomended way of doing this? If not, how do > I do this without getting all the warnings? > > Thanks! > > Best, > > Karin Lagesen > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >