# Just when I thought I had the basic stuff mastered.... # This has been quite perplexing, thanks for any help ## Here's the example: db1=data.frame( olditems=c('soup','','','','nuts'), prices=c(4.45, 3.25, 4.42, 2.25, 3.98)) db2=data.frame( newitems=c('stew','crackers','tofu','goatsmilk','peanuts')) str(db1) #factors and prices str(db2) #new names, but I want *only* the updates is.na(db1$olditems) #a little surprising that '' is not equal to NA db1$olditems=='' #oh good, at least I can get to the blanks this way db1$olditems[db1$olditems==''] #wait, only one item is returned? db1[db1$olditems=='',] #somehow this works! #how would I get the new item names into the old items column of db1?? # I was expecting that this would work: # db1$olditems[db1$olditems=='']# db2$newitems[db1$olditems==''] [[alternative HTML version deleted]]
On Tue, Jul 21, 2009 at 7:39 PM, Gene Leynes<gleynes+r at gmail.com> wrote:> # Just when I thought I had the basic stuff mastered.... > # This has been quite perplexing, thanks for any help > > > ## Here's the example: > > db1=data.frame( > ? ?olditems=c('soup','','','','nuts'), > ? ?prices=c(4.45, 3.25, 4.42, 2.25, 3.98)) > db2=data.frame( > ? ?newitems=c('stew','crackers','tofu','goatsmilk','peanuts')) > > str(db1) ? ?#factors and prices > str(db2) ? ?#new names, but I want *only* the updates > > is.na(db1$olditems) ?#a little surprising that '' is not equal to NAWhy?> db1$olditems=='' ? ? #oh good, at least I can get to the blanks this way > db1$olditems[db1$olditems==''] ?#wait, only one item is returned?length(db1$olditems[db1$olditems==''])> db1[db1$olditems=='',] ?#somehow this works! > > #how would I get the new item names into the old items column of db1?? > # I was expecting that this would work: > # ? ?db1$olditems[db1$olditems=='']> # ? ? ? ?db2$newitems[db1$olditems=='']Try working with characters instead of factors. db1$olditems <- as.character(db1$olditems) db2$newitems <- as.character(db2$newitems) db1$olditems[db1$olditems==''] <- db2$newitems[db1$olditems==''] Hadley -- http://had.co.nz/
Notice that three items are returned where you thought one was: [1] FALSE TRUE TRUE TRUE FALSE> db1$olditems[db1$olditems==''] #wait, only one item is returned?[1] Levels: nuts soup> db1[db1$olditems=='',] #somehow this works!olditems prices 2 3.25 3 4.42 4 2.25> paste('[',db1$olditems[db1$olditems==''],']') # put some characters around return values[1] "[ ]" "[ ]" "[ ]">The '[1]' was just an indication that this is the first value returned and the other two were blanks so you did not see them. Also "" is just that; a blank and not NA. On Tue, Jul 21, 2009 at 8:39 PM, Gene Leynes<gleynes+r at gmail.com> wrote:> # Just when I thought I had the basic stuff mastered.... > # This has been quite perplexing, thanks for any help > > > ## Here's the example: > > db1=data.frame( > ? ?olditems=c('soup','','','','nuts'), > ? ?prices=c(4.45, 3.25, 4.42, 2.25, 3.98)) > db2=data.frame( > ? ?newitems=c('stew','crackers','tofu','goatsmilk','peanuts')) > > str(db1) ? ?#factors and prices > str(db2) ? ?#new names, but I want *only* the updates > > is.na(db1$olditems) ?#a little surprising that '' is not equal to NA > db1$olditems=='' ? ? #oh good, at least I can get to the blanks this way > db1$olditems[db1$olditems==''] ?#wait, only one item is returned? > db1[db1$olditems=='',] ?#somehow this works! > > #how would I get the new item names into the old items column of db1?? > # I was expecting that this would work: > # ? ?db1$olditems[db1$olditems=='']> # ? ? ? ?db2$newitems[db1$olditems==''] > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Bill.Venables at csiro.au
2009-Jul-22 01:22 UTC
[R] How to replace NAs in a vector of factors?
Couple of points: 1. if you are going to be replacing entries in factors with updated levels, it's probably easier if you start with your strings remaining as strings as they go into the data frames. So here is how I would start your example db1 <- data.frame( olditems = c('soup','','','','nuts'), prices = c(4.45, 3.25, 4.42, 2.25, 3.98), stringsAsFactors = FALSE) db2 <- data.frame( newitems = c('stew','crackers','tofu','goatsmilk','peanuts'), stringsAsFactors = FALSE) 2. Strings with zero characters are still strings (like zero is still a number). They are not missing. If you want them to be made missing you can do so afterwards with: #### zero length strings become NA is.na(db1$olditems[db1$olditems == '']) <- TRUE 3. Now to replace the missing values with the corresponding ones from the second data frame: k <- is.na(db1$olditems) db1[k, "olditems"] <- db2[k, "newitems"] 4. Check> db1olditems prices 1 soup 4.45 2 crackers 3.25 3 tofu 4.42 4 goatsmilk 2.25 5 nuts 3.98>5. If you really do want factors rather than character strings, you can now change back: db1 <- within(db1, olditems <- factor(olditems)) ## use <- here! 6. check the difference> str(db1)'data.frame': 5 obs. of 2 variables: $ olditems: Factor w/ 5 levels "crackers","goatsmilk",..: 4 1 5 2 3 $ prices : num 4.45 3.25 4.42 2.25 3.98>Bill Venables http://www.cmis.csiro.au/bill.venables/ -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Gene Leynes Sent: Wednesday, 22 July 2009 10:39 AM To: r-help at r-project.org Subject: [R] How to replace NAs in a vector of factors? # Just when I thought I had the basic stuff mastered.... # This has been quite perplexing, thanks for any help ## Here's the example: db1=data.frame( olditems=c('soup','','','','nuts'), prices=c(4.45, 3.25, 4.42, 2.25, 3.98)) db2=data.frame( newitems=c('stew','crackers','tofu','goatsmilk','peanuts')) str(db1) #factors and prices str(db2) #new names, but I want *only* the updates is.na(db1$olditems) #a little surprising that '' is not equal to NA db1$olditems=='' #oh good, at least I can get to the blanks this way db1$olditems[db1$olditems==''] #wait, only one item is returned? db1[db1$olditems=='',] #somehow this works! #how would I get the new item names into the old items column of db1?? # I was expecting that this would work: # db1$olditems[db1$olditems=='']# db2$newitems[db1$olditems==''] [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.