GRAHAM LEASK
2013-Apr-06 09:24 UTC
[R] Replace missing values within a group with the non-missing value
I have a large dataset organised in choice groups see below. Each choice represents a separate occasion with 1 product chosen out of the 6 offered. +-------------------------------------------------------------------------------------------------+ | dn obs choice acid br date cdate situat~n mth year set | |-------------------------------------------------------------------------------------------------| 1. | 4 1 0 LOSEC 1 . . . . 1 | 2. | 4 1 0 NEXIUM 2 . . . . 1 | 3. | 4 1 0 PARIET 3 . . . . 1 | 4. | 4 1 0 PROTIUM 4 . . . . 1 | 5. | 4 1 0 ZANTAC 5 . . . . 1 | |-------------------------------------------------------------------------------------------------| 6. | 4 1 1 ZOTON 6 23aug2000 01:00:00 23aug2000 NS 487 2000 1 | 7. | 4 2 0 LOSEC 1 . . . . 2 | 8. | 4 2 0 NEXIUM 2 . . . . 2 | 9. | 4 2 1 PARIET 3 25sep2000 01:00:00 25sep2000 L 488 2000 2 | 10. | 4 2 0 PROTIUM 4 . . . . 2 | |-------------------------------------------------------------------------------------------------| 11. | 4 2 0 ZANTAC 5 . . . . 2 | 12. | 4 2 0 ZOTON 6 . . . . 2 | 13. | 4 3 0 LOSEC 1 . . . . 3 | 14. | 4 3 0 NEXIUM 2 . . . . 3 | 15. | 4 3 0 PARIET 3 . . . . 3 | |-------------------------------------------------------------------------------------------------| 16. | 4 3 0 PROTIUM 4 . . . . 3 | 17. | 4 3 0 ZANTAC 5 . . . . 3 | 18. | 4 3 1 ZOTON 6 20sep2000 00:00:00 20sep2000 R 488 2000 3 | 19. | 4 4 0 LOSEC 1 . . . . 4 | I wish to fill in the missing values in each choice set – delineated by dn (Doctor) obs (Observation number) and choices (1 to 6). For each choice set one choice is chosen which contains full time information for that choice set ie in set 1 choice 6 was chosen and shows the month 487. The other 5 choices show mth as missing. I want to fill these with the correct mth. Clearly on different occasions the date will differ but each choice set has only one date. Is there a simple elegant way to do this in R? Kind regards Graham [[alternative HTML version deleted]]
Adams, Jean
2013-Apr-08 14:30 UTC
[R] Replace missing values within a group with the non-missing value
Graham, # use dput() to share your data in a way that's easy for R-help readers to use mydf <- structure(list(dn = c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), obs = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L), choice = c(0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L), acid = c("LOSEC", "NEXIUM", "PARIET", "PROTIUM", "ZANTAC", "ZOTON", "LOSEC", "NEXIUM", "PARIET", "PROTIUM", "ZANTAC", "ZOTON", "LOSEC", "NEXIUM", "PARIET", "PROTIUM", "ZANTAC", "ZOTON", "LOSEC"), br = c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L), cdate = c(NA, NA, NA, NA, NA, "23-Aug-00", NA, NA, "25-Sep-00", NA, NA, NA, NA, NA, NA, NA, NA, "20-Sep-00", NA), situation = c(NA, NA, NA, NA, NA, "NS", NA, NA, "L", NA, NA, NA, NA, NA, NA, NA, NA, "R", NA), mth = c(NA, NA, NA, NA, NA, 487L, NA, NA, 488L, NA, NA, NA, NA, NA, NA, NA, NA, 488L, NA), year = c(NA, NA, NA, NA, NA, 2000L, NA, NA, 2000L, NA, NA, NA, NA, NA, NA, NA, NA, 2000L, NA), set = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L)), .Names = c("dn", "obs", "choice", "acid", "br", "cdate", "situation", "mth", "year", "set"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19")) # use aggregate() to calculate the mean of mth for each combination of dn and obs mean.mth <- aggregate(mth ~ dn + obs, data=mydf, mean, na.rm=TRUE) # merge the original data with means mydf2 <- merge(mydf, mean.mth, by=c("dn", "obs"), suffixes=c("",".filled")) Jean On Sat, Apr 6, 2013 at 4:24 AM, GRAHAM LEASK <grahamleask@btopenworld.com>wrote:> > > I have a large dataset organised in choice groups see below. Each choice > represents a separate occasion with 1 product chosen out of the 6 offered. > > > > > +-------------------------------------------------------------------------------------------------+ > | dn obs choice acid br date cdate > situat~n mth year set | > > |-------------------------------------------------------------------------------------------------| > 1. | 4 1 0 LOSEC 1 . > . . . 1 | > 2. | 4 1 0 NEXIUM 2 . > . . . 1 | > 3. | 4 1 0 PARIET 3 . > . . . 1 | > 4. | 4 1 0 PROTIUM 4 . > . . . 1 | > 5. | 4 1 0 ZANTAC 5 . > . . . 1 | > > |-------------------------------------------------------------------------------------------------| > 6. | 4 1 1 ZOTON 6 23aug2000 01:00:00 > 23aug2000 NS 487 2000 1 | > 7. | 4 2 0 LOSEC 1 . > . . . 2 | > 8. | 4 2 0 NEXIUM 2 . > . . . 2 | > 9. | 4 2 1 PARIET 3 25sep2000 01:00:00 > 25sep2000 L 488 2000 2 | > 10. | 4 2 0 PROTIUM 4 . > . . . 2 | > > |-------------------------------------------------------------------------------------------------| > 11. | 4 2 0 ZANTAC 5 . > . . . 2 | > 12. | 4 2 0 ZOTON 6 . > . . . 2 | > 13. | 4 3 0 LOSEC 1 . > . . . 3 | > 14. | 4 3 0 NEXIUM 2 . > . . . 3 | > 15. | 4 3 0 PARIET 3 . > . . . 3 | > > |-------------------------------------------------------------------------------------------------| > 16. | 4 3 0 PROTIUM 4 . > . . . 3 | > 17. | 4 3 0 ZANTAC 5 . > . . . 3 | > 18. | 4 3 1 ZOTON 6 20sep2000 00:00:00 > 20sep2000 R 488 2000 3 | > 19. | 4 4 0 LOSEC 1 . > . . . 4 | > > I wish to fill in the missing values in each choice set – delineated by dn > (Doctor) obs (Observation number) and choices (1 to 6). > For each choice set one choice is chosen which contains full time > information for that choice set ie in set 1 choice 6 was chosen and > shows the month 487. The other 5 choices show mth as missing. I want to > fill these with the correct mth. > > Clearly on different occasions the date will differ but each choice set > has only one date. > > Is there a simple elegant way to do this in R? > Kind regards > > > > > Graham > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]