Frank S.
2016-Nov-15 18:44 UTC
[R] Condirional row removing and replacing in small data.table
Dear R list members, I have a data table of which here is an example: dt <- data.table(id = rep(1:3, c(5, 1, 2)), date = as.Date(rep(c("2005-07-25", "2006-09-17", "1998-11-06", "2001-04-19"), c(3, 2, 1, 2))), fam = factor(c(1, 1, 3, 1, 1, 5, 4, 2)), code = factor(c(90, 91, 300, 75, 91, 500, 400, 90))) I would want to conduct 3 operations: A) Remove rows whose fam is not {1, 2 or 3}, except where this would lead to the disappearance of subject (case of id = 2), where we will keep the row but assigning fam=0 and code=0. B) If within same id and date there are 2 rows with code=90 and code=91 (regardless the order of appearance), then remove that with code=91. C) If within same id and date there is only 1 row with code=91, then this row will be kept but changing its value to code=90. The right solution would be: id date fam code 1 25/07/2005 1 90 1 25/07/2005 3 300 1 17/09/2006 1 75 1 17/09/2006 1 90 2 06/11/1998 0 0 3 19/04/2001 2 90 I have tried to implement step A, but I get an error message when executing. Moreover, I'm aware that the code I present may be not the optimal way to do so (since I need too many code lines): dtcount <- dt[, count1 := .N, by = id][, count2 := .N, by = list(id, date)] # add two counts dtA <- dtcount[, { if (!(fam %in% 1:3) && count1 == 1) { result <- list(date = date, fam = factor(0), code = factor(0)) } else { if (fam %in% 1:3) { result <- list(date = date, fam = fam, code = code) } } result }, by = id] Any help would be appreciated! Frank S. [[alternative HTML version deleted]]