Luigi Marongiu
2015-Aug-31 20:49 UTC
[R] Conditional replacement and removal of data frame values
Dear all, I have a data frame and I would like to do the following: a) replace value of one variable "a" according to the value of another one "b" b) remove all the instances of the variable "b" For the sake of argument, let's say I have the following data frame: test <- rep(c("Adenovirus", "Rotavirus", "Norovirus", "Rotarix", "Sapovirus"), 3) res <- c(0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0) samp <- c(rep(1, 5), rep(2, 5), rep(3, 5)) df <- data.frame(test, res, samp, stringsAsFactors = FALSE) The task I need is to coerce the results of the "Rotavirus" to negative (0) if and only if "Rotarix" is positive (1). In this example, the results shows that for "samp" 3 "Rotavirus" should be 0: test res samp 2 Rotavirus 1 1 4 Rotarix 0 1 7 Rotavirus 0 2 9 Rotarix 1 2 12 Rotavirus 1 3 14 Rotarix 1 3 I can't use the subset function because then I would work on a separate object and I don't know how to implement the conditions for the replacements. Finally, all the "Rotarix" entries should be removed from the data frame. Thank you. Best regards, Luigi
David Winsemius
2015-Aug-31 23:48 UTC
[R] Conditional replacement and removal of data frame values
On Aug 31, 2015, at 1:49 PM, Luigi Marongiu wrote:> Dear all, > I have a data frame and I would like to do the following: > a) replace value of one variable "a" according to the value of another one "b" > b) remove all the instances of the variable "b" > > For the sake of argument, let's say I have the following data frame: > test <- rep(c("Adenovirus", "Rotavirus", "Norovirus", "Rotarix", > "Sapovirus"), 3) > res <- c(0, 1, 0, 0, 1, > 1, 0, 1, 1, 0, > 0, 1, 0, 1, 0) > samp <- c(rep(1, 5), rep(2, 5), rep(3, 5)) > df <- data.frame(test, res, samp, stringsAsFactors = FALSE) > > The task I need is to coerce the results of the "Rotavirus" to > negative (0) if and only if "Rotarix" is positive (1). In this > example, the results shows that for "samp" 3 "Rotavirus" should be 0: > test res samp > 2 Rotavirus 1 1 > 4 Rotarix 0 1 > 7 Rotavirus 0 2 > 9 Rotarix 1 2 > 12 Rotavirus 1 3 > 14 Rotarix 1 3 > > I can't use the subset function because then I would work on a > separate object and I don't know how to implement the conditions for > the replacements. > Finally, all the "Rotarix" entries should be removed from the data frame.From context it appears you want to do this testing within groups determined by 'samp', so you might choose to use an lapply-split approach: lapply( split(df, df$samp), FUN=function(d) if ( d[d$test =="Rotarix", "res"] ) { d$res[d$test=="Rotavirus"] <- 0 ; return( d[!d$test=="Rotarix", ] ) } else { d[!d$test=="Rotarix", ]} ) $`1` test res samp 1 Adenovirus 0 1 2 Rotavirus 1 1 3 Norovirus 0 1 5 Sapovirus 1 1 $`2` test res samp 6 Adenovirus 1 2 7 Rotavirus 0 2 8 Norovirus 1 2 10 Sapovirus 0 2 $`3` test res samp 11 Adenovirus 0 3 12 Rotavirus 0 3 13 Norovirus 0 3 15 Sapovirus 0 3 It's pretty easy to rbind.data.frame those together> do.call( rbind.data.frame, lapply( split(df, df$samp), FUN=function(d) if ( d[d$test =="Rotarix", "res"] ) { d$res[d$test=="Rotavirus"] <- 0 ; return( d[!d$test=="Rotarix", ] ) } else { d[!d$test=="Rotarix", ]} ) )test res samp 1.1 Adenovirus 0 1 1.2 Rotavirus 1 1 1.3 Norovirus 0 1 1.5 Sapovirus 1 1 2.6 Adenovirus 1 2 2.7 Rotavirus 0 2 2.8 Norovirus 1 2 2.10 Sapovirus 0 2 3.11 Adenovirus 0 3 3.12 Rotavirus 0 3 3.13 Norovirus 0 3 3.15 Sapovirus 0 3> Thank you. > Best regards, > Luigi > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA