Hello all, I would like to take a data frame such as the following one:> df <-data.frame(id=c("A","A","B","B"),first=c("BX",NA,NA,"LF"),second=c(NA,"TD","BZ",NA),third=c(NA,NA,"RB","BT"),fourth=c("LG","QR",NA,NA))> dfid first second third fourth 1 A BX <NA> <NA> LG 2 A <NA> TD <NA> QR 3 B <NA> BZ RB <NA> 4 B LF <NA> BT <NA> and merge rows based on the id, such that the value in the column will be one of four possibilities: if both values in the original df are <NA>, the new value should also be <NA>. If there are two non-NA values, then the new value should read "clash". Otherwise, the new value should be whichever value was not <NA>. An example output from the command would read in df and read out: id first second third fourth 1 A BX TD <NA> clash 2 B LF BZ clash <NA> I'd be grateful if someone could point me in the right direction. Thanks, Jonathan [[alternative HTML version deleted]]
Here's one way in base R: df <- data.frame(id=c("A","A","B","B"), first=c("BX",NA,NA,"LF"), second=c(NA,"TD","BZ",NA), third=c(NA,NA,"RB","BT"), fourth=c("LG","QR",NA,NA)) new_df <- data.frame(do.call(rbind, by(df, df$id, function(x) { sapply(x[,-1], function(y) { if (all(is.na(y))) return(NA) if (all(!is.na(y))) return("clash") return(as.character(y[which(!is.na(y))])) }) }))) new_df$id <- rownames(new_df) rownames(new_df) <- NULL new_df ## first second third fourth id ## 1 BX TD <NA> clash A ## 2 LF BZ clash <NA> B On Tue, Aug 18, 2015 at 3:06 PM, Jon BR <jonsleepy at gmail.com> wrote:> df <- > data.frame(id=c("A","A","B","B"),first=c("BX",NA,NA,"LF"),second=c(NA,"TD","BZ",NA),third=c(NA,NA,"RB","BT"),fourth=c("LG","QR",NA,NA)) >> df