Hello all,
I would like to take a data frame such as the following one:
> df <-
data.frame(id=c("A","A","B","B"),first=c("BX",NA,NA,"LF"),second=c(NA,"TD","BZ",NA),third=c(NA,NA,"RB","BT"),fourth=c("LG","QR",NA,NA))> df
id first second third fourth
1 A BX <NA> <NA> LG
2 A <NA> TD <NA> QR
3 B <NA> BZ RB <NA>
4 B LF <NA> BT <NA>
and merge rows based on the id, such that the value in the column will be
one of four possibilities: if both values in the original df are <NA>, the
new value should also be <NA>. If there are two non-NA values, then the
new value should read "clash". Otherwise, the new value should be
whichever value was not <NA>.
An example output from the command would read in df and read out:
id first second third fourth
1 A BX TD <NA> clash
2 B LF BZ clash <NA>
I'd be grateful if someone could point me in the right direction.
Thanks,
Jonathan
[[alternative HTML version deleted]]
Here's one way in base R:
df <-
data.frame(id=c("A","A","B","B"),
first=c("BX",NA,NA,"LF"),
second=c(NA,"TD","BZ",NA),
third=c(NA,NA,"RB","BT"),
fourth=c("LG","QR",NA,NA))
new_df <- data.frame(do.call(rbind, by(df, df$id, function(x) {
sapply(x[,-1], function(y) {
if (all(is.na(y))) return(NA)
if (all(!is.na(y))) return("clash")
return(as.character(y[which(!is.na(y))]))
})
})))
new_df$id <- rownames(new_df)
rownames(new_df) <- NULL
new_df
## first second third fourth id
## 1 BX TD <NA> clash A
## 2 LF BZ clash <NA> B
On Tue, Aug 18, 2015 at 3:06 PM, Jon BR <jonsleepy at gmail.com>
wrote:> df <-
>
data.frame(id=c("A","A","B","B"),first=c("BX",NA,NA,"LF"),second=c(NA,"TD","BZ",NA),third=c(NA,NA,"RB","BT"),fourth=c("LG","QR",NA,NA))
>> df