thr3ads.net - R help - [R] data frame formatting [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Jon BR

2015-Aug-18 19:06 UTC

[R] data frame formatting

Hello all,
    I would like to take a data frame such as the following one:
> df <-
data.frame(id=c("A","A","B","B"),first=c("BX",NA,NA,"LF"),second=c(NA,"TD","BZ",NA),third=c(NA,NA,"RB","BT"),fourth=c("LG","QR",NA,NA))> df  id first second third fourth
1  A    BX   <NA>  <NA>     LG
2  A  <NA>     TD  <NA>     QR
3  B  <NA>     BZ    RB   <NA>
4  B    LF   <NA>    BT   <NA>

and merge rows based on the id, such that the value in the column will be
one of four possibilities: if both values in the original df are <NA>, the
new value should also be <NA>.  If there are two non-NA values, then the
new value should read "clash".  Otherwise, the new value should be
whichever value was not <NA>.

An example output from the command would read in df and read out:


  id first second third fourth
1  A    BX   TD  <NA>     clash
2  B    LF   BZ    clash   <NA>


I'd be grateful if someone could point me in the right direction.

Thanks,
Jonathan

	[[alternative HTML version deleted]]

boB Rudis

2015-Aug-18 19:27 UTC

head link

[R] data frame formatting

Here's one way in base R:

df <-
data.frame(id=c("A","A","B","B"),
                 first=c("BX",NA,NA,"LF"),
                 second=c(NA,"TD","BZ",NA),
                 third=c(NA,NA,"RB","BT"),
                 fourth=c("LG","QR",NA,NA))


new_df <- data.frame(do.call(rbind, by(df, df$id, function(x) {
  sapply(x[,-1], function(y) {
    if (all(is.na(y))) return(NA)
    if (all(!is.na(y))) return("clash")
    return(as.character(y[which(!is.na(y))]))
  })
})))

new_df$id <- rownames(new_df)
rownames(new_df) <- NULL

new_df

##   first second third fourth id
## 1    BX     TD  <NA>  clash  A
## 2    LF     BZ clash   <NA>  B


On Tue, Aug 18, 2015 at 3:06 PM, Jon BR <jonsleepy at gmail.com>
wrote:> df <-
>
data.frame(id=c("A","A","B","B"),first=c("BX",NA,NA,"LF"),second=c(NA,"TD","BZ",NA),third=c(NA,NA,"RB","BT"),fourth=c("LG","QR",NA,NA))
>> df

R help - Aug 2015 - data frame formatting

[R] data frame formatting

[R] data frame formatting