Daniel Malter
2012-Apr-28 14:46 UTC
[R] Consolidate column contents of equally "named" columns
Hi, I have a data frame whose first row (not the header) contains the true column names. The same column name can occur multiple times in the dataset. Columns with equal names are not adjacent, and for each observation only one of the equally named columns contains the actual data (see the example below). I am looking for an easy method consolidate these columns into one column for each unique column name. Say, x1<-c("x",1,NA,NA) x2<-c("x",NA,2,NA) x3<-c("x",NA,NA,3) y1<-c("y",3,NA,NA) y2<-c("y",NA,1,NA) y3<-c("y",NA,NA,2) d<-data.frame(x1,y1,x2,y2,x3,y3) d # d looks like: x1 y1 x2 y2 x3 y3 1 x y x y x y 2 1 3 <NA> <NA> <NA> <NA> 3 <NA> <NA> 2 1 <NA> <NA> 4 <NA> <NA> <NA> <NA> 3 2>From this, I want to create the table or data framex y 1 3 2 1 3 2 I would appreciate your help. Daniel -- View this message in context: http://r.789695.n4.nabble.com/Consolidate-column-contents-of-equally-named-columns-tp4594852p4594852.html Sent from the R help mailing list archive at Nabble.com.
Rui Barradas
2012-Apr-28 16:33 UTC
[R] Consolidate column contents of equally "named" columns
Hello, This solution is not very pretty but it works. nms <- unlist(d[1, ]) nm <- unique(nms) dd <- na.exclude(sapply(nm, function(jj){ inx <- nms %in% jj do.call(rbind, as.list(d[, inx])) })) dd <- dd[ dd[ , nm[1]] != nm[1], ] dd <- data.frame(apply(dd, 2, as.integer)) dd Hope this helps, Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/Consolidate-column-contents-of-equally-named-columns-tp4594852p4594980.html Sent from the R help mailing list archive at Nabble.com.
David Winsemius
2012-Apr-29 13:05 UTC
[R] Consolidate column contents of equally "named" columns
On Apr 28, 2012, at 10:46 AM, Daniel Malter wrote:> Hi, > > I have a data frame whose first row (not the header) contains the true > column names. The same column name can occur multiple times in the > dataset. > Columns with equal names are not adjacent, and for each observation > only one > of the equally named columns contains the actual data (see the example > below). I am looking for an easy method consolidate these columns > into one > column for each unique column name. Say, > > x1<-c("x",1,NA,NA) > x2<-c("x",NA,2,NA) > x3<-c("x",NA,NA,3) > y1<-c("y",3,NA,NA) > y2<-c("y",NA,1,NA) > y3<-c("y",NA,NA,2) > d<-data.frame(x1,y1,x2,y2,x3,y3) > d >It would avoid problems with manipulating factors it these were created (or converted to) character columns, choose one of: d=data.frame(x1,y1,x2,y2,x3,y3, stringsAsFactors=FALSE) d[]<-lapply(d, as.character)> # d looks like: > > x1 y1 x2 y2 x3 y3 > 1 x y x y x y > 2 1 3 <NA> <NA> <NA> <NA> > 3 <NA> <NA> 2 1 <NA> <NA> > 4 <NA> <NA> <NA> <NA> 3 2 > >> From this, I want to create the table or data frame > > x y > 1 3 > 2 1 > 3 2na.omit( data.frame( X=stack(d[-1,grep("x", names(d))]), Y=stack(d[-1,grep("y", names(d))]), stringsAsFactors=FALSE)[ c(1,3) ]) X.values Y.values 1 1 3 5 2 1 9 3 2 If it were less regular you might need to merge with the "source" columns that stack generates. -- David Winsemius, MD Heritage Laboratories West Hartford, CT
Bert Gunter
2012-Apr-29 14:48 UTC
[R] Consolidate column contents of equally "named" columns
I believe the regularity of the problem allows a (to me, anyway) simpler procedure. td <- t(apply(d,2, na.omit)) data.frame(split(as.numeric(td[,-1]),td[,1])) -- Bert On Sat, Apr 28, 2012 at 9:33 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:> Hello, > > This solution is not very pretty but it works. > > nms <- unlist(d[1, ]) > nm <- unique(nms) > dd <- na.exclude(sapply(nm, function(jj){ > ? ? ? ? ? ? ? ?inx <- nms %in% jj > ? ? ? ? ? ? ? ?do.call(rbind, as.list(d[, inx])) > ? ? ? ?})) > dd <- dd[ dd[ , nm[1]] != nm[1], ] > dd <- data.frame(apply(dd, 2, as.integer)) > dd > > Hope this helps, > > Rui Barradas > > > -- > View this message in context: http://r.789695.n4.nabble.com/Consolidate-column-contents-of-equally-named-columns-tp4594852p4594980.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm