I know one of R's advantages is it's ability to index, eliminating the need for control loops to select relevant data, so I thought this problem would be easy. I can't crack it. I have looked through past postings, but nothing seems to match this problem I have a data set with one column of actors and one column of acts. I need a list that will give me a pair of actors in each row, provided they both participated in the act. Example: The Data looks like this: Jim A Bob A Bob C Larry D Alice C Tom F Tom D Tom A Alice B Nancy B I would like this: Jim Bob Jim Tom Bob Alice Larry Tom Alice Nancy The order doesn't matter (Jim-Bob vs. Bob-Jim), but each pairing should be counted only once. Thanks! -- View this message in context: http://n4.nabble.com/Using-indexing-to-manipulate-data-tp1597547p1597547.html Sent from the R help mailing list archive at Nabble.com.
One approach is the following: Dat <- read.table(textConnection( "Jim A Bob A Bob C Larry D Alice C Tom F Tom D Tom A Alice B Nancy B")) closeAllConnections() names(Dat) <- c("name", "act") out <- tapply(as.character(Dat$name), Dat$act, function (x) { if (length(x) < 2) c(x, "") else t(combn(x, 2)) }) unique(do.call(rbind, out)) I hope it helps. Best, Dimitris On 3/18/2010 6:05 AM, duncandonutz wrote:> > I know one of R's advantages is it's ability to index, eliminating the need > for control loops to select relevant data, so I thought this problem would > be easy. I can't crack it. I have looked through past postings, but > nothing seems to match this problem > > I have a data set with one column of actors and one column of acts. I need > a list that will give me a pair of actors in each row, provided they both > participated in the act. > > Example: > > The Data looks like this: > Jim A > Bob A > Bob C > Larry D > Alice C > Tom F > Tom D > Tom A > Alice B > Nancy B > > I would like this: > Jim Bob > Jim Tom > Bob Alice > Larry Tom > Alice Nancy > > The order doesn't matter (Jim-Bob vs. Bob-Jim), but each pairing should be > counted only once. > Thanks! >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
On 03/18/2010 04:05 PM, duncandonutz wrote:> > I know one of R's advantages is it's ability to index, eliminating the need > for control loops to select relevant data, so I thought this problem would > be easy. I can't crack it. I have looked through past postings, but > nothing seems to match this problem > > I have a data set with one column of actors and one column of acts. I need > a list that will give me a pair of actors in each row, provided they both > participated in the act. > > Example: > > The Data looks like this: > Jim A > Bob A > Bob C > Larry D > Alice C > Tom F > Tom D > Tom A > Alice B > Nancy B > > I would like this: > Jim Bob > Jim Tom > Bob Alice > Larry Tom > Alice Nancy > > The order doesn't matter (Jim-Bob vs. Bob-Jim), but each pairing should be > counted only once.Hi duncandonutz, Try this: actnames<-read.table("junkfunc/names.dat",stringsAsFactors=FALSE) actorpairs<-NULL for(act in unique(actnames$V2)) { actors<-actnames$V1[actnames$V2 == act] nactors<-length(actors) if(nactors > 1) { indices<-combn(nactors,2) for(i in 1:dim(indices)[2]) actorpairs<- rbind(actorpairs,c(actors[indices[1,i]],actors[indices[2,i]])) } } actorpairs Jim
Here are two solutions. The first uses merge and the second uses sqldf. They both do a self join picking off the unique pairs. The sqldf solution also sorts the result: # input DF <- structure(list(Actor = c("Jim", "Bob", "Bob", "Larry", "Alice", "Tom", "Tom", "Tom", "Alice", "Nancy"), Act = c("A", "A", "C", "D", "C", "F", "D", "A", "B", "B")), .Names = c("Actor", "Act" ), class = "data.frame", row.names = c(NA, -10L)) subset(unique(merge(DF, DF, by = 2)), Actor.x < Actor.y) library(sqldf) # see http://sqldf.googlecode.com sqldf("select A.Actor, A.Act, B.Act from DF A join DF B where A.Act = B.Act and A.Actor < B.Actor order by A.Act, A.Actor") On Thu, Mar 18, 2010 at 1:05 AM, duncandonutz <dwadswor at unm.edu> wrote:> > I know one of R's advantages is it's ability to index, eliminating the need > for control loops to select relevant data, so I thought this problem would > be easy. ?I can't crack it. ?I have looked through past postings, but > nothing seems to match this problem > > I have a data set with one column of actors and one column of acts. ?I need > a list that will give me a pair of actors in each row, provided they both > participated in the act. > > Example: > > The Data looks like this: > Jim ? ? ? ? A > Bob ? ? ? ?A > Bob ? ? ? ?C > Larry ? ? ?D > Alice ? ? ?C > Tom ? ? ? F > Tom ? ? ? D > Tom ? ? ? A > Alice ? ? ?B > Nancy ? ?B > > I would like this: > Jim ? ? ?Bob > Jim ? ? ?Tom > Bob ? ? Alice > Larry ? Tom > Alice ? ?Nancy > > The order doesn't matter (Jim-Bob vs. Bob-Jim), but each pairing should be > counted only once. > Thanks! > > -- > View this message in context: http://n4.nabble.com/Using-indexing-to-manipulate-data-tp1597547p1597547.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Apparently Analagous Threads
- Question RE Rails associations
- [LLVMdev] Integer handling
- has_many :through and scopes: how to mutate the set of associated objects?
- RSpec view spec writing problem (unable to generate url_for in RESTful resource link_to)
- [LLVMdev] LLVM and coroutines/microthreads