I know one of R's advantages is it's ability to index, eliminating the need for control loops to select relevant data, so I thought this problem would be easy. I can't crack it. I have looked through past postings, but nothing seems to match this problem I have a data set with one column of actors and one column of acts. I need a list that will give me a pair of actors in each row, provided they both participated in the act. Example: The Data looks like this: Jim A Bob A Bob C Larry D Alice C Tom F Tom D Tom A Alice B Nancy B I would like this: Jim Bob Jim Tom Bob Alice Larry Tom Alice Nancy The order doesn't matter (Jim-Bob vs. Bob-Jim), but each pairing should be counted only once. Thanks! -- View this message in context: http://n4.nabble.com/Using-indexing-to-manipulate-data-tp1597547p1597547.html Sent from the R help mailing list archive at Nabble.com.
One approach is the following:
Dat <- read.table(textConnection(
"Jim A
Bob A
Bob C
Larry D
Alice C
Tom F
Tom D
Tom A
Alice B
Nancy B"))
closeAllConnections()
names(Dat) <- c("name", "act")
out <- tapply(as.character(Dat$name), Dat$act, function (x) {
if (length(x) < 2) c(x, "") else t(combn(x, 2))
})
unique(do.call(rbind, out))
I hope it helps.
Best,
Dimitris
On 3/18/2010 6:05 AM, duncandonutz wrote:>
> I know one of R's advantages is it's ability to index, eliminating
the need
> for control loops to select relevant data, so I thought this problem would
> be easy. I can't crack it. I have looked through past postings, but
> nothing seems to match this problem
>
> I have a data set with one column of actors and one column of acts. I need
> a list that will give me a pair of actors in each row, provided they both
> participated in the act.
>
> Example:
>
> The Data looks like this:
> Jim A
> Bob A
> Bob C
> Larry D
> Alice C
> Tom F
> Tom D
> Tom A
> Alice B
> Nancy B
>
> I would like this:
> Jim Bob
> Jim Tom
> Bob Alice
> Larry Tom
> Alice Nancy
>
> The order doesn't matter (Jim-Bob vs. Bob-Jim), but each pairing should
be
> counted only once.
> Thanks!
>
--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center
Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
On 03/18/2010 04:05 PM, duncandonutz wrote:> > I know one of R's advantages is it's ability to index, eliminating the need > for control loops to select relevant data, so I thought this problem would > be easy. I can't crack it. I have looked through past postings, but > nothing seems to match this problem > > I have a data set with one column of actors and one column of acts. I need > a list that will give me a pair of actors in each row, provided they both > participated in the act. > > Example: > > The Data looks like this: > Jim A > Bob A > Bob C > Larry D > Alice C > Tom F > Tom D > Tom A > Alice B > Nancy B > > I would like this: > Jim Bob > Jim Tom > Bob Alice > Larry Tom > Alice Nancy > > The order doesn't matter (Jim-Bob vs. Bob-Jim), but each pairing should be > counted only once.Hi duncandonutz, Try this: actnames<-read.table("junkfunc/names.dat",stringsAsFactors=FALSE) actorpairs<-NULL for(act in unique(actnames$V2)) { actors<-actnames$V1[actnames$V2 == act] nactors<-length(actors) if(nactors > 1) { indices<-combn(nactors,2) for(i in 1:dim(indices)[2]) actorpairs<- rbind(actorpairs,c(actors[indices[1,i]],actors[indices[2,i]])) } } actorpairs Jim
Here are two solutions. The first uses merge and the second uses
sqldf. They both do a self join picking off the unique pairs. The
sqldf solution also sorts the result:
# input
DF <- structure(list(Actor = c("Jim", "Bob",
"Bob", "Larry", "Alice",
"Tom", "Tom", "Tom", "Alice",
"Nancy"), Act = c("A", "A", "C",
"D", "C", "F", "D", "A",
"B", "B")), .Names = c("Actor", "Act"
), class = "data.frame", row.names = c(NA, -10L))
subset(unique(merge(DF, DF, by = 2)), Actor.x < Actor.y)
library(sqldf) # see http://sqldf.googlecode.com
sqldf("select A.Actor, A.Act, B.Act
from DF A join DF B
where A.Act = B.Act and A.Actor < B.Actor
order by A.Act, A.Actor")
On Thu, Mar 18, 2010 at 1:05 AM, duncandonutz <dwadswor at unm.edu>
wrote:>
> I know one of R's advantages is it's ability to index, eliminating
the need
> for control loops to select relevant data, so I thought this problem would
> be easy. ?I can't crack it. ?I have looked through past postings, but
> nothing seems to match this problem
>
> I have a data set with one column of actors and one column of acts. ?I need
> a list that will give me a pair of actors in each row, provided they both
> participated in the act.
>
> Example:
>
> The Data looks like this:
> Jim ? ? ? ? A
> Bob ? ? ? ?A
> Bob ? ? ? ?C
> Larry ? ? ?D
> Alice ? ? ?C
> Tom ? ? ? F
> Tom ? ? ? D
> Tom ? ? ? A
> Alice ? ? ?B
> Nancy ? ?B
>
> I would like this:
> Jim ? ? ?Bob
> Jim ? ? ?Tom
> Bob ? ? Alice
> Larry ? Tom
> Alice ? ?Nancy
>
> The order doesn't matter (Jim-Bob vs. Bob-Jim), but each pairing should
be
> counted only once.
> Thanks!
>
> --
> View this message in context:
http://n4.nabble.com/Using-indexing-to-manipulate-data-tp1597547p1597547.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Apparently Analagous Threads
- Question RE Rails associations
- [LLVMdev] Integer handling
- has_many :through and scopes: how to mutate the set of associated objects?
- RSpec view spec writing problem (unable to generate url_for in RESTful resource link_to)
- [LLVMdev] LLVM and coroutines/microthreads