Dear List: I have a data manipulation problem that I was unable to solve in R. I did it in SQL, and it may be that the solution in R is to do it in SQL, but I wondered if people could imagine a vector-based solution. Imagine a list A[i] of observers who observe some set of events B[j]. Each observer i may observe one or more events, and each event j may have been observed by one or more observers. Thus the data are a lower-triangular array AxB where each cell [i,j] has a zero or one indicating whether observer i saw event j. I am interested in how observers cluster in circuits whereby observer _a_ sees events _1,2,3_, observer _b_ sees events _2,4,5_, observer _c_ sees event _4_, and observer _d_ sees _4,6,7_. Observers a, b, c, d comprise a circuit linked by the events they jointly observed. Given AxB, how can we use R to articulate the circuits? Pseudocode for my SQL solution is below. For each observation i: - get all the events [j1,j2,...jk] observed by i - get all the observations [i_m] which observe one or more events in [j1,j2,...jk] - count [i_m] - assign events [i_m] to i's circuit if i_m > i - end What would an R solution look like? The "get all the" and the "assign events" are pure SQL reasoning (select, update). Thanks in advance. Best. - PB Patrick Ball, PhD. Deputy Director AAAS Science and Human Rights Program http://shr.aaas.org __________________________________________________ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
I may not have properly understood your problem, but if what you want is to arrive at is a grouping of observers into non-overlapping groups such that two observers are in the same group exactly when they have observed a common event, then it boils down to a basic graph problem: find the connected components of a graph. There are efficient algorithms for this problem but in my opinion they are more suited to C than R due to the (data-dependent) way they trace through the data structure (i.e., I can't see a way to vectorize it). Perhaps other algorithms exist which might be more suitable for R but I am not aware of them. What I do is just have a simple stand-alone program to do this, and communicate with it via flat files or pipes. If the overhead started to become a problem it might be worthwhile to explore a better "interface" but I'm not there yet. I don't know of an existing R package for this or for other common graph problems, but there are fairly comprehensive C or C++ libraries available. S. Skiena, The Algorithm Design Manual, is useful and has a lot of references, see http://www.algorist.com/ for links. Assumming I've correctly understood the problem, the algorithm you've sketched seems not to take proper account of observers (nodes) separated by more than one other observer. After the first two steps, you have for each i the set N(i) of its neighbors (nodes connected to i by an edge) and from this data it is essentially the same problem as before to decide which of these sets overlap etc. In other words you need to iterate these steps as many times as the length of the longest distance in the graph... This could be made to work but isn't efficient. Hope that helps, Reid Huntsinger -----Original Message----- From: Patrick Ball [mailto:lookout_20005 at yahoo.com] Sent: Sunday, April 15, 2001 5:02 PM To: r-help at lists.R-project.org Subject: [R] data manipulation in R Dear List: I have a data manipulation problem that I was unable to solve in R. I did it in SQL, and it may be that the solution in R is to do it in SQL, but I wondered if people could imagine a vector-based solution. Imagine a list A[i] of observers who observe some set of events B[j]. Each observer i may observe one or more events, and each event j may have been observed by one or more observers. Thus the data are a lower-triangular array AxB where each cell [i,j] has a zero or one indicating whether observer i saw event j. I am interested in how observers cluster in circuits whereby observer _a_ sees events _1,2,3_, observer _b_ sees events _2,4,5_, observer _c_ sees event _4_, and observer _d_ sees _4,6,7_. Observers a, b, c, d comprise a circuit linked by the events they jointly observed. Given AxB, how can we use R to articulate the circuits? Pseudocode for my SQL solution is below. For each observation i: - get all the events [j1,j2,...jk] observed by i - get all the observations [i_m] which observe one or more events in [j1,j2,...jk] - count [i_m] - assign events [i_m] to i's circuit if i_m > i - end What would an R solution look like? The "get all the" and the "assign events" are pure SQL reasoning (select, update). Thanks in advance. Best. - PB Patrick Ball, PhD. Deputy Director AAAS Science and Human Rights Program http://shr.aaas.org __________________________________________________ Do You Yahoo!? Get email at your own domain with Yahoo! Mail. http://personal.mail.yahoo.com/ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. -.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Sun, 15 Apr 2001, Patrick Ball wrote:> Dear List: > > I have a data manipulation problem that I was unable > to solve in R. I did it in SQL, and it may be that > the solution in R is to do it in SQL, but I wondered > if people could imagine a vector-based solution. > > Imagine a list A[i] of observers who observe some set > of events B[j]. Each observer i may observe one or > more events, and each event j may have been observed > by one or more observers. Thus the data are a > lower-triangular array AxB where each cell [i,j] has a > zero or one indicating whether observer i saw event j. > > > I am interested in how observers cluster in circuits > whereby observer _a_ sees events _1,2,3_, observer _b_ > sees events _2,4,5_, observer _c_ sees event _4_, and > observer _d_ sees _4,6,7_. Observers a, b, c, d > comprise a circuit linked by the events they jointly > observed.I don't see why this is a lower-triangular matrix. If every observer saw every event wouldn't it be a rectangular matrix of 1s? You can solve this in R, but I shouldn't think it would be very efficient. I think it's going to involve iteration over either the events or the observers. One solution: Suppose A is the matrix of zeros & ones, in your case> A[,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,] 1 1 1 0 0 0 0 [2,] 0 1 0 1 1 0 0 [3,] 0 0 0 1 0 0 0 [4,] 0 0 0 1 0 1 1 We can make an incidence matrix B for the graph nevent<-NCOL(A) nobs<-NROW(A) B<-matrix(rowsum(apply(A,2,function(x) outer(x,x)),rep(1,nobs*nobs)),ncol=nobs) And now find the connected components by powering up B D<-(B>0) for(i in 1:ceiling(log(nobs,2))){ Dnew<-(D%*%D)>0 if (all(Dnew==D)) break D<-Dnew } I think we now have D[i,j]==TRUE if i,j are in the same circuit. -thomas Thomas Lumley Asst. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._