Donald Macnaughton
2009-Mar-21 16:01 UTC
[R] How Can I Concatenate Every Row in a Data Frame with Every Other Row?
I have a data frame with roughly 500 rows and 120 variables. I would like to generate a new data frame that will include one row for each PAIR of rows in the original data frame and will include all 120 + 120 = 240 variables from the two rows. I need only one row for each pair, not two rows. Thus the new data frame will contain 500 x 499 / 2 = 124,750 rows. Is there an easy way to do this with R? Thanks in advance, Don Macnaughton
jim holtman
2009-Mar-21 16:09 UTC
[R] How Can I Concatenate Every Row in a Data Frame with Every Other Row?
Try this:> x <- data.frame(a=1:100, b=100:1, c=sample(100)) > # assume even number of rows: bind the even/odd together > even <- seq(nrow(x)) %% 2 > new.x <- cbind(x[even==1,], x[even==0,]) > > > head(new.x)a b c a.1 b.1 c.1 1 1 100 69 2 99 60 3 3 98 24 4 97 26 5 5 96 71 6 95 43 7 7 94 17 8 93 70 9 9 92 10 10 91 79 11 11 90 56 12 89 50>On Sat, Mar 21, 2009 at 12:01 PM, Donald Macnaughton <donmac at matstat.com> wrote:> I have a data frame with roughly 500 rows and 120 variables. ?I would like > to generate a new data frame that will include one row for each PAIR of > rows in the original data frame and will include all 120 + 120 = 240 > variables from the two rows. ?I need only one row for each pair, not two > rows. ?Thus the new data frame will contain 500 x 499 / 2 = 124,750 rows. > > Is there an easy way to do this with R? > > Thanks in advance, > > Don Macnaughton > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Duncan Murdoch
2009-Mar-21 16:23 UTC
[R] How Can I Concatenate Every Row in a Data Frame with Every Other Row?
On 21/03/2009 12:01 PM, Donald Macnaughton wrote:> I have a data frame with roughly 500 rows and 120 variables. I would like > to generate a new data frame that will include one row for each PAIR of > rows in the original data frame and will include all 120 + 120 = 240 > variables from the two rows. I need only one row for each pair, not two > rows. Thus the new data frame will contain 500 x 499 / 2 = 124,750 rows. > > Is there an easy way to do this with R?Probably the easiest is to generate row indices for each pair, e.g. n <- nrow(mydata) row1 <- rep(1:n, n) row2 <- rep(1:n, each=n) keep <- row1 < row2 big <- cbind(mydata[row1[keep],], mydata[row2[keep],]) With a simple example > mydata <- data.frame(a=1:3, b=letters[1:3]) > mydata a b 1 1 a 2 2 b 3 3 c this produces > big a b a b 1 1 a 2 b 1.1 1 a 3 c 2 2 b 3 c
David Winsemius
2009-Mar-21 16:41 UTC
[R] How Can I Concatenate Every Row in a Data Frame with Every Other Row?
I hacked at a bit differently than Duncan. See if these help pages and this example point another way: ?combn ?"[" > df <- data.frame(a = 1:4, b=LETTERS[1:4]) > n <- nrow(df) > cbind(df[combn(1:n,2)[1,],], df[combn(1:n,2)[2,],] ) a b a b 1 1 A 2 B 1.1 1 A 3 C 1.2 1 A 4 D 2 2 B 3 C 2.1 2 B 4 D 3 3 C 4 D -- David Winsemius On Mar 21, 2009, at 12:01 PM, Donald Macnaughton wrote:> I have a data frame with roughly 500 rows and 120 variables. I > would like > to generate a new data frame that will include one row for each PAIR > of > rows in the original data frame and will include all 120 + 120 = 240 > variables from the two rows. I need only one row for each pair, not > two > rows. Thus the new data frame will contain 500 x 499 / 2 = 124,750 > rows. > > Is there an easy way to do this with R? >David Winsemius, MD Heritage Laboratories West Hartford, CT
Donald Macnaughton
2009-Mar-21 19:46 UTC
[R] How Can I Concatenate Every Row in a Data Frame with Every Other Row?
On Sat, Mar 21, 2009 at 12:01 PM, I wrote:> I have a data frame with roughly 500 rows and 120 variables. ? > I would like to generate a new data frame that will include > one row for each PAIR of rows in the original data frame and > will include all 120 + 120 = 240 variables from the two rows. ? > I need only one row for each pair, not two rows. ?Thus the > new data frame will contain 500 x 499 / 2 = 124,750 rows. > > Is there an easy way to do this with R? > > Thanks in advance, > > Don MacnaughtonI thank David Wisemius, Duncan Murdoch, and Jim Holtman for their helpful replies. Jim wrote> What is the problem that you are trying to solve?This work is for a client whose son was accused of cheating on a multiple choice exam. One can investigate this matter statistically by computing the number of matching answers to questions on the exam between all pairs of students. Of course under the null hypothesis of no cheating the number of matching answers has a certain distribution, which allows one to reject the null hypothesis if the number of matching answers is unduly large for a particular pair. (The distribution is generally taken with respect to the average number of correct answers in a given pair because the more correct answers, the more matches can be expected under the null hypothesis.) Wesolowsky (2000) discusses some of the statistical and ethical aspects of this exercise. Don Macnaughton REFERENCE Wesolowsky, G. O. 2000. Detecting excessive similarity in answers on multiple choice exams. _Journal of Applied Statistics,_ 27, 909-921.