Mark Na
2009-Jun-11 17:13 UTC
[R] Expand a contingency table based on the value in one column
Hi R-helpers, I have the following (dummy) dataframe:> testDATE LOCATION KIND CLASS COUNT 1 1 1 CAR A 2 2 1 1 TRUCK D 3 3 1 1 BUS E 4 4 1 2 CAR E 2 5 1 2 TRUCK A 7 6 1 2 BUS F 1 That I would like to turn into this:> test2DATE LOCATION KIND CLASS 1 1 1 CAR A 2 1 1 CAR A 3 1 1 TRUCK D 4 1 1 TRUCK D 5 1 1 TRUCK D 6 1 1 BUS E 7 1 1 BUS E 8 1 1 BUS E 9 1 1 BUS E 10 1 2 CAR E 11 1 2 CAR E 12 1 2 TRUCK A 13 1 2 TRUCK A 14 1 2 TRUCK A 15 1 2 TRUCK A 16 1 2 TRUCK A 17 1 2 TRUCK A 18 1 2 TRUCK A 19 1 2 BUS F So, basically it's a case of expanding (adding rows to) the first dataframe by the value in the COUNT column. I have solved this problem with the following code: test2<-with(test, data.frame(DATE=rep(DATE,COUNT), LOCATION=rep(LOCATION,COUNT), KIND=rep(KIND,COUNT), CLASS=rep(CLASS,COUNT))) but I'm unsatisfied with that solution because it's verbose and I think there must a more elegant way. If I had more variables than 4 (which I do in my real data) it would be a nuisance to repeat each column within the rep function. I would prefer to do this with Base R or package(reshape) than relying on another package. Any ideas? Thanks! Mark Na [[alternative HTML version deleted]]
David Winsemius
2009-Jun-11 17:27 UTC
[R] Expand a contingency table based on the value in one column
On Jun 11, 2009, at 1:13 PM, Mark Na wrote:> Hi R-helpers, > > I have the following (dummy) dataframe: > >> test > DATE LOCATION KIND CLASS COUNT > 1 1 1 CAR A 2 > 2 1 1 TRUCK D 3 > 3 1 1 BUS E 4 > 4 1 2 CAR E 2 > 5 1 2 TRUCK A 7 > 6 1 2 BUS F 1 > > That I would like to turn into this: > >> test2 > DATE LOCATION KIND CLASS > 1 1 1 CAR A > 2 1 1 CAR A > 3 1 1 TRUCK D > 4 1 1 TRUCK D > 5 1 1 TRUCK D > 6 1 1 BUS E > 7 1 1 BUS E > 8 1 1 BUS E > 9 1 1 BUS E > 10 1 2 CAR E > 11 1 2 CAR E > 12 1 2 TRUCK A > 13 1 2 TRUCK A > 14 1 2 TRUCK A > 15 1 2 TRUCK A > 16 1 2 TRUCK A > 17 1 2 TRUCK A > 18 1 2 TRUCK A > 19 1 2 BUS F >test2 <- test[rep(1:nrow(test), test$COUNT),] rownames(test2) <- 1:nrow(test2) test2> So, basically it's a case of expanding (adding rows to) the first > dataframe > by the value in the COUNT column. > > I have solved this problem with the following code: > > test2<-with(test, data.frame(DATE=rep(DATE,COUNT), > LOCATION=rep(LOCATION,COUNT), KIND=rep(KIND,COUNT), > CLASS=rep(CLASS,COUNT))) > > but I'm unsatisfied with that solution because it's verbose and I > think > there must a more elegant way. If I had more variables than 4 (which > I do in > my real data) it would be a nuisance to repeat each column within > the rep > function. > > I would prefer to do this with Base R or package(reshape) than > relying on > another package. > > Any ideas? Thanks! > > Mark Na > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT
Marc Schwartz
2009-Jun-11 17:33 UTC
[R] Expand a contingency table based on the value in one column
On Jun 11, 2009, at 12:13 PM, Mark Na wrote:> Hi R-helpers, > > I have the following (dummy) dataframe: > >> test > DATE LOCATION KIND CLASS COUNT > 1 1 1 CAR A 2 > 2 1 1 TRUCK D 3 > 3 1 1 BUS E 4 > 4 1 2 CAR E 2 > 5 1 2 TRUCK A 7 > 6 1 2 BUS F 1 > > That I would like to turn into this: > >> test2 > DATE LOCATION KIND CLASS > 1 1 1 CAR A > 2 1 1 CAR A > 3 1 1 TRUCK D > 4 1 1 TRUCK D > 5 1 1 TRUCK D > 6 1 1 BUS E > 7 1 1 BUS E > 8 1 1 BUS E > 9 1 1 BUS E > 10 1 2 CAR E > 11 1 2 CAR E > 12 1 2 TRUCK A > 13 1 2 TRUCK A > 14 1 2 TRUCK A > 15 1 2 TRUCK A > 16 1 2 TRUCK A > 17 1 2 TRUCK A > 18 1 2 TRUCK A > 19 1 2 BUS F > > So, basically it's a case of expanding (adding rows to) the first > dataframe > by the value in the COUNT column. > > I have solved this problem with the following code: > > test2<-with(test, data.frame(DATE=rep(DATE,COUNT), > LOCATION=rep(LOCATION,COUNT), KIND=rep(KIND,COUNT), > CLASS=rep(CLASS,COUNT))) > > but I'm unsatisfied with that solution because it's verbose and I > think > there must a more elegant way. If I had more variables than 4 (which > I do in > my real data) it would be a nuisance to repeat each column within > the rep > function. > > I would prefer to do this with Base R or package(reshape) than > relying on > another package. > > Any ideas? Thanks! > > Mark NaMark, A quick and dirty solution: > test[rep(1:nrow(test), test$COUNT), -ncol(test)] DATE LOCATION KIND CLASS 1 1 1 CAR A 1.1 1 1 CAR A 2 1 1 TRUCK D 2.1 1 1 TRUCK D 2.2 1 1 TRUCK D 3 1 1 BUS E 3.1 1 1 BUS E 3.2 1 1 BUS E 3.3 1 1 BUS E 4 1 2 CAR E 4.1 1 2 CAR E 5 1 2 TRUCK A 5.1 1 2 TRUCK A 5.2 1 2 TRUCK A 5.3 1 2 TRUCK A 5.4 1 2 TRUCK A 5.5 1 2 TRUCK A 5.6 1 2 TRUCK A 6 1 2 BUS F For a more general solution to taking a tabulated data frame and converting it back to the raw data see my expand.dft() function: https://stat.ethz.ch/pipermail/r-help/2009-January/185561.html For example: > expand.dft(test, freq = "COUNT") DATE LOCATION KIND CLASS 1 1 1 CAR A 2 1 1 CAR A 3 1 1 TRUCK D 4 1 1 TRUCK D 5 1 1 TRUCK D 6 1 1 BUS E 7 1 1 BUS E 8 1 1 BUS E 9 1 1 BUS E 10 1 2 CAR E 11 1 2 CAR E 12 1 2 TRUCK A 13 1 2 TRUCK A 14 1 2 TRUCK A 15 1 2 TRUCK A 16 1 2 TRUCK A 17 1 2 TRUCK A 18 1 2 TRUCK A 19 1 2 BUS F HTH, Marc Schwartz