Christofer Bogaso
2012-Dec-06 19:35 UTC
[R] Can somebody help me with following data manipulation?
Dear all, let say I have following data: dat <- structure(list(V1 = structure(c(1L, 4L, 5L, 3L, 3L, 5L, 6L, 6L, 4L, 3L, 5L, 6L, 5L, 5L, 4L, 4L, 6L, 2L, 3L, 4L, 3L, 3L, 2L, 5L, 3L, 6L, 3L, 3L, 6L, 3L, 6L, 1L, 6L, 5L, 2L, 2L), .Label = c("C", "G", "I", "O", "R", "T"), class = "factor"), V2 = c(0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L), V3 = c(1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L)), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, -36L)) Now I want to get following kind of data frame out of that: dat1 <- structure(list(V1 = structure(c(3L, 3L, 1L, 1L, 2L, 2L), .Label = c("C", "G", "I"), class = "factor"), V2 = c(0L, 1L, 0L, 1L, 0L, 1L), V3 = c(0.333333333, 0.428571429, 0.5, NA, 1, NA)), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, -6L)) Basically in 'dat1', the 3rd column is coming from: for 'V1 = I' & 'V2 = 0' what is the percentage of '1' for "V3" and so on..... Is there any R function to achieve that directly? Thanks and regards,
Sarah Goslee
2012-Dec-06 20:03 UTC
[R] Can somebody help me with following data manipulation?
If I understand what you want correctly, aggregate() should do it.> aggregate(V3 ~ V1 + V2, "mean", data=dat)V1 V2 V3 1 C 0 0.5000000 2 G 0 1.0000000 3 I 0 0.3333333 4 O 0 1.0000000 5 R 0 0.0000000 6 T 0 0.8333333 7 I 1 0.4285714 8 O 1 0.0000000 9 R 1 0.6666667 10 T 1 0.5000000 That returns the combinations that actually exist. If you convert V1 and V2 to factors, thus setting the possible levels, all combinations will be returned:> dat$V1 <- factor(dat$V1) > dat$V2 <- factor(dat$V2) > aggregate(V3 ~ V1 + V2, "mean", data=dat)V1 V2 V3 1 C 0 0.5000000 2 G 0 1.0000000 3 I 0 0.3333333 4 O 0 1.0000000 5 R 0 0.0000000 6 T 0 0.8333333 7 I 1 0.4285714 8 O 1 0.0000000 9 R 1 0.6666667 10 T 1 0.5000000 Sarah On Thu, Dec 6, 2012 at 2:35 PM, Christofer Bogaso <bogaso.christofer at gmail.com> wrote:> Dear all, let say I have following data: > > dat <- structure(list(V1 = structure(c(1L, 4L, 5L, 3L, 3L, 5L, 6L, 6L, > 4L, 3L, 5L, 6L, 5L, 5L, 4L, 4L, 6L, 2L, 3L, 4L, 3L, 3L, 2L, 5L, > 3L, 6L, 3L, 3L, 6L, 3L, 6L, 1L, 6L, 5L, 2L, 2L), .Label = c("C", > "G", "I", "O", "R", "T"), class = "factor"), V2 = c(0L, 0L, 0L, > 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, > 1L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, > 0L), V3 = c(1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, > 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, > 0L, 1L, 0L, 1L, 0L, 1L, 1L)), .Names = c("V1", "V2", "V3"), class > "data.frame", row.names = c(NA, > -36L)) > > Now I want to get following kind of data frame out of that: > > dat1 <- structure(list(V1 = structure(c(3L, 3L, 1L, 1L, 2L, 2L), .Label > c("C", > "G", "I"), class = "factor"), V2 = c(0L, 1L, 0L, 1L, 0L, 1L), > V3 = c(0.333333333, 0.428571429, 0.5, NA, 1, NA)), .Names = c("V1", > "V2", "V3"), class = "data.frame", row.names = c(NA, -6L)) > > Basically in 'dat1', the 3rd column is coming from: for 'V1 = I' & 'V2 = 0' > what is the percentage of '1' for "V3" and so on..... > > Is there any R function to achieve that directly? > > Thanks and regards, >
Possibly Parallel Threads
- lattice --- different properties of lines corresponding to type=c("l", "a") respectively
- DPLYR Multiple Mutate Statements On Same DataFrame
- DPLYR Multiple Mutate Statements On Same DataFrame
- processing subset lists and then plot(density())
- data summary and some automated t.tests.