Hello list, I am wondering if a joining "one-to-many" can be done a little bit easier. I tried merge function but I was not able to do it, so I end up using for and if. Suppose you have a table with locations, each location repeated several times, and some attributes at that location. The second table has the same locations, but only once with a different set of attributes. I would like to add the second set of attributes to the first table. Example: set.seed <- 123 loc <- c(rep("L1", 3), rep("L2", 5), rep("L3", 2)) val1 <- round(rnorm(10),2) val2 <- c("a", "b", "c", "a", "b", "d", "f", "e", "b", "e") t1 <- data.frame(loc, val1, val2) t2 <- data.frame(loc=c("L1","L2","L3"), val3 = c("m", "n", "p"), val4 = c(25, 67, 48)) # join one-to-many n <- nrow(t1) m <- nrow(t2) t1$val3 <- rep(1, n) t1$val4 <- rep(1, n) for (i in 1:n) { for (j in 1:m){ if (t1$loc[i]==t2$loc[j]) { t1$val3[i] <- as.character(t2$val3[j]) t1$val4[i] <- t2$val4[j] } } } Desired result: t1 loc val1 val2 val3 val4 1 L1 -0.41 a m 25 2 L1 -0.69 b m 25 3 L1 0.36 c m 25 4 L2 1.11 a n 67 5 L2 0.15 b n 67 6 L2 -0.80 d n 67 7 L2 -0.08 f n 67 8 L2 -1.01 e n 67 9 L3 -1.01 b p 48 10 L3 -2.50 e p 48 This code works OK but it is slow if the data frames are actually bigger than my little example. I hope somebody knows of a better way of doing these type of things. Thanks, Monica _________________________________________________________________ 22009
On Tue, Feb 17, 2009 at 8:33 AM, Monica Pisica <pisicandru at hotmail.com> wrote:> > Hello list, > > I am wondering if a joining "one-to-many" can be done a little bit easier. I tried merge function but I was not able to do it, so I end up using for and if. > > Suppose you have a table with locations, each location repeated several times, and some attributes at that location. The second table has the same locations, but only once with a different set of attributes. I would like to add the second set of attributes to the first table. > > Example: > > set.seed <- 123 > loc <- c(rep("L1", 3), rep("L2", 5), rep("L3", 2)) > val1 <- round(rnorm(10),2) > val2 <- c("a", "b", "c", "a", "b", "d", "f", "e", "b", "e") > t1 <- data.frame(loc, val1, val2) > t2 <- data.frame(loc=c("L1","L2","L3"), val3 = c("m", "n", "p"), val4 = c(25, 67, 48)) > > # join one-to-many > > n <- nrow(t1) > m <- nrow(t2) > t1$val3 <- rep(1, n) > t1$val4 <- rep(1, n) > > for (i in 1:n) { > for (j in 1:m){ > if (t1$loc[i]==t2$loc[j]) { > t1$val3[i] <- as.character(t2$val3[j]) > t1$val4[i] <- t2$val4[j] > } > } > } > > Desired result: > > t1 > loc val1 val2 val3 val4 > 1 L1 -0.41 a m 25 > 2 L1 -0.69 b m 25 > 3 L1 0.36 c m 25 > 4 L2 1.11 a n 67 > 5 L2 0.15 b n 67 > 6 L2 -0.80 d n 67 > 7 L2 -0.08 f n 67 > 8 L2 -1.01 e n 67 > 9 L3 -1.01 b p 48 > 10 L3 -2.50 e p 48 > > > This code works OK but it is slow if the data frames are actually bigger than my little example. I hope somebody knows of a better way of doing these type of things.merge(t1, t2) Hadley -- http://had.co.nz/
on 02/17/2009 08:33 AM Monica Pisica wrote:> Hello list, > > I am wondering if a joining "one-to-many" can be done a little biteasier. I tried merge function but I was not able to do it, so I end up using for and if.> > Suppose you have a table with locations, each location repeatedseveral times, and some attributes at that location. The second table has the same locations, but only once with a different set of attributes. I would like to add the second set of attributes to the first table.> Example: > > set.seed <- 123This needs to be set.seed(123) See ?set.seed :-)> loc <- c(rep("L1", 3), rep("L2", 5), rep("L3", 2)) > val1 <- round(rnorm(10),2) > val2 <- c("a", "b", "c", "a", "b", "d", "f", "e", "b", "e") > t1 <- data.frame(loc, val1, val2) > t2 <- data.frame(loc=c("L1","L2","L3"), val3 = c("m", "n", "p"), val4 = c(25, 67, 48)) > > # join one-to-many > > n <- nrow(t1) > m <- nrow(t2) > t1$val3 <- rep(1, n) > t1$val4 <- rep(1, n) > > for (i in 1:n) { > for (j in 1:m){ > if (t1$loc[i]==t2$loc[j]) { > t1$val3[i] <- as.character(t2$val3[j]) > t1$val4[i] <- t2$val4[j] > } > } > } > > Desired result: > > t1 > loc val1 val2 val3 val4 > 1 L1 -0.41 a m 25 > 2 L1 -0.69 b m 25 > 3 L1 0.36 c m 25 > 4 L2 1.11 a n 67 > 5 L2 0.15 b n 67 > 6 L2 -0.80 d n 67 > 7 L2 -0.08 f n 67 > 8 L2 -1.01 e n 67 > 9 L3 -1.01 b p 48 > 10 L3 -2.50 e p 48 > > > This code works OK but it is slow if the data frames are actuallybigger than my little example. I hope somebody knows of a better way of doing these type of things.> merge(t1, t2, by = "loc")loc val1 val2 val3 val4 1 L1 -0.32 a m 25 2 L1 -1.50 b m 25 3 L1 -0.31 c m 25 4 L2 1.42 a n 67 5 L2 0.32 b n 67 6 L2 -0.12 d n 67 7 L2 0.33 f n 67 8 L2 -1.74 e n 67 9 L3 0.88 b p 48 10 L3 1.88 e p 48> system.time(merge(t1, t2, by = "loc"))user system elapsed 0.004 0.000 0.019 HTH, Marc Schwartz
Try merge(t1, t2) On Tue, Feb 17, 2009 at 9:33 AM, Monica Pisica <pisicandru at hotmail.com> wrote:> > Hello list, > > I am wondering if a joining "one-to-many" can be done a little bit easier. I tried merge function but I was not able to do it, so I end up using for and if. > > Suppose you have a table with locations, each location repeated several times, and some attributes at that location. The second table has the same locations, but only once with a different set of attributes. I would like to add the second set of attributes to the first table. > > Example: > > set.seed <- 123 > loc <- c(rep("L1", 3), rep("L2", 5), rep("L3", 2)) > val1 <- round(rnorm(10),2) > val2 <- c("a", "b", "c", "a", "b", "d", "f", "e", "b", "e") > t1 <- data.frame(loc, val1, val2) > t2 <- data.frame(loc=c("L1","L2","L3"), val3 = c("m", "n", "p"), val4 = c(25, 67, 48)) > > # join one-to-many > > n <- nrow(t1) > m <- nrow(t2) > t1$val3 <- rep(1, n) > t1$val4 <- rep(1, n) > > for (i in 1:n) { > for (j in 1:m){ > if (t1$loc[i]==t2$loc[j]) { > t1$val3[i] <- as.character(t2$val3[j]) > t1$val4[i] <- t2$val4[j] > } > } > } > > Desired result: > > t1 > loc val1 val2 val3 val4 > 1 L1 -0.41 a m 25 > 2 L1 -0.69 b m 25 > 3 L1 0.36 c m 25 > 4 L2 1.11 a n 67 > 5 L2 0.15 b n 67 > 6 L2 -0.80 d n 67 > 7 L2 -0.08 f n 67 > 8 L2 -1.01 e n 67 > 9 L3 -1.01 b p 48 > 10 L3 -2.50 e p 48 > > > This code works OK but it is slow if the data frames are actually bigger than my little example. I hope somebody knows of a better way of doing these type of things. > > Thanks, > > Monica > _________________________________________________________________ > > > 22009 > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hi Monica, merge(t1, t2) works on your example. So why don't you use merge? HTH, Thierry ------------------------------------------------------------------------ ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Namens Monica Pisica Verzonden: dinsdag 17 februari 2009 15:33 Aan: R help project Onderwerp: [R] joining "one-to-many" Hello list, I am wondering if a joining "one-to-many" can be done a little bit easier. I tried merge function but I was not able to do it, so I end up using for and if. Suppose you have a table with locations, each location repeated several times, and some attributes at that location. The second table has the same locations, but only once with a different set of attributes. I would like to add the second set of attributes to the first table. Example: set.seed <- 123 loc <- c(rep("L1", 3), rep("L2", 5), rep("L3", 2)) val1 <- round(rnorm(10),2) val2 <- c("a", "b", "c", "a", "b", "d", "f", "e", "b", "e") t1 <- data.frame(loc, val1, val2) t2 <- data.frame(loc=c("L1","L2","L3"), val3 = c("m", "n", "p"), val4 c(25, 67, 48)) # join one-to-many n <- nrow(t1) m <- nrow(t2) t1$val3 <- rep(1, n) t1$val4 <- rep(1, n) for (i in 1:n) { for (j in 1:m){ if (t1$loc[i]==t2$loc[j]) { t1$val3[i] <- as.character(t2$val3[j]) t1$val4[i] <- t2$val4[j] } } } Desired result: t1 loc val1 val2 val3 val4 1 L1 -0.41 a m 25 2 L1 -0.69 b m 25 3 L1 0.36 c m 25 4 L2 1.11 a n 67 5 L2 0.15 b n 67 6 L2 -0.80 d n 67 7 L2 -0.08 f n 67 8 L2 -1.01 e n 67 9 L3 -1.01 b p 48 10 L3 -2.50 e p 48 This code works OK but it is slow if the data frames are actually bigger than my little example. I hope somebody knows of a better way of doing these type of things. Thanks, Monica _________________________________________________________________ 22009 ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.