Hello list,
I am wondering if a joining "one-to-many" can be done a little bit
easier. I tried merge function but I was not able to do it, so I end up using
for and if.
Suppose you have a table with locations, each location repeated several times,
and some attributes at that location. The second table has the same locations,
but only once with a different set of attributes. I would like to add the second
set of attributes to the first table.
Example:
set.seed <- 123
loc <- c(rep("L1", 3), rep("L2", 5), rep("L3",
2))
val1 <- round(rnorm(10),2)
val2 <- c("a", "b", "c", "a",
"b", "d", "f", "e", "b",
"e")
t1 <- data.frame(loc, val1, val2)
t2 <- data.frame(loc=c("L1","L2","L3"), val3 =
c("m", "n", "p"), val4 = c(25, 67, 48))
# join one-to-many
n <- nrow(t1)
m <- nrow(t2)
t1$val3 <- rep(1, n)
t1$val4 <- rep(1, n)
for (i in 1:n) {
for (j in 1:m){
if (t1$loc[i]==t2$loc[j]) {
t1$val3[i] <- as.character(t2$val3[j])
t1$val4[i] <- t2$val4[j]
}
}
}
Desired result:
t1
loc val1 val2 val3 val4
1 L1 -0.41 a m 25
2 L1 -0.69 b m 25
3 L1 0.36 c m 25
4 L2 1.11 a n 67
5 L2 0.15 b n 67
6 L2 -0.80 d n 67
7 L2 -0.08 f n 67
8 L2 -1.01 e n 67
9 L3 -1.01 b p 48
10 L3 -2.50 e p 48
This code works OK but it is slow if the data frames are actually bigger than my
little example. I hope somebody knows of a better way of doing these type of
things.
Thanks,
Monica
_________________________________________________________________
22009
On Tue, Feb 17, 2009 at 8:33 AM, Monica Pisica <pisicandru at hotmail.com> wrote:> > Hello list, > > I am wondering if a joining "one-to-many" can be done a little bit easier. I tried merge function but I was not able to do it, so I end up using for and if. > > Suppose you have a table with locations, each location repeated several times, and some attributes at that location. The second table has the same locations, but only once with a different set of attributes. I would like to add the second set of attributes to the first table. > > Example: > > set.seed <- 123 > loc <- c(rep("L1", 3), rep("L2", 5), rep("L3", 2)) > val1 <- round(rnorm(10),2) > val2 <- c("a", "b", "c", "a", "b", "d", "f", "e", "b", "e") > t1 <- data.frame(loc, val1, val2) > t2 <- data.frame(loc=c("L1","L2","L3"), val3 = c("m", "n", "p"), val4 = c(25, 67, 48)) > > # join one-to-many > > n <- nrow(t1) > m <- nrow(t2) > t1$val3 <- rep(1, n) > t1$val4 <- rep(1, n) > > for (i in 1:n) { > for (j in 1:m){ > if (t1$loc[i]==t2$loc[j]) { > t1$val3[i] <- as.character(t2$val3[j]) > t1$val4[i] <- t2$val4[j] > } > } > } > > Desired result: > > t1 > loc val1 val2 val3 val4 > 1 L1 -0.41 a m 25 > 2 L1 -0.69 b m 25 > 3 L1 0.36 c m 25 > 4 L2 1.11 a n 67 > 5 L2 0.15 b n 67 > 6 L2 -0.80 d n 67 > 7 L2 -0.08 f n 67 > 8 L2 -1.01 e n 67 > 9 L3 -1.01 b p 48 > 10 L3 -2.50 e p 48 > > > This code works OK but it is slow if the data frames are actually bigger than my little example. I hope somebody knows of a better way of doing these type of things.merge(t1, t2) Hadley -- http://had.co.nz/
on 02/17/2009 08:33 AM Monica Pisica wrote:> Hello list, > > I am wondering if a joining "one-to-many" can be done a little biteasier. I tried merge function but I was not able to do it, so I end up using for and if.> > Suppose you have a table with locations, each location repeatedseveral times, and some attributes at that location. The second table has the same locations, but only once with a different set of attributes. I would like to add the second set of attributes to the first table.> Example: > > set.seed <- 123This needs to be set.seed(123) See ?set.seed :-)> loc <- c(rep("L1", 3), rep("L2", 5), rep("L3", 2)) > val1 <- round(rnorm(10),2) > val2 <- c("a", "b", "c", "a", "b", "d", "f", "e", "b", "e") > t1 <- data.frame(loc, val1, val2) > t2 <- data.frame(loc=c("L1","L2","L3"), val3 = c("m", "n", "p"), val4 = c(25, 67, 48)) > > # join one-to-many > > n <- nrow(t1) > m <- nrow(t2) > t1$val3 <- rep(1, n) > t1$val4 <- rep(1, n) > > for (i in 1:n) { > for (j in 1:m){ > if (t1$loc[i]==t2$loc[j]) { > t1$val3[i] <- as.character(t2$val3[j]) > t1$val4[i] <- t2$val4[j] > } > } > } > > Desired result: > > t1 > loc val1 val2 val3 val4 > 1 L1 -0.41 a m 25 > 2 L1 -0.69 b m 25 > 3 L1 0.36 c m 25 > 4 L2 1.11 a n 67 > 5 L2 0.15 b n 67 > 6 L2 -0.80 d n 67 > 7 L2 -0.08 f n 67 > 8 L2 -1.01 e n 67 > 9 L3 -1.01 b p 48 > 10 L3 -2.50 e p 48 > > > This code works OK but it is slow if the data frames are actuallybigger than my little example. I hope somebody knows of a better way of doing these type of things.> merge(t1, t2, by = "loc")loc val1 val2 val3 val4 1 L1 -0.32 a m 25 2 L1 -1.50 b m 25 3 L1 -0.31 c m 25 4 L2 1.42 a n 67 5 L2 0.32 b n 67 6 L2 -0.12 d n 67 7 L2 0.33 f n 67 8 L2 -1.74 e n 67 9 L3 0.88 b p 48 10 L3 1.88 e p 48> system.time(merge(t1, t2, by = "loc"))user system elapsed 0.004 0.000 0.019 HTH, Marc Schwartz
Try merge(t1, t2) On Tue, Feb 17, 2009 at 9:33 AM, Monica Pisica <pisicandru at hotmail.com> wrote:> > Hello list, > > I am wondering if a joining "one-to-many" can be done a little bit easier. I tried merge function but I was not able to do it, so I end up using for and if. > > Suppose you have a table with locations, each location repeated several times, and some attributes at that location. The second table has the same locations, but only once with a different set of attributes. I would like to add the second set of attributes to the first table. > > Example: > > set.seed <- 123 > loc <- c(rep("L1", 3), rep("L2", 5), rep("L3", 2)) > val1 <- round(rnorm(10),2) > val2 <- c("a", "b", "c", "a", "b", "d", "f", "e", "b", "e") > t1 <- data.frame(loc, val1, val2) > t2 <- data.frame(loc=c("L1","L2","L3"), val3 = c("m", "n", "p"), val4 = c(25, 67, 48)) > > # join one-to-many > > n <- nrow(t1) > m <- nrow(t2) > t1$val3 <- rep(1, n) > t1$val4 <- rep(1, n) > > for (i in 1:n) { > for (j in 1:m){ > if (t1$loc[i]==t2$loc[j]) { > t1$val3[i] <- as.character(t2$val3[j]) > t1$val4[i] <- t2$val4[j] > } > } > } > > Desired result: > > t1 > loc val1 val2 val3 val4 > 1 L1 -0.41 a m 25 > 2 L1 -0.69 b m 25 > 3 L1 0.36 c m 25 > 4 L2 1.11 a n 67 > 5 L2 0.15 b n 67 > 6 L2 -0.80 d n 67 > 7 L2 -0.08 f n 67 > 8 L2 -1.01 e n 67 > 9 L3 -1.01 b p 48 > 10 L3 -2.50 e p 48 > > > This code works OK but it is slow if the data frames are actually bigger than my little example. I hope somebody knows of a better way of doing these type of things. > > Thanks, > > Monica > _________________________________________________________________ > > > 22009 > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hi Monica,
merge(t1, t2) works on your example. So why don't you use merge?
HTH,
Thierry
------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
-----Oorspronkelijk bericht-----
Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
Namens Monica Pisica
Verzonden: dinsdag 17 februari 2009 15:33
Aan: R help project
Onderwerp: [R] joining "one-to-many"
Hello list,
I am wondering if a joining "one-to-many" can be done a little bit
easier. I tried merge function but I was not able to do it, so I end up
using for and if.
Suppose you have a table with locations, each location repeated several
times, and some attributes at that location. The second table has the
same locations, but only once with a different set of attributes. I
would like to add the second set of attributes to the first table.
Example:
set.seed <- 123
loc <- c(rep("L1", 3), rep("L2", 5), rep("L3",
2))
val1 <- round(rnorm(10),2)
val2 <- c("a", "b", "c", "a",
"b", "d", "f", "e", "b",
"e")
t1 <- data.frame(loc, val1, val2)
t2 <- data.frame(loc=c("L1","L2","L3"), val3 =
c("m", "n", "p"), val4 c(25, 67, 48))
# join one-to-many
n <- nrow(t1)
m <- nrow(t2)
t1$val3 <- rep(1, n)
t1$val4 <- rep(1, n)
for (i in 1:n) {
for (j in 1:m){
if (t1$loc[i]==t2$loc[j]) {
t1$val3[i] <- as.character(t2$val3[j])
t1$val4[i] <- t2$val4[j]
}
}
}
Desired result:
t1
loc val1 val2 val3 val4
1 L1 -0.41 a m 25
2 L1 -0.69 b m 25
3 L1 0.36 c m 25
4 L2 1.11 a n 67
5 L2 0.15 b n 67
6 L2 -0.80 d n 67
7 L2 -0.08 f n 67
8 L2 -1.01 e n 67
9 L3 -1.01 b p 48
10 L3 -2.50 e p 48
This code works OK but it is slow if the data frames are actually bigger
than my little example. I hope somebody knows of a better way of doing
these type of things.
Thanks,
Monica
_________________________________________________________________
22009
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in this message
and any annex are purely those of the writer and may not be regarded as stating
an official position of INBO, as long as the message is not confirmed by a duly
signed document.