Hi all, Suppose I have the following data.frame, with an id column and two variables columns : id X Y 0001 NA 21 0002 NA 13 0003 0001 45 0004 NA 71 0005 0003 20 What I would like to do is to create a new variable Z whose values are the Y value for the id value in X, that is : id X Y Z 0001 NA 21 NA 0002 NA 13 NA 0003 0001 45 21 0004 NA 71 NA 0005 0003 20 45 Do you have an idea on how to obtain that without using a for loop ? Thanks in advance for any help, Julien Here is the R code to reproduce the first data.frame : id <- c("0001","0002","0003","0004","0005") x <- c(NA, NA, "0001", NA, "0003") y <- c(21,13,45,71,20) d <- data.frame(id,x,y) -- Julien Barnier Groupe de recherche sur la socialisation ENS-LSH - Lyon, France
Hi r-help-bounces at r-project.org napsal dne 10.10.2007 12:10:29:> Hi all, > > Suppose I have the following data.frame, with an id column and two > variables columns : > > id X Y > 0001 NA 21 > 0002 NA 13 > 0003 0001 45 > 0004 NA 71 > 0005 0003 20 > > What I would like to do is to create a new variable Z whose values are > the Y value for the id value in X, that is : > > id X Y Z > 0001 NA 21 NA > 0002 NA 13 NA > 0003 0001 45 21 > 0004 NA 71 NA > 0005 0003 20 45 > > Do you have an idea on how to obtain that without using a for loop ?d$z<-NA d$z[d$x %in% d$id] <- d$y[d$id %in% d$x] works in this particular case but it means you do not have multiple same ids and X Regards Petr> > Thanks in advance for any help, > > Julien > > > > Here is the R code to reproduce the first data.frame : > > id <- c("0001","0002","0003","0004","0005") > x <- c(NA, NA, "0001", NA, "0003") > y <- c(21,13,45,71,20) > d <- data.frame(id,x,y) > > > > -- > Julien Barnier > Groupe de recherche sur la socialisation > ENS-LSH - Lyon, France > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Hi Petr,> d$z<-NA > d$z[d$x %in% d$id] <- d$y[d$id %in% d$x] > > works in this particular case but it means you do not have multiple same > ids and XThanks for the idea. But the problem is that I can have multiple ids... In fact in the meantime I found a solution by using row names : R> d id x y 1 0001 <NA> 21 2 0002 <NA> 13 3 0003 0001 45 4 0004 <NA> 71 5 0005 0003 20 R> rownames(d) <- d$id R> d$z <- NA R> d$z <- d[d$x,"y"] R> d id x y z 0001 0001 <NA> 21 NA 0002 0002 <NA> 13 NA 0003 0003 0001 45 21 0004 0004 <NA> 71 NA 0005 0005 0003 20 13 Thanks for your help, Julien -- Julien Barnier Groupe de recherche sur la socialisation ENS-LSH - Lyon, France
Try this: transform(d, z = y[match(x, id)]) On 10/10/07, Julien Barnier <jbarnier at ens-lsh.fr> wrote:> Hi all, > > Suppose I have the following data.frame, with an id column and two > variables columns : > > id X Y > 0001 NA 21 > 0002 NA 13 > 0003 0001 45 > 0004 NA 71 > 0005 0003 20 > > What I would like to do is to create a new variable Z whose values are > the Y value for the id value in X, that is : > > id X Y Z > 0001 NA 21 NA > 0002 NA 13 NA > 0003 0001 45 21 > 0004 NA 71 NA > 0005 0003 20 45 > > Do you have an idea on how to obtain that without using a for loop ? > > Thanks in advance for any help, > > Julien > > > > Here is the R code to reproduce the first data.frame : > > id <- c("0001","0002","0003","0004","0005") > x <- c(NA, NA, "0001", NA, "0003") > y <- c(21,13,45,71,20) > d <- data.frame(id,x,y) > > > > -- > Julien Barnier > Groupe de recherche sur la socialisation > ENS-LSH - Lyon, France > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
r-help-bounces at r-project.org napsal dne 10.10.2007 13:55:32:> Hi Petr, > > > d$z<-NA > > d$z[d$x %in% d$id] <- d$y[d$id %in% d$x] > > > > works in this particular case but it means you do not have multiplesame> > ids and X > > Thanks for the idea. But the problem is that I can have multiple > ids... > > In fact in the meantime I found a solution by using row names :are you sure?> > R> d > id x y > 1 0001 <NA> 21 > 2 0002 <NA> 13 > 3 0003 0001 45 > 4 0004 <NA> 71 > 5 0005 0003 20 > > R> rownames(d) <- d$id > R> d$z <- NA > R> d$z <- d[d$x,"y"] > R> d > id x y z > 0001 0001 <NA> 21 NA > 0002 0002 <NA> 13 NA > 0003 0003 0001 45 21 > 0004 0004 <NA> 71 NA > 0005 0005 0003 20 13Why 13 in row 5. And using your code my result is> did x y 1 0001 <NA> 21 2 0002 <NA> 13 3 0003 0001 45 4 0004 <NA> 71 5 0005 0003 20> d$z <- NA > rownames(d) <- d$id > d$z <- d[d$x,"y"] > did x y z 0001 0001 <NA> 21 21 0002 0002 <NA> 13 21 0003 0003 0001 45 13 0004 0004 <NA> 71 21 0005 0005 0003 20 45 Regards Petr> > > Thanks for your help, > > Julien > > -- > Julien Barnier > Groupe de recherche sur la socialisation > ENS-LSH - Lyon, France > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.