Malcolm Fairbrother
2010-Jun-09 14:03 UTC
[R] creating a new variable, conditional on the value of an existing variable, selected conditionally
Dear all, I have a data frame f, with four variables: f <- data.frame(A=c(0,0,1,1), B=c(0,1,0,1), C=c(1,1,0,1), D=c(3,1,2,3)) f A B C D 1 0 0 1 3 2 0 1 1 1 3 1 0 0 2 4 1 1 1 3 I want to create a new variable (f$E), such that each of its elements is drawn from either f$A, f$B, or f$C, according to the value (for each row) of f$D (values of which range from 1 to 3). In the first row, D is 3, so I want the value from the third variable (C), which for the first row is 1. In the second row, D is 1, so I want the value from the first variable (A), which for the second row is 0. And so forth, such that in the end my new data frame looks like: A B C D E 1 0 0 1 3 1 2 0 1 1 1 0 3 1 0 0 2 0 4 1 1 1 3 1 My question is: How do I do this for a much larger dataset, where my "index variable" (f$D in this example) actually indexes a much larger number of variables (not just three)? I know that in principle I could do this with a long series of nested ifelse statements (as below), but I assume there is some less cumbersome option, and I'd like to know what it is. Any help would be much appreciated. Apologies if I'm missing something obvious. f$E <- ifelse(f$D==3, f$C, ifelse(f$D==2, f$B, f$A)) Thanks, Malcolm
Erik Iverson
2010-Jun-09 16:55 UTC
[R] creating a new variable, conditional on the value of an existing variable, selected conditionally
Can your data.frame be properly coerced to a matrix like your example? If so, apply(f, 1, function(x) x[eval(x)["D"]]) Malcolm Fairbrother wrote:> Dear all, > > I have a data frame f, with four variables: > > f <- data.frame(A=c(0,0,1,1), B=c(0,1,0,1), C=c(1,1,0,1), D=c(3,1,2,3)) > f > A B C D > 1 0 0 1 3 > 2 0 1 1 1 > 3 1 0 0 2 > 4 1 1 1 3 > > I want to create a new variable (f$E), such that each of its elements is drawn from either f$A, f$B, or f$C, according to the value (for each row) of f$D (values of which range from 1 to 3). > > In the first row, D is 3, so I want the value from the third variable (C), which for the first row is 1. In the second row, D is 1, so I want the value from the first variable (A), which for the second row is 0. And so forth, such that in the end my new data frame looks like: > > A B C D E > 1 0 0 1 3 1 > 2 0 1 1 1 0 > 3 1 0 0 2 0 > 4 1 1 1 3 1 > > My question is: How do I do this for a much larger dataset, where my "index variable" (f$D in this example) actually indexes a much larger number of variables (not just three)? > > I know that in principle I could do this with a long series of nested ifelse statements (as below), but I assume there is some less cumbersome option, and I'd like to know what it is. Any help would be much appreciated. Apologies if I'm missing something obvious. > > f$E <- ifelse(f$D==3, f$C, ifelse(f$D==2, f$B, f$A)) > > Thanks, > Malcolm > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Doran, Harold
2010-Jun-09 17:15 UTC
[R] creating a new variable, conditional on the value of an existing variable, selected conditionally
How about this: f <- data.frame(A=c(0,0,1,1), B=c(0,1,0,1), C=c(1,1,0,1), D=c(3,1,2,3)) N <- nrow(f) mat <- cbind(1:N,f$D) f$E <- f[mat] f A B C D E 1 0 0 1 3 1 2 0 1 1 1 0 3 1 0 0 2 0 4 1 1 1 3 1 -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Malcolm Fairbrother Sent: Wednesday, June 09, 2010 10:03 AM To: r-help at r-project.org Subject: [R] creating a new variable, conditional on the value of an existing variable, selected conditionally Dear all, I have a data frame f, with four variables: f <- data.frame(A=c(0,0,1,1), B=c(0,1,0,1), C=c(1,1,0,1), D=c(3,1,2,3)) f A B C D 1 0 0 1 3 2 0 1 1 1 3 1 0 0 2 4 1 1 1 3 I want to create a new variable (f$E), such that each of its elements is drawn from either f$A, f$B, or f$C, according to the value (for each row) of f$D (values of which range from 1 to 3). In the first row, D is 3, so I want the value from the third variable (C), which for the first row is 1. In the second row, D is 1, so I want the value from the first variable (A), which for the second row is 0. And so forth, such that in the end my new data frame looks like: A B C D E 1 0 0 1 3 1 2 0 1 1 1 0 3 1 0 0 2 0 4 1 1 1 3 1 My question is: How do I do this for a much larger dataset, where my "index variable" (f$D in this example) actually indexes a much larger number of variables (not just three)? I know that in principle I could do this with a long series of nested ifelse statements (as below), but I assume there is some less cumbersome option, and I'd like to know what it is. Any help would be much appreciated. Apologies if I'm missing something obvious. f$E <- ifelse(f$D==3, f$C, ifelse(f$D==2, f$B, f$A)) Thanks, Malcolm ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Henrique Dallazuanna
2010-Jun-09 20:49 UTC
[R] creating a new variable, conditional on the value of an existing variable, selected conditionally
Try this: f$E <- diag(as.matrix(f[f$D])) On Wed, Jun 9, 2010 at 11:03 AM, Malcolm Fairbrother < m.fairbrother@bristol.ac.uk> wrote:> Dear all, > > I have a data frame f, with four variables: > > f <- data.frame(A=c(0,0,1,1), B=c(0,1,0,1), C=c(1,1,0,1), D=c(3,1,2,3)) > f > A B C D > 1 0 0 1 3 > 2 0 1 1 1 > 3 1 0 0 2 > 4 1 1 1 3 > > I want to create a new variable (f$E), such that each of its elements is > drawn from either f$A, f$B, or f$C, according to the value (for each row) of > f$D (values of which range from 1 to 3). > > In the first row, D is 3, so I want the value from the third variable (C), > which for the first row is 1. In the second row, D is 1, so I want the value > from the first variable (A), which for the second row is 0. And so forth, > such that in the end my new data frame looks like: > > A B C D E > 1 0 0 1 3 1 > 2 0 1 1 1 0 > 3 1 0 0 2 0 > 4 1 1 1 3 1 > > My question is: How do I do this for a much larger dataset, where my "index > variable" (f$D in this example) actually indexes a much larger number of > variables (not just three)? > > I know that in principle I could do this with a long series of nested > ifelse statements (as below), but I assume there is some less cumbersome > option, and I'd like to know what it is. Any help would be much appreciated. > Apologies if I'm missing something obvious. > > f$E <- ifelse(f$D==3, f$C, ifelse(f$D==2, f$B, f$A)) > > Thanks, > Malcolm > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
Dennis Murphy
2010-Jun-10 11:02 UTC
[R] creating a new variable, conditional on the value of an existing variable, selected conditionally
Hi: I had Harold's idea (matrix indexing), but I was curious to see which of these ran fastest. I simulated 1000 rows and three columns of binary data, along with a fourth column that sampled the values 1:3 1000 times. Here are the timings:> f <- as.data.frame(matrix(rbinom(3000, 1, 0.4), nrow = 1000)) > names(f) <- LETTERS[1:3] > f$D <- sample(1:3, 1000, replace = TRUE) > system.time(E1 <- f[cbind(1:nrow(f), f$D)])user system elapsed 0 0 0> system.time(E2 <- apply(f, 1, function(x) x[eval(x)["D"]]))user system elapsed 0.03 0.00 0.03> system.time(E3 <- diag(as.matrix(f[f$D])))user system elapsed 0.26 0.03 0.30> identical(E1, E2)[1] TRUE> identical(E2, E3)[1] TRUE HTH, Dennis On Wed, Jun 9, 2010 at 7:03 AM, Malcolm Fairbrother < m.fairbrother@bristol.ac.uk> wrote:> Dear all, > > I have a data frame f, with four variables: > > f <- data.frame(A=c(0,0,1,1), B=c(0,1,0,1), C=c(1,1,0,1), D=c(3,1,2,3)) > f > A B C D > 1 0 0 1 3 > 2 0 1 1 1 > 3 1 0 0 2 > 4 1 1 1 3 > > I want to create a new variable (f$E), such that each of its elements is > drawn from either f$A, f$B, or f$C, according to the value (for each row) of > f$D (values of which range from 1 to 3). > > In the first row, D is 3, so I want the value from the third variable (C), > which for the first row is 1. In the second row, D is 1, so I want the value > from the first variable (A), which for the second row is 0. And so forth, > such that in the end my new data frame looks like: > > A B C D E > 1 0 0 1 3 1 > 2 0 1 1 1 0 > 3 1 0 0 2 0 > 4 1 1 1 3 1 > > My question is: How do I do this for a much larger dataset, where my "index > variable" (f$D in this example) actually indexes a much larger number of > variables (not just three)? > > I know that in principle I could do this with a long series of nested > ifelse statements (as below), but I assume there is some less cumbersome > option, and I'd like to know what it is. Any help would be much appreciated. > Apologies if I'm missing something obvious. > > f$E <- ifelse(f$D==3, f$C, ifelse(f$D==2, f$B, f$A)) > > Thanks, > Malcolm > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Apparently Analagous Threads
- overlapping confidence bands for predicted probabilities from a logistic model
- How to add a variable to a dataframe whose values are conditional upon the values of an existing variable
- ifelse and "&&" vs "&"
- Plotting a segmented function
- If else loop problem: the condition has length > 1 and only the first element will be used