Hi everyone, I've got a dataset with 12,000 observations. One of the variables (cleary$D1) is for an individual's country, coded 1 - 15. I'd like to create a dummy variable for the Baltic states which are coded 4,6, and 7. In other words, as a dummy variable Baltic states would be coded 1, else 0. I've attempted the following for loop: dummy <- matrix(NA, nrow=nrow(cleary), ncol=1) for (i in 1:length(cleary$D1)){ if (cleary$D1 == 4){dummy[i] = 1} else {dummy[i] = 0} } Unfortunately it generates the following error: 1: In if (cleary$D1 == 4) { ... : the condition has length > 1 and only the first element will be used Another options I've tried is the following: binary <- vector(length=length(cleary$D1)) for (i in 1:length(cleary$D1)) { if (cleary$D1 == 4 | cleary$D1 == 6 | cleary$D1 == 7 ) {binary[i] = 1} else {binary[i] = 0} } Unfortunately it simply responds with "syntax error". Any thoughts would be greatly appreciated! -- View this message in context: http://r.789695.n4.nabble.com/For-loop-dummy-variables-tp3001396p3001396.html Sent from the R help mailing list archive at Nabble.com.
I should have noted that the first attempt list above obviously was practice when cleary$D1== 4. To reiterate, this still didn't work. -- View this message in context: http://r.789695.n4.nabble.com/For-loop-dummy-variables-tp3001396p3001398.html Sent from the R help mailing list archive at Nabble.com.
you might try dummy <- with(cleary, cbind(B4 = as.numeric(D1 == 4), B6 = as.numeric(D1 == 6), B7 = as.numeric(D1 == 7))) and do it all in one go. ___ to fix up your apporach you need to use if(cleary$D1[i] == 4) dummy[i] <- 1 else dummy[i] <- 0 but this is a very clumsy and slow way of going about it. -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of gravityflyer Sent: Tuesday, 19 October 2010 1:24 PM To: r-help at r-project.org Subject: [R] For-loop dummy variables? Hi everyone, I've got a dataset with 12,000 observations. One of the variables (cleary$D1) is for an individual's country, coded 1 - 15. I'd like to create a dummy variable for the Baltic states which are coded 4,6, and 7. In other words, as a dummy variable Baltic states would be coded 1, else 0. I've attempted the following for loop: dummy <- matrix(NA, nrow=nrow(cleary), ncol=1) for (i in 1:length(cleary$D1)){ if (cleary$D1 == 4){dummy[i] = 1} else {dummy[i] = 0} } Unfortunately it generates the following error: 1: In if (cleary$D1 == 4) { ... : the condition has length > 1 and only the first element will be used Another options I've tried is the following: binary <- vector(length=length(cleary$D1)) for (i in 1:length(cleary$D1)) { if (cleary$D1 == 4 | cleary$D1 == 6 | cleary$D1 == 7 ) {binary[i] = 1} else {binary[i] = 0} } Unfortunately it simply responds with "syntax error". Any thoughts would be greatly appreciated! -- View this message in context: http://r.789695.n4.nabble.com/For-loop-dummy-variables-tp3001396p3001396.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
gravityflyer <gravityflyer <at> yahoo.com> writes:> > Hi everyone, > > I've got a dataset with 12,000 observations. One of the variables > (cleary$D1) is for an individual's country, coded 1 - 15. I'd like to create > a dummy variable for the Baltic states which are coded 4,6, and 7. In other > words, as a dummy variable Baltic states would be coded 1, else 0. I've > attempted the following for loop: > > dummy <- matrix(NA, nrow=nrow(cleary), ncol=1) > for (i in 1:length(cleary$D1)){ > if (cleary$D1 == 4){dummy[i] = 1} > else {dummy[i] = 0} > } > > Unfortunately it generates the following error: > > 1: In if (cleary$D1 == 4) { ... : > the condition has length > 1 and only the first element will be used > > Another options I've tried is the following: > > binary <- vector(length=length(cleary$D1)) > for (i in 1:length(cleary$D1)) { > if (cleary$D1 == 4 | cleary$D1 == 6 | cleary$D1 == 7 ) {binary[i] = 1} > else {binary[i] = 0} > } > > Unfortunately it simply responds with "syntax error". > > Any thoughts would be greatly appreciated! >Be aware that R is a vectorised programming language, therefore your for loop in completely unnecessary. This is what I'd do: dummy <- rep(0, nrow(cleary)) dummy[cleary$D1 %in% c(4,6,7)] <- 1 This is your dummy variable. Below is your working (though VERY inefficient) version of the for loop: binary <- vector(length=length(cleary$D1)) for (i in 1:length(cleary$D1)) { if (cleary$D1[i] == 4 | cleary$D1[i] == 6 | cleary$D1[i] == 7 ) { binary[i] = 1 } else { binary[i] = 0 } } Now try to figure out: - what is the difference between your for() loop and mine? - which code is more simple (and better), the vectorised or the for() loop? I hope it helps, Adrian
I always find R useful to solve problems like this: dummy = as.numeric(cleary$D1 %in% c(4,6,7)) If, for some reason you want to use a loop, try dummy <- matrix(NA, nrow=nrow(cleary), ncol=1) for (i in 1:length(cleary$D1)){ if (cleary$D1[i] %in% c(4,6,7)){dummy[i] = 1} else {dummy[i] = 0} } When you write a loop, you need to use the loop index to select the individual value you're working with. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Mon, 18 Oct 2010, gravityflyer wrote:> > Hi everyone, > > I've got a dataset with 12,000 observations. One of the variables > (cleary$D1) is for an individual's country, coded 1 - 15. I'd like to create > a dummy variable for the Baltic states which are coded 4,6, and 7. In other > words, as a dummy variable Baltic states would be coded 1, else 0. I've > attempted the following for loop: > > dummy <- matrix(NA, nrow=nrow(cleary), ncol=1) > for (i in 1:length(cleary$D1)){ > if (cleary$D1 == 4){dummy[i] = 1} > else {dummy[i] = 0} > } > > Unfortunately it generates the following error: > > 1: In if (cleary$D1 == 4) { ... : > the condition has length > 1 and only the first element will be used > > > Another options I've tried is the following: > > binary <- vector(length=length(cleary$D1)) > for (i in 1:length(cleary$D1)) { > if (cleary$D1 == 4 | cleary$D1 == 6 | cleary$D1 == 7 ) {binary[i] = 1} > else {binary[i] = 0} > } > > Unfortunately it simply responds with "syntax error". > > Any thoughts would be greatly appreciated! > > > -- > View this message in context: http://r.789695.n4.nabble.com/For-loop-dummy-variables-tp3001396p3001396.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On Tuesday 19 October 2010, Phil Spector wrote:> I always find R useful to solve problems like this: > > dummy = as.numeric(cleary$D1 %in% c(4,6,7))Indeed, and this works too: dummy <- 1*(cleary$D1 %in% c(4,6,7)) Adrian -- Adrian Dusa Romanian Social Data Archive 1, Schitu Magureanu Bd. 050025 Bucharest sector 5 Romania Tel.:+40 21 3126618 \ +40 21 3120210 / int.101 Fax: +40 21 3158391