Dear All! my data is on pairs of countries, i and j, e.g.: y,i,j 1,AUT,BEL 2,AUT,GER 3,BEL,GER I would like to create a dummy (indicator) variable for use in regression (using factor?), such that it takes the value of 1 if the country is in the pair (i.e. EITHER an i-country OR an j-country). Thank you for your help, Serguei ________________________________________ Austrian Institute of Economic Research (WIFO) P.O.Box 91 Tel.: +43-1-7982601-231 1103 Vienna, Austria Fax: +43-1-7989386 Mail: Serguei.Kaniovski@wifo.ac.at http://www.wifo.ac.at/Serguei.Kaniovski [[alternative HTML version deleted]]
Bernardo Rangel Tura
2009-Apr-18 09:31 UTC
[R] Dummy (factor) based on a pair of variables
On Sat, 2009-04-18 at 08:55 +0200, Serguei Kaniovski wrote:> > Dear All! > > my data is on pairs of countries, i and j, e.g.: > > y,i,j > 1,AUT,BEL > 2,AUT,GER > 3,BEL,GER > > I would like to create a dummy (indicator) variable for use in regression > (using factor?), such that it takes the value of 1 if the country is in the > pair (i.e. EITHER an i-country OR an j-country). > > Thank you for your help, > SergueiHi Serguei, If I understand your doubt, the solution is something like this for pair i-country is AUT or j-country is BEL output ~ I(i-country=="AUT"|j-country=="BEL") -- Bernardo Rangel Tura, M.D,MPH,Ph.D National Institute of Cardiology Brazil
Bernardo: this is not quite what I am looking for, Let the data be: y,i,j 1,AUT,BEL 2,AUT,GER 3,BEL,GER then the dummies sould look like: y,i,j,d_AUT,d_BEL,d_GER 1,AUT,BEL,1,1,0 2,AUT,GER,1,0,1 3,BEL,GER,0,1,1 I can generate the above dummies but can this design be imputed in a reg. model directly? Serguei
On 2009.04.18 13:52:35, Serguei Kaniovski wrote:> Bernardo: this is not quite what I am looking for, > > Let the data be: > y,i,j > 1,AUT,BEL > 2,AUT,GER > 3,BEL,GER > > then the dummies sould look like: > > y,i,j,d_AUT,d_BEL,d_GER > 1,AUT,BEL,1,1,0 > 2,AUT,GER,1,0,1 > 3,BEL,GER,0,1,1 > > I can generate the above dummies but can this design be imputed in a > reg. model directly? > > SergueiHello Serguei, I am sure there is a better way to do this, but the following seems to work: # Create sample data.frame() i <- c("AUT", "AUT", "BEL") j <- c("BEL", "GER", "GER") df <- data.frame(i=i, j=j) # Create dummy vectors df$d.aut <- ifelse(df$i=="AUT"|df$j=="AUT", 1, 0) df$d.bel <- ifelse(df$i=="BEL"|df$j=="BEL", 1, 0) df$d.ger <- ifelse(df$i=="GER"|df$j=="GER", 1, 0) # Print results df HTH, ~Jason -- Jason W. Morgan Graduate Student, Political Science *The Ohio State University*
On 2009.04.18 15:58:30, Jason Morgan wrote:> On 2009.04.18 13:52:35, Serguei Kaniovski wrote: > > I can generate the above dummies but can this design be imputed in a > > reg. model directly?Oops, I apologize for not reading the whole question. Can you do the following: lm(y ~ I(ifelse(df$i=="AUT"|df$j=="AUT", 1, 0)) + I(ifelse(df$i=="BEL"|df$j=="BEL", 1, 0)) + I(ifelse(df$i=="GER"|df$j=="GER", 1, 0)), data=df) If you exclude the ifelse(), you will get a vector of TRUE/FALSE, which may or may not work. ~Jason> Hello Serguei, > > I am sure there is a better way to do this, but the following seems to > work: > > # Create sample data.frame() > i <- c("AUT", "AUT", "BEL") > j <- c("BEL", "GER", "GER") > df <- data.frame(i=i, j=j) > > # Create dummy vectors > df$d.aut <- ifelse(df$i=="AUT"|df$j=="AUT", 1, 0) > df$d.bel <- ifelse(df$i=="BEL"|df$j=="BEL", 1, 0) > df$d.ger <- ifelse(df$i=="GER"|df$j=="GER", 1, 0) > > # Print results > df > > HTH, > > ~Jason > >-- Jason W. Morgan Graduate Student, Political Science *The Ohio State University*
> df <- read.table(textConnection("y,i,j+ 1,AUT,BEL + 2,AUT,GER + 3,BEL,GER"), header=T,sep=",", as.is=T) > df y i j 1 1 AUT BEL 2 2 AUT GER 3 3 BEL GER > countries <- unique(c(df$i,df$j)) > countries [1] "AUT" "BEL" "GER" > df[countries] <- sapply(countries, function(x) df[x] <<- df$i == x | df$j == x) > df y i j AUT BEL GER 1 1 AUT BEL TRUE TRUE FALSE 2 2 AUT GER TRUE FALSE TRUE 3 3 BEL GER FALSE TRUE TRUE Obviously it would not be possible to test this arrangement with lm. So I tried scaling it up and testing on: dft <- data.frame(y=rnorm(100), i = sample(countries, 100, replace=T), j= sample(countries, 100, replace=T)) #Removed all the duplicates with: dft <- dft(dft$i != dft$j, ] #and it did not give proper answers. This seems to give correct answers dft[countries] <- sapply(countries, function(y) apply(dft, 1, function(x) x[2] == y | x[3] == y)) And application of those variables is handles in a reasonable manner by the R formula parser: > lm(y ~ AUT + BEL + GER, data=dft) Call: lm(formula = y ~ AUT + BEL + GER, data = dft) Coefficients: (Intercept) AUTTRUE BELTRUE GERTRUE 0.09192 0.15130 -0.29274 NA - David Winsemius On Apr 18, 2009, at 4:09 PM, Jason Morgan wrote:> On 2009.04.18 15:58:30, Jason Morgan wrote: >> On 2009.04.18 13:52:35, Serguei Kaniovski wrote: >>> I can generate the above dummies but can this design be imputed in a >>> reg. model directly? > > Oops, I apologize for not reading the whole question. Can you do the > following: > > lm(y ~ I(ifelse(df$i=="AUT"|df$j=="AUT", 1, 0)) + > I(ifelse(df$i=="BEL"|df$j=="BEL", 1, 0)) + > I(ifelse(df$i=="GER"|df$j=="GER", 1, 0)), data=df) > > If you exclude the ifelse(), you will get a vector of TRUE/FALSE, > which may or may not work. > > ~Jason > >> Hello Serguei, >> >> I am sure there is a better way to do this, but the following seems >> to >> work: >> >> # Create sample data.frame() >> i <- c("AUT", "AUT", "BEL") >> j <- c("BEL", "GER", "GER") >> df <- data.frame(i=i, j=j) >> >> # Create dummy vectors >> df$d.aut <- ifelse(df$i=="AUT"|df$j=="AUT", 1, 0) >> df$d.bel <- ifelse(df$i=="BEL"|df$j=="BEL", 1, 0) >> df$d.ger <- ifelse(df$i=="GER"|df$j=="GER", 1, 0) >> >> # Print results >> df >> >> HTH, >> >> ~Jason >> >> > > -- > Jason W. Morgan > Graduate Student, Political Science > *The Ohio State University* > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT