thr3ads.net - R help - [R] Dummy (factor) based on a pair of variables [Apr 2009]

If this information is useful, please help other people find it:
Share via:

Serguei Kaniovski

2009-Apr-18 06:55 UTC

[R] Dummy (factor) based on a pair of variables

Dear All!

my data is on pairs of countries, i and j, e.g.:

y,i,j
1,AUT,BEL
2,AUT,GER
3,BEL,GER

I would like to create a dummy (indicator) variable for use in regression
(using factor?), such that it takes the value of 1 if the country is in the
pair (i.e. EITHER an i-country OR an j-country).

Thank you for your help,
Serguei
________________________________________
Austrian Institute of Economic Research (WIFO)

P.O.Box 91                          Tel.: +43-1-7982601-231
1103 Vienna, Austria        Fax: +43-1-7989386

Mail: Serguei.Kaniovski@wifo.ac.at
http://www.wifo.ac.at/Serguei.Kaniovski
	[[alternative HTML version deleted]]

Bernardo Rangel Tura

2009-Apr-18 09:31 UTC

head link

[R] Dummy (factor) based on a pair of variables

On Sat, 2009-04-18 at 08:55 +0200, Serguei Kaniovski
wrote:> 
> Dear All!
> 
> my data is on pairs of countries, i and j, e.g.:
> 
> y,i,j
> 1,AUT,BEL
> 2,AUT,GER
> 3,BEL,GER
> 
> I would like to create a dummy (indicator) variable for use in regression
> (using factor?), such that it takes the value of 1 if the country is in the
> pair (i.e. EITHER an i-country OR an j-country).
> 
> Thank you for your help,
> Serguei
Hi Serguei,

If I understand your doubt, the solution is something like this for pair
i-country is AUT or j-country is BEL


output ~ I(i-country=="AUT"|j-country=="BEL")
-- 
Bernardo Rangel Tura, M.D,MPH,Ph.D
National Institute of Cardiology
Brazil

Serguei Kaniovski

2009-Apr-18 11:52 UTC

head link

[R] Dummy (factor) based on a pair of variables

Bernardo: this is not quite what I am looking for,

Let the data be:
y,i,j
1,AUT,BEL
2,AUT,GER
3,BEL,GER

then the dummies sould look like:

y,i,j,d_AUT,d_BEL,d_GER
1,AUT,BEL,1,1,0
2,AUT,GER,1,0,1
3,BEL,GER,0,1,1

I can generate the above dummies but can this design be imputed in a 
reg. model directly?

Serguei

Jason Morgan

2009-Apr-18 19:58 UTC

head link

[R] Dummy (factor) based on a pair of variables

On 2009.04.18 13:52:35, Serguei Kaniovski wrote:> Bernardo: this is not quite what I am looking for,
> 
> Let the data be:
> y,i,j
> 1,AUT,BEL
> 2,AUT,GER
> 3,BEL,GER
> 
> then the dummies sould look like:
> 
> y,i,j,d_AUT,d_BEL,d_GER
> 1,AUT,BEL,1,1,0
> 2,AUT,GER,1,0,1
> 3,BEL,GER,0,1,1
> 
> I can generate the above dummies but can this design be imputed in a 
> reg. model directly?
> 
> Serguei
Hello Serguei,

I am sure there is a better way to do this, but the following seems to
work:

# Create sample data.frame()
i  <- c("AUT", "AUT", "BEL")
j  <- c("BEL", "GER", "GER")
df <- data.frame(i=i, j=j)

# Create dummy vectors
df$d.aut <- ifelse(df$i=="AUT"|df$j=="AUT", 1, 0)
df$d.bel <- ifelse(df$i=="BEL"|df$j=="BEL", 1, 0)
df$d.ger <- ifelse(df$i=="GER"|df$j=="GER", 1, 0)

# Print results
df

HTH,

~Jason


--
Jason W. Morgan
Graduate Student, Political Science
*The Ohio State University*

Jason Morgan

2009-Apr-18 20:09 UTC

head link

[R] Dummy (factor) based on a pair of variables

On 2009.04.18 15:58:30, Jason Morgan wrote:> On 2009.04.18 13:52:35, Serguei Kaniovski wrote:
> > I can generate the above dummies but can this design be imputed in a 
> > reg. model directly?
Oops, I apologize for not reading the whole question. Can you do the
following:

lm(y ~ I(ifelse(df$i=="AUT"|df$j=="AUT", 1, 0)) +
       I(ifelse(df$i=="BEL"|df$j=="BEL", 1, 0)) +
       I(ifelse(df$i=="GER"|df$j=="GER", 1, 0)), data=df)

If you exclude the ifelse(), you will get a vector of TRUE/FALSE,
which may or may not work.

~Jason
> Hello Serguei,
> 
> I am sure there is a better way to do this, but the following seems to
> work:
> 
> # Create sample data.frame()
> i  <- c("AUT", "AUT", "BEL")
> j  <- c("BEL", "GER", "GER")
> df <- data.frame(i=i, j=j)
> 
> # Create dummy vectors
> df$d.aut <- ifelse(df$i=="AUT"|df$j=="AUT", 1, 0)
> df$d.bel <- ifelse(df$i=="BEL"|df$j=="BEL", 1, 0)
> df$d.ger <- ifelse(df$i=="GER"|df$j=="GER", 1, 0)
> 
> # Print results
> df
> 
> HTH,
> 
> ~Jason
> 
> 
--
Jason W. Morgan
Graduate Student, Political Science
*The Ohio State University*

David Winsemius

2009-Apr-18 22:06 UTC

head link

[R] Dummy (factor) based on a pair of variables

> df <- read.table(textConnection("y,i,j+ 1,AUT,BEL
+ 2,AUT,GER
+ 3,BEL,GER"), header=T,sep=",", as.is=T)
 > df
   y   i   j
1 1 AUT BEL
2 2 AUT GER
3 3 BEL GER
 > countries <- unique(c(df$i,df$j))
 > countries
[1] "AUT" "BEL" "GER"

 > df[countries] <- sapply(countries, function(x) df[x] <<- df$i ==
x
| df$j == x)
 > df
   y   i   j   AUT   BEL   GER
1 1 AUT BEL  TRUE  TRUE FALSE
2 2 AUT GER  TRUE FALSE  TRUE
3 3 BEL GER FALSE  TRUE  TRUE

Obviously it would not be possible to test this arrangement with lm.

So I tried scaling it up and testing on:
  dft <- data.frame(y=rnorm(100), i = sample(countries, 100,  
replace=T), j= sample(countries, 100, replace=T))
#Removed all the duplicates with:
dft <- dft(dft$i != dft$j, ]
#and it did not give proper answers.

This seems to give correct answers
  dft[countries] <- sapply(countries, function(y) apply(dft, 1,  
function(x)   x[2] == y | x[3] == y))

And application of those variables is handles in a reasonable manner  
by the R formula parser:
 > lm(y ~ AUT + BEL + GER, data=dft)

Call:
lm(formula = y ~ AUT + BEL + GER, data = dft)

Coefficients:
(Intercept)      AUTTRUE      BELTRUE      GERTRUE
     0.09192      0.15130     -0.29274           NA

-
David Winsemius


On Apr 18, 2009, at 4:09 PM, Jason Morgan wrote:
> On 2009.04.18 15:58:30, Jason Morgan wrote:
>> On 2009.04.18 13:52:35, Serguei Kaniovski wrote:
>>> I can generate the above dummies but can this design be imputed in
a
>>> reg. model directly?
>
> Oops, I apologize for not reading the whole question. Can you do the
> following:
>
> lm(y ~ I(ifelse(df$i=="AUT"|df$j=="AUT", 1, 0)) +
>       I(ifelse(df$i=="BEL"|df$j=="BEL", 1, 0)) +
>       I(ifelse(df$i=="GER"|df$j=="GER", 1, 0)),
data=df)
>
> If you exclude the ifelse(), you will get a vector of TRUE/FALSE,
> which may or may not work.
>
> ~Jason
>
>> Hello Serguei,
>>
>> I am sure there is a better way to do this, but the following seems  
>> to
>> work:
>>
>> # Create sample data.frame()
>> i  <- c("AUT", "AUT", "BEL")
>> j  <- c("BEL", "GER", "GER")
>> df <- data.frame(i=i, j=j)
>>
>> # Create dummy vectors
>> df$d.aut <- ifelse(df$i=="AUT"|df$j=="AUT", 1,
0)
>> df$d.bel <- ifelse(df$i=="BEL"|df$j=="BEL", 1,
0)
>> df$d.ger <- ifelse(df$i=="GER"|df$j=="GER", 1,
0)
>>
>> # Print results
>> df
>>
>> HTH,
>>
>> ~Jason
>>
>>
>
> --
> Jason W. Morgan
> Graduate Student, Political Science
> *The Ohio State University*
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT

Reasonably Related Threads

Search for more reasonably related threads

R help - Apr 2009 - Dummy (factor) based on a pair of variables

[R] Dummy (factor) based on a pair of variables

[R] Dummy (factor) based on a pair of variables

[R] Dummy (factor) based on a pair of variables

[R] Dummy (factor) based on a pair of variables

[R] Dummy (factor) based on a pair of variables

[R] Dummy (factor) based on a pair of variables

Reasonably Related Threads