thr3ads.net - R help - [R] creating a derived variable in a data frame [Oct 2005]

If this information is useful, please help other people find it:
Share via:

Avram Aelony

2005-Oct-20 00:09 UTC

[R] creating a derived variable in a data frame

Hello,

I have read through the manuals and can't seem to find an answer.

I have a categorical, character variable that has hundreds of values.  I want to
group the existing values of this variable into a new, derived (categorical)
variable by applying conditions to the values in the data.

For example, suppose I have a data frame with variables: date, country, x, y,
and z.

x,y,z are numeric and country is a 2-digit character string.  I want to create a
new derived variable named "continent" that would also exist in the
data frame. The Continent variable would have values of "Asia",
"Europe", "North America", etc...

How would this best be done for a large dataset (>10MB) ?  
I have tried many variations on following without success (note in a real
example I would have a longer list of countries and continent values):
> mydata$continent <- mydata[
mydata$country==list('US','CA','MX'), ] ->
"North America"
I have read about factors, but I am not sure how they apply here.  

Can anyone help me with the syntax?  I am sure it is trivial and a common thing
to do.
The ultimate goal is to compute percentages of x by continent.

Thanks for any help in advance.

-Avram

ronggui

2005-Oct-20 01:00 UTC

head link

[R] creating a derived variable in a data frame

I suggest you use the recode function in car package to do your job.
	

======= 2005-10-20 08:09:08 伳侜佋佢伬伌佇伵佒佇佇伌伒伬仯伜======>Hello,
>
>I have read through the manuals and can't seem to find an answer.
>
>I have a categorical, character variable that has hundreds of values.  I
want to group the existing values of this variable into a new, derived
(categorical) variable by applying conditions to the values in the data.
>
>For example, suppose I have a data frame with variables: date, country, x,
y, and z.
>
>x,y,z are numeric and country is a 2-digit character string.  I want to
create a new derived variable named "continent" that would also exist
in the data frame. The Continent variable would have values of "Asia",
"Europe", "North America", etc...
>
>How would this best be done for a large dataset (>10MB) ?  
>I have tried many variations on following without success (note in a real
example I would have a longer list of countries and continent values):
>
>> mydata$continent <- mydata[
mydata$country==list('US','CA','MX'), ] ->
"North America"
>
>I have read about factors, but I am not sure how they apply here.  
>
>Can anyone help me with the syntax?  I am sure it is trivial and a common
thing to do.
>The ultimate goal is to compute percentages of x by continent.
>
>Thanks for any help in advance.
>
>-Avram
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
= = = = = = = = = = = = = = = = = = = 			


 

2005-10-20

------
Deparment of Sociology
Fudan University

My new mail addres is ronggui.huang at gmail.com
Blog:http://sociology.yculblog.com

Martin Henry H. Stevens

2005-Oct-20 14:47 UTC

head link

[R] creating a derived variable in a data frame

Hi Avram-
How many countries do you have?
I would do it the following way because it is simple and I don't know  
any better, even if it is  absurdly painstaking.

#Step 1
mydata$continent <- factor(NA, levels=c("NoAm","Euro"))

#Steps 2 a-z
mydata$continent[mydata$country=="US" |
                                 mydata$country=="CA" |
                                mydata$country=="MX" ]  <-
"NoAm"

#Repeat for all countries and continents.

Hank


On Oct 19, 2005, at 8:09 PM, Avram Aelony wrote:
> Hello,
>
> I have read through the manuals and can't seem to find an answer.
>
> I have a categorical, character variable that has hundreds of  
> values.  I want to group the existing values of this variable into  
> a new, derived (categorical) variable by applying conditions to the  
> values in the data.
>
> For example, suppose I have a data frame with variables: date,  
> country, x, y, and z.
>
> x,y,z are numeric and country is a 2-digit character string.  I  
> want to create a new derived variable named "continent" that
would
> also exist in the data frame. The Continent variable would have  
> values of "Asia", "Europe", "North America",
etc...
>
> How would this best be done for a large dataset (>10MB) ?
> I have tried many variations on following without success (note in  
> a real example I would have a longer list of countries and  
> continent values):
>
>
>> mydata$continent <- mydata[ mydata$country==list 
>> ('US','CA','MX'), ] -> "North
America"
>>
>
> I have read about factors, but I am not sure how they apply here.
>
> Can anyone help me with the syntax?  I am sure it is trivial and a  
> common thing to do.
> The ultimate goal is to compute percentages of x by continent.
>
> Thanks for any help in advance.
>
> -Avram
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting- 
> guide.html
>
Dr. Martin Henry H. Stevens, Assistant Professor
338 Pearson Hall
Botany Department
Miami University
Oxford, OH 45056

Office: (513) 529-4206
Lab: (513) 529-4262
FAX: (513) 529-4243
http://www.cas.muohio.edu/~stevenmh/
http://www.muohio.edu/ecology/
http://www.muohio.edu/botany/
"E Pluribus Unum"

Greg Snow

2005-Oct-20 15:37 UTC

head link

[R] creating a derived variable in a data frame

>>>> "Martin Henry H. Stevens" <HStevens at
MUOhio.edu> 10/20/05 08:47AM
>>>
>Hi Avram-
>How many countries do you have?
>I would do it the following way because it is simple and I don't know 
>any better, even if it is  absurdly painstaking.
>
>#Step 1
>mydata$continent <- factor(NA,
levels=c("NoAm","Euro"))
>
>#Steps 2 a-z
>mydata$continent[mydata$country=="US" |
>                                 mydata$country=="CA" |
>                                mydata$country=="MX" ]  <-
"NoAm"
A shorter alternative to the above is to use %in% like:

mydata$continent[ mydata$country %in%
c("US","CA","MX") ] <- "NoAm"

You could also create a new data frame with 2 columns for the country
and 
corresponding continent, then merge this with your data (see ?merge).
>
>#Repeat for all countries and continents.
>
>Hank
>
>
>On Oct 19, 2005, at 8:09 PM, Avram Aelony wrote:
>
>> Hello,
>>
>> I have read through the manuals and can't seem to find an answer.
>>
>> I have a categorical, character variable that has hundreds of  
>> values.  I want to group the existing values of this variable into 
>> a new, derived (categorical) variable by applying conditions to the 
>> values in the data.
>>
>> For example, suppose I have a data frame with variables: date,  
>> country, x, y, and z.
>>
>> x,y,z are numeric and country is a 2-digit character string.  I  
>> want to create a new derived variable named "continent" that
would
>> also exist in the data frame. The Continent variable would have  
>> values of "Asia", "Europe", "North
America", etc...
>>
>> How would this best be done for a large dataset (>10MB) ?
>> I have tried many variations on following without success (note in 
>> a real example I would have a longer list of countries and  
>> continent values):
>>
>>
>>> mydata$continent <- mydata[ mydata$country==list 
>>> ('US','CA','MX'), ] -> "North
America"
>>>
>>
>> I have read about factors, but I am not sure how they apply here.
>>
>> Can anyone help me with the syntax?  I am sure it is trivial and a 
>> common thing to do.
>> The ultimate goal is to compute percentages of x by continent.
>>
>> Thanks for any help in advance.
>>
>> -Avram
>

Greg Snow, Ph.D.
Statistical Data Center, LDS Hospital
Intermountain Health Care
greg.snow at ihc.com
(801) 408-8111

Johnson, Andrea

2005-Oct-20 16:21 UTC

head link

[R] creating a derived variable in a data frame

Check out this website for a couple examples of how to use transform()
and replace() - (look under recode):
http://www.ku.edu/~pauljohn/R/Rtips.html
 
-Andrea 
 
 >Hello,
>
>I have read through the manuals and can't seem to find an answer.
>
>I have a categorical, character variable that has hundreds of values.
I want >to group the existing values of this variable into a new, derived 
>(categorical) variable by applying conditions to the values in the
data.>
>For example, suppose I have a data frame with variables: date, country,
x, y, >and z.  
>
>x,y,z are numeric and country is a 2-digit character string.  I want to
create >a new derived variable named "continent" that would also exist in
the
data >frame. The Continent variable would have values of "Asia",
"Europe",
"North >America", etc...   
>
>How would this best be done for a large dataset (>10MB) ?  
>I have tried many variations on following without success (note in a
real >example I would have a longer list of countries and continent values):
>
>> mydata$continent <- mydata[
mydata$country==list('US','CA','MX'), ]
-> >> "North America"
>
>I have read about factors, but I am not sure how they apply here.  
>
>Can anyone help me with the syntax?  I am sure it is trivial and a
common >thing to do.
>The ultimate goal is to compute percentages of x by continent.
>
>Thanks for any help in advance.
>
>-Avram
>

	[[alternative HTML version deleted]]

Apparently Analagous Threads

Search for more reasonably related threads

R help - Oct 2005 - creating a derived variable in a data frame

[R] creating a derived variable in a data frame

[R] creating a derived variable in a data frame

[R] creating a derived variable in a data frame

[R] creating a derived variable in a data frame

[R] creating a derived variable in a data frame

Apparently Analagous Threads