thr3ads.net - R help - [R] Grouping data [Jan 2008]

If this information is useful, please help other people find it:
Share via:

K. Elo

2008-Jan-16 20:06 UTC

[R] Grouping data

Hi,

I am quite new to R (but like it very much!), so please apologize if 
this is a too simple question.

I have a large data frame consisting of data from a survey. There is, 
for example, information about age and education (a numeric value from 
1-9). Now I would like to extract the total amount of each type of 
education within different age groups (e.g. from 18 to 25, from 25 to 
35 etc.). How could I achieve this? (I have been thinking about 
using 'subset', but if there are better ideas they are welcome :) )

An example might clarify my point. Let's assume the following data:
#	age	edu
1	25	2
2	33	5
3	22	3
4	19	1
5	21	3
6	30	4
7	32	4
8	31	1

What I want to have is:

edu	18-25	25-35 ...
1	1	1
2	1	0
3	2	0
4	0	2
5	0	1

Thanks in advance & kind regards,
Kimmo

Andrew Robinson

2008-Jan-16 20:13 UTC

head link

[R] Grouping data

Hi Kimmo,

try cut() to create a factor with levels according to the range of
values, and (among other options) table() to make the table.

Cheers

Andrew.

On Wed, Jan 16, 2008 at 10:06:23PM +0200, K. Elo wrote:> Hi,
> 
> I am quite new to R (but like it very much!), so please apologize if 
> this is a too simple question.
> 
> I have a large data frame consisting of data from a survey. There is, 
> for example, information about age and education (a numeric value from 
> 1-9). Now I would like to extract the total amount of each type of 
> education within different age groups (e.g. from 18 to 25, from 25 to 
> 35 etc.). How could I achieve this? (I have been thinking about 
> using 'subset', but if there are better ideas they are welcome :) )
> 
> An example might clarify my point. Let's assume the following data:
> #	age	edu
> 1	25	2
> 2	33	5
> 3	22	3
> 4	19	1
> 5	21	3
> 6	30	4
> 7	32	4
> 8	31	1
> 
> What I want to have is:
> 
> edu	18-25	25-35 ...
> 1	1	1
> 2	1	0
> 3	2	0
> 4	0	2
> 5	0	1
> 
> Thanks in advance & kind regards,
> Kimmo
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
-- 
Andrew Robinson  
Department of Mathematics and Statistics            Tel: +61-3-8344-9763
University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/

Marc Schwartz

2008-Jan-16 20:24 UTC

head link

[R] Grouping data

K. Elo wrote:> Hi,
>
> I am quite new to R (but like it very much!), so please apologize if
> this is a too simple question.
>
> I have a large data frame consisting of data from a survey. There is,
> for example, information about age and education (a numeric value from
> 1-9). Now I would like to extract the total amount of each type of
> education within different age groups (e.g. from 18 to 25, from 25 to
> 35 etc.). How could I achieve this? (I have been thinking about
> using 'subset', but if there are better ideas they are welcome :) )
>
> An example might clarify my point. Let's assume the following data:
> #	age	edu
> 1	25	2
> 2	33	5
> 3	22	3
> 4	19	1
> 5	21	3
> 6	30	4
> 7	32	4
> 8	31	1
>
> What I want to have is:
>
> edu	18-25	25-35 ...
> 1	1	1
> 2	1	0
> 3	2	0
> 4	0	2
> 5	0	1
>
> Thanks in advance&  kind regards,
> Kimmo
See ?cut which will enable you to take a continuous vector and convert 
it into a factor based upon breakpoints. Use this combined with ?table 
which will give you a cross tabulation. Something along the lines of the 
following, presuming that your data is in a data frame called 'DF':

 > with(DF, table(edu, cut(age, breaks = c(18, 25, 35))))

edu (18,25] (25,35]
   1       1       1
   2       1       0
   3       2       0
   4       0       2
   5       0       1

Note the default symbology of the returned labels indicating whether or 
not the interval breakpoints are open or closed in each grouping. This 
is covered in the help for cut(). Pay attention to the 'include.lowest' 
and 'right' arguments.

Note also the 'trick' of using with() here, so that the column names are
evaluated within the *environment* of the dataframe. See ?with for more 
information there.

HTH,

Marc Schwartz

John Kane

2008-Jan-17 14:44 UTC

head link

[R] Grouping data

You might want to have a look at the recode function
in  the car package.  By the way I think you meant
26-35 not 25-25. 
==================================================Example
 xx <- data.frame(age=c(25, 33, 22, 19,21, 30, 32,
31),
         edu=c(2,5 ,3, 1,3, 4, 4, 1)) 
   
library(car)
   
aa <- recode(xx$age, "18:25='A'; 26:35='B'") ; aa
   
table(xx$edu, aa)
==================================================
--- "K. Elo" <maillists at nic.fi> wrote:
> Hi,
> 
> I am quite new to R (but like it very much!), so
> please apologize if 
> this is a too simple question.
> 
> I have a large data frame consisting of data from a
> survey. There is, 
> for example, information about age and education (a
> numeric value from 
> 1-9). Now I would like to extract the total amount
> of each type of 
> education within different age groups (e.g. from 18
> to 25, from 25 to 
> 35 etc.). How could I achieve this? (I have been
> thinking about 
> using 'subset', but if there are better ideas they
> are welcome :) )
> 
> An example might clarify my point. Let's assume the
> following data:
> #	age	edu
> 1	25	2
> 2	33	5
> 3	22	3
> 4	19	1
> 5	21	3
> 6	30	4
> 7	32	4
> 8	31	1
> 
> What I want to have is:
> 
> edu	18-25	25-35 ...
> 1	1	1
> 2	1	0
> 3	2	0
> 4	0	2
> 5	0	1
> 
> Thanks in advance & kind regards,
> Kimmo
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
> 


[[replacing trailing spam]]

Apparently Analagous Threads

Search for more maybe matching threads

R help - Jan 2008 - Grouping data

[R] Grouping data

[R] Grouping data

[R] Grouping data

[R] Grouping data

Apparently Analagous Threads