Francesco Sarracino
2012-Dec-10 14:33 UTC
[R] equivalent of group command of the egen function in Stata
Dear R listers,
I am trying to create a new variable that uniquely identifies groups of
observations in a dataset. So far I couldn't figure out how to do this in
R. In Stata I would simply type:
egen newvar = group(dim1, dim2, dim3)
Please, find below a quick example to show what I am dealing with:
I have a dataset with 4 variables:
var <- runif(50) ## a variable that I want to group
dim1 <- factor(rep(1:3, length.out= 50), labels =
c("x","y","z") ) ## 3
variables that should form the groups
dim2 <- rep(1:2, length.out= 50)
dim3 <- rep(1:5, length.out= 50)
data <- data.frame(var, dim1, dim2, dim3)
I am trying to build a fifth one (let's say: group_id) to uniquely identify
groups of observations as defined by dim1, dim2 and dim3, i.e. 30 groups.
can you please help me figuring out how to do it?
thanks in advance,
f.
--
Francesco Sarracino, Ph.D.
https://sites.google.com/site/fsarracino/
[[alternative HTML version deleted]]
Ista Zahn
2012-Dec-10 14:58 UTC
[R] equivalent of group command of the egen function in Stata
Hi, On Mon, Dec 10, 2012 at 9:33 AM, Francesco Sarracino <f.sarracino at gmail.com> wrote:> > Dear R listers, > > I am trying to create a new variable that uniquely identifies groups of > observations in a dataset. So far I couldn't figure out how to do this in > R. In Stata I would simply type: > egen newvar = group(dim1, dim2, dim3)A rough equivalent is dat$group <- with(dat, interaction(dim1, dim2, dim3)) The differences between this and the Stata command are that the result in R is a factor rather than numeric, and the default ordering is different. Best, Ista> > > Please, find below a quick example to show what I am dealing with: > I have a dataset with 4 variables: > var <- runif(50) ## a variable that I want to group > dim1 <- factor(rep(1:3, length.out= 50), labels = c("x","y","z") ) ## 3 > variables that should form the groups > dim2 <- rep(1:2, length.out= 50) > dim3 <- rep(1:5, length.out= 50) > > data <- data.frame(var, dim1, dim2, dim3) > > I am trying to build a fifth one (let's say: group_id) to uniquely identify > groups of observations as defined by dim1, dim2 and dim3, i.e. 30 groups. > > can you please help me figuring out how to do it? > thanks in advance, > f. > > -- > Francesco Sarracino, Ph.D. > https://sites.google.com/site/fsarracino/ > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi,
Try this:
#changed data to dat1
?list1<-split(dat1,list(dat1$dim1,dat1$dim2,dat1$dim3))
names(list1)<-1:length(list1)
?res<-do.call(rbind,lapply(list1,function(x)
data.frame(x,group=names(list1)[match.call()[[2]][[3]]])))
?row.names(res)<-1:nrow(res)
?head(res)
#???????? var dim1 dim2 dim3 group
#1 0.06896418??? x??? 1??? 1???? 1
#2 0.44958942??? x??? 1??? 1???? 1
#3 0.08163725??? y??? 1??? 1???? 2
#4 0.21945238??? y??? 1??? 1???? 2
#5 0.05695142??? z??? 1??? 1???? 3
#6 0.36656387??? x??? 2??? 1???? 4
A.K.
----- Original Message -----
From: Francesco Sarracino <f.sarracino at gmail.com>
To: r-help at r-project.org
Cc:
Sent: Monday, December 10, 2012 9:33 AM
Subject: [R] equivalent of group command of the egen function in Stata
Dear R listers,
I am trying to create a new variable that uniquely identifies groups of
observations in a dataset. So far I couldn't figure out how to do this in
R. In Stata I would simply type:
egen newvar = group(dim1, dim2, dim3)
Please, find below a quick example to show what I am dealing with:
I have a dataset with 4 variables:
var <- runif(50)? ## a variable that I want to group
dim1 <- factor(rep(1:3, length.out= 50), labels =
c("x","y","z") ) ## 3
variables that should form the groups
dim2 <- rep(1:2, length.out= 50)
dim3 <- rep(1:5, length.out= 50)
data <- data.frame(var, dim1, dim2, dim3)
I am trying to build a fifth one (let's say: group_id) to uniquely identify
groups of observations as defined by dim1, dim2 and dim3, i.e. 30? groups.
can you please help me figuring out how to do it?
thanks in advance,
f.
--
Francesco Sarracino, Ph.D.
https://sites.google.com/site/fsarracino/
??? [[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hi,
May be this also helps:
?dat2<-within(dat1,{group<-as.numeric(factor(paste0(dim1,dim2,dim3)))})
?head(dat2)
#??????? var dim1 dim2 dim3 group
#1 0.5366483??? x??? 1??? 1???? 1
#2 0.3081562??? y??? 2??? 2??? 17
#3 0.1493687??? z??? 1??? 3??? 23
#4 0.3202687??? x??? 2??? 4???? 9
#5 0.1177976??? y??? 1??? 5??? 15
#6 0.7709756??? z??? 2??? 1??? 26
A.K.
----- Original Message -----
From: Francesco Sarracino <f.sarracino at gmail.com>
To: r-help at r-project.org
Cc:
Sent: Monday, December 10, 2012 9:33 AM
Subject: [R] equivalent of group command of the egen function in Stata
Dear R listers,
I am trying to create a new variable that uniquely identifies groups of
observations in a dataset. So far I couldn't figure out how to do this in
R. In Stata I would simply type:
egen newvar = group(dim1, dim2, dim3)
Please, find below a quick example to show what I am dealing with:
I have a dataset with 4 variables:
var <- runif(50)? ## a variable that I want to group
dim1 <- factor(rep(1:3, length.out= 50), labels =
c("x","y","z") ) ## 3
variables that should form the groups
dim2 <- rep(1:2, length.out= 50)
dim3 <- rep(1:5, length.out= 50)
data <- data.frame(var, dim1, dim2, dim3)
I am trying to build a fifth one (let's say: group_id) to uniquely identify
groups of observations as defined by dim1, dim2 and dim3, i.e. 30? groups.
can you please help me figuring out how to do it?
thanks in advance,
f.
--
Francesco Sarracino, Ph.D.
https://sites.google.com/site/fsarracino/
??? [[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.