On Jun 1, 2009, at 1:14 PM, Joseph Magagnoli wrote:
> Dear All,
> I am practicing data manipulation and I would like to generarte a
> count
> variable. My data looks like this:
>
>
> Country MID
> 1 NA
> 1 0
> 1 0
> 1 1
> 1 0
> 2 0
> 2 1
> 2 0
> 2 0
> 2 0
>
> I would like to to generate a variable that counts the periods of
> zeros in
> the MID variable for each country for example:
> Country MID Count
> 1 NA # ya' gotta put something
> there
> 1 0 1
> 1 0 2
> 1 1 0
> 1 0 1
> 2 0 1
> 2 1 0
> 2 0 1
> 2 0 2
> 2 0 3
> I am used to doing my data manipulation in stata but I want to try
> learn to
> do it in R.
The rle function is generally useful for such problems. Having created
a data.frame, dd, with those elements:
rledd<- rle(paste(dd$Country,dd$MID,sep=".") )
as.vector(unlist(sapply(rledd$lengths, FUN=function(x) seq(1,x)))) -
dd$MID
[1] NA 1 2 0 1 1 0 1 2 3
> dd$Count <- as.vector(unlist(sapply(rledd$lengths, FUN=function(x)
seq(1,x))))-dd$MID
> dd
Country MID Count
1 1 NA NA
2 1 0 1
3 1 0 2
4 1 1 0
5 1 0 1
6 2 0 1
7 2 1 0
8 2 0 1
9 2 0 2
10 2 0 3
--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT