thr3ads.net - R help - [R] aggregation question [Apr 2005]

If this information is useful, please help other people find it:
Share via:

Liaw, Andy

2005-Apr-15 14:38 UTC

[R] aggregation question

Is length(unique()) what you are looking for?

Andy
> From: Christoph Lehmann
> 
> Hi I have a question concerning aggregation
> 
> (simple demo code S. below)
> 
> I have the data.frame
> 
>     id        meas date
> 1   a 0.637513747    1
> 2   a 0.187710063    2
> 3   a 0.247098459    2
> 4   a 0.306447690    3
> 5   b 0.407573577    2
> 6   b 0.783255085    2
> 7   b 0.344265082    3
> 8   b 0.103893068    3
> 9   c 0.738649586    1
> 10  c 0.614154037    2
> 11  c 0.949924371    3
> 12  c 0.008187858    4
> 
> When I want for each id the sum of its meas I do:
> 
> 	aggregate(data$meas, list(id = data$id), sum)
> 
> If I want to know the number of meas(ures) for each id I do, eg
> 
> 	aggregate(data$meas, list(id = data$id), length)
> 
> NOW: Is there a way to compute the number of meas(ures) for 
> each id with 
> not identical date (e.g using diff()?
> so that I get eg:
> 
>    id x
> 1  a 3
> 2  b 2
> 3  c 4
> 
> 
> I am sure it must be possible
> 
> thanks for any (even short) hint
> 
> cheers
> Christoph
> 
> 
> 
> --------------
> data <- data.frame(c(rep("a", 4), rep("b", 4),
rep("c", 4)),
>                     runif(12), c(1, 2, 2, 3, 2, 2, 3, 3, 1, 2, 3, 4))
> names(data) <- c("id", "meas", "date")
> 
> m <- aggregate(data$meas, list(id = data$id), sum)
> names(m) <- c("id", "cum.meas")
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
>

Sundar Dorai-Raj

2005-Apr-15 14:48 UTC

head link

[R] aggregation question

Christoph Lehmann wrote on 4/15/2005 9:51 AM:> Hi I have a question concerning aggregation
> 
> (simple demo code S. below)
> 
> I have the data.frame
> 
>    id        meas date
> 1   a 0.637513747    1
> 2   a 0.187710063    2
> 3   a 0.247098459    2
> 4   a 0.306447690    3
> 5   b 0.407573577    2
> 6   b 0.783255085    2
> 7   b 0.344265082    3
> 8   b 0.103893068    3
> 9   c 0.738649586    1
> 10  c 0.614154037    2
> 11  c 0.949924371    3
> 12  c 0.008187858    4
> 
> When I want for each id the sum of its meas I do:
> 
>     aggregate(data$meas, list(id = data$id), sum)
> 
> If I want to know the number of meas(ures) for each id I do, eg
> 
>     aggregate(data$meas, list(id = data$id), length)
> 
> NOW: Is there a way to compute the number of meas(ures) for each id with 
> not identical date (e.g using diff()?
> so that I get eg:
> 
>   id x
> 1  a 3
> 2  b 2
> 3  c 4
> 
> 
> I am sure it must be possible
> 
> thanks for any (even short) hint
> 
> cheers
> Christoph
> 
> 
> 
> --------------
> data <- data.frame(c(rep("a", 4), rep("b", 4),
rep("c", 4)),
>                    runif(12), c(1, 2, 2, 3, 2, 2, 3, 3, 1, 2, 3, 4))
> names(data) <- c("id", "meas", "date")
> 
> m <- aggregate(data$meas, list(id = data$id), sum)
> names(m) <- c("id", "cum.meas")
> 

How about:

m <- aggregate(data["date"], data["id"],
                function(x) length(unique(x)))

--sundar

Christoph Lehmann

2005-Apr-15 14:51 UTC

head link

[R] aggregation question

Hi I have a question concerning aggregation

(simple demo code S. below)

I have the data.frame

    id        meas date
1   a 0.637513747    1
2   a 0.187710063    2
3   a 0.247098459    2
4   a 0.306447690    3
5   b 0.407573577    2
6   b 0.783255085    2
7   b 0.344265082    3
8   b 0.103893068    3
9   c 0.738649586    1
10  c 0.614154037    2
11  c 0.949924371    3
12  c 0.008187858    4

When I want for each id the sum of its meas I do:

	aggregate(data$meas, list(id = data$id), sum)

If I want to know the number of meas(ures) for each id I do, eg

	aggregate(data$meas, list(id = data$id), length)

NOW: Is there a way to compute the number of meas(ures) for each id with 
not identical date (e.g using diff()?
so that I get eg:

   id x
1  a 3
2  b 2
3  c 4


I am sure it must be possible

thanks for any (even short) hint

cheers
Christoph



--------------
data <- data.frame(c(rep("a", 4), rep("b", 4),
rep("c", 4)),
                    runif(12), c(1, 2, 2, 3, 2, 2, 3, 3, 1, 2, 3, 4))
names(data) <- c("id", "meas", "date")

m <- aggregate(data$meas, list(id = data$id), sum)
names(m) <- c("id", "cum.meas")

Liaw, Andy

2005-Apr-15 19:42 UTC

head link

[R] aggregation question

If I understood you correctly, here's one way:
> sumWO2 <- sapply(split(dat, dat$id), function(d) sum(d$meas[d$date !=
2]))
> sumWO2        a         b         c 
0.9439614 0.4481582 1.6967618 

Andy

> From: Christoph Lehmann 
> 
> Dear Sundar, dear Andy
> manyt thanks for the length(unique(x)) hint. It solves of course my 
> problem in a very elegant way. Just of curiosity (or for 
> potential future 
> problems): how could I solve it in a way, conceptually 
> different, namely, 
> that the computation on 'meas' being dependent on the 
> variable 'date'?, 
> means the computation on a variable x in the function passed 
> to aggregate 
> is conditional on the value of another variable y? I hope you 
> understand 
> what I mean, let's think of an example:
> 
> E.g for the example data.frame below, the sum shall be taken over the 
> variable meas only for all entries with a corresponding 'data' != 2
> 
> for this do I have to nest two aggregate statements, or is 
> there a way 
> using sapply or similar apply-based commands?
> 
> thanks a lot for your kind help.
> 
> Cheers!
> 
> Christoph
> 
> aggregate(data$meas, list(id = data$id), sum)
> > 
> > 
> > Christoph Lehmann wrote on 4/15/2005 9:51 AM:
> > > Hi I have a question concerning aggregation
> > > 
> > > (simple demo code S. below)
> > > 
> > > I have the data.frame
> > > 
> > >    id        meas date
> > > 1   a 0.637513747    1
> > > 2   a 0.187710063    2
> > > 3   a 0.247098459    2
> > > 4   a 0.306447690    3
> > > 5   b 0.407573577    2
> > > 6   b 0.783255085    2
> > > 7   b 0.344265082    3
> > > 8   b 0.103893068    3
> > > 9   c 0.738649586    1
> > > 10  c 0.614154037    2
> > > 11  c 0.949924371    3
> > > 12  c 0.008187858    4
> > > 
> > > When I want for each id the sum of its meas I do:
> > > 
> > >     aggregate(data$meas, list(id = data$id), sum)
> > > 
> > > If I want to know the number of meas(ures) for each id I do, eg
> > > 
> > >     aggregate(data$meas, list(id = data$id), length)
> > > 
> > > NOW: Is there a way to compute the number of meas(ures) 
> for each id 
> with
> > > not identical date (e.g using diff()?
> > > so that I get eg:
> > > 
> > >   id x
> > > 1  a 3
> > > 2  b 2
> > > 3  c 4
> > > 
> > > 
> > > I am sure it must be possible
> > > 
> > > thanks for any (even short) hint
> > > 
> > > cheers
> > > Christoph
> > > 
> > > 
> > > 
> > > --------------
> > > data <- data.frame(c(rep("a", 4), rep("b",
4), rep("c", 4)),
> > >                    runif(12), c(1, 2, 2, 3, 2, 2, 3, 3, 
> 1, 2, 3, 4))
> > > names(data) <- c("id", "meas",
"date")
> > > 
> > > m <- aggregate(data$meas, list(id = data$id), sum)
> > > names(m) <- c("id", "cum.meas")
> > > 
> > 
> > 
> > How about:
> > 
> > m <- aggregate(data["date"], data["id"],
> >                 function(x) length(unique(x)))
> > 
> > --sundar
> > 
> 
> -- 
> +++ GMX - Die erste Adresse f?r Mail, Message, More +++
> 
> 1 GB Mailbox bereits in GMX FreeMail http://www.gmx.net/de/go/mail
> 
> 
>

Liaw, Andy

2005-Apr-16 02:15 UTC

head link

[R] aggregation question

> From: Christoph Lehmann
> 
> great, Andy! Thanks a lot- I didn't know split. 
> So 'split' can be used as alternative for 'aggregate', with
> the advantage 
> that in the passed self-defined function one can consider 
> more than one 
> variable of the to-be-aggregated data.frame?
split() only split the data frame into a list of data frames, according to
the variable supplied as the second argument.  You can then use
sapply()/lapply() to apply the same operation on each piece, where each
piece contains all the variables.

Andy

 > Christoph
> > If I understood you correctly, here's one way:
> > 
> > > sumWO2 <- sapply(split(dat, dat$id), function(d) 
> sum(d$meas[d$date !> > 2]))
> > > sumWO2
> >         a         b         c 
> > 0.9439614 0.4481582 1.6967618 
> > 
> > Andy
> > 
> > 
> > > From: Christoph Lehmann 
> > > 
> > > Dear Sundar, dear Andy
> > > manyt thanks for the length(unique(x)) hint. It solves of 
> course my 
> > > problem in a very elegant way. Just of curiosity (or for 
> > > potential future 
> > > problems): how could I solve it in a way, conceptually 
> > > different, namely, 
> > > that the computation on 'meas' being dependent on the 
> > > variable 'date'?, 
> > > means the computation on a variable x in the function passed 
> > > to aggregate 
> > > is conditional on the value of another variable y? I hope you 
> > > understand 
> > > what I mean, let's think of an example:
> > > 
> > > E.g for the example data.frame below, the sum shall be 
> taken over the 
> > > variable meas only for all entries with a corresponding 
> 'data' != 2
> > > 
> > > for this do I have to nest two aggregate statements, or is 
> > > there a way 
> > > using sapply or similar apply-based commands?
> > > 
> > > thanks a lot for your kind help.
> > > 
> > > Cheers!
> > > 
> > > Christoph
> > > 
> > > aggregate(data$meas, list(id = data$id), sum)
> > > > 
> > > > 
> > > > Christoph Lehmann wrote on 4/15/2005 9:51 AM:
> > > > > Hi I have a question concerning aggregation
> > > > > 
> > > > > (simple demo code S. below)
> > > > > 
> > > > > I have the data.frame
> > > > > 
> > > > >    id        meas date
> > > > > 1   a 0.637513747    1
> > > > > 2   a 0.187710063    2
> > > > > 3   a 0.247098459    2
> > > > > 4   a 0.306447690    3
> > > > > 5   b 0.407573577    2
> > > > > 6   b 0.783255085    2
> > > > > 7   b 0.344265082    3
> > > > > 8   b 0.103893068    3
> > > > > 9   c 0.738649586    1
> > > > > 10  c 0.614154037    2
> > > > > 11  c 0.949924371    3
> > > > > 12  c 0.008187858    4
> > > > > 
> > > > > When I want for each id the sum of its meas I do:
> > > > > 
> > > > >     aggregate(data$meas, list(id = data$id), sum)
> > > > > 
> > > > > If I want to know the number of meas(ures) for each 
> id I do, eg
> > > > > 
> > > > >     aggregate(data$meas, list(id = data$id), length)
> > > > > 
> > > > > NOW: Is there a way to compute the number of meas(ures)
> > > for each id 
> > > with
> > > > > not identical date (e.g using diff()?
> > > > > so that I get eg:
> > > > > 
> > > > >   id x
> > > > > 1  a 3
> > > > > 2  b 2
> > > > > 3  c 4
> > > > > 
> > > > > 
> > > > > I am sure it must be possible
> > > > > 
> > > > > thanks for any (even short) hint
> > > > > 
> > > > > cheers
> > > > > Christoph
> > > > > 
> > > > > 
> > > > > 
> > > > > --------------
> > > > > data <- data.frame(c(rep("a", 4),
rep("b", 4), rep("c", 4)),
> > > > >                    runif(12), c(1, 2, 2, 3, 2, 2, 3, 3,
> > > 1, 2, 3, 4))
> > > > > names(data) <- c("id", "meas",
"date")
> > > > > 
> > > > > m <- aggregate(data$meas, list(id = data$id), sum)
> > > > > names(m) <- c("id", "cum.meas")
> > > > > 
> > > > 
> > > > 
> > > > How about:
> > > > 
> > > > m <- aggregate(data["date"],
data["id"],
> > > >                 function(x) length(unique(x)))
> > > > 
> > > > --sundar
> > > > 
> > > 
> > > -- 
> > > +++ GMX - Die erste Adresse f?r Mail, Message, More +++
> > > 
> > > 1 GB Mailbox bereits in GMX FreeMail
http://www.gmx.net/de/go/mail
> > > 
> > > 
> > >
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> > 
> 
> -- 
> +++ NEU: GMX DSL_Flatrate! Schon ab 14,99 EUR/Monat! +++
> 
> GMX Garantie: Surfen ohne Tempo-Limit! http://www.gmx.net/de/go/dsl
> 
> 
>

Maybe Matching Threads

Search for more reasonably related threads

R help - Apr 2005 - aggregation question

[R] aggregation question

[R] aggregation question

[R] aggregation question

[R] aggregation question

[R] aggregation question

Maybe Matching Threads