thr3ads.net - R help - [R] group bunch of lines in a data.frame, an additional requirement [Sep 2006]

If this information is useful, please help other people find it:
Share via:

Emmanuel Levy

2006-Sep-13 16:38 UTC

[R] group bunch of lines in a data.frame, an additional requirement

Thanks for pointing me out "aggregate", that works fine!

There is one complication though: I have mixed types (numerical and character),

So the matrix is of the form:

A 1.0 200 ID1
A 3.0 800 ID1
A 2.0 200 ID1
B 0.5 20   ID2
B 0.9 50   ID2
C 5.0 70   ID1

One letter always has the same ID but one ID can be shared by many
letters (like ID1)

I just want to keep track of the ID, and get a matrix like:

A 2.0 400 ID1
B 0.7 35 ID2
C 5.0 70 ID1

Any idea on how to do that without a loop?

  Many thanks,

     Emmanuel

On 9/12/06, Emmanuel Levy <emmanuel.levy at gmail.com>
wrote:> Hello,
>
> I'd like to group the lines of a matrix so that:
> A 1.0 200
> A 3.0 800
> A 2.0 200
> B 0.5 20
> B 0.9 50
> C 5.0 70
>
> Would give:
> A 2.0 400
> B 0.7 35
> C 5.0 70
>
> So all lines corresponding to a letter (level), become a single line
> where all the values of each column are averaged.
>
> I've done that with a loop but it doesn't sound right (it is very
> slow). I imagine there is a
> sort of "apply" shortcut but I can't figure it out.
>
> Please note that it is not exactly a matrix I'm using, the function
> "typeof" tells me it's a list, however I access to it like it
was a
> matrix.
>
> Could someone help me with the right function to use, a help topic or
> a piece of code?
>
> Thanks,
>
>   Emmanuel
>

Marc Schwartz (via MN)

2006-Sep-13 17:33 UTC

head link

[R] group bunch of lines in a data.frame, an additional requirement

Try something like this:

# Initial data frame> DF  V1  V2  V3  V4
1  A 1.0 200 ID1
2  A 3.0 800 ID1
3  A 2.0 200 ID1
4  B 0.5  20 ID2
5  B 0.9  50 ID2
6  C 5.0  70 ID1


# Now do the aggregation to get the means
DF.1 <- aggregate(DF[, 2:3], list(V1 = DF$V1), mean)

> DF.1  V1  V2  V3
1  A 2.0 400
2  B 0.7  35
3  C 5.0  70


# Now get the unique combinations of letters and IDs in DF
DF.U <- unique(DF[, c("V1", "V4")])
> DF.U  V1  V4
1  A ID1
4  B ID2
6  C ID1


# Now merge the two data frames together, matching the letters
DF.NEW <- merge(DF.1, DF.U, by = "V1")
> DF.NEW  V1  V2  V3  V4
1  A 2.0 400 ID1
2  B 0.7  35 ID2
3  C 5.0  70 ID1


See ?unique and ?merge for more information.

Also, for the sake of clarification, these are not matrices, but data
frames. A matrix may contain only one data type, whereas data frames are
specifically designed to contain multiple data types as you have here.

HTH,

Marc Schwartz

On Wed, 2006-09-13 at 17:38 +0100, Emmanuel Levy wrote:> Thanks for pointing me out "aggregate", that works fine!
> 
> There is one complication though: I have mixed types (numerical and
character),
> 
> So the matrix is of the form:
> 
> A 1.0 200 ID1
> A 3.0 800 ID1
> A 2.0 200 ID1
> B 0.5 20   ID2
> B 0.9 50   ID2
> C 5.0 70   ID1
> 
> One letter always has the same ID but one ID can be shared by many
> letters (like ID1)
> 
> I just want to keep track of the ID, and get a matrix like:
> 
> A 2.0 400 ID1
> B 0.7 35 ID2
> C 5.0 70 ID1
> 
> Any idea on how to do that without a loop?
> 
>   Many thanks,
> 
>      Emmanuel
> 
> On 9/12/06, Emmanuel Levy <emmanuel.levy at gmail.com> wrote:
> > Hello,
> >
> > I'd like to group the lines of a matrix so that:
> > A 1.0 200
> > A 3.0 800
> > A 2.0 200
> > B 0.5 20
> > B 0.9 50
> > C 5.0 70
> >
> > Would give:
> > A 2.0 400
> > B 0.7 35
> > C 5.0 70
> >
> > So all lines corresponding to a letter (level), become a single line
> > where all the values of each column are averaged.
> >
> > I've done that with a loop but it doesn't sound right (it is
very
> > slow). I imagine there is a
> > sort of "apply" shortcut but I can't figure it out.
> >
> > Please note that it is not exactly a matrix I'm using, the
function
> > "typeof" tells me it's a list, however I access to it
like it was a
> > matrix.
> >
> > Could someone help me with the right function to use, a help topic or
> > a piece of code?
> >
> > Thanks,
> >
> >   Emmanuel
> >

Gabor Grothendieck

2006-Sep-13 17:44 UTC

head link

[R] group bunch of lines in a data.frame, an additional requirement

See below.

On 9/13/06, Emmanuel Levy <emmanuel.levy at gmail.com>
wrote:> Thanks for pointing me out "aggregate", that works fine!
>
> There is one complication though: I have mixed types (numerical and
character),
>
> So the matrix is of the form:
>
> A 1.0 200 ID1
> A 3.0 800 ID1
> A 2.0 200 ID1
> B 0.5 20   ID2
> B 0.9 50   ID2
> C 5.0 70   ID1
>
> One letter always has the same ID but one ID can be shared by many
> letters (like ID1)
>
> I just want to keep track of the ID, and get a matrix like:
>
> A 2.0 400 ID1
> B 0.7 35 ID2
> C 5.0 70 ID1
>
> Any idea on how to do that without a loop?
If V4 is a function of V1 then you can aggregate by it too and it will
appear but have no effect on the classification:
> aggregate(DF[2:3], DF[c(1,4)], mean)  V1  V4  V2  V3
1  A ID1 2.0 400
2  C ID1 5.0  70
3  B ID2 0.7  35

>
>  Many thanks,
>
>     Emmanuel
>
> On 9/12/06, Emmanuel Levy <emmanuel.levy at gmail.com> wrote:
> > Hello,
> >
> > I'd like to group the lines of a matrix so that:
> > A 1.0 200
> > A 3.0 800
> > A 2.0 200
> > B 0.5 20
> > B 0.9 50
> > C 5.0 70
> >
> > Would give:
> > A 2.0 400
> > B 0.7 35
> > C 5.0 70
> >
> > So all lines corresponding to a letter (level), become a single line
> > where all the values of each column are averaged.
> >
> > I've done that with a loop but it doesn't sound right (it is
very
> > slow). I imagine there is a
> > sort of "apply" shortcut but I can't figure it out.
> >
> > Please note that it is not exactly a matrix I'm using, the
function
> > "typeof" tells me it's a list, however I access to it
like it was a
> > matrix.
> >
> > Could someone help me with the right function to use, a help topic or
> > a piece of code?
> >
> > Thanks,
> >
> >   Emmanuel
> >
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Emmanuel Levy

2006-Sep-15 14:32 UTC

head link

[R] group bunch of lines in a data.frame, an additional requirement

(re)-Hello
I actually thought about another possibility with a "1" column, a sum
(instead of a mean), and a division of the columns for which I want
the mean:
> DF = data.frame(
V1=c("A","A","A","B","B","C")
, V2=c(1,3,2,0.5,0.9,5.0), V3=c(200,800,200,20,50,70),
V4=c("ID1","ID1","ID1","ID2","ID2","ID3"))
> DF2 = cbind(DF,o=rep(1,length(DF[,1])))
> DF3 = aggregate(DF2[,c(2,3,5)], data.frame(code=DF2[,1],id=DF2[,4]), sum,
na.rm=T)
> DF3[,3:4]=DF3[,3:4]/DF3[,5]
> DF3  code  id  V2  V3 o
1    A ID1 2.0 400 3
2    B ID2 0.7  35 2
3    C ID3 5.0  70 1

Thanks again,

  Best,

   Emmanuel

On 9/15/06, Emmanuel Levy <emmanuel.levy at gmail.com>
wrote:> Dear Mark, dear Gabor,
>
> Thanks again for your help! It is great to be able to get answers when
> no-one (can you believe this?) uses R in your lab.
>
> The package doBy looks really convenient for many puposes.
>
> Best,
>
>    Emmanuel
>
> On 9/15/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> > Here are three different ways to do it:
> >
> > # base R
> > fb <- function(x)
> >    c(V1 = x$V1[1], V4 = x$V4[1], V2.mean = mean(x$V2),
> >      V3.mean = mean(x$V3), n = length(x$V1))
> > do.call(rbind, by(DF, DF[c(1,4)], fb))
> >
> > # package doBy
> > library(doBy)
> > summaryBy(V2 + V3 ~ V1 + V4, DF, FUN = c(mean, length))[,-5]
> >
> > # package reshape
> > library(reshape)
> > f <- function(x) c(mean = mean(x), n = length(x))
> > cast(melt(DF, id = c(1,4)), V1 + V4 ~ variable, fun.aggregate =
f)[,-6]
> >
> > ----
> >
> > > # base R
> > > fb <- function(x)
> > +    c(V1 = x$V1[1], V4 = x$V4[1], V2.mean = mean(x$V2),
> > +      V3.mean = mean(x$V3), n = length(x$V1))
> > > do.call(rbind, by(DF, DF[c(1,4)], fb))
> >      V1 V4 V2.mean V3.mean n
> > [1,]  1  1     2.0     400 3
> > [2,]  3  1     5.0      70 1
> > [3,]  2  2     0.7      35 2
> > >
> > > # package doBy
> > > library(doBy)
> > > summaryBy(V2 + V3 ~ V1 + V4, DF, FUN = c(mean, length))[,-5]
> >   V1  V4 mean.V2 mean.V3 length.V3
> > 1  A ID1     2.0     400         3
> > 2  C ID1     5.0      70         1
> > 3  B ID2     0.7      35         2
> > >
> > > # package reshape
> > > library(reshape)
> > > f <- function(x) c(mean = mean(x), n = length(x))
> > > cast(melt(DF, id = c(1,4)), V1 + V4 ~ variable, fun.aggregate =
f)[,-6]
> >   V1  V4 V2_mean V2_n V3_mean
> > 1  A ID1     2.0    3     400
> > 2  B ID2     0.7    2      35
> > 3  C ID1     5.0    1      70
> >
> >
> >
> >
> >
> >
> > ---
> >
> > > library(doBy)
> > > summaryBy(V2 + V3 ~ V1 + V4, DF, FUN = c(mean, length))[,-5]
> >   V1  V4 mean.V2 mean.V3 length.V3
> > 1  A ID1     2.0     400         3
> > 2  C ID1     5.0      70         1
> > 3  B ID2     0.7      35         2
> > >
> > > library(reshape)
> > > f <- function(x) c(mean = mean(x), n = length(x))
> > > cast(melt(DF, id = c(1,4)), V1 + V4 ~ variable, fun.aggregate =
f)[,-6]
> >   V1  V4 V2_mean V2_n V3_mean
> > 1  A ID1     2.0    3     400
> > 2  B ID2     0.7    2      35
> > 3  C ID1     5.0    1      70
> >
> >
> >
> > On 9/14/06, Emmanuel Levy <emmanuel.levy at gmail.com> wrote:
> > > Thanks Gabor, that is much faster than using a loop!
> > >
> > > I've got a last question:
> > >
> > > Can you think of a fast way of keeping track of the number of
> > > observations collapsed for each entry?
> > >
> > > i.e. I'd like to end up with:
> > >
> > > A 2.0 400 ID1 3 (3obs in the first matrix)
> > > B 0.7 35 ID2 2 (2obs in the first matrix)
> > > C 5.0 70 ID1 1 (1obs in the first matrix)
> > >
> > > Or is it required to use an temporary matrix that is merged
later? (As
> > > examplified by Mark in a previous email?)
> > >
> > > Thanks a lot for your help,
> > >
> > >  Emmanuel
> > >
> > > On 9/13/06, Gabor Grothendieck <ggrothendieck at gmail.com>
wrote:
> > > > See below.
> > > >
> > > > On 9/13/06, Emmanuel Levy <emmanuel.levy at gmail.com>
wrote:
> > > > > Thanks for pointing me out "aggregate", that
works fine!
> > > > >
> > > > > There is one complication though: I have mixed types
(numerical and character),
> > > > >
> > > > > So the matrix is of the form:
> > > > >
> > > > > A 1.0 200 ID1
> > > > > A 3.0 800 ID1
> > > > > A 2.0 200 ID1
> > > > > B 0.5 20   ID2
> > > > > B 0.9 50   ID2
> > > > > C 5.0 70   ID1
> > > > >
> > > > > One letter always has the same ID but one ID can be
shared by many
> > > > > letters (like ID1)
> > > > >
> > > > > I just want to keep track of the ID, and get a matrix
like:
> > > > >
> > > > > A 2.0 400 ID1
> > > > > B 0.7 35 ID2
> > > > > C 5.0 70 ID1
> > > > >
> > > > > Any idea on how to do that without a loop?
> > > >
> > > > If V4 is a function of V1 then you can aggregate by it too
and it will
> > > > appear but have no effect on the classification:
> > > >
> > > > > aggregate(DF[2:3], DF[c(1,4)], mean)
> > > >   V1  V4  V2  V3
> > > > 1  A ID1 2.0 400
> > > > 2  C ID1 5.0  70
> > > > 3  B ID2 0.7  35
> > > >
> > > >
> > > > >
> > > > >  Many thanks,
> > > > >
> > > > >     Emmanuel
> > > > >
> > > > > On 9/12/06, Emmanuel Levy <emmanuel.levy at
gmail.com> wrote:
> > > > > > Hello,
> > > > > >
> > > > > > I'd like to group the lines of a matrix so
that:
> > > > > > A 1.0 200
> > > > > > A 3.0 800
> > > > > > A 2.0 200
> > > > > > B 0.5 20
> > > > > > B 0.9 50
> > > > > > C 5.0 70
> > > > > >
> > > > > > Would give:
> > > > > > A 2.0 400
> > > > > > B 0.7 35
> > > > > > C 5.0 70
> > > > > >
> > > > > > So all lines corresponding to a letter (level),
become a single line
> > > > > > where all the values of each column are averaged.
> > > > > >
> > > > > > I've done that with a loop but it doesn't
sound right (it is very
> > > > > > slow). I imagine there is a
> > > > > > sort of "apply" shortcut but I can't
figure it out.
> > > > > >
> > > > > > Please note that it is not exactly a matrix
I'm using, the function
> > > > > > "typeof" tells me it's a list,
however I access to it like it was a
> > > > > > matrix.
> > > > > >
> > > > > > Could someone help me with the right function to
use, a help topic or
> > > > > > a piece of code?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > >   Emmanuel
> > > > > >
> > > > >
> > > > > ______________________________________________
> > > > > R-help at stat.math.ethz.ch mailing list
> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > > > > and provide commented, minimal, self-contained,
reproducible code.
> > > > >
> > > >
> > >
> >
>

Maybe Matching Threads

Search for more reasonably related threads

R help - Sep 2006 - group bunch of lines in a data.frame, an additional requirement

[R] group bunch of lines in a data.frame, an additional requirement

[R] group bunch of lines in a data.frame, an additional requirement

[R] group bunch of lines in a data.frame, an additional requirement

[R] group bunch of lines in a data.frame, an additional requirement

Maybe Matching Threads