thr3ads.net - R help - [R] Aggregate and select mode [Mar 2010]

If this information is useful, please help other people find it:
Share via:

Gabriel Yospin

2010-Mar-15 17:43 UTC

[R] Aggregate and select mode

Greetings Everyone -

I have a data frame "x" that looks like this:

v1   v2
1     A
1     B
1     B
2     B
2     W
2     W
3     D
3     D
3     Z

What I would like to do is create a new data frame, "y", that has one
row
for each unique value of v1, and returns the corresponding mode of v2.  If I
were to run it on the above data frame, it should therefore return:

v1   v2
1     B
2     W
3     D

I've been using the following code:

x <- data.frame(v1 = c(1,1,1,2,2,2,3,3,3), v2
c("A","B","B","B","W","W","D","D","Z"))
y <- aggregate.data.frame(x, by = list(x$var1), FUN = "Mode")

which relies on the Mode function from package prettyR.  The above code
works for me.

My problem comes when I use my real database.  Running this produces many
warnings, because there are multiple modes of v2 for many values of v1.  My
database is also rather large (~700,000 rows), and I'm wondering if there is
a faster way to get R to process these data.

Thank you for your help and consideration,

Gabriel Yospin
Center for Ecology and Evolutionary Biology
University of Oregon

	[[alternative HTML version deleted]]

Gabor Grothendieck

2010-Mar-15 21:32 UTC

head link

[R] Aggregate and select mode

This gives the first mode in each group:

# test data
xx <- structure(list(v1 = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 1),
  v2 = structure(c(1L, 2L, 2L, 2L, 4L, 4L, 3L, 3L, 5L, 1L),
    .Label = c("A", "B", "D", "W",
"Z"), class = "factor")),
  .Names = c("v1", "v2"), row.names = c(NA, 10L), class =
"data.frame")

# 1. first mode only in each group

Mode2 <- function(x) { tab <- table(x); names(tab)[which.max(tab)] }
aggregate(xx["v2"], xx["v1"], Mode2)

# 2. All modes in each group:

Mode3 <- function(x) { tab <- table(x); names(tab)[tab == max(tab)] }
do.call(rbind, by(xx, xx$v1, function(x) data.frame(v1 = x$v1[1], v2
Mode3(x$v2))))

# 3. Here is an SQL solution giving all modes in each group:

library(sqldf)
sqldf("select v1, v2 from xx a group by v1, v2 having count(*) >  
(select max(k) from
      (select v1, v2, count(*) as k from xx b where b.v1 = a.v1 group
by v1, v2) s)")

The inner select gets the counts for all groups having the same v1 as
the current v1 and the select surrounding it takes the largest of
those.  You could also check if replacing the >= with = makes any
difference to the time.

You could also try the above query with PostgreSQL. Just issue the
library statement below and then repeat the sqldf statement just
given. sqldf checks whether RpgSQL is loaded and if it is then it
automatically uses PostgreSQL instead of sqlite. Note that the first
time you execute sqldf with PostgreSQL in a session it will load java
so disregard the timing of the first run.

# 4. PostgreSQL
library(RpgSQL)
# repeat sqldf statement above



On Mon, Mar 15, 2010 at 1:43 PM, Gabriel Yospin <yosping at gmail.com>
wrote:> Greetings Everyone -
>
> I have a data frame "x" that looks like this:
>
> v1 ? v2
> 1 ? ? A
> 1 ? ? B
> 1 ? ? B
> 2 ? ? B
> 2 ? ? W
> 2 ? ? W
> 3 ? ? D
> 3 ? ? D
> 3 ? ? Z
>
> What I would like to do is create a new data frame, "y", that has
one row
> for each unique value of v1, and returns the corresponding mode of v2. ?If
I
> were to run it on the above data frame, it should therefore return:
>
> v1 ? v2
> 1 ? ? B
> 2 ? ? W
> 3 ? ? D
>
> I've been using the following code:
>
> x <- data.frame(v1 = c(1,1,1,2,2,2,3,3,3), v2 >
c("A","B","B","B","W","W","D","D","Z"))
> y <- aggregate.data.frame(x, by = list(x$var1), FUN = "Mode")
>
> which relies on the Mode function from package prettyR. ?The above code
> works for me.
>
> My problem comes when I use my real database. ?Running this produces many
> warnings, because there are multiple modes of v2 for many values of v1. ?My
> database is also rather large (~700,000 rows), and I'm wondering if
there is
> a faster way to get R to process these data.
>
> Thank you for your help and consideration,
>
> Gabriel Yospin
> Center for Ecology and Evolutionary Biology
> University of Oregon
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Seemingly Similar Threads

Search for more reasonably related threads

R help - Mar 2010 - Aggregate and select mode

[R] Aggregate and select mode

[R] Aggregate and select mode

Seemingly Similar Threads