thr3ads.net - R help - [R] data frames; matching/merging [Feb 2010]

If this information is useful, please help other people find it:
Share via:

Jonathan

2010-Feb-08 16:39 UTC

[R] data frames; matching/merging

Hi all,
    I'm feeling a little guilty to ask this question, since I've
written a solution using a rather clunky for loop that gets the job
done.  But I'm convinced there must be a faster (and probably more
elegant) way to accomplish what I'm looking to do (perhaps using the
"merge" function?).  I figured somebody out there might've already
figured this out:

I have a dataframe with two columns (let's call them V1 and V2).  All
rows are unique, although column V1 has several redundant entries.

Ex:

     V1     V2
1    a        3
2    a        2
3    b        9
4    c        4
5    a        7
6    b        11


What I'd like is to return a dataframe cut down to have only unique
entires in V1.  V2 should contain a vector, for each V1, that is the
minimum of all the possible choices from the set of redundant V1's.

Example output:

      V1     V2
1     a        2
2     b        9
3     c        4


If somebody could (relatively easily) figure out how to get closer to
a solution, I'd appreciate hearing how.  Also, I'd be interested to
hear how you came upon the answer (so I can get better at searching
the R resources myself).

Regards,
Jonathan

jim holtman

2010-Feb-08 16:49 UTC

head link

[R] data frames; matching/merging

On Mon, Feb 8, 2010 at 11:39 AM, Jonathan <jonsleepy at gmail.com>
wrote:> Hi all,
> ? ?I'm feeling a little guilty to ask this question, since I've
> written a solution using a rather clunky for loop that gets the job
> done. ?But I'm convinced there must be a faster (and probably more
> elegant) way to accomplish what I'm looking to do (perhaps using the
> "merge" function?). ?I figured somebody out there might've
already
> figured this out:
>
> I have a dataframe with two columns (let's call them V1 and V2). ?All
> rows are unique, although column V1 has several redundant entries.
>
> Ex:
>
> ? ? V1 ? ? V2
> 1 ? ?a ? ? ? ?3
> 2 ? ?a ? ? ? ?2
> 3 ? ?b ? ? ? ?9
> 4 ? ?c ? ? ? ?4
> 5 ? ?a ? ? ? ?7
> 6 ? ?b ? ? ? ?11
>
>
> What I'd like is to return a dataframe cut down to have only unique
> entires in V1. ?V2 should contain a vector, for each V1, that is the
> minimum of all the possible choices from the set of redundant V1's.
>
> Example output:
>
> ? ? ?V1 ? ? V2
> 1 ? ? a ? ? ? ?2
> 2 ? ? b ? ? ? ?9
> 3 ? ? c ? ? ? ?4
>
>
> If somebody could (relatively easily) figure out how to get closer to
> a solution, I'd appreciate hearing how. ?Also, I'd be interested to
> hear how you came upon the answer (so I can get better at searching
> the R resources myself).
>
> Regards,
> Jonathan
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

jim holtman

2010-Feb-08 16:49 UTC

head link

[R] data frames; matching/merging

> x <- read.table(textConnection("    V1     V2+ 1    a        3
+ 2    a        2
+ 3    b        9
+ 4    c        4
+ 5    a        7
+ 6    b        11"), header=TRUE)> closeAllConnections()
> # close; matrix with rownames - easy enough to change into a dataframe if
you want
> cbind(tapply(x$V2, x$V1, min))  [,1]
a    2
b    9
c    4>

On Mon, Feb 8, 2010 at 11:39 AM, Jonathan <jonsleepy at gmail.com>
wrote:> Hi all,
> ? ?I'm feeling a little guilty to ask this question, since I've
> written a solution using a rather clunky for loop that gets the job
> done. ?But I'm convinced there must be a faster (and probably more
> elegant) way to accomplish what I'm looking to do (perhaps using the
> "merge" function?). ?I figured somebody out there might've
already
> figured this out:
>
> I have a dataframe with two columns (let's call them V1 and V2). ?All
> rows are unique, although column V1 has several redundant entries.
>
> Ex:
>
> ? ? V1 ? ? V2
> 1 ? ?a ? ? ? ?3
> 2 ? ?a ? ? ? ?2
> 3 ? ?b ? ? ? ?9
> 4 ? ?c ? ? ? ?4
> 5 ? ?a ? ? ? ?7
> 6 ? ?b ? ? ? ?11
>
>
> What I'd like is to return a dataframe cut down to have only unique
> entires in V1. ?V2 should contain a vector, for each V1, that is the
> minimum of all the possible choices from the set of redundant V1's.
>
> Example output:
>
> ? ? ?V1 ? ? V2
> 1 ? ? a ? ? ? ?2
> 2 ? ? b ? ? ? ?9
> 3 ? ? c ? ? ? ?4
>
>
> If somebody could (relatively easily) figure out how to get closer to
> a solution, I'd appreciate hearing how. ?Also, I'd be interested to
> hear how you came upon the answer (so I can get better at searching
> the R resources myself).
>
> Regards,
> Jonathan
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

Ivan Calandra

2010-Feb-08 16:51 UTC

head link

[R] data frames; matching/merging

Hi!

I'm definitely not an expert in R (and it's my first reply!), but if I 
understand right, I think the aggregate function might do what you're 
looking for.
Try ?aggregate to get more info. You might find what you need!

HTH
Ivan



Le 2/8/2010 17:39, Jonathan a ?crit :> Hi all,
>      I'm feeling a little guilty to ask this question, since I've
> written a solution using a rather clunky for loop that gets the job
> done.  But I'm convinced there must be a faster (and probably more
> elegant) way to accomplish what I'm looking to do (perhaps using the
> "merge" function?).  I figured somebody out there might've
already
> figured this out:
>
> I have a dataframe with two columns (let's call them V1 and V2).  All
> rows are unique, although column V1 has several redundant entries.
>
> Ex:
>
>       V1     V2
> 1    a        3
> 2    a        2
> 3    b        9
> 4    c        4
> 5    a        7
> 6    b        11
>
>
> What I'd like is to return a dataframe cut down to have only unique
> entires in V1.  V2 should contain a vector, for each V1, that is the
> minimum of all the possible choices from the set of redundant V1's.
>
> Example output:
>
>        V1     V2
> 1     a        2
> 2     b        9
> 3     c        4
>
>
> If somebody could (relatively easily) figure out how to get closer to
> a solution, I'd appreciate hearing how.  Also, I'd be interested to
> hear how you came upon the answer (so I can get better at searching
> the R resources myself).
>
> Regards,
> Jonathan
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

David Winsemius

2010-Feb-08 16:51 UTC

head link

[R] data frames; matching/merging

On Feb 8, 2010, at 11:39 AM, Jonathan wrote:
> Hi all,
>    I'm feeling a little guilty to ask this question, since I've
> written a solution using a rather clunky for loop that gets the job
> done.  But I'm convinced there must be a faster (and probably more
> elegant) way to accomplish what I'm looking to do (perhaps using the
> "merge" function?).  I figured somebody out there might've
already
> figured this out:
>
> I have a dataframe with two columns (let's call them V1 and V2).  All
> rows are unique, although column V1 has several redundant entries.
>
> Ex:
>
>     V1     V2
> 1    a        3
> 2    a        2
> 3    b        9
> 4    c        4
> 5    a        7
> 6    b        11
>
>
> What I'd like is to return a dataframe cut down to have only unique
> entires in V1.  V2 should contain a vector, for each V1, that is the
> minimum of all the possible choices from the set of redundant V1's.
 > rd.txt
function(txt, header=TRUE,...) {
       rd<-read.table(textConnection(txt), header=header, ...)
        closeAllConnections()
       rd}
 > DF <- rd.txt("    V1     V2
+ 1    a        3
+ 2    a        2
+ 3    b        9
+ 4    c        4
+ 5    a        7
+ 6    b        11
+ ")
 > tapply(DF$V2, DF$V1, min)
a b c
2 9 4

 > as.data.frame.table(tapply(DF$V2, DF$V1, min))
   Var1 Freq
1    a    2
2    b    9
3    c    4
 > DF2 <- as.data.frame.table(tapply(DF$V2, DF$V1, min))
 > names(DF2) <- names(DF)
 > DF2
   V1 V2
1  a  2
2  b  9
3  c  4
>
> Example output:
>
>      V1     V2
> 1     a        2
> 2     b        9
> 3     c        4
>
>
> If somebody could (relatively easily) figure out how to get closer to
> a solution, I'd appreciate hearing how.  Also, I'd be interested to
> hear how you came upon the answer (so I can get better at searching
> the R resources myself).
>
> Regards,
> Jonathan
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT

hadley wickham

2010-Feb-08 16:53 UTC

head link

[R] data frames; matching/merging

On Mon, Feb 8, 2010 at 10:39 AM, Jonathan <jonsleepy at gmail.com>
wrote:> Hi all,
> ? ?I'm feeling a little guilty to ask this question, since I've
> written a solution using a rather clunky for loop that gets the job
> done. ?But I'm convinced there must be a faster (and probably more
> elegant) way to accomplish what I'm looking to do (perhaps using the
> "merge" function?). ?I figured somebody out there might've
already
> figured this out:
>
> I have a dataframe with two columns (let's call them V1 and V2). ?All
> rows are unique, although column V1 has several redundant entries.
>
> Ex:
>
> ? ? V1 ? ? V2
> 1 ? ?a ? ? ? ?3
> 2 ? ?a ? ? ? ?2
> 3 ? ?b ? ? ? ?9
> 4 ? ?c ? ? ? ?4
> 5 ? ?a ? ? ? ?7
> 6 ? ?b ? ? ? ?11
>
>
> What I'd like is to return a dataframe cut down to have only unique
> entires in V1. ?V2 should contain a vector, for each V1, that is the
> minimum of all the possible choices from the set of redundant V1's.
>
> Example output:
>
> ? ? ?V1 ? ? V2
> 1 ? ? a ? ? ? ?2
> 2 ? ? b ? ? ? ?9
> 3 ? ? c ? ? ? ?4
With the plyr package:

library(plyr)
ddply(mydf, "V1", summarise, V2 = min(V2))

Hadley


-- 
http://had.co.nz/

S Ellison

2010-Feb-08 16:59 UTC

head link

[R] data frames; matching/merging

You could try aggregate:

If we call your data frame df:

aggregate(df[2], by=df[1], FUN=min)

will get you what you asked for (if not necessarily what you need ;-)
)

Switching the columns around is easy enough if you need to; proceeding
stepwise:
df.new<-aggregate(df[2], by=df[1], FUN=min)
df.new[,c(2,1)]

As to how I found aggregate: watching R-help daily for years
occasionally pops up fundamental gems like aggregate...

 Steve Ellison
LGC
>>> Jonathan <jonsleepy at gmail.com> 08/02/2010 16:39:11
>>>What I'd like is to return a dataframe cut down to have only unique
entires in V1.  V2 should contain a vector, for each V1, that is the
minimum of all the possible choices from the set of redundant V1's.

Example output:

      V1     V2
1     a        2
2     b        9
3     c        4



*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

Gabor Grothendieck

2010-Feb-08 17:11 UTC

head link

[R] data frames; matching/merging

Here are 3 solutions assuming DF contains the data frame:
> # 1. aggregate
> aggregate(DF[2], DF[1], min)  V1 V2
1  a  2
2  b  9
3  c  4
> # 2. aggregate.formula - requires R 2.11.x
> aggregate(V2 ~ V1, DF, min)  V1 V2
1  a  2
2  b  9
3  c  4
> # 3. SQL using sqldf
> library(sqldf)
> sqldf("select V1, min(V2) V2 from DF group by V1")  V1 V2
1  a  2
2  b  9
3  c  4
> # 4. summaryBy in the doBy package
> library(doBy)
> summaryBy(V2 ~., DF, FUN = min, keep.names = TRUE)  V1 V2
1  a  2
2  b  9
3  c  4

On Mon, Feb 8, 2010 at 11:39 AM, Jonathan <jonsleepy at gmail.com>
wrote:> Hi all,
> ? ?I'm feeling a little guilty to ask this question, since I've
> written a solution using a rather clunky for loop that gets the job
> done. ?But I'm convinced there must be a faster (and probably more
> elegant) way to accomplish what I'm looking to do (perhaps using the
> "merge" function?). ?I figured somebody out there might've
already
> figured this out:
>
> I have a dataframe with two columns (let's call them V1 and V2). ?All
> rows are unique, although column V1 has several redundant entries.
>
> Ex:
>
> ? ? V1 ? ? V2
> 1 ? ?a ? ? ? ?3
> 2 ? ?a ? ? ? ?2
> 3 ? ?b ? ? ? ?9
> 4 ? ?c ? ? ? ?4
> 5 ? ?a ? ? ? ?7
> 6 ? ?b ? ? ? ?11
>
>
> What I'd like is to return a dataframe cut down to have only unique
> entires in V1. ?V2 should contain a vector, for each V1, that is the
> minimum of all the possible choices from the set of redundant V1's.
>
> Example output:
>
> ? ? ?V1 ? ? V2
> 1 ? ? a ? ? ? ?2
> 2 ? ? b ? ? ? ?9
> 3 ? ? c ? ? ? ?4
>
>
> If somebody could (relatively easily) figure out how to get closer to
> a solution, I'd appreciate hearing how. ?Also, I'd be interested to
> hear how you came upon the answer (so I can get better at searching
> the R resources myself).
>
> Regards,
> Jonathan
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Feb 2010 - data frames; matching/merging

[R] data frames; matching/merging

[R] data frames; matching/merging

[R] data frames; matching/merging

[R] data frames; matching/merging

[R] data frames; matching/merging

[R] data frames; matching/merging

[R] data frames; matching/merging

[R] data frames; matching/merging

Possibly Parallel Threads