Hi all, I'm feeling a little guilty to ask this question, since I've written a solution using a rather clunky for loop that gets the job done. But I'm convinced there must be a faster (and probably more elegant) way to accomplish what I'm looking to do (perhaps using the "merge" function?). I figured somebody out there might've already figured this out: I have a dataframe with two columns (let's call them V1 and V2). All rows are unique, although column V1 has several redundant entries. Ex: V1 V2 1 a 3 2 a 2 3 b 9 4 c 4 5 a 7 6 b 11 What I'd like is to return a dataframe cut down to have only unique entires in V1. V2 should contain a vector, for each V1, that is the minimum of all the possible choices from the set of redundant V1's. Example output: V1 V2 1 a 2 2 b 9 3 c 4 If somebody could (relatively easily) figure out how to get closer to a solution, I'd appreciate hearing how. Also, I'd be interested to hear how you came upon the answer (so I can get better at searching the R resources myself). Regards, Jonathan
On Mon, Feb 8, 2010 at 11:39 AM, Jonathan <jonsleepy at gmail.com> wrote:> Hi all, > ? ?I'm feeling a little guilty to ask this question, since I've > written a solution using a rather clunky for loop that gets the job > done. ?But I'm convinced there must be a faster (and probably more > elegant) way to accomplish what I'm looking to do (perhaps using the > "merge" function?). ?I figured somebody out there might've already > figured this out: > > I have a dataframe with two columns (let's call them V1 and V2). ?All > rows are unique, although column V1 has several redundant entries. > > Ex: > > ? ? V1 ? ? V2 > 1 ? ?a ? ? ? ?3 > 2 ? ?a ? ? ? ?2 > 3 ? ?b ? ? ? ?9 > 4 ? ?c ? ? ? ?4 > 5 ? ?a ? ? ? ?7 > 6 ? ?b ? ? ? ?11 > > > What I'd like is to return a dataframe cut down to have only unique > entires in V1. ?V2 should contain a vector, for each V1, that is the > minimum of all the possible choices from the set of redundant V1's. > > Example output: > > ? ? ?V1 ? ? V2 > 1 ? ? a ? ? ? ?2 > 2 ? ? b ? ? ? ?9 > 3 ? ? c ? ? ? ?4 > > > If somebody could (relatively easily) figure out how to get closer to > a solution, I'd appreciate hearing how. ?Also, I'd be interested to > hear how you came upon the answer (so I can get better at searching > the R resources myself). > > Regards, > Jonathan > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
> x <- read.table(textConnection(" V1 V2+ 1 a 3 + 2 a 2 + 3 b 9 + 4 c 4 + 5 a 7 + 6 b 11"), header=TRUE)> closeAllConnections() > # close; matrix with rownames - easy enough to change into a dataframe if you want > cbind(tapply(x$V2, x$V1, min))[,1] a 2 b 9 c 4>On Mon, Feb 8, 2010 at 11:39 AM, Jonathan <jonsleepy at gmail.com> wrote:> Hi all, > ? ?I'm feeling a little guilty to ask this question, since I've > written a solution using a rather clunky for loop that gets the job > done. ?But I'm convinced there must be a faster (and probably more > elegant) way to accomplish what I'm looking to do (perhaps using the > "merge" function?). ?I figured somebody out there might've already > figured this out: > > I have a dataframe with two columns (let's call them V1 and V2). ?All > rows are unique, although column V1 has several redundant entries. > > Ex: > > ? ? V1 ? ? V2 > 1 ? ?a ? ? ? ?3 > 2 ? ?a ? ? ? ?2 > 3 ? ?b ? ? ? ?9 > 4 ? ?c ? ? ? ?4 > 5 ? ?a ? ? ? ?7 > 6 ? ?b ? ? ? ?11 > > > What I'd like is to return a dataframe cut down to have only unique > entires in V1. ?V2 should contain a vector, for each V1, that is the > minimum of all the possible choices from the set of redundant V1's. > > Example output: > > ? ? ?V1 ? ? V2 > 1 ? ? a ? ? ? ?2 > 2 ? ? b ? ? ? ?9 > 3 ? ? c ? ? ? ?4 > > > If somebody could (relatively easily) figure out how to get closer to > a solution, I'd appreciate hearing how. ?Also, I'd be interested to > hear how you came upon the answer (so I can get better at searching > the R resources myself). > > Regards, > Jonathan > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Hi! I'm definitely not an expert in R (and it's my first reply!), but if I understand right, I think the aggregate function might do what you're looking for. Try ?aggregate to get more info. You might find what you need! HTH Ivan Le 2/8/2010 17:39, Jonathan a ?crit :> Hi all, > I'm feeling a little guilty to ask this question, since I've > written a solution using a rather clunky for loop that gets the job > done. But I'm convinced there must be a faster (and probably more > elegant) way to accomplish what I'm looking to do (perhaps using the > "merge" function?). I figured somebody out there might've already > figured this out: > > I have a dataframe with two columns (let's call them V1 and V2). All > rows are unique, although column V1 has several redundant entries. > > Ex: > > V1 V2 > 1 a 3 > 2 a 2 > 3 b 9 > 4 c 4 > 5 a 7 > 6 b 11 > > > What I'd like is to return a dataframe cut down to have only unique > entires in V1. V2 should contain a vector, for each V1, that is the > minimum of all the possible choices from the set of redundant V1's. > > Example output: > > V1 V2 > 1 a 2 > 2 b 9 > 3 c 4 > > > If somebody could (relatively easily) figure out how to get closer to > a solution, I'd appreciate hearing how. Also, I'd be interested to > hear how you came upon the answer (so I can get better at searching > the R resources myself). > > Regards, > Jonathan > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
On Feb 8, 2010, at 11:39 AM, Jonathan wrote:> Hi all, > I'm feeling a little guilty to ask this question, since I've > written a solution using a rather clunky for loop that gets the job > done. But I'm convinced there must be a faster (and probably more > elegant) way to accomplish what I'm looking to do (perhaps using the > "merge" function?). I figured somebody out there might've already > figured this out: > > I have a dataframe with two columns (let's call them V1 and V2). All > rows are unique, although column V1 has several redundant entries. > > Ex: > > V1 V2 > 1 a 3 > 2 a 2 > 3 b 9 > 4 c 4 > 5 a 7 > 6 b 11 > > > What I'd like is to return a dataframe cut down to have only unique > entires in V1. V2 should contain a vector, for each V1, that is the > minimum of all the possible choices from the set of redundant V1's.> rd.txt function(txt, header=TRUE,...) { rd<-read.table(textConnection(txt), header=header, ...) closeAllConnections() rd} > DF <- rd.txt(" V1 V2 + 1 a 3 + 2 a 2 + 3 b 9 + 4 c 4 + 5 a 7 + 6 b 11 + ") > tapply(DF$V2, DF$V1, min) a b c 2 9 4 > as.data.frame.table(tapply(DF$V2, DF$V1, min)) Var1 Freq 1 a 2 2 b 9 3 c 4 > DF2 <- as.data.frame.table(tapply(DF$V2, DF$V1, min)) > names(DF2) <- names(DF) > DF2 V1 V2 1 a 2 2 b 9 3 c 4> > Example output: > > V1 V2 > 1 a 2 > 2 b 9 > 3 c 4 > > > If somebody could (relatively easily) figure out how to get closer to > a solution, I'd appreciate hearing how. Also, I'd be interested to > hear how you came upon the answer (so I can get better at searching > the R resources myself). > > Regards, > Jonathan > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT
On Mon, Feb 8, 2010 at 10:39 AM, Jonathan <jonsleepy at gmail.com> wrote:> Hi all, > ? ?I'm feeling a little guilty to ask this question, since I've > written a solution using a rather clunky for loop that gets the job > done. ?But I'm convinced there must be a faster (and probably more > elegant) way to accomplish what I'm looking to do (perhaps using the > "merge" function?). ?I figured somebody out there might've already > figured this out: > > I have a dataframe with two columns (let's call them V1 and V2). ?All > rows are unique, although column V1 has several redundant entries. > > Ex: > > ? ? V1 ? ? V2 > 1 ? ?a ? ? ? ?3 > 2 ? ?a ? ? ? ?2 > 3 ? ?b ? ? ? ?9 > 4 ? ?c ? ? ? ?4 > 5 ? ?a ? ? ? ?7 > 6 ? ?b ? ? ? ?11 > > > What I'd like is to return a dataframe cut down to have only unique > entires in V1. ?V2 should contain a vector, for each V1, that is the > minimum of all the possible choices from the set of redundant V1's. > > Example output: > > ? ? ?V1 ? ? V2 > 1 ? ? a ? ? ? ?2 > 2 ? ? b ? ? ? ?9 > 3 ? ? c ? ? ? ?4With the plyr package: library(plyr) ddply(mydf, "V1", summarise, V2 = min(V2)) Hadley -- http://had.co.nz/
You could try aggregate: If we call your data frame df: aggregate(df[2], by=df[1], FUN=min) will get you what you asked for (if not necessarily what you need ;-) ) Switching the columns around is easy enough if you need to; proceeding stepwise: df.new<-aggregate(df[2], by=df[1], FUN=min) df.new[,c(2,1)] As to how I found aggregate: watching R-help daily for years occasionally pops up fundamental gems like aggregate... Steve Ellison LGC>>> Jonathan <jonsleepy at gmail.com> 08/02/2010 16:39:11 >>>What I'd like is to return a dataframe cut down to have only unique entires in V1. V2 should contain a vector, for each V1, that is the minimum of all the possible choices from the set of redundant V1's. Example output: V1 V2 1 a 2 2 b 9 3 c 4 ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}}
Here are 3 solutions assuming DF contains the data frame:> # 1. aggregate > aggregate(DF[2], DF[1], min)V1 V2 1 a 2 2 b 9 3 c 4> # 2. aggregate.formula - requires R 2.11.x > aggregate(V2 ~ V1, DF, min)V1 V2 1 a 2 2 b 9 3 c 4> # 3. SQL using sqldf > library(sqldf) > sqldf("select V1, min(V2) V2 from DF group by V1")V1 V2 1 a 2 2 b 9 3 c 4> # 4. summaryBy in the doBy package > library(doBy) > summaryBy(V2 ~., DF, FUN = min, keep.names = TRUE)V1 V2 1 a 2 2 b 9 3 c 4 On Mon, Feb 8, 2010 at 11:39 AM, Jonathan <jonsleepy at gmail.com> wrote:> Hi all, > ? ?I'm feeling a little guilty to ask this question, since I've > written a solution using a rather clunky for loop that gets the job > done. ?But I'm convinced there must be a faster (and probably more > elegant) way to accomplish what I'm looking to do (perhaps using the > "merge" function?). ?I figured somebody out there might've already > figured this out: > > I have a dataframe with two columns (let's call them V1 and V2). ?All > rows are unique, although column V1 has several redundant entries. > > Ex: > > ? ? V1 ? ? V2 > 1 ? ?a ? ? ? ?3 > 2 ? ?a ? ? ? ?2 > 3 ? ?b ? ? ? ?9 > 4 ? ?c ? ? ? ?4 > 5 ? ?a ? ? ? ?7 > 6 ? ?b ? ? ? ?11 > > > What I'd like is to return a dataframe cut down to have only unique > entires in V1. ?V2 should contain a vector, for each V1, that is the > minimum of all the possible choices from the set of redundant V1's. > > Example output: > > ? ? ?V1 ? ? V2 > 1 ? ? a ? ? ? ?2 > 2 ? ? b ? ? ? ?9 > 3 ? ? c ? ? ? ?4 > > > If somebody could (relatively easily) figure out how to get closer to > a solution, I'd appreciate hearing how. ?Also, I'd be interested to > hear how you came upon the answer (so I can get better at searching > the R resources myself). > > Regards, > Jonathan > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >