I have two datasets: A with columns Open and Name (and many others, irrelevant to the merge) B with columns Time and Name (and many others, irrelevant to the merge) I want the dataset AB with all these columns Open from A - a difftime (time of day) Time from B - a difftime (time of day) Name (same in A & B) - a factor, does NOT index rows, i.e., there are _many_ rows in both A & B with the same Name. all the other columns from A & B. Each row in AB must come from exactly one row in A. (i.e., dim(AB)[1] == dim(A)[1]). For each row in AB, Open>=Time, and "as small as possible". The above conditions uniquely define AB. The "obvious algorithm" is: for each row in A search B for a row with the same Name and the largest Time <= Open. However, I don't see an easy way to do it in R. The obvious intermediary step is AB1 <- merge(A, B, all.x = TRUE, all.y = FALSE, by = 'Name') Now, AB1 has many rows with the same Name and Open. I need to drop all of them except for the one with the largest Time <= Open. I can do AB2 <- AB1[which(AB1$Time <= AB1$Open),] Now I need to keep just _one_ row with the same Name & Open - and the largest Time. How do I do that? unique() seems to have the right name, but I don't see how it can help me... tia. -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X 11.0.60900031 http://jihadwatch.org http://honestreporting.com http://ffii.org http://camera.org http://thereligionofpeace.com UNIX is a way of thinking. Windows is a way of not thinking.
Hi Tia, On Tue, Aug 16, 2011 at 6:00 PM, Sam Steingold <sds at gnu.org> wrote:> I have two datasets: > A with columns Open and Name (and many others, irrelevant to the merge) > B with columns Time and Name (and many others, irrelevant to the merge) > > I want the dataset AB with all these columns > Open from A - a difftime (time of day) > Time from B - a difftime (time of day) > Name (same in A & B) - a factor, does NOT index rows, i.e., there are > _many_ rows in both A & B with the same Name. > all the other columns from A & B. > > Each row in AB must come from exactly one row in A. > (i.e., dim(AB)[1] == dim(A)[1]). > > For each row in AB, Open>=Time, and "as small as possible". > > The above conditions uniquely define AB. > > The "obvious algorithm" is: for each row in A search B for a row > with the same Name and the largest Time <= Open. > > However, I don't see an easy way to do it in R. > The obvious intermediary step is > > AB1 <- merge(A, B, all.x = TRUE, all.y = FALSE, by = 'Name') > > Now, AB1 has many rows with the same Name and Open. > I need to drop all of them except for the one with the largest Time <= Open. > I can do > > AB2 <- AB1[which(AB1$Time <= AB1$Open),] > > Now I need to keep just _one_ row with the same Name & Open - and the > largest Time.Untested (your example was not reproducible) but how about AB3 <- AB2[order(AB$Time, decreasing=TRUE) AB4 <- AB3[!duplicated(AB3[c("Name", "Open")]), ] ? Best, Ista> > How do I do that? > > unique() seems to have the right name, but I don't see how it can help me... > > tia. > > -- > Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X 11.0.60900031 > http://jihadwatch.org http://honestreporting.com > http://ffii.org http://camera.org http://thereligionofpeace.com > UNIX is a way of thinking. ?Windows is a way of not thinking. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org
On Tue, Aug 16, 2011 at 6:40 PM, Sam Steingold <sds at gnu.org> wrote:>> * Ista Zahn <vmnua at cflpu.ebpurfgre.rqh> [2011-08-16 18:31:00 -0400]: >> On Tue, Aug 16, 2011 at 6:29 PM, Ista Zahn <izahn at psych.rochester.edu> wrote: >>> Hi Tia, > > "tia" == "thanks in advance" :-)*facepalm* Thanks Sam, one day I'll learn internet acronyms...> >> AB3 <- AB2[order(AB$Time, decreasing=TRUE), ] >> AB4 <- AB3[!duplicated(AB3[c("Name", "Open")]), ] > > thanks! > > -- > Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X 11.0.60900031 > http://honestreporting.com http://openvotingconsortium.org http://ffii.org > http://iris.org.il http://www.memritv.org http://dhimmi.com > Warning! Dates in calendar are closer than they appear! >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org