thr3ads.net - R help - [R] merge(join) problem [Aug 2011]

If this information is useful, please help other people find it:
Share via:

Sam Steingold

2011-Aug-16 22:00 UTC

[R] merge(join) problem

I have two datasets:
A with columns Open and Name (and many others, irrelevant to the merge)
B with columns Time and Name (and many others, irrelevant to the merge)

I want the dataset AB with all these columns
Open from A - a difftime (time of day)
Time from B - a difftime (time of day)
Name (same in A & B) - a factor, does NOT index rows, i.e., there are
_many_ rows in both A & B with the same Name.
all the other columns from A & B.

Each row in AB must come from exactly one row in A.
(i.e., dim(AB)[1] == dim(A)[1]).

For each row in AB, Open>=Time, and "as small as possible".

The above conditions uniquely define AB.

The "obvious algorithm" is: for each row in A search B for a row
with the same Name and the largest Time <= Open.

However, I don't see an easy way to do it in R.
The obvious intermediary step is

AB1 <- merge(A, B, all.x = TRUE, all.y = FALSE, by = 'Name')

Now, AB1 has many rows with the same Name and Open.
I need to drop all of them except for the one with the largest Time <= Open.
I can do

AB2 <- AB1[which(AB1$Time <= AB1$Open),]

Now I need to keep just _one_ row with the same Name & Open - and the
largest Time.

How do I do that?

unique() seems to have the right name, but I don't see how it can help me...

tia.

-- 
Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X
11.0.60900031
http://jihadwatch.org http://honestreporting.com
http://ffii.org http://camera.org http://thereligionofpeace.com
UNIX is a way of thinking.  Windows is a way of not thinking.

Ista Zahn

2011-Aug-16 22:29 UTC

head link

[R] merge(join) problem

Hi Tia,

On Tue, Aug 16, 2011 at 6:00 PM, Sam Steingold <sds at gnu.org>
wrote:> I have two datasets:
> A with columns Open and Name (and many others, irrelevant to the merge)
> B with columns Time and Name (and many others, irrelevant to the merge)
>
> I want the dataset AB with all these columns
> Open from A - a difftime (time of day)
> Time from B - a difftime (time of day)
> Name (same in A & B) - a factor, does NOT index rows, i.e., there are
> _many_ rows in both A & B with the same Name.
> all the other columns from A & B.
>
> Each row in AB must come from exactly one row in A.
> (i.e., dim(AB)[1] == dim(A)[1]).
>
> For each row in AB, Open>=Time, and "as small as possible".
>
> The above conditions uniquely define AB.
>
> The "obvious algorithm" is: for each row in A search B for a row
> with the same Name and the largest Time <= Open.
>
> However, I don't see an easy way to do it in R.
> The obvious intermediary step is
>
> AB1 <- merge(A, B, all.x = TRUE, all.y = FALSE, by = 'Name')
>
> Now, AB1 has many rows with the same Name and Open.
> I need to drop all of them except for the one with the largest Time <=
Open.
> I can do
>
> AB2 <- AB1[which(AB1$Time <= AB1$Open),]
>
> Now I need to keep just _one_ row with the same Name & Open - and the
> largest Time.
Untested (your example was not reproducible) but how about

AB3 <- AB2[order(AB$Time, decreasing=TRUE)
AB4 <- AB3[!duplicated(AB3[c("Name", "Open")]), ]

?

Best,
Ista>
> How do I do that?
>
> unique() seems to have the right name, but I don't see how it can help
me...
>
> tia.
>
> --
> Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X
11.0.60900031
> http://jihadwatch.org http://honestreporting.com
> http://ffii.org http://camera.org http://thereligionofpeace.com
> UNIX is a way of thinking. ?Windows is a way of not thinking.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

Ista Zahn

2011-Aug-16 23:06 UTC

head link

[R] merge(join) problem

On Tue, Aug 16, 2011 at 6:40 PM, Sam Steingold <sds at gnu.org>
wrote:>> * Ista Zahn <vmnua at cflpu.ebpurfgre.rqh> [2011-08-16 18:31:00
-0400]:
>> On Tue, Aug 16, 2011 at 6:29 PM, Ista Zahn <izahn at
psych.rochester.edu> wrote:
>>> Hi Tia,
>
> "tia" == "thanks in advance" :-)
*facepalm* Thanks Sam, one day I'll learn internet acronyms...
>
>> AB3 <- AB2[order(AB$Time, decreasing=TRUE), ]
>> AB4 <- AB3[!duplicated(AB3[c("Name", "Open")]),
]
>
> thanks!
>
> --
> Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X
11.0.60900031
> http://honestreporting.com http://openvotingconsortium.org http://ffii.org
> http://iris.org.il http://www.memritv.org http://dhimmi.com
> Warning! Dates in calendar are closer than they appear!
>


-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

Maybe Matching Threads

Search for more possibly parallel threads

R help - Aug 2011 - merge(join) problem

[R] merge(join) problem

[R] merge(join) problem

[R] merge(join) problem

Maybe Matching Threads