I finally figured this out with the help of Dave and Dennis.
The steps I had to do were
1. Convert dates to POSIXlt
2. Create a column in each frame that was paste(date, quarter)
3. Then %in% worked instantly.
gw = c(length(arr))
gw[1:length(arr[,1])]=FALSE
arr["gw"]=gw # put a column of 0s into arr
arr$Date <- as.Date(as.character(arr$Date),format="%m/%d/%y")
weather$Date <-
as.Date(as.character(weather$Date),format="%m/%d/%y")
weather$dq = paste(as.character(weather$Date),
as.character(weather$quarter))
arr$dq = paste(as.character(arr$Date), as.character(arr$quarter))
arr$gw <- as.numeric(arr$dq %in% weather$dq)
Thanks for all the help,
Jim
On 1/17/10 10:16 PM, David Winsemius wrote:>
> On Jan 17, 2010, at 9:22 PM, Dennis Murphy wrote:
>
>> I thought it was a solution :)
>>
>> The gw column is an indicator that is meant to match quarters with
>> good weather
>> (in the weather data frame) to the quarters in the arr data frame: 1
>> = good weather,
>> 0 = not. I split up both data frames by Date since the quarter values
>> are in the
>> same range from day to day, and then ran a loop to generate the
>> indicator gw by
>> finding whether the quarter in the i-th component of the arrl list
>> matched the corresponding reference table of quarters (of good
>> weather) in the wc list.
>> do.call then slurps it all together into a data frame.
>> My understanding was that James
>> just wanted to be able to distinguish arrivals during good weather
>> quarters from those
>> in bad weather quarters.
>
> This was my understanding, too. I was just trying to get to some point
> where that could be expressed in an R-encodable format and implemented.
>
>
>>
>> I did the tables as a sanity check, relative to the hand calculations
>> I did earlier as a ballpark
>> estimate.
>
> The rather small dataset may have been enough. We will need to see
> what the OP says ... is that a sufficient solution?
>
>
>>
>> Does this make sense?
>>
>> Dennis
>>
>> On Sun, Jan 17, 2010 at 6:04 PM, David Winsemius
>> <dwinsemius@comcast.net <mailto:dwinsemius@comcast.net>>
wrote:
>>
>> I'm not clear about your last message. Do you have a solution?
>>
>> --
>> David.
>>
>> On Jan 17, 2010, at 8:27 PM, Dennis Murphy wrote:
>>
>>> Hi James and David:
>>>
>>> I tried the following: split the quarters from weather into a
>>> list by Date, and ditto with arr.
>>> Then assign value of gw by running a (sleazy) loop over the
>>> components of the arr list...
>>>
>>> wc <- split(weather$quarter, weather$Date)
>>> arrl <- split(arr, arr$Date)
>>>
>>> # Note that there are four dates in wc and three in arrl...
>>> for(i in seq_along(arrl)) {
>>> arrl[[i]]$gw <- as.numeric(arrl[[i]]$quarter %in% wc[[i]])
}
>>> arr2 <- do.call(rbind, arrl)
>>> dim(arr2)
>>> [1] 1126 9
>>> table(arr2$gw)
>>>
>>> 0 1
>>> 661 465
>>> with(arr2, table(Date, gw))
>>> gw
>>> Date 0 1
>>> 2009-01-01 368 99
>>> 2009-01-02 266 348
>>> 2009-01-03 27 18
>>>
>>> OK, I was a bit off, but at least we know this is in the
>>> ballpark of my estimates :)
>>> I'm sure David will come up with something more elegant,
but
>>> this seems to work.
>>>
>>> HTH,
>>> Dennis
>>>
>>>
>>> On Sun, Jan 17, 2010 at 5:17 PM, James Rome
<jamesrome@gmail.com
>>> <mailto:jamesrome@gmail.com>> wrote:
>>>
>>> Any entry in the weather data is a good day. That is the
>>> point. And
>>> please ignore my mistake about the quarters getting too
large in
>>> weather. I am being swamped with versions, and it does not
>>> matter for
>>> this purpose.. so, the bad weather days are not in the
>>> weather data set.
>>>
>>> I am trying to get gw=1 in arr if the date and quarter are
>>> in weather.
>>>
>>> Thanks,
>>> Jim
>>>
>>> On 1/17/10 7:46 PM, David Winsemius wrote:
>>> > But, but, but .... there is no weather goodness
variable
>>> in weather?!?!?!
>>> >
>>> > > str(weather)
>>> > 'data.frame': 155 obs. of 4 variables:
>>> > $ Date :Class 'Date' num [1:155] 14245
14245 14245
>>> 14245 14245 ...
>>> > $ minute : int 5 15 30 45 0 15 30 45 0 15 ...
>>> > $ hour : int 15 15 15 15 17 17 17 17 18 18 ...
>>> > $ quarter: int 65 75 90 105 68 83 98 113 72 87 ..
>>> >
>>> > I thought you said the "weather" dataframe
would have some
>>> information
>>> > about "goodness" that we were supposed to
map to
>>> arrivals.? What is
>>> > the meaning of those variables? How do we define a
"good"
>>> quarter
>>> > hour? And why are the values of quarter not 1, 2, 3,
4?
>>> They ought to
>>> > be a factor or integer that could be matched to those
that
>>> are in
>>> > "arr", which are also apparently not so
defined. Let's see
>>> a better
>>> > codebook or description of these variables.
>>> >
>>> > On Jan 17, 2010, at 6:47 PM, James Rome wrote:
>>> >
>>> >> Here are some sample data sets.
>>> >>
>>> >> I also tried making a combined field in each set
such as
>>> >> adq=paste(as.character(arr$Date),
as.character(arr$quarter))
>>> >> and similarly for the weather set, so I have
unique
>>> single things to
>>> >> compare, but that did not seem to help much.
>>> >>
>>> >> Thanks,
>>> >> Jim
>>> >>
>>> >> On 1/17/10 5:50 PM, David Winsemius wrote:
>>> >>> My guess (since we still have no data on which
to test
>>> these ideas)
>>> >>> is that you need either to merge() or to use a
matrix
>>> created from the
>>> >>> dates and qtr-hours entries in "gw",
since matching on
>>> dates and hours
>>> >>> separately will not uniquely classify the good
qtr-hours
>>> within their
>>> >>> proper corresponding dates. You want a
structure (or a
>>> matching
>>> >>> process) that takes:
>>> >>> hqhr1 qhr2 qhr3 qhr4 .......
>>> >>> date1 good bad good bad
>>> >>> date2 bad good good good
>>> >>> date3 bad bad bad good
>>> >>> .
>>> >>> .
>>> >>> .
>>> >>> and lets you use the values in "arr"
to get values in
>>> "gw". Notice
>>> >>> that the notion of arr$Date %in% gw$date &
arr$qtrhr
>>> %in% gw$qtrhr
>>> >>> simply will not accomplish anything correct/
>>> >>>
>>> >>> Merging by multiple criteria (with the merge
function)
>>> would do that
>>> >>> or you could construct a matrix whose entries
were the
>>> categories good
>>> >>> /bad. The table function could create the
matrix for the
>>> purpose of
>>> >>> using an indexed solution if you are dead-set
against
>>> the merge
>>> >>> concept.
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Jan 17, 2010, at 4:47 PM, James Rome wrote:
>>> >>>
>>> >>>> Thank you Dennis.
>>> >>>> arr$gw <- as.numeric(weather$Date ==
arr$Date &
>>> arr$quarter %in%
>>> >>>> weather$quarter)
>>> >>>> seems to be what I want to do, but in
fact, with the
>>> full data set, it
>>> >>>> misidentifies the rows, so I think the
error message
>>> must mean
>>> >>>> something.
>>> >>>>
>>> >>>>> arrr$Date <-
>>> as.Date(as.character(arr$Date),format="%m/%d/%y")
>>> >>>>> weather$Date <-
>>>
as.Date(as.character(weather$Date),format="%m/%d/%y")
>>> >>>>> gw = c(length(arrr))
>>> >>>>> gw[1:length(arrr[,1])]=FALSE
>>> >>>>> gw[arrr$Date==weather$Date &
weather$quarter %in%
>>> arr$quarter]
>>> >>>> Warning in `==.default`(arr$Date,
weather$Date) :
>>> >>>> longer object length is not a multiple of
shorter
>>> object length
>>> >>>> Warning in arr$Date == weather$Date &
weather$quarter %in%
>>> >>>> arr$quarter :
>>> >>>> longer object length is not a multiple of
shorter
>>> object length
>>> >>>> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0
>>> >>>> 0 0 0 0
>>> >>>> [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0
>>> >>>> 0 0 0 0
>>> >>>> [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0
>>> >>>> 0 0 0 0
>>> >>>> [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
>>> 0 0 0 0 0 0
>>> >>>> 0 0
>>> >>>> 0 0 0 0
>>> >>>> [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
>>> 0 0 0 0 0 0
>>> >>>> 0 0
>>> >>>> 0 0 0 0
>>> >>>> [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
>>> 0 0 0 0 0 0
>>> >>>> 0 0
>>> >>>> 0 0 0 0
>>> >>>> [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
>>> 0 0 0 0 0 0
>>> >>>> 0 0
>>> >>>> 0 0 0 0
>>> >>>> [260] 0 0 0 0 0 0 0 0
>>> >>>>
>>> >>>> There are many many more matches in the
99k line
>>> arrival data set.
>>> >>>>
>>> >>>> Thanks a bunch,
>>> >>>> Jim
>>> >>>>
>>> >>>>
>>> >>>> On 1/17/10 3:21 PM, Dennis Murphy wrote:
>>> >>>>> Hi:
>>> >>>>>
>>> >>>>> To read a data set from a R-help
message into R, one uses
>>> >>>>>
read.table(textConnection("<verbatim text>"), ...)
>>> >>>>>
>>> >>>>> Your weather data set had
>>> >>>>> (a) a variable name with a space in
it, that R misread
>>> and had to be
>>> >>>>> altered manually;
>>> >>>>> (b) a missing value with no NA that R
interpreted as
>>> an incomplete
>>> >>>>> line; again, it had
>>> >>>>> to be altered manually.
>>> >>>>>
>>> >>>>> This is why David suggested the use of
dput(), so that
>>> these vagaries
>>> >>>>> don't have to be
>>> >>>>> dealt with by those who are trying to
help.
>>> >>>>>
>>> >>>>> That being said, for the example that
you gave and the
>>> desired value
>>> >>>>> that you wanted, try
>>> >>>>>
>>> >>>>> arr$gw <- as.numeric(weather$Date
== arr$Date &
>>> arr$quarter %in%
>>> >>>>> weather$quarter)
>>> >>>>>
>>> >>>>> (I changed DateTime to Date in the arr
data frame...)
>>> >>>>>
>>> >>>>> You'll get warnings like
>>> >>>>>
>>> >>>>> Warning messages:
>>> >>>>> 1: In is.na <http://is.na>
<http://is.na>(e1) | is.na
>>> <http://is.na> <http://is.na>(e2) :
>>> >>>>> longer object length is not a multiple
of shorter
>>> object length
>>> >>>>>
>>> >>>>> but it seems to do the right thing.
The first equality
>>> is there to
>>> >>>>> constrain matches for
>>> >>>>> quarter to be within the same day.
>>> >>>>>
>>> >>>>> For future reference,
>>> >>>>>
>>> >>>>>> dput(weather)
>>> >>>>> structure(list(Date = structure(c(1L,
1L, 1L, 1L),
>>> .Label = "1/1/09",
>>> >>>>> class = "factor"),
>>> >>>>> minute = c(5L, 15L, 30L, 45L), hour
= c(15L, 15L,
>>> 15L, 15L
>>> >>>>> ), quarter = 60:63, efficiency =
c(NA, 72, 63.3,
>>> 85.4)), .Names >>> >>>>>
c("Date",
>>> >>>>> "minute", "hour",
"quarter", "efficiency"), class >>>
"data.frame",
>>> >>>>> row.names = c(NA,
>>> >>>>> -4L))
>>> >>>>>> dput(arr)
>>> >>>>> structure(list(Date = structure(c(1L,
1L, 1L, 1L, 1L,
>>> 1L, 1L,
>>> >>>>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L),
>>> .Label = "1/1/09",
>>> >>>>> class = "factor"),
>>> >>>>> weekday = c(5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L,
>>> 5L, 5L,
>>> >>>>> 5L, 5L, 5L, 5L, 5L, 5L, 5L), month =
c(1L, 1L, 1L,
>>> 1L, 1L,
>>> >>>>> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L),
>>> >>>>> quarter = c(59L, 59L, 60L, 60L, 60L,
60L, 60L, 60L, 60L,
>>> >>>>> 60L, 60L, 60L, 60L, 61L, 61L, 61L,
61L, 66L, 67L),
>>> ICAO >>> >>>>>
structure(c(6L,
>>> >>>>> 8L, 7L, 3L, 6L, 3L, 5L, 3L, 3L, 1L,
3L, 5L, 3L, 3L,
>>> 6L, 6L,
>>> >>>>> 2L, 4L, 3L), .Label =
c("AAL", "AWE", "BTA", "CHQ",
>>> "CJC",
>>> >>>>> "COA", "JBU",
"NWA"), class = "factor"), Flight >>>
structure(c(15L,
>>> >>>>> 19L, 18L, 6L, 17L, 8L, 12L, 5L, 4L,
1L, 3L, 13L, 9L,
>>> 10L,
>>> >>>>> 14L, 16L, 2L, 11L, 7L), .Label =
c("AAL842",
>>> "AWE307", "BTA1234",
>>> >>>>> "BTA2064",
"BTA2085", "BTA2347", "BTA2405",
>>> "BTA2916", "BTA3072",
>>> >>>>> "BTA3086",
"CHQ5312", "CJC3225", "CJC3359",
>>> "COA1166", "COA349",
>>> >>>>> "COA855",
"COA886", "JBU554", "NWA9934"), class >>>
"factor"),
>>> >>>>> gw = c(FALSE, FALSE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE,
>>> >>>>> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE,
>>> TRUE, FALSE,
>>> >>>>> FALSE)), .Names =
c("Date", "weekday", "month",
>>> "quarter",
>>> >>>>> "ICAO", "Flight",
"gw"), row.names = c(NA, -19L), class >>>
>>>>> "data.frame")
>>> >>>>>
>>> >>>>> These can be copied and pasted
directly into an R
>>> session without
>>> >>>>> modification.
>>> >>>>>
>>> >>>>> HTH,
>>> >>>>> Dennis
>>> >>>>>
>>> >>>>> On Sun, Jan 17, 2010 at 10:51 AM,
James Rome
>>> <jamesrome@gmail.com <mailto:jamesrome@gmail.com>
>>> >>>>> <mailto:jamesrome@gmail.com
>>> <mailto:jamesrome@gmail.com>>> wrote:
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> On 1/17/10 1:06 PM, David Winsemius
wrote:
>>> >>>>>>
>>> >>>>>> On Jan 17, 2010, at 12:37 PM,
James Rome wrote:
>>> >>>>>>
>>> >>>>>>> I don't think it is that
simple because it is not a
>>> one-to-one
>>> >>>>> match. In
>>> >>>>>>> the arr data frame, there are
many arrivals in a
>>> quarter hour
>>> >>>>> with good
>>> >>>>>>> weather on a given day. So I
need to match the date
>>> and the quarter
>>> >>>>>>> hour.
>>> >>>>>>>
>>> >>>>>>> And all of the rows in the
weather data frame are
>>> times with good
>>> >>>>>>> weather--unique date + quarter
hour. That is why I
>>> needed the
>>> >>>>> loop. For
>>> >>>>>>> each date and quarter hour in
weather, I want to
>>> mark all the
>>> >>>>> entries
>>> >>>>>>> with the corresponding date
and weather as TRUE in
>>> the arr$gw
>>> >>>>> column.
>>> >>>>>>>
>>> >>>>>>> I did convert the dates to
POSIXlt dates and rarrote
>>> my function as
>>> >>>>>>> gooddates = function(all,
good) {
>>> >>>>>>> la = length(all) # All the
arrivals
>>> >>>>>>> lw = length(good) # The good
15-minute periods
>>> >>>>>>> for(j in 1:lw) {
>>> >>>>>>> d=good$Date[j]
>>> >>>>>>> q=good$quarter[j]
>>> >>>>>>> all$gw[all$Date==d &&
all$quarter==q]=TRUE
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> You are attempting a vectorized
test and assignment
>>> with "&&" which
>>> >>>>>> seems unlikely to succeed, but
even then I am not
>>> sure your problems
>>> >>>>>> would be over. (I'm also
guessing that you might not
>>> have reported a
>>> >>>>>> warning.)
>>> >>>>>
>>> >>>>> Why shouldn't the &&
succeed? You are correct there,
>>> because I do
>>> >>>>> get
>>> >>>>> items if I use either part of this
and test, when I
>>> insert the &&,
>>> >>>>> I get
>>> >>>>> no hits. And I got no warnings.
>>> >>>>>>
>>> >>>>>> Why not merge arr to gw by date
and quarter?
>>> >>>>> The sets contain different data, and
the only thing
>>> I want from the
>>> >>>>> weather set is the fact that it has
an entry for a
>>> given date and
>>> >>>>> time
>>> >>>>>>
>>> >>>>>> Answering these questions would be
greatly speeded up
>>> with a small
>>> >>>>>> sample dataset. Are you aware of
the virtues of the
>>> dput function?
>>> >>>>>>
>>> >>>>>
>>> >>>>> What I want is for a 1 to be in the
gw column in the
>>> quarter
>>> >>>>> 60,61,62,63,...
>>> >>>>>
>>> >>>>> For example, here is some data from
the good weather
>>> set:
>>> >>>>> Date minute hour quarter
Efficiency Val
>>> >>>>> 1/1/09 5 15 60
>>> >>>>> 1/1/09 15 15 61 72
>>> >>>>> 1/1/09 30 15 62 63.3
>>> >>>>> 1/1/09 45 15 63 85.4
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> And this is from the arrivals set:
>>> >>>>> DateTime weekday
month quarter
>>> ICAO
>>> >>>>> Flight gw
>>> >>>>>
>>> >>>>> 1/1/09 5 1 59 COA
COA349
>>> 0
>>> >>>>> 1/1/09 5 1 59 NWA
NWA9934
>>> 0
>>> >>>>> 1/1/09 5 1 60 JBU
JBU554
>>> 0
>>> >>>>> 1/1/09 5 1 60 BTA
BTA2347
>>> 0
>>> >>>>> 1/1/09 5 1 60 COA
COA886
>>> 0
>>> >>>>> 1/1/09 5 1 60 BTA
BTA2916
>>> 0
>>> >>>>> 1/1/09 5 1 60 CJC
CJC3225
>>> 0
>>> >>>>> 1/1/09 5 1 60 BTA
BTA2085
>>> 0
>>> >>>>> 1/1/09 5 1 60 BTA
BTA2064
>>> 0
>>> >>>>> 1/1/09 5 1 60 AAL
AAL842
>>> 0
>>> >>>>> 1/1/09 5 1 60 BTA
BTA1234
>>> 0
>>> >>>>> 1/1/09 5 1 60 CJC
CJC3359
>>> 0
>>> >>>>> 1/1/09 5 1 60 BTA
BTA3072
>>> 0
>>> >>>>> 1/1/09 5 1 61 BTA
BTA3086
>>> 0
>>> >>>>> 1/1/09 5 1 61 COA
COA1166
>>> 0
>>> >>>>> 1/1/09 5 1 61 COA
COA855
>>> 0
>>> >>>>> 1/1/09 5 1 61 AWE
AWE307
>>> 0
>>> >>>>> 1/1/09 5 1 66 CHQ
CHQ5312
>>> 0
>>> >>>>> 1/1/09 5 1 67 BTA
BTA2405
>>> 0
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> [[alternative HTML version
deleted]]
>>> >>>>>
>>> >>>>>
______________________________________________
>>> >>>>> R-help@r-project.org
<mailto:R-help@r-project.org>
>>> <mailto:R-help@r-project.org
<mailto:R-help@r-project.org>>
>>> mailing list
>>> >>>>>
https://stat.ethz.ch/mailman/listinfo/r-help
>>> >>>>> PLEASE do read the posting guide
>>> >>>>>
http://www.R-project.org/posting-guide.html
>>> >>>>> and provide commented, minimal,
self-contained,
>>> reproducible code.
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >>>> [[alternative HTML version deleted]]
>>> >>>>
>>> >>>>
______________________________________________
>>> >>>> R-help@r-project.org
<mailto:R-help@r-project.org>
>>> mailing list
>>> >>>>
https://stat.ethz.ch/mailman/listinfo/r-help
>>> >>>> PLEASE do read the posting guide
>>> >>>>
http://www.R-project.org/posting-guide.html
>>> >>>> and provide commented, minimal,
self-contained,
>>> reproducible code.
>>> >>>
>>> >>> David Winsemius, MD
>>> >>> Heritage Laboratories
>>> >>> West Hartford, CT
>>> >>>
>>> >> <arr.rda><weather.rda>
>>> >
>>> > David Winsemius, MD
>>> > Heritage Laboratories
>>> > West Hartford, CT
>>> >
>>>
>>>
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>>
>>
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
[[alternative HTML version deleted]]