Sean Baumgarten wrote on 12/14/2011 06:38:08 PM:
> Hello,
>
> I have a data frame with hourly or sub-hourly weather records that span
> several years, and from that data frame I'm trying to select only the
> records taken closest to noon for each day. Here's what I've done
so
far:>
> #Add a column to the data frame showing the difference between noon and
the> observation time (I converted time to a 0-1 scale so 0.5 represents
noon):> data$Diff_from_noon <- abs(0.5-data$Time)
>
> #Find the minimum value of "Diff_from_noon" for each Date:
> aggregated <- aggregate(Diff_from_noon ~ Date, data, FUN=min)
>
>
> The problem is that the "aggregated" data frame only has two
columns:
Date> and Diff_from_noon. I can't figure out how to get the columns with the
> actual weather variables to carry over from the original data frame.
>
> Any suggestions you have would be much appreciated.
>
> Thanks,
> Sean
You don't provide any example data, so I will use data from R datasets,
airquality. After using the aggregate() function to find the minimum Day
for each Month, merge the resulting data frame with the original data
frame to see all the columns corresponding to the selected minimums.
> aggregated <- aggregate(Day ~ Month, airquality, FUN=min)
> aggregated
Month Day
1 5 1
2 6 1
3 7 1
4 8 1
5 9 1> merge(aggregated, airquality)
Month Day Ozone Solar.R Wind Temp
1 5 1 41 190 7.4 67
2 6 1 NA 286 8.6 78
3 7 1 135 269 4.1 84
4 8 1 39 83 6.9 81
5 9 1 96 167 6.9 91
For your data, the code would look like this:
aggregated <- aggregate(Diff_from_noon ~ Date, data, FUN=min)
merge(aggregated, data)
I recommend that you use a name other than "data" for your data frame,
since data() is a built in R function.
Jean
[[alternative HTML version deleted]]