Kara Przeczek
2008-Nov-14 20:24 UTC
[R] can I use one data frame to subset data from another?
I am new to R and have been struggling with the following problem. I apologize if there is an obvious answer that I have missed. Problem: I have a set of hourly temperature data that I would like to average over a number of different hours. Example, I would like the average temperature between 11:00 am on April 1 2008 and 12:00 pm on April 12 2008, and the average temperature between 2:00 pm on April 1 2008 and 2:00 pm on April 12 2008, and so on. I have two different data frames. The first has the air temperature and date/time data (length = 7188) the other has the start and end date/time data which I would like to average over (length = 36). name: "met_h" DateTime AT_h 09/21/2007 100 3.66850 09/21/2007 200 3.63925 09/21/2007 300 3.39800 09/21/2007 400 3.38000 09/21/2007 500 3.11425 09/21/2007 600 2.89850 09/21/2007 700 2.65600 09/21/2007 800 2.08900 09/21/2007 900 2.17325 09/21/2007 1000 3.22350 ..... name: "period" Start End 4/1/2008 11:00 4/12/2008 12:00 4/1/2008 14:00 4/12/2008 14:00 4/1/2008 17:00 4/12/2008 16:00 4/1/2008 19:00 4/12/2008 17:00 4/12/2008 13:00 4/20/2008 12:00 4/12/2008 15:00 4/20/2008 15:00 4/12/2008 17:00 4/20/2008 17:00 4/12/2008 18:00 4/20/2008 19:00 4/20/2008 13:00 4/27/2008 12:00 4/20/2008 16:00 4/27/2008 13:00 ..... I first converted the date time columns using strptime: strptime(met_h$DateTime, format="%m/%d/%Y %H:%M") I am attempting to use the second data frame to select the date to average in the first data frame: mean(met_h$AT_h[met_h$DateTime>=(period$Start) & met_h$DateTime<=(period$End)]) This gets me many error messages. Firstly because the reference data frame "period" is not the same length as the temperature data frame "met_h". However, I want to create an output of the same length as "period". I would like to use the "period" data frame to pick out certain data from the larger "met_h". Thank you very much for your time! Kara Kara Przeczek M.Sc. Candidate NRES - Environmental Science University of Northern B.C. 3333 University Way Prince George B.C V2N 4Z9 Phone: (250) 960-5427 przeczek at unbc.ca
jim holtman
2008-Nov-14 23:47 UTC
[R] can I use one data frame to subset data from another?
something like this should work; yourMean <- lapply(seq(nrow(period)), function(.indx){ mean(met_h$AT_h[(met_h$DateTime >= period$Start[.indx]) & (met_h$DateTime <= period$End[.indx])]) }) On Fri, Nov 14, 2008 at 3:24 PM, Kara Przeczek <przeczek at unbc.ca> wrote:> I am new to R and have been struggling with the following problem. I apologize if there is an obvious answer that I have missed. > > Problem: > I have a set of hourly temperature data that I would like to average over a number of different hours. Example, I would like the average temperature between 11:00 am on April 1 2008 and 12:00 pm on April 12 2008, and the average temperature between 2:00 pm on April 1 2008 and 2:00 pm on April 12 2008, and so on. I have two different data frames. The first has the air temperature and date/time data (length = 7188) the other has the start and end date/time data which I would like to average over (length = 36). > name: "met_h" > DateTime AT_h > 09/21/2007 100 3.66850 > 09/21/2007 200 3.63925 > 09/21/2007 300 3.39800 > 09/21/2007 400 3.38000 > 09/21/2007 500 3.11425 > 09/21/2007 600 2.89850 > 09/21/2007 700 2.65600 > 09/21/2007 800 2.08900 > 09/21/2007 900 2.17325 > 09/21/2007 1000 3.22350 > ..... > > name: "period" > Start End > 4/1/2008 11:00 4/12/2008 12:00 > 4/1/2008 14:00 4/12/2008 14:00 > 4/1/2008 17:00 4/12/2008 16:00 > 4/1/2008 19:00 4/12/2008 17:00 > 4/12/2008 13:00 4/20/2008 12:00 > 4/12/2008 15:00 4/20/2008 15:00 > 4/12/2008 17:00 4/20/2008 17:00 > 4/12/2008 18:00 4/20/2008 19:00 > 4/20/2008 13:00 4/27/2008 12:00 > 4/20/2008 16:00 4/27/2008 13:00 > ..... > > I first converted the date time columns using strptime: > strptime(met_h$DateTime, format="%m/%d/%Y %H:%M") > > I am attempting to use the second data frame to select the date to average in the first data frame: > mean(met_h$AT_h[met_h$DateTime>=(period$Start) & met_h$DateTime<=(period$End)]) > > This gets me many error messages. Firstly because the reference data frame "period" is not the same length as the temperature data frame "met_h". However, I want to create an output of the same length as "period". I would like to use the "period" data frame to pick out certain data from the larger "met_h". > > Thank you very much for your time! > Kara > > > > Kara Przeczek > M.Sc. Candidate NRES - Environmental Science > University of Northern B.C. > 3333 University Way > Prince George B.C V2N 4Z9 > Phone: (250) 960-5427 > przeczek at unbc.ca > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?