Ryan Utz
2011-Jun-22 18:02 UTC
[R] Linking 2 columns in 2 databases and applying a function
Hi all, I have two datasets, one that represents a long-term time series and one that represents summary data for the time series. It looks something like this: x<-data.frame(Year=c(2001,2001,2001,2001,2001,2001,2002,2002,2002,2002,2002,2002), Month=c(1,1,1,2,2,2),Q=c(5,5,5,6,6,6,3,3,3,4,4,5)) y<-data.frame(Year=c(2001,2001,2002,2002),Month=c(1,2,1,2),Threshold_Q=c(5,5,4,4)) What I'd like to do is link the Year and Month fields in both dataframes then determine if Q exceeds Q_Threshold (by noting it with something like 1 or 0 in a new field in the dataframe x). If I were doing this in the more-familiar-to-me Matlab, I'd just write a pair of nested for-loops. But as we know, this won't fly in R. I've tried reading the help pages and seeking for solutions on the net, with no luck (I'm relatively new to R and the help pages are still a bit opaque to me). It seems like the functions "apply" or "lapply" are key, but I can't make sense of their syntax. Any advice/help?!? Many thanks, Ryan -- Ryan Utz, Ph.D. Aquatic Ecologist/STREON Scientist National Ecological Observatory Network Home/Cell: (724) 272-7769 Work: (720) 746-4844 ext. 2488 [[alternative HTML version deleted]]
Daniel Malter
2011-Jun-22 18:32 UTC
[R] Linking 2 columns in 2 databases and applying a function
For example, you can merge the two data frames and do a direct comparison: df<-merge(x,y,all.x=T,all.y=F) df df$Q>df$Threshold_Q HTH, Daniel Ryan Utz-2 wrote:> > Hi all, > > I have two datasets, one that represents a long-term time series and one > that represents summary data for the time series. It looks something like > this: > > x<-data.frame(Year=c(2001,2001,2001,2001,2001,2001,2002,2002,2002,2002,2002,2002), > Month=c(1,1,1,2,2,2),Q=c(5,5,5,6,6,6,3,3,3,4,4,5)) > y<-data.frame(Year=c(2001,2001,2002,2002),Month=c(1,2,1,2),Threshold_Q=c(5,5,4,4)) > > What I'd like to do is link the Year and Month fields in both dataframes > then determine if Q exceeds Q_Threshold (by noting it with something like > 1 > or 0 in a new field in the dataframe x). > > If I were doing this in the more-familiar-to-me Matlab, I'd just write a > pair of nested for-loops. But as we know, this won't fly in R. I've tried > reading the help pages and seeking for solutions on the net, with no luck > (I'm relatively new to R and the help pages are still a bit opaque to me). > It seems like the functions "apply" or "lapply" are key, but I can't make > sense of their syntax. > > Any advice/help?!? > > Many thanks, > Ryan > > -- > > Ryan Utz, Ph.D. > Aquatic Ecologist/STREON Scientist > National Ecological Observatory Network > > Home/Cell: (724) 272-7769 > Work: (720) 746-4844 ext. 2488 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- View this message in context: http://r.789695.n4.nabble.com/Linking-2-columns-in-2-databases-and-applying-a-function-tp3617710p3617775.html Sent from the R help mailing list archive at Nabble.com.
Ryan Utz
2011-Jun-22 19:06 UTC
[R] Linking 2 columns in 2 databases and applying a function
Daniel, That indeed does work... and I didn't even need to learn a new function. Thanks! -- Ryan Utz, Ph.D. Aquatic Ecologist/STREON Scientist National Ecological Observatory Network Home/Cell: (724) 272-7769 Work: (720) 746-4844 ext. 2488 [[alternative HTML version deleted]]