Evan: You misunderstand the concept of a lagged variable. Ulrik: Well, yes, that is certainly a general solution that works. However, given the *specific* structure described by the OP, an even more direct (maybe more efficient?) way to do it just uses (logical) subscripting: odds <- (seq_len(nrow(mydata)) %% 2) == 1 newdat <-data.frame(mydata[odds,1 ],mydata[!odds,2] - mydata[odds,2]) names(newdat) <- names(mydata) Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Mar 17, 2017 at 9:58 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com> wrote:> Hi Evan > > you can easily do this by applying diff() to each exp group. > > Either using dplyr: > library(dplyr) > mydata %>% > group_by(exp) %>% > summarise(difference = diff(rslt)) > > Or with base R > aggregate(mydata, by = list(group = mydata$exp), FUN = diff) > > HTH > Ulrik > > > On Fri, 17 Mar 2017 at 17:34 Evan Cooch <evan.cooch at gmail.com> wrote: > >> Suppose I have a dataframe that looks like the following: >> >> n=2 >> mydata <- data.frame(exp = rep(1:5,each=n), rslt >> c(12,15,7,8,24,28,33,15,22,11)) >> mydata >> exp rslt >> 1 1 12 >> 2 1 15 >> 3 2 7 >> 4 2 8 >> 5 3 24 >> 6 3 28 >> 7 4 33 >> 8 4 15 >> 9 5 22 >> 10 5 11 >> >> The variable 'exp' (for experiment') occurs in pairs over consecutive >> rows -- 1,1, then 2,2, then 3,3, and so on. The first row in a pair is >> the 'control', and the second is a 'treatment'. The rslt column is the >> result. >> >> What I'm trying to do is create a subset of this dataframe that consists >> of the exp number, and the lagged difference between the 'control' and >> 'treatment' result. So, for exp=1, the difference is (15-12)=3. For >> exp=2, the difference is (8-7)=1, and so on. What I'm hoping to do is >> take mydata (above), and turn it into >> >> exp diff >> 1 1 3 >> 2 2 1 >> 3 3 4 >> 4 4 -18 >> 5 5 -11 >> >> The basic 'trick' I can't figure out is how to create a lagged variable >> between the second row (record) for a given level of exp, and the first >> row for that exp. This is easy to do in SAS (which I'm more familiar >> with), but I'm struggling with the equivalent in R. The brute force >> approach I thought of is to simply split the dataframe into to (one >> even rows, one odd rows), merge by exp, and then calculate a difference. >> But this seems to require renaming the rslt column in the two new >> dataframes so they are different in the merge (say, rslt_cont n the odd >> dataframe, and rslt_trt in the even dataframe), allowing me to calculate >> a difference between the two. >> >> While I suppose this would work, I'm wondering if I'm missing a more >> elegant 'in place' approach that doesn't require me to split the data >> frame and do every via a merge. >> >> Suggestions/pointers to the obvious welcome. I've tried playing with >> lag, and some approaches using lag in the zoo package, but haven't >> found the magic trick. The problem (meaning, what I can't figure out) >> seems to be conditioning the lag on the level of exp. >> >> Many thanks... >> >> >> mydata <-*data.frame*(x = c(20,35,45,55,70), n = rep(50,5), y >> c(6,17,26,37,44)) >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On 3/17/2017 1:19 PM, Bert Gunter wrote:> Evan: > > You misunderstand the concept of a lagged variable.Well, lag in R, perhaps (and by my own admission). In SAS, thats exactly how it works.: data test; input exp rslt; cards; <data in the data frame in OP> *; data test2; set test; by exp; diff=rslt-lag(rslt); if last.exp;> > Ulrik: > > Well, yes, that is certainly a general solution that works. However, > given the *specific* structure described by the OP, an even more > direct (maybe more efficient?) way to do it just uses (logical) > subscripting: > > odds <- (seq_len(nrow(mydata)) %% 2) == 1 > newdat <-data.frame(mydata[odds,1 ],mydata[!odds,2] - mydata[odds,2]) > names(newdat) <- names(mydata) >Interesting - thanks!> > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Fri, Mar 17, 2017 at 9:58 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com> wrote: >> Hi Evan >> >> you can easily do this by applying diff() to each exp group. >> >> Either using dplyr: >> library(dplyr) >> mydata %>% >> group_by(exp) %>% >> summarise(difference = diff(rslt)) >> >> Or with base R >> aggregate(mydata, by = list(group = mydata$exp), FUN = diff) >> >> HTH >> Ulrik >> >> >> On Fri, 17 Mar 2017 at 17:34 Evan Cooch <evan.cooch at gmail.com> wrote: >> >>> Suppose I have a dataframe that looks like the following: >>> >>> n=2 >>> mydata <- data.frame(exp = rep(1:5,each=n), rslt >>> c(12,15,7,8,24,28,33,15,22,11)) >>> mydata >>> exp rslt >>> 1 1 12 >>> 2 1 15 >>> 3 2 7 >>> 4 2 8 >>> 5 3 24 >>> 6 3 28 >>> 7 4 33 >>> 8 4 15 >>> 9 5 22 >>> 10 5 11 >>> >>> The variable 'exp' (for experiment') occurs in pairs over consecutive >>> rows -- 1,1, then 2,2, then 3,3, and so on. The first row in a pair is >>> the 'control', and the second is a 'treatment'. The rslt column is the >>> result. >>> >>> What I'm trying to do is create a subset of this dataframe that consists >>> of the exp number, and the lagged difference between the 'control' and >>> 'treatment' result. So, for exp=1, the difference is (15-12)=3. For >>> exp=2, the difference is (8-7)=1, and so on. What I'm hoping to do is >>> take mydata (above), and turn it into >>> >>> exp diff >>> 1 1 3 >>> 2 2 1 >>> 3 3 4 >>> 4 4 -18 >>> 5 5 -11 >>> >>> The basic 'trick' I can't figure out is how to create a lagged variable >>> between the second row (record) for a given level of exp, and the first >>> row for that exp. This is easy to do in SAS (which I'm more familiar >>> with), but I'm struggling with the equivalent in R. The brute force >>> approach I thought of is to simply split the dataframe into to (one >>> even rows, one odd rows), merge by exp, and then calculate a difference. >>> But this seems to require renaming the rslt column in the two new >>> dataframes so they are different in the merge (say, rslt_cont n the odd >>> dataframe, and rslt_trt in the even dataframe), allowing me to calculate >>> a difference between the two. >>> >>> While I suppose this would work, I'm wondering if I'm missing a more >>> elegant 'in place' approach that doesn't require me to split the data >>> frame and do every via a merge. >>> >>> Suggestions/pointers to the obvious welcome. I've tried playing with >>> lag, and some approaches using lag in the zoo package, but haven't >>> found the magic trick. The problem (meaning, what I can't figure out) >>> seems to be conditioning the lag on the level of exp. >>> >>> Many thanks... >>> >>> >>> mydata <-*data.frame*(x = c(20,35,45,55,70), n = rep(50,5), y >>> c(6,17,26,37,44)) >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.
Evan: Yes, I stand partially corrected. You have the concept correct, but R implements it differently than SAS. I think what you want for your approach is diff(): evens <- (seq_len(nrow(mydata)) %% 2) == 0 newdat <-data.frame(exp=mydata[evens,1 ],reslt= diff(mydata[,2])[evens[-1]]) ... which seems neater to me than what I offered previously. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Mar 17, 2017 at 10:25 AM, Evan Cooch <evan.cooch at gmail.com> wrote:> > > On 3/17/2017 1:19 PM, Bert Gunter wrote: >> >> Evan: >> >> You misunderstand the concept of a lagged variable. > > > Well, lag in R, perhaps (and by my own admission). In SAS, thats exactly how > it works.: > > data test; > input exp rslt; > cards; > <data in the data frame in OP> > *; > > > data test2; set test; by exp; > diff=rslt-lag(rslt); > if last.exp; > >> >> Ulrik: >> >> Well, yes, that is certainly a general solution that works. However, >> given the *specific* structure described by the OP, an even more >> direct (maybe more efficient?) way to do it just uses (logical) >> subscripting: >> >> odds <- (seq_len(nrow(mydata)) %% 2) == 1 >> newdat <-data.frame(mydata[odds,1 ],mydata[!odds,2] - mydata[odds,2]) >> names(newdat) <- names(mydata) >> > > Interesting - thanks! > > >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Fri, Mar 17, 2017 at 9:58 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com> >> wrote: >>> >>> Hi Evan >>> >>> you can easily do this by applying diff() to each exp group. >>> >>> Either using dplyr: >>> library(dplyr) >>> mydata %>% >>> group_by(exp) %>% >>> summarise(difference = diff(rslt)) >>> >>> Or with base R >>> aggregate(mydata, by = list(group = mydata$exp), FUN = diff) >>> >>> HTH >>> Ulrik >>> >>> >>> On Fri, 17 Mar 2017 at 17:34 Evan Cooch <evan.cooch at gmail.com> wrote: >>> >>>> Suppose I have a dataframe that looks like the following: >>>> >>>> n=2 >>>> mydata <- data.frame(exp = rep(1:5,each=n), rslt >>>> c(12,15,7,8,24,28,33,15,22,11)) >>>> mydata >>>> exp rslt >>>> 1 1 12 >>>> 2 1 15 >>>> 3 2 7 >>>> 4 2 8 >>>> 5 3 24 >>>> 6 3 28 >>>> 7 4 33 >>>> 8 4 15 >>>> 9 5 22 >>>> 10 5 11 >>>> >>>> The variable 'exp' (for experiment') occurs in pairs over consecutive >>>> rows -- 1,1, then 2,2, then 3,3, and so on. The first row in a pair is >>>> the 'control', and the second is a 'treatment'. The rslt column is the >>>> result. >>>> >>>> What I'm trying to do is create a subset of this dataframe that consists >>>> of the exp number, and the lagged difference between the 'control' and >>>> 'treatment' result. So, for exp=1, the difference is (15-12)=3. For >>>> exp=2, the difference is (8-7)=1, and so on. What I'm hoping to do is >>>> take mydata (above), and turn it into >>>> >>>> exp diff >>>> 1 1 3 >>>> 2 2 1 >>>> 3 3 4 >>>> 4 4 -18 >>>> 5 5 -11 >>>> >>>> The basic 'trick' I can't figure out is how to create a lagged variable >>>> between the second row (record) for a given level of exp, and the first >>>> row for that exp. This is easy to do in SAS (which I'm more familiar >>>> with), but I'm struggling with the equivalent in R. The brute force >>>> approach I thought of is to simply split the dataframe into to (one >>>> even rows, one odd rows), merge by exp, and then calculate a difference. >>>> But this seems to require renaming the rslt column in the two new >>>> dataframes so they are different in the merge (say, rslt_cont n the odd >>>> dataframe, and rslt_trt in the even dataframe), allowing me to calculate >>>> a difference between the two. >>>> >>>> While I suppose this would work, I'm wondering if I'm missing a more >>>> elegant 'in place' approach that doesn't require me to split the data >>>> frame and do every via a merge. >>>> >>>> Suggestions/pointers to the obvious welcome. I've tried playing with >>>> lag, and some approaches using lag in the zoo package, but haven't >>>> found the magic trick. The problem (meaning, what I can't figure out) >>>> seems to be conditioning the lag on the level of exp. >>>> >>>> Many thanks... >>>> >>>> >>>> mydata <-*data.frame*(x = c(20,35,45,55,70), n = rep(50,5), y >>>> c(6,17,26,37,44)) >>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > >