I think I must be missing something obvious, but I'm having trouble getting a data transformation to work on groupings of data within a data frame (csss3) as defined by 2 factors (population, locid). The data are sorted by year within locid within population and I want to lag another variable (dbc), i.e, shift them down by 1 row replacing the first row with NA, within groups defined by locid nested within population. I thought I could do something using by(csss3,list(locid, population), function) but don't seem to be having any success. Any suggestions?? Brian Brian S. Cade U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_cade@usgs.gov tel: 970 226-9326 [[alternative HTML version deleted]]
Fair enough. To clarify what I'm trying to achieve I've pasted below a
small piece of the larger data frame with the hierarchical structure of
factors POPULATION and LOCID and the ascending order of YEARS and the
variable DBC that I would like to transform to another variable that is a
lag of the previous years DBC (call it LAG1DBC) within LOCID within
POPULATION. The desired outcome is shown in the second example data set
pasted below the first. The setup is desired for doing some 1st order
autoregressive analyses (not in the time series library). Any examples
I've tried doing using by() only seem to work for outputing results not
creating new variables in an existing data frame. I suspect that people
do similar types of hierarchical subgroup data manipulations all the time
in R (I know how to do these easily in SYSTAT), so I'm sure I'm missing
some obvious, simple trick. My search of the R newslist archives and
various other R documentation has not yielded any solutions yet.
Suggestions are graciously welcomed.
LOCID POPULATION YEAR DBC
1 algb-1 A 1992 0.70451575
2 algb-1 A 1993 0.59506851
3 algb-1 A 1997 0.84837544
4 algb-1 A 1998 0.50283182
5 algb-1 A 2000 0.91242707
6 algb-2 A 1992 0.09747155
7 algb-2 A 1993 0.84772253
8 algb-2 A 1997 0.43974081
9 algb-2 A 1998 0.83108544
10 algb-2 A 2000 0.22291192
11 algb-3 A 1992 0.44234175
12 algb-3 A 1993 0.54089534
5680 taylr-73 B 2001 0.43918082
5681 taylr-73 B 2002 0.34694427
5682 taylr-73 B 2003 3.35619190
5683 taylr-73 B 2004 0.71575815
5684 taylr-73 B 2005 0.42038506
5685 taylr-74 B 1992 3.88410354
5686 taylr-74 B 1993 3.32472557
5687 taylr-74 B 1994 3.29861501
5688 taylr-74 B 1996 0.48153827
5689 taylr-74 B 1997 3.63570636
5690 taylr-74 B 1998 1.94630194
LOCID POPULATION YEAR DBC LAG1DBC
1 algb-1 A 1992 0.70451575 NA
2 algb-1 A 1993 0.59506851 0.70451575
3 algb-1 A 1997 0.84837544 0.59506851
4 algb-1 A 1998 0.50283182 0.84837544
5 algb-1 A 2000 0.91242707 0.50283182
6 algb-2 A 1992 0.09747155 NA
7 algb-2 A 1993 0.84772253 0.09747155
8 algb-2 A 1997 0.43974081 0.84772253
9 algb-2 A 1998 0.83108544 0.43974081
10 algb-2 A 2000 0.22291192 0.83108544
11 algb-3 A 1992 0.44234175 NA
12 algb-3 A 1993 0.54089534 0.44234175
5680 taylr-73 B 2001 0.43918082 NA
5681 taylr-73 B 2002 0.34694427 0.43918082
5682 taylr-73 B 2003 3.35619190 0.34694427
5683 taylr-73 B 2004 0.71575815 3.35619190
5684 taylr-73 B 2005 0.42038506 0.71575815
5685 taylr-74 B 1992 3.88410354 NA
5686 taylr-74 B 1993 3.32472557 3.88410354
5687 taylr-74 B 1994 3.29861501 3.32472557
5688 taylr-74 B 1996 0.48153827 3.29861501
5689 taylr-74 B 1997 3.63570636 0.48153827
5690 taylr-74 B 1998 1.94630194 3.63570636
Brian
Brian S. Cade
U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO 80526-8818
email: brian_cade@usgs.gov
tel: 970 226-9326
Florence Combes <fcombes@gmail.com>
10/13/2005 05:34 AM
To
Brian S Cade <brian_cade@usgs.gov>
cc
Subject
Re: [R] subsetting with by() or other function??
maybe an example of the data you have and the data you want could be
helpful for the people of the list to understand, and so to be able to
help you ?
best regards,
Florence.
On 10/12/05, Brian S Cade <brian_cade@usgs.gov> wrote:
I think I must be missing something obvious, but I'm having trouble
getting a data transformation to work on groupings of data within a data
frame (csss3) as defined by 2 factors (population, locid). The data are
sorted by year within locid within population and I want to lag another
variable (dbc), i.e, shift them down by 1 row replacing the first row with
NA, within groups defined by locid nested within population. I thought I
could do something using by(csss3,list(locid, population), function) but
don't seem to be having any success. Any suggestions??
Brian
Brian S. Cade
U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO 80526-8818
email: brian_cade@usgs.gov
tel: 970 226-9326
[[alternative HTML version deleted]]
______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
[[alternative HTML version deleted]]
Dimitris: Thank you for the suggestion but I get an error just as when I
did similar commands using by(), The error given is
Error in "$<-.data.frame"(`*tmp*`, "LAGDBC", value =
tapply(csss3lagm81$DBC, :
replacement has 1089 rows, data has 8314
So I'm not sure what the problem is - why does the transformed tmp only
have 1089 rows instead of 8314 like the full data frame?
Brian
Brian S. Cade
U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO 80526-8818
email: brian_cade@usgs.gov
tel: 970 226-9326
"Dimitris Rizopoulos" <dimitris.rizopoulos@med.kuleuven.be>
10/13/2005 10:04 AM
To
"Brian S Cade" <brian_cade@usgs.gov>
cc
Subject
Re: [R] subsetting with by() or other function??
I think this should be something like:
dat$LAG1DBC <- tapply(dat$DBC, dat$LOCID, function(x) c(NA,
x[-length(x)]))
I hope it helps.
Best,
Dimitris
----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven
Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://www.med.kuleuven.be/biostat/
http://www.student.kuleuven.be/~m0390867/dimitris.htm
----- Original Message -----
From: "Brian S Cade" <brian_cade@usgs.gov>
To: "Florence Combes" <fcombes@gmail.com>;
<r-help@stat.math.ethz.ch>
Sent: Thursday, October 13, 2005 5:48 PM
Subject: Re: [R] subsetting with by() or other function??
> Fair enough. To clarify what I'm trying to achieve I've pasted
> below a
> small piece of the larger data frame with the hierarchical structure
> of
> factors POPULATION and LOCID and the ascending order of YEARS and
> the
> variable DBC that I would like to transform to another variable that
> is a
> lag of the previous years DBC (call it LAG1DBC) within LOCID within
> POPULATION. The desired outcome is shown in the second example data
> set
> pasted below the first. The setup is desired for doing some 1st
> order
> autoregressive analyses (not in the time series library). Any
> examples
> I've tried doing using by() only seem to work for outputing results
> not
> creating new variables in an existing data frame. I suspect that
> people
> do similar types of hierarchical subgroup data manipulations all the
> time
> in R (I know how to do these easily in SYSTAT), so I'm sure I'm
> missing
> some obvious, simple trick. My search of the R newslist archives
> and
> various other R documentation has not yielded any solutions yet.
> Suggestions are graciously welcomed.
>
> LOCID POPULATION YEAR DBC
> 1 algb-1 A 1992 0.70451575
> 2 algb-1 A 1993 0.59506851
> 3 algb-1 A 1997 0.84837544
> 4 algb-1 A 1998 0.50283182
> 5 algb-1 A 2000 0.91242707
> 6 algb-2 A 1992 0.09747155
> 7 algb-2 A 1993 0.84772253
> 8 algb-2 A 1997 0.43974081
> 9 algb-2 A 1998 0.83108544
> 10 algb-2 A 2000 0.22291192
> 11 algb-3 A 1992 0.44234175
> 12 algb-3 A 1993 0.54089534
> 5680 taylr-73 B 2001 0.43918082
> 5681 taylr-73 B 2002 0.34694427
> 5682 taylr-73 B 2003 3.35619190
> 5683 taylr-73 B 2004 0.71575815
> 5684 taylr-73 B 2005 0.42038506
> 5685 taylr-74 B 1992 3.88410354
> 5686 taylr-74 B 1993 3.32472557
> 5687 taylr-74 B 1994 3.29861501
> 5688 taylr-74 B 1996 0.48153827
> 5689 taylr-74 B 1997 3.63570636
> 5690 taylr-74 B 1998 1.94630194
>
> LOCID POPULATION YEAR DBC
> 1 algb-1 A 1992 0.70451575 NA
> 2 algb-1 A 1993 0.59506851 0.70451575
> 3 algb-1 A 1997 0.84837544 0.59506851
> 4 algb-1 A 1998 0.50283182 0.84837544
> 5 algb-1 A 2000 0.91242707 0.50283182
> 6 algb-2 A 1992 0.09747155 NA
> 7 algb-2 A 1993 0.84772253 0.09747155
> 8 algb-2 A 1997 0.43974081 0.84772253
> 9 algb-2 A 1998 0.83108544 0.43974081
> 10 algb-2 A 2000 0.22291192 0.83108544
> 11 algb-3 A 1992 0.44234175 NA
> 12 algb-3 A 1993 0.54089534 0.44234175
> 5680 taylr-73 B 2001 0.43918082 NA
> 5681 taylr-73 B 2002 0.34694427 0.43918082
> 5682 taylr-73 B 2003 3.35619190 0.34694427
> 5683 taylr-73 B 2004 0.71575815 3.35619190
> 5684 taylr-73 B 2005 0.42038506 0.71575815
> 5685 taylr-74 B 1992 3.88410354 NA
> 5686 taylr-74 B 1993 3.32472557 3.88410354
> 5687 taylr-74 B 1994 3.29861501 3.32472557
> 5688 taylr-74 B 1996 0.48153827 3.29861501
> 5689 taylr-74 B 1997 3.63570636 0.48153827
> 5690 taylr-74 B 1998 1.94630194 3.63570636
>
> Brian
>
>
>
> Brian S. Cade
>
> U. S. Geological Survey
> Fort Collins Science Center
> 2150 Centre Ave., Bldg. C
> Fort Collins, CO 80526-8818
>
> email: brian_cade@usgs.gov
> tel: 970 226-9326
>
>
>
> Florence Combes <fcombes@gmail.com>
> 10/13/2005 05:34 AM
>
> To
> Brian S Cade <brian_cade@usgs.gov>
> cc
>
> Subject
> Re: [R] subsetting with by() or other function??
>
>
>
>
>
>
> maybe an example of the data you have and the data you want could be
> helpful for the people of the list to understand, and so to be able
> to
> help you ?
>
> best regards,
>
> Florence.
>
>
>
> On 10/12/05, Brian S Cade <brian_cade@usgs.gov> wrote:
> I think I must be missing something obvious, but I'm having trouble
> getting a data transformation to work on groupings of data within a
> data
> frame (csss3) as defined by 2 factors (population, locid). The data
> are
> sorted by year within locid within population and I want to lag
> another
> variable (dbc), i.e, shift them down by 1 row replacing the first
> row with
> NA, within groups defined by locid nested within population. I
> thought I
> could do something using by(csss3,list(locid, population), function)
> but
> don't seem to be having any success. Any suggestions??
>
> Brian
>
> Brian S. Cade
>
> U. S. Geological Survey
> Fort Collins Science Center
> 2150 Centre Ave., Bldg. C
> Fort Collins, CO 80526-8818
>
> email: brian_cade@usgs.gov
> tel: 970 226-9326
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
[[alternative HTML version deleted]]