Brian S Cade
2005-Oct-13 20:28 UTC
[R] subsetting data frame using by() or tapply() or other
Ok so I see the problem that I'm having creating a new variable (LAG1DBC) in the example data transformation below is that tapply() is creating a list that is not dimensionally consistent with the data frame (data). So how do I go from the list output of tapply() to create a dimensionally consistent vector that can create the new variable in my original data frame? I've been trying to use a function like data$LAG1DBC <- tapply(data$DBC, data$LOCID, function(x) c(NA, x[-length(x)])) which creates a list of dimension much smaller than the nrows in data. And I've tried things like using as.data.frame.array() or as.data.frame.list() in front of tapply() and still have the same problem. I know this can't be that unusual of a data manipulation and that someone has to have done similar things before. I want to go from something like this: LOCID POPULATION YEAR DBC 1 algb-1 A 1992 0.70451575 2 algb-1 A 1993 0.59506851 3 algb-1 A 1997 0.84837544 4 algb-1 A 1998 0.50283182 5 algb-1 A 2000 0.91242707 6 algb-2 A 1992 0.09747155 7 algb-2 A 1993 0.84772253 8 algb-2 A 1997 0.43974081 9 algb-2 A 1998 0.83108544 10 algb-2 A 2000 0.22291192 11 algb-3 A 1992 0.44234175 12 algb-3 A 1993 0.54089534 5680 taylr-73 B 2001 0.43918082 5681 taylr-73 B 2002 0.34694427 5682 taylr-73 B 2003 3.35619190 5683 taylr-73 B 2004 0.71575815 5684 taylr-73 B 2005 0.42038506 5685 taylr-74 B 1992 3.88410354 5686 taylr-74 B 1993 3.32472557 5687 taylr-74 B 1994 3.29861501 5688 taylr-74 B 1996 0.48153827 5689 taylr-74 B 1997 3.63570636 5690 taylr-74 B 1998 1.94630194 to something like this: LOCID POPULATION YEAR DBC LAG1DBC 1 algb-1 A 1992 0.70451575 NA 2 algb-1 A 1993 0.59506851 0.70451575 3 algb-1 A 1997 0.84837544 0.59506851 4 algb-1 A 1998 0.50283182 0.84837544 5 algb-1 A 2000 0.91242707 0.50283182 6 algb-2 A 1992 0.09747155 NA 7 algb-2 A 1993 0.84772253 0.09747155 8 algb-2 A 1997 0.43974081 0.84772253 9 algb-2 A 1998 0.83108544 0.43974081 10 algb-2 A 2000 0.22291192 0.83108544 11 algb-3 A 1992 0.44234175 NA 12 algb-3 A 1993 0.54089534 0.44234175 5680 taylr-73 B 2001 0.43918082 NA 5681 taylr-73 B 2002 0.34694427 0.43918082 5682 taylr-73 B 2003 3.35619190 0.34694427 5683 taylr-73 B 2004 0.71575815 3.35619190 5684 taylr-73 B 2005 0.42038506 0.71575815 5685 taylr-74 B 1992 3.88410354 NA 5686 taylr-74 B 1993 3.32472557 3.88410354 5687 taylr-74 B 1994 3.29861501 3.32472557 5688 taylr-74 B 1996 0.48153827 3.29861501 5689 taylr-74 B 1997 3.63570636 0.48153827 5690 taylr-74 B 1998 1.94630194 3.63570636 Brian Brian S. Cade U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_cade@usgs.gov tel: 970 226-9326 [[alternative HTML version deleted]]
Marc Schwartz (via MN)
2005-Oct-13 21:04 UTC
[R] subsetting data frame using by() or tapply() or other
On Thu, 2005-10-13 at 14:28 -0600, Brian S Cade wrote:> Ok so I see the problem that I'm having creating a new variable (LAG1DBC) > in the example data transformation below is that tapply() is creating a > list that is not dimensionally consistent with the data frame (data). So > how do I go from the list output of tapply() to create a dimensionally > consistent vector that can create the new variable in my original data > frame? I've been trying to use a function like > data$LAG1DBC <- tapply(data$DBC, data$LOCID, function(x) c(NA, > x[-length(x)])) > which creates a list of dimension much smaller than the nrows in data. And > I've tried things like using as.data.frame.array() or as.data.frame.list() > in front of tapply() and still have the same problem. I know this can't > be that unusual of a data manipulation and that someone has to have done > similar things before. > > I want to go from something like this: > > LOCID POPULATION YEAR DBC > 1 algb-1 A 1992 0.70451575 > 2 algb-1 A 1993 0.59506851 > 3 algb-1 A 1997 0.84837544 > 4 algb-1 A 1998 0.50283182 > 5 algb-1 A 2000 0.91242707 > 6 algb-2 A 1992 0.09747155 > 7 algb-2 A 1993 0.84772253 > 8 algb-2 A 1997 0.43974081 > 9 algb-2 A 1998 0.83108544 > 10 algb-2 A 2000 0.22291192 > 11 algb-3 A 1992 0.44234175 > 12 algb-3 A 1993 0.54089534 > 5680 taylr-73 B 2001 0.43918082 > 5681 taylr-73 B 2002 0.34694427 > 5682 taylr-73 B 2003 3.35619190 > 5683 taylr-73 B 2004 0.71575815 > 5684 taylr-73 B 2005 0.42038506 > 5685 taylr-74 B 1992 3.88410354 > 5686 taylr-74 B 1993 3.32472557 > 5687 taylr-74 B 1994 3.29861501 > 5688 taylr-74 B 1996 0.48153827 > 5689 taylr-74 B 1997 3.63570636 > 5690 taylr-74 B 1998 1.94630194 > > to something like this: > > LOCID POPULATION YEAR DBC LAG1DBC > 1 algb-1 A 1992 0.70451575 NA > 2 algb-1 A 1993 0.59506851 0.70451575 > 3 algb-1 A 1997 0.84837544 0.59506851 > 4 algb-1 A 1998 0.50283182 0.84837544 > 5 algb-1 A 2000 0.91242707 0.50283182 > 6 algb-2 A 1992 0.09747155 NA > 7 algb-2 A 1993 0.84772253 0.09747155 > 8 algb-2 A 1997 0.43974081 0.84772253 > 9 algb-2 A 1998 0.83108544 0.43974081 > 10 algb-2 A 2000 0.22291192 0.83108544 > 11 algb-3 A 1992 0.44234175 NA > 12 algb-3 A 1993 0.54089534 0.44234175 > 5680 taylr-73 B 2001 0.43918082 NA > 5681 taylr-73 B 2002 0.34694427 0.43918082 > 5682 taylr-73 B 2003 3.35619190 0.34694427 > 5683 taylr-73 B 2004 0.71575815 3.35619190 > 5684 taylr-73 B 2005 0.42038506 0.71575815 > 5685 taylr-74 B 1992 3.88410354 NA > 5686 taylr-74 B 1993 3.32472557 3.88410354 > 5687 taylr-74 B 1994 3.29861501 3.32472557 > 5688 taylr-74 B 1996 0.48153827 3.29861501 > 5689 taylr-74 B 1997 3.63570636 0.48153827 > 5690 taylr-74 B 1998 1.94630194 3.63570636 > > BrianBrian, Use unlist():> data$LAG1DBC <- unlist(tapply(data$DBC, data$LOCID,function(x) c(NA, x[-length(x)])))> dataLOCID POPULATION YEAR DBC LAG1DBC 1 algb-1 A 1992 0.70451575 NA 2 algb-1 A 1993 0.59506851 0.70451575 3 algb-1 A 1997 0.84837544 0.59506851 4 algb-1 A 1998 0.50283182 0.84837544 5 algb-1 A 2000 0.91242707 0.50283182 6 algb-2 A 1992 0.09747155 NA 7 algb-2 A 1993 0.84772253 0.09747155 8 algb-2 A 1997 0.43974081 0.84772253 9 algb-2 A 1998 0.83108544 0.43974081 10 algb-2 A 2000 0.22291192 0.83108544 11 algb-3 A 1992 0.44234175 NA 12 algb-3 A 1993 0.54089534 0.44234175 5680 taylr-73 B 2001 0.43918082 NA 5681 taylr-73 B 2002 0.34694427 0.43918082 5682 taylr-73 B 2003 3.35619190 0.34694427 5683 taylr-73 B 2004 0.71575815 3.35619190 5684 taylr-73 B 2005 0.42038506 0.71575815 5685 taylr-74 B 1992 3.88410354 NA 5686 taylr-74 B 1993 3.32472557 3.88410354 5687 taylr-74 B 1994 3.29861501 3.32472557 5688 taylr-74 B 1996 0.48153827 3.29861501 5689 taylr-74 B 1997 3.63570636 0.48153827 5690 taylr-74 B 1998 1.94630194 3.63570636 HTH, Marc Schwartz