Steve E.
2014-Apr-16 19:01 UTC
[R] help incorporating data subset lengths in function with ddply
Dear R Community, I am having some trouble with a task that I hope you might be able to help with. I have a dataset that includes the time and corresponding stream discharge from numerous storms (example of structure with simplified data below). I would like to produce a field that details the duration of each storm, where each storm is a subset of the data and the duration runs from zero to end for each unique storm. I have been trying to accomplish this with ddply but to no avail as I am unable to provide ddply (e.g., below) with the length of the storm (i.e., subset of data). Thank you in advance, any help would be appreciated. existing df: storm,Q_time,Q s1,2008-08-07 21:15:00,0.000 s1,2008-08-07 21:16:00,3.020 s1,2008-08-07 21:17:00,6.041 s1,2008-08-07 21:18:00,9.061 s1,2008-08-07 21:19:00,12.082 s1,2008-08-07 21:20:00,15.102 s1,2008-08-07 21:21:00,18.123 s1,2008-08-07 21:22:00,11.143 s1,2008-08-07 21:23:00,0.000 s2,2010-10-05 21:00:00,0.000 s2,2010-10-05 21:01:00,1.812 s2,2010-10-05 21:02:00,3.625 s2,2010-10-05 21:03:00,5.437 s2,2010-10-05 21:04:00,7.249 s2,2010-10-05 21:05:00,9.061 s2,2010-10-05 21:06:00,0.874 s2,2010-10-05 21:07:00,0.000 desired df: storm,Q_time,Q, duration s1,2008-08-07 21:15:00,0.000,1 s1,2008-08-07 21:16:00,3.020,2 s1,2008-08-07 21:17:00,6.041,3 s1,2008-08-07 21:18:00,9.061,4 s1,2008-08-07 21:19:00,12.082,5 s1,2008-08-07 21:20:00,15.102,6 s1,2008-08-07 21:21:00,18.123,7 s1,2008-08-07 21:22:00,11.143,8 s1,2008-08-07 21:23:00,0.000,9 s2,2010-10-05 21:00:00,0.000,1 s2,2010-10-05 21:01:00,1.812,2 s2,2010-10-05 21:02:00,3.625,3 s2,2010-10-05 21:03:00,5.437,4 s2,2010-10-05 21:04:00,7.249,5 s2,2010-10-05 21:05:00,9.061,6 s2,2010-10-05 21:06:00,0.874,7 s2,2010-10-05 21:07:00,0.000,8 I have been trying variations of the following statement, but I cannot seem to get the length of the subset correct as I receive an error of the type 'Error: arguments imply differing number of rows: 2401, 0'. newdf <- ddply(df, "storm", transform, FUN = function(x) {duration=seq(from=1, by=1, length.out=nrow(x))}) I would really like to get a handle on ddply in this instance as it will be quite helpful for many other similar calculations that I need to do with this dataset. Thanks again, Stevan -- View this message in context: http://r.789695.n4.nabble.com/help-incorporating-data-subset-lengths-in-function-with-ddply-tp4688926.html Sent from the R help mailing list archive at Nabble.com.
Frede Aakmann Tøgersen
2014-Apr-16 19:35 UTC
[R] help incorporating data subset lengths in function with ddply
Hi Do you seek something like this: mydat <- read.table(text=" storm,Q_time,Q, duration s1,2008-08-07 21:15:00,0.000,1 s1,2008-08-07 21:16:00,3.020,2 s1,2008-08-07 21:17:00,6.041,3 s1,2008-08-07 21:18:00,9.061,4 s1,2008-08-07 21:19:00,12.082,5 s1,2008-08-07 21:20:00,15.102,6 s1,2008-08-07 21:21:00,18.123,7 s1,2008-08-07 21:22:00,11.143,8 s1,2008-08-07 21:23:00,0.000,9 s2,2010-10-05 21:00:00,0.000,1 s2,2010-10-05 21:01:00,1.812,2 s2,2010-10-05 21:02:00,3.625,3 s2,2010-10-05 21:03:00,5.437,4 s2,2010-10-05 21:04:00,7.249,5 s2,2010-10-05 21:05:00,9.061,6 s2,2010-10-05 21:06:00,0.874,7 s2,2010-10-05 21:07:00,0.000,8", h = TRUE, sep =",", stringsAsFactors = FALSE) mydat$Q_time <- as.POSIXct(strptime(mydat$Q_time, format = "%Y-%m-%d %H:%M:%S")) tmp <- aggregate(Q_time ~ storm, data = mydat, FUN = function(x) diff(range(x))) tmp str(tmp) Yours sincerely / Med venlig hilsen Frede Aakmann T?gersen Specialist, M.Sc., Ph.D. Plant Performance & Modeling Technology & Service Solutions T +45 9730 5135 M +45 2547 6050 frtog at vestas.com http://www.vestas.com Company reg. name: Vestas Wind Systems A/S This e-mail is subject to our e-mail disclaimer statement. Please refer to www.vestas.com/legal/notice If you have received this e-mail in error please contact the sender.> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > On Behalf Of Steve E. > Sent: 16. april 2014 21:01 > To: r-help at r-project.org > Subject: [R] help incorporating data subset lengths in function with ddply > > Dear R Community, > > I am having some trouble with a task that I hope you might be able to help > with. I have a dataset that includes the time and corresponding stream > discharge from numerous storms (example of structure with simplified data > below). I would like to produce a field that details the duration of each > storm, where each storm is a subset of the data and the duration runs from > zero to end for each unique storm. I have been trying to accomplish this > with ddply but to no avail as I am unable to provide ddply (e.g., below) > with the length of the storm (i.e., subset of data). Thank you in advance, > any help would be appreciated. > > > existing df: > storm,Q_time,Q > s1,2008-08-07 21:15:00,0.000 > s1,2008-08-07 21:16:00,3.020 > s1,2008-08-07 21:17:00,6.041 > s1,2008-08-07 21:18:00,9.061 > s1,2008-08-07 21:19:00,12.082 > s1,2008-08-07 21:20:00,15.102 > s1,2008-08-07 21:21:00,18.123 > s1,2008-08-07 21:22:00,11.143 > s1,2008-08-07 21:23:00,0.000 > s2,2010-10-05 21:00:00,0.000 > s2,2010-10-05 21:01:00,1.812 > s2,2010-10-05 21:02:00,3.625 > s2,2010-10-05 21:03:00,5.437 > s2,2010-10-05 21:04:00,7.249 > s2,2010-10-05 21:05:00,9.061 > s2,2010-10-05 21:06:00,0.874 > s2,2010-10-05 21:07:00,0.000 > > desired df: > storm,Q_time,Q, duration > s1,2008-08-07 21:15:00,0.000,1 > s1,2008-08-07 21:16:00,3.020,2 > s1,2008-08-07 21:17:00,6.041,3 > s1,2008-08-07 21:18:00,9.061,4 > s1,2008-08-07 21:19:00,12.082,5 > s1,2008-08-07 21:20:00,15.102,6 > s1,2008-08-07 21:21:00,18.123,7 > s1,2008-08-07 21:22:00,11.143,8 > s1,2008-08-07 21:23:00,0.000,9 > s2,2010-10-05 21:00:00,0.000,1 > s2,2010-10-05 21:01:00,1.812,2 > s2,2010-10-05 21:02:00,3.625,3 > s2,2010-10-05 21:03:00,5.437,4 > s2,2010-10-05 21:04:00,7.249,5 > s2,2010-10-05 21:05:00,9.061,6 > s2,2010-10-05 21:06:00,0.874,7 > s2,2010-10-05 21:07:00,0.000,8 > > I have been trying variations of the following statement, but I cannot seem > to get the length of the subset correct as I receive an error of the type > 'Error: arguments imply differing number of rows: 2401, 0'. > > newdf <- ddply(df, "storm", transform, FUN = function(x) > {duration=seq(from=1, by=1, length.out=nrow(x))}) > > I would really like to get a handle on ddply in this instance as it will be > quite helpful for many other similar calculations that I need to do with > this dataset. > > Thanks again, > Stevan > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/help- > incorporating-data-subset-lengths-in-function-with-ddply-tp4688926.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Jeff Newmiller
2014-Apr-17 01:18 UTC
[R] help incorporating data subset lengths in function with ddply
Note that ddply is a heavyweight solution, and as your data gets larger you may find that using it for little things like this hits performance. Also, "df" is a base function that you might actually want to use someday, and you also introduce confusion in the mind of someone reading your code if you redefine it this way. existingdf <- read.csv( text"storm,Q_time,Q s1,2008-08-07 21:15:00,0.000 s1,2008-08-07 21:16:00,3.020 s1,2008-08-07 21:17:00,6.041 s1,2008-08-07 21:18:00,9.061 s1,2008-08-07 21:19:00,12.082 s1,2008-08-07 21:20:00,15.102 s1,2008-08-07 21:21:00,18.123 s1,2008-08-07 21:22:00,11.143 s1,2008-08-07 21:23:00,0.000 s2,2010-10-05 21:00:00,0.000 s2,2010-10-05 21:01:00,1.812 s2,2010-10-05 21:02:00,3.625 s2,2010-10-05 21:03:00,5.437 s2,2010-10-05 21:04:00,7.249 s2,2010-10-05 21:05:00,9.061 s2,2010-10-05 21:06:00,0.874 s2,2010-10-05 21:07:00,0.000 ", as.is=TRUE ) library(plyr) # plyr solution newdf <- ddply( existingdf , "storm" , function( DF ) { transform( DF , duration=seq.int( length.out=nrow( DF ) ) ) } ) # base R solution newdf2 <- transform( existingdf , duration=ave( rep( 1, nrow(existingdf) ) , storm , FUN=cumsum ) ) On Wed, 16 Apr 2014, Steve E. wrote:> Dear R Community, > > I am having some trouble with a task that I hope you might be able to help > with. I have a dataset that includes the time and corresponding stream > discharge from numerous storms (example of structure with simplified data > below). I would like to produce a field that details the duration of each > storm, where each storm is a subset of the data and the duration runs from > zero to end for each unique storm. I have been trying to accomplish this > with ddply but to no avail as I am unable to provide ddply (e.g., below) > with the length of the storm (i.e., subset of data). Thank you in advance, > any help would be appreciated. > > > existing df: > storm,Q_time,Q > s1,2008-08-07 21:15:00,0.000 > s1,2008-08-07 21:16:00,3.020 > s1,2008-08-07 21:17:00,6.041 > s1,2008-08-07 21:18:00,9.061 > s1,2008-08-07 21:19:00,12.082 > s1,2008-08-07 21:20:00,15.102 > s1,2008-08-07 21:21:00,18.123 > s1,2008-08-07 21:22:00,11.143 > s1,2008-08-07 21:23:00,0.000 > s2,2010-10-05 21:00:00,0.000 > s2,2010-10-05 21:01:00,1.812 > s2,2010-10-05 21:02:00,3.625 > s2,2010-10-05 21:03:00,5.437 > s2,2010-10-05 21:04:00,7.249 > s2,2010-10-05 21:05:00,9.061 > s2,2010-10-05 21:06:00,0.874 > s2,2010-10-05 21:07:00,0.000 > > desired df: > storm,Q_time,Q, duration > s1,2008-08-07 21:15:00,0.000,1 > s1,2008-08-07 21:16:00,3.020,2 > s1,2008-08-07 21:17:00,6.041,3 > s1,2008-08-07 21:18:00,9.061,4 > s1,2008-08-07 21:19:00,12.082,5 > s1,2008-08-07 21:20:00,15.102,6 > s1,2008-08-07 21:21:00,18.123,7 > s1,2008-08-07 21:22:00,11.143,8 > s1,2008-08-07 21:23:00,0.000,9 > s2,2010-10-05 21:00:00,0.000,1 > s2,2010-10-05 21:01:00,1.812,2 > s2,2010-10-05 21:02:00,3.625,3 > s2,2010-10-05 21:03:00,5.437,4 > s2,2010-10-05 21:04:00,7.249,5 > s2,2010-10-05 21:05:00,9.061,6 > s2,2010-10-05 21:06:00,0.874,7 > s2,2010-10-05 21:07:00,0.000,8 > > I have been trying variations of the following statement, but I cannot seem > to get the length of the subset correct as I receive an error of the type > 'Error: arguments imply differing number of rows: 2401, 0'. > > newdf <- ddply(df, "storm", transform, FUN = function(x) > {duration=seq(from=1, by=1, length.out=nrow(x))}) > > I would really like to get a handle on ddply in this instance as it will be > quite helpful for many other similar calculations that I need to do with > this dataset. > > Thanks again, > Stevan > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/help-incorporating-data-subset-lengths-in-function-with-ddply-tp4688926.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >--------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k