Sunny Srivastava
2010-Dec-06 09:58 UTC
[R] [plyr] Question regarding ddply: use of .(as.name(varname)) and varname in ddply function
Dear R-Helpers: I am using trying to use *ddply* to extract min and max of a particular column in a data.frame. I am using two different forms of the function: ## var_name_to_split is a string -- something like "var1" which is the name of a column in data.frame ddply( df, .(as.name(var_name_to_split)), function(x) c(min(x[ , 3] , max(x[ , 3]))) ## fails with an error - case 1 ddply( df, var_name_to_split , function(x) c(min(x[ , 3] , max(x[ , 3]))) ## works fine - case 2 I can't understand why I get the error in case 1. Can someone help me please? Thank you in advance. S. ---------- Here is the reproducible code: https://gist.github.com/730069 Here is sample data: structure(list(g10 = c(1L, 1L, 1L, 10L, 10L, 10L), l1 c(0.410077661080032, 0.607497980054711, 0.640488621149069, -1.47837849145189, -1.48199933642397, -1.42815840788069), d1 = c(0.917769870675383, 0.959256755797054, 0.772928570498006, 0.473545787883884, 0.590580940273922, 0.0448629265021484 ), l13 = c(0.0803696045647364, -0.291741079837731, -0.00191015929550312, 0.295889063381279, 0.615383505686296, 0.71991154637985), d13 c(-1.40821713632015, -1.27501365601403, -1.41150703235157, 0.708943640186729, 0.276034890463749, 0.663383934998686)), .Names = c("g10", "l1", "d1", "l13", "d13" ), row.names = c(1L, 2L, 3L, 1758L, 1759L, 1760L), class = "data.frame") ----------- If some one doesn't want to open github - here is the code ## Doesn't work # grp -- name of a column of the the data.frame df # function call is -- getMinMax1( df1 , grp = "var1") getMinMax1 <-function(df, grp){ dfret <- ddply( df , .(as.name(grp)), ## I am using as.name(grp), source of error function(x){ minmax <- c(mix(x[ , 3]), max(x[ ,3])) return(minmax) } ) return(dfret) } ## Works fine # grp -- name of a column of the the data.frame df # function call is -- getMinMax2( df1 , grp = "var1") getMinMax2 <-function(df, grp){ dfret <- ddply( df , grp, ## using the quoted variable name passed to grp when the fun is called function(x){ minmax <- c(min(x[ , 3]), max(x[ ,3])) return(minmax) } ) return(dfret) } [[alternative HTML version deleted]]
Peter Ehlers
2010-Dec-06 10:58 UTC
[R] [plyr] Question regarding ddply: use of .(as.name(varname)) and varname in ddply function
On 2010-12-06 01:58, Sunny Srivastava wrote:> Dear R-Helpers: > > I am using trying to use *ddply* to extract min and max of a particular > column in a data.frame. I am using two different forms of the function: > > > ## var_name_to_split is a string -- something like "var1" which is the name > of a column in data.frame > > ddply( df, .(as.name(var_name_to_split)), function(x) c(min(x[ , 3] , max(x[ > , 3]))) ## fails with an error - case 1 > ddply( df, var_name_to_split , function(x) c(min(x[ , 3] , max(x[ , 3]))) > ## works fine - case 2 >Try it without the .(), i.e. ddply(df, as.name(), ....) Peter Ehlers> I can't understand why I get the error in case 1. Can someone help me > please? > > Thank you in advance. > > S. >[ snip ]
jim holtman
2010-Dec-06 11:05 UTC
[R] [plyr] Question regarding ddply: use of .(as.name(varname)) and varname in ddply function
Here is another approach to try:> require(data.table) > var <- "g10" > df <- data.table(df) > str(df)Classes ?data.table? and 'data.frame': 6 obs. of 5 variables: $ g10: int 1 1 1 10 10 10 $ l1 : num 0.41 0.607 0.64 -1.478 -1.482 ... $ d1 : num 0.918 0.959 0.773 0.474 0.591 ... $ l13: num 0.08037 -0.29174 -0.00191 0.29589 0.61538 ... $ d13: num -1.408 -1.275 -1.412 0.709 0.276 ...> df[,list(min=min(d1), max = max(d1)), by = eval(var)]g10 min max [1,] 1 0.77292857 0.9592568 [2,] 10 0.04486293 0.5905809 On Mon, Dec 6, 2010 at 4:58 AM, Sunny Srivastava <research.baba at gmail.com> wrote:> Dear R-Helpers: > > I am using trying to use *ddply* to extract min and max of a particular > column in a data.frame. I am using two different forms of the function: > > > ## var_name_to_split is a string -- something like "var1" which is the name > of a column in data.frame > > ddply( df, .(as.name(var_name_to_split)), function(x) c(min(x[ , 3] , max(x[ > , 3]))) ## fails with an error - case 1 > ddply( df, var_name_to_split , function(x) c(min(x[ , 3] , max(x[ , 3]))) > ? ? ? ? ? ? ? ## works fine - case 2 > > I can't understand why I get the error in case 1. Can someone help me > please? > > Thank you in advance. > > S. > > ---------- > > Here is the reproducible code: > > https://gist.github.com/730069 > > Here is sample data: > > structure(list(g10 = c(1L, 1L, 1L, 10L, 10L, 10L), l1 > c(0.410077661080032, > 0.607497980054711, 0.640488621149069, -1.47837849145189, -1.48199933642397, > -1.42815840788069), d1 = c(0.917769870675383, 0.959256755797054, > 0.772928570498006, 0.473545787883884, 0.590580940273922, 0.0448629265021484 > ), l13 = c(0.0803696045647364, -0.291741079837731, -0.00191015929550312, > 0.295889063381279, 0.615383505686296, 0.71991154637985), d13 > c(-1.40821713632015, > -1.27501365601403, -1.41150703235157, 0.708943640186729, 0.276034890463749, > 0.663383934998686)), .Names = c("g10", "l1", "d1", "l13", "d13" > ), row.names = c(1L, 2L, 3L, 1758L, 1759L, 1760L), class = "data.frame") > > > ----------- > If some one doesn't want to open github - here is the code > > ## Doesn't work > > # grp -- name of a column of the the data.frame df > # function call is -- getMinMax1( df1 , grp = "var1") > > getMinMax1 <-function(df, grp){ > ? ? ?dfret <- ddply( df , .(as.name(grp)), ## I am using > as.name(grp), source of error > ? ? ? ? ? ?function(x){ > ? ? ? ? ? ? ? ?minmax <- c(mix(x[ , 3]), max(x[ ,3])) > ? ? ? ? ? ? ? ?return(minmax) > ? ? ? ? ? ?} > ? ? ? ? ? ?) > ? ? ?return(dfret) > ?} > > ## Works fine > # grp -- name of a column of the the data.frame df > # function call is -- getMinMax2( df1 , grp = "var1") > > getMinMax2 <-function(df, grp){ > ? ? ?dfret <- ddply( df , grp, ## using the quoted variable name > passed to grp when the fun is called > ? ? ? ? ? ?function(x){ > ? ? ? ? ? ? ? ?minmax <- c(min(x[ , 3]), max(x[ ,3])) > ? ? ? ? ? ? ? ?return(minmax) > ? ? ? ? ? ?} > ? ? ? ? ? ?) > ? ? ?return(dfret) > ?} > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
Hadley Wickham
2010-Dec-06 22:28 UTC
[R] [plyr] Question regarding ddply: use of .(as.name(varname)) and varname in ddply function
On Mon, Dec 6, 2010 at 3:58 AM, Sunny Srivastava <research.baba at gmail.com> wrote:> Dear R-Helpers: > > I am using trying to use *ddply* to extract min and max of a particular > column in a data.frame. I am using two different forms of the function: > > > ## var_name_to_split is a string -- something like "var1" which is the name > of a column in data.frame > > ddply( df, .(as.name(var_name_to_split)), function(x) c(min(x[ , 3] , max(x[ > , 3]))) ## fails with an error - case 1 > ddply( df, var_name_to_split , function(x) c(min(x[ , 3] , max(x[ , 3]))) > ? ? ? ? ? ? ? ## works fine - case 2 > > I can't understand why I get the error in case 1. Can someone help me > please?Why do you expect case 1 to work? Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/