So in this case, "create_bins" returns a vector and I still get the same error. create_bins <- function(x, nBins) { Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins))) bin <- cut(x$pred, breaks = Breaks, include.lowest = TRUE) bin } ### Using dplyr (fails) nBins = 10 by_group <- dplyr::group_by(df, models) res_dplyr <- dplyr::summarize(by_group, create_bins, nBins) Error: not a vector On Thu, Oct 29, 2015 at 8:28 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> You are jumping the gun (your other email did get through) and you are > posting using HTML (which does not come through on the list). Some time > (re)reading the Posting Guide mentioned at the bottom of all emails on this > list seems to be in order. > > The error is actually quite clear. You should return a vector from your > function, not a data frame. > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live > Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --------------------------------------------------------------------------- > Sent from my phone. Please excuse my brevity. > > On October 29, 2015 4:55:19 PM MST, Axel Urbiz <axel.urbiz at gmail.com> > wrote: > >Hello, > > > >Sorry, resending this question as the prior was not sent properly. > > > >I?m using the plyr package below to add a variable named "bin" to my > >original data frame "df" with the user-defined function "create_bins". > >I'd > >like to get similar results using dplyr instead, but failing to do so. > > > >set.seed(4) > >df <- data.frame(pred = rnorm(100), models = gl(2, 50, 100, labels > >c("model1", "model2"))) > > > > > >### Using plyr (works fine) > >create_bins <- function(x, nBins) > >{ > > Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins))) > > dfB <- data.frame(pred = x$pred, > > bin = cut(x$pred, breaks = Breaks, include.lowest > >TRUE)) > > dfB > >} > > > >nBins = 10 > >res_plyr <- plyr::ddply(df, plyr::.(models), create_bins, nBins) > >head(res_plyr) > > > >### Using dplyr (fails) > > > >by_group <- dplyr::group_by(df, models) > >res_dplyr <- dplyr::summarize(by_group, create_bins, nBins) > >Error: not a vector > > > > > >Any help would be much appreciated. > > > >Best, > >Axel. > > > > [[alternative HTML version deleted]] > > > >______________________________________________ > >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]
The error message is not very helpful and the stack trace is pretty inscrutable as well> dplyr::group_by(df, models) %>% dplyr::summarize(create_bins)Error: not a vector> traceback()14: stop(list(message = "not a vector", call = NULL, cppstack = NULL)) 13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots) 12: summarise_impl(.data, dots) 11: summarise_.tbl_df(.data, .dots = lazyeval::lazy_dots(...)) 10: summarise_(.data, .dots = lazyeval::lazy_dots(...)) 9: dplyr::summarize(., create_bins) 8: function_list[[k]](value) 7: withVisible(function_list[[k]](value)) 6: freduce(value, `_function_list`) 5: `_fseq`(`_lhs`) 4: eval(expr, envir, enclos) 3: eval(quote(`_fseq`(`_lhs`)), env, env) 2: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 1: dplyr::group_by(df, models) %>% dplyr::summarize(create_bins) It does not mean that your function, create_bins, does not return a vector -- the sum function gives the same result. help(summarize,package="dplyr") says: ...: Name-value pairs of summary functions like ?min()?, ?mean()?, ?max()? etc. It apparently means calls to summary functions, not summary functions themselves. The examples in the help file show the proper usage. Use a call to your function and you will see it works better > dplyr::group_by(df, models) %>% dplyr::summarize(create_bins(pred,nBins)) Error: $ operator is invalid for atomic vectors The traceback again is not very useful, because the call information was stripped by dplyr (by the call=NULL in the call to stop()): > traceback() 14: stop(list(message = "$ operator is invalid for atomic vectors", call = NULL, cppstack = NULL)) 13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots) However it is clear that the fault is in your function, which is expecting a data.frame x with a column called pred but gets pred itself. Change x to xpred in the argument list and x$pred to xpred in the body of the function. You will run into more problems because your function returns a vector the length of its input but summarize expects a summary function - one that returns a scalar for any size vector input. Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Oct 30, 2015 at 4:04 AM, Axel Urbiz <axel.urbiz at gmail.com> wrote:> So in this case, "create_bins" returns a vector and I still get the same > error. > > > create_bins <- function(x, nBins) > { > Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins))) > bin <- cut(x$pred, breaks = Breaks, include.lowest = TRUE) > bin > } > > > ### Using dplyr (fails) > nBins = 10 > by_group <- dplyr::group_by(df, models) > res_dplyr <- dplyr::summarize(by_group, create_bins, nBins) > Error: not a vector > > On Thu, Oct 29, 2015 at 8:28 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> > wrote: > > > You are jumping the gun (your other email did get through) and you are > > posting using HTML (which does not come through on the list). Some time > > (re)reading the Posting Guide mentioned at the bottom of all emails on > this > > list seems to be in order. > > > > The error is actually quite clear. You should return a vector from your > > function, not a data frame. > > > --------------------------------------------------------------------------- > > Jeff Newmiller The ..... ..... Go > Live... > > DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live > > Go... > > Live: OO#.. Dead: OO#.. Playing > > Research Engineer (Solar/Batteries O.O#. #.O#. with > > /Software/Embedded Controllers) .OO#. .OO#. > rocks...1k > > > --------------------------------------------------------------------------- > > Sent from my phone. Please excuse my brevity. > > > > On October 29, 2015 4:55:19 PM MST, Axel Urbiz <axel.urbiz at gmail.com> > > wrote: > > >Hello, > > > > > >Sorry, resending this question as the prior was not sent properly. > > > > > >I?m using the plyr package below to add a variable named "bin" to my > > >original data frame "df" with the user-defined function "create_bins". > > >I'd > > >like to get similar results using dplyr instead, but failing to do so. > > > > > >set.seed(4) > > >df <- data.frame(pred = rnorm(100), models = gl(2, 50, 100, labels > > >c("model1", "model2"))) > > > > > > > > >### Using plyr (works fine) > > >create_bins <- function(x, nBins) > > >{ > > > Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins))) > > > dfB <- data.frame(pred = x$pred, > > > bin = cut(x$pred, breaks = Breaks, include.lowest > > >TRUE)) > > > dfB > > >} > > > > > >nBins = 10 > > >res_plyr <- plyr::ddply(df, plyr::.(models), create_bins, nBins) > > >head(res_plyr) > > > > > >### Using dplyr (fails) > > > > > >by_group <- dplyr::group_by(df, models) > > >res_dplyr <- dplyr::summarize(by_group, create_bins, nBins) > > >Error: not a vector > > > > > > > > >Any help would be much appreciated. > > > > > >Best, > > >Axel. > > > > > > [[alternative HTML version deleted]] > > > > > >______________________________________________ > > >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > >https://stat.ethz.ch/mailman/listinfo/r-help > > >PLEASE do read the posting guide > > >http://www.R-project.org/posting-guide.html > > >and provide commented, minimal, self-contained, reproducible code. > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
dplyr::mutate is probably what you want instead of dplyr::summarize: create_bins3 <- function (xpred, nBins) { Breaks <- unique(quantile(xpred, probs = seq(0, 1, 1/nBins))) bin <- cut(xpred, breaks = Breaks, include.lowest = TRUE) bin } dplyr::group_by(df, models) %>% dplyr::mutate(Bin=create_bins3(pred,nBins)) #Source: local data frame [100 x 3] #Groups: models [2] # # pred models Bin # (dbl) (fctr) (fctr) #1 0.2167549 model1 (0.167,0.577] #2 -0.5424926 model1 (-0.869,-0.481] ... Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Oct 30, 2015 at 9:06 AM, William Dunlap <wdunlap at tibco.com> wrote:> The error message is not very helpful and the stack trace is pretty > inscrutable as well > > dplyr::group_by(df, models) %>% dplyr::summarize(create_bins) > Error: not a vector > > traceback() > 14: stop(list(message = "not a vector", call = NULL, cppstack = NULL)) > 13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots) > 12: summarise_impl(.data, dots) > 11: summarise_.tbl_df(.data, .dots = lazyeval::lazy_dots(...)) > 10: summarise_(.data, .dots = lazyeval::lazy_dots(...)) > 9: dplyr::summarize(., create_bins) > 8: function_list[[k]](value) > 7: withVisible(function_list[[k]](value)) > 6: freduce(value, `_function_list`) > 5: `_fseq`(`_lhs`) > 4: eval(expr, envir, enclos) > 3: eval(quote(`_fseq`(`_lhs`)), env, env) > 2: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) > 1: dplyr::group_by(df, models) %>% dplyr::summarize(create_bins) > > > It does not mean that your function, create_bins, does not return a vector > -- > the sum function gives the same result. help(summarize,package="dplyr") > says: > ...: Name-value pairs of summary functions like ?min()?, ?mean()?, > ?max()? etc. > It apparently means calls to summary functions, not summary functions > themselves. The examples in the help file show the proper usage. > > Use a call to your function and you will see it works better > > dplyr::group_by(df, models) %>% > dplyr::summarize(create_bins(pred,nBins)) > Error: $ operator is invalid for atomic vectors > The traceback again is not very useful, because the call information was > stripped by dplyr (by the call=NULL in the call to stop()): > > traceback() > 14: stop(list(message = "$ operator is invalid for atomic vectors", > call = NULL, cppstack = NULL)) > 13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots) > However it is clear that the fault is in your function, which is expecting > a > data.frame x with a column called pred but gets pred itself. Change x to > xpred > in the argument list and x$pred to xpred in the body of the function. > > You will run into more problems because your function returns a vector > the length of its input but summarize expects a summary function - one > that returns a scalar for any size vector input. > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Fri, Oct 30, 2015 at 4:04 AM, Axel Urbiz <axel.urbiz at gmail.com> wrote: > >> So in this case, "create_bins" returns a vector and I still get the same >> error. >> >> >> create_bins <- function(x, nBins) >> { >> Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins))) >> bin <- cut(x$pred, breaks = Breaks, include.lowest = TRUE) >> bin >> } >> >> >> ### Using dplyr (fails) >> nBins = 10 >> by_group <- dplyr::group_by(df, models) >> res_dplyr <- dplyr::summarize(by_group, create_bins, nBins) >> Error: not a vector >> >> On Thu, Oct 29, 2015 at 8:28 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us >> > >> wrote: >> >> > You are jumping the gun (your other email did get through) and you are >> > posting using HTML (which does not come through on the list). Some time >> > (re)reading the Posting Guide mentioned at the bottom of all emails on >> this >> > list seems to be in order. >> > >> > The error is actually quite clear. You should return a vector from your >> > function, not a data frame. >> > >> --------------------------------------------------------------------------- >> > Jeff Newmiller The ..... ..... Go >> Live... >> > DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live >> > Go... >> > Live: OO#.. Dead: OO#.. Playing >> > Research Engineer (Solar/Batteries O.O#. #.O#. with >> > /Software/Embedded Controllers) .OO#. .OO#. >> rocks...1k >> > >> --------------------------------------------------------------------------- >> > Sent from my phone. Please excuse my brevity. >> > >> > On October 29, 2015 4:55:19 PM MST, Axel Urbiz <axel.urbiz at gmail.com> >> > wrote: >> > >Hello, >> > > >> > >Sorry, resending this question as the prior was not sent properly. >> > > >> > >I?m using the plyr package below to add a variable named "bin" to my >> > >original data frame "df" with the user-defined function "create_bins". >> > >I'd >> > >like to get similar results using dplyr instead, but failing to do so. >> > > >> > >set.seed(4) >> > >df <- data.frame(pred = rnorm(100), models = gl(2, 50, 100, labels >> > >c("model1", "model2"))) >> > > >> > > >> > >### Using plyr (works fine) >> > >create_bins <- function(x, nBins) >> > >{ >> > > Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins))) >> > > dfB <- data.frame(pred = x$pred, >> > > bin = cut(x$pred, breaks = Breaks, include.lowest >> > >TRUE)) >> > > dfB >> > >} >> > > >> > >nBins = 10 >> > >res_plyr <- plyr::ddply(df, plyr::.(models), create_bins, nBins) >> > >head(res_plyr) >> > > >> > >### Using dplyr (fails) >> > > >> > >by_group <- dplyr::group_by(df, models) >> > >res_dplyr <- dplyr::summarize(by_group, create_bins, nBins) >> > >Error: not a vector >> > > >> > > >> > >Any help would be much appreciated. >> > > >> > >Best, >> > >Axel. >> > > >> > > [[alternative HTML version deleted]] >> > > >> > >______________________________________________ >> > >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > >https://stat.ethz.ch/mailman/listinfo/r-help >> > >PLEASE do read the posting guide >> > >http://www.R-project.org/posting-guide.html >> > >and provide commented, minimal, self-contained, reproducible code. >> > >> > >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > >[[alternative HTML version deleted]]