Moving from interactive use of R to scripts and functions and have bumped into what I believe is a problem with variable names. Did not see a solution in the two R programming books I have or from my Web searches. Inexperience with ess-tracebug keeps me from refining my bug tracking. Here's a test data set (cleverly called 'testset.dput'): structure(list(stream = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("B", "J", "S"), class = "factor"), sampdate = structure(c(8121, 8121, 8121, 8155, 8155, 8155, 8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257, 8257, 8257, 8308, 8785, 8785, 8785, 8785, 8785, 8785, 8785, 8847, 8847, 8847, 8847, 8847, 8847, 8847, 8875, 8875, 8875, 8875, 8875, 8875, 8875, 8121, 8121, 8121, 8155, 8155, 8155, 8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257, 8257, 8257, 8301, 8301, 8301), class = "Date"), param = structure(c(2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L ), .Label = c("Ca", "Cl", "K", "Mg", "Na", "SO4", "pH"), class = "factor"), quant = c(4, 33, 8.43, 4, 32, 8.46, 4, 31, 8.43, 6, 33, 8.32, 5, 33, 8.5, 5, 32, 8.5, 5, 59.9, 3.46, 1.48, 29, 7.54, 64.6, 7.36, 46, 2.95, 1.34, 21.8, 5.76, 48.8, 7.72, 74.2, 5.36, 2.33, 38.4, 8.27, 141, 7.8, 3, 76, 6.64, 4, 74, 7.46, 2, 82, 7.58, 5, 106, 7.91, 3, 56, 7.83, 3, 51, 7.6, 6, 149, 7.73)), .Names = c("stream", "sampdate", "param", "quant" ), row.names = c(NA, -61L), class = "data.frame") I want to subset that data.frame on each of the stream names: B, J, and S. This is the function that has the naming error (eda.R): extstream = function(alldf) { sname = alldf$stream sdate = alldf$sampdate comp = alldf$param value = alldf$quant for (i in sname) { sname <- subset(alldf, alldf$stream, select = c(sdate, comp, value)) return(sname) } } This is the result of running source('eda.R') followed by> extstream(testset)Error in subset.data.frame(alldf, alldf$stream, select = c(sdate, comp, : 'subset' must be logical I've tried using sname for the rows to select, but that produces a different error of trying to select undefined columns. A pointer to the correct syntax for subset() is needed. Rich
Using return() within a for loop makes no sense: only the first one will be returned. How about: alldf.B = subset(alldf, stream=='B') # etc... Also, have a look at unique(alldf$stream) or levels(alldf$stream) if you want to use a for loop on each unique value. cheers, Steve -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Rich Shepard Sent: Tuesday, 30 June 2015 12:04p To: r-help at r-project.org Subject: [R] Subset() within function: logical error Moving from interactive use of R to scripts and functions and have bumped into what I believe is a problem with variable names. Did not see a solution in the two R programming books I have or from my Web searches. Inexperience with ess-tracebug keeps me from refining my bug tracking. Here's a test data set (cleverly called 'testset.dput'): structure(list(stream = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("B", "J", "S"), class = "factor"), sampdate = structure(c(8121, 8121, 8121, 8155, 8155, 8155, 8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257, 8257, 8257, 8308, 8785, 8785, 8785, 8785, 8785, 8785, 8785, 8847, 8847, 8847, 8847, 8847, 8847, 8847, 8875, 8875, 8875, 8875, 8875, 8875, 8875, 8121, 8121, 8121, 8155, 8155, 8155, 8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257, 8257, 8257, 8301, 8301, 8301), class = "Date"), param = structure(c(2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L ), .Label = c("Ca", "Cl", "K", "Mg", "Na", "SO4", "pH"), class = "factor"), quant = c(4, 33, 8.43, 4, 32, 8.46, 4, 31, 8.43, 6, 33, 8.32, 5, 33, 8.5, 5, 32, 8.5, 5, 59.9, 3.46, 1.48, 29, 7.54, 64.6, 7.36, 46, 2.95, 1.34, 21.8, 5.76, 48.8, 7.72, 74.2, 5.36, 2.33, 38.4, 8.27, 141, 7.8, 3, 76, 6.64, 4, 74, 7.46, 2, 82, 7.58, 5, 106, 7.91, 3, 56, 7.83, 3, 51, 7.6, 6, 149, 7.73)), .Names = c("stream", "sampdate", "param", "quant" ), row.names = c(NA, -61L), class = "data.frame") I want to subset that data.frame on each of the stream names: B, J, and S. This is the function that has the naming error (eda.R): extstream = function(alldf) { sname = alldf$stream sdate = alldf$sampdate comp = alldf$param value = alldf$quant for (i in sname) { sname <- subset(alldf, alldf$stream, select = c(sdate, comp, value)) return(sname) } } This is the result of running source('eda.R') followed by> extstream(testset)Error in subset.data.frame(alldf, alldf$stream, select = c(sdate, comp, : 'subset' must be logical I've tried using sname for the rows to select, but that produces a different error of trying to select undefined columns. A pointer to the correct syntax for subset() is needed. Rich ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Well, your code is, ah, too incorrect to convey what you want out of this effort. If I were to guess based on your description, you want all of the data, not a subset. An example data frame containing what you hope to extract might be helpful. However, extracting subsets is rarely done for just one subset... usually you want to process the data in groups. Base functions such as ave, aggregate, or split work at a higher level than you seem to be thinking. Packages such as plyr and dplyr handle this breaking and recombining more succinctly, leaving you to think more about what you want to do with the pieces and less about making pieces. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. On June 29, 2015 5:03:38 PM PDT, Rich Shepard <rshepard at appl-ecosys.com> wrote:>Moving from interactive use of R to scripts and functions and have >bumped >into what I believe is a problem with variable names. Did not see a >solution >in the two R programming books I have or from my Web searches. >Inexperience >with ess-tracebug keeps me from refining my bug tracking. > > Here's a test data set (cleverly called 'testset.dput'): > >structure(list(stream = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, >1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, >2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, >2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, >3L, 3L, 3L, 3L, 3L, 3L), .Label = c("B", "J", "S"), class = "factor"), > sampdate = structure(c(8121, 8121, 8121, 8155, 8155, 8155, > 8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257, > 8257, 8257, 8308, 8785, 8785, 8785, 8785, 8785, 8785, 8785, > 8847, 8847, 8847, 8847, 8847, 8847, 8847, 8875, 8875, 8875, > 8875, 8875, 8875, 8875, 8121, 8121, 8121, 8155, 8155, 8155, > 8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257, >8257, 8257, 8301, 8301, 8301), class = "Date"), param = structure(c(2L, > 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, > 6L, 7L, 2L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, > 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, > 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L >), .Label = c("Ca", "Cl", "K", "Mg", "Na", "SO4", "pH"), class >"factor"), > quant = c(4, 33, 8.43, 4, 32, 8.46, 4, 31, 8.43, 6, 33, 8.32, > 5, 33, 8.5, 5, 32, 8.5, 5, 59.9, 3.46, 1.48, 29, 7.54, 64.6, > 7.36, 46, 2.95, 1.34, 21.8, 5.76, 48.8, 7.72, 74.2, 5.36, > 2.33, 38.4, 8.27, 141, 7.8, 3, 76, 6.64, 4, 74, 7.46, 2, > 82, 7.58, 5, 106, 7.91, 3, 56, 7.83, 3, 51, 7.6, 6, 149, > 7.73)), .Names = c("stream", "sampdate", "param", "quant" >), row.names = c(NA, -61L), class = "data.frame") > >I want to subset that data.frame on each of the stream names: B, J, and >S. >This is the function that has the naming error (eda.R): > >extstream = function(alldf) { > sname = alldf$stream > sdate = alldf$sampdate > comp = alldf$param > value = alldf$quant > for (i in sname) { > sname <- subset(alldf, alldf$stream, select = c(sdate, comp, value)) > return(sname) > } >} > > This is the result of running source('eda.R') followed by > >> extstream(testset) >Error in subset.data.frame(alldf, alldf$stream, select = c(sdate, comp, > : > 'subset' must be logical > > I've tried using sname for the rows to select, but that produces a >different error of trying to select undefined columns. > > A pointer to the correct syntax for subset() is needed. > >Rich > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
On Jun 29, 2015, at 5:03 PM, Rich Shepard wrote:> Moving from interactive use of R to scripts and functions and have bumped > into what I believe is a problem with variable names. Did not see a solution > in the two R programming books I have or from my Web searches. Inexperience > with ess-tracebug keeps me from refining my bug tracking. > > Here's a test data set (cleverly called 'testset.dput'): > > structure(list(stream = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("B", "J", "S"), class = "factor"), > sampdate = structure(c(8121, 8121, 8121, 8155, 8155, 8155, > 8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257, > 8257, 8257, 8308, 8785, 8785, 8785, 8785, 8785, 8785, 8785, > 8847, 8847, 8847, 8847, 8847, 8847, 8847, 8875, 8875, 8875, > 8875, 8875, 8875, 8875, 8121, 8121, 8121, 8155, 8155, 8155, > 8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257, > 8257, 8257, 8301, 8301, 8301), class = "Date"), param = structure(c(2L, > 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, > 6L, 7L, 2L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, > 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, > 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L > ), .Label = c("Ca", "Cl", "K", "Mg", "Na", "SO4", "pH"), class = "factor"), > quant = c(4, 33, 8.43, 4, 32, 8.46, 4, 31, 8.43, 6, 33, 8.32, > 5, 33, 8.5, 5, 32, 8.5, 5, 59.9, 3.46, 1.48, 29, 7.54, 64.6, > 7.36, 46, 2.95, 1.34, 21.8, 5.76, 48.8, 7.72, 74.2, 5.36, > 2.33, 38.4, 8.27, 141, 7.8, 3, 76, 6.64, 4, 74, 7.46, 2, > 82, 7.58, 5, 106, 7.91, 3, 56, 7.83, 3, 51, 7.6, 6, 149, > 7.73)), .Names = c("stream", "sampdate", "param", "quant" > ), row.names = c(NA, -61L), class = "data.frame") > > I want to subset that data.frame on each of the stream names: B, J, and S. > This is the function that has the naming error (eda.R): > > extstream = function(alldf) { > sname = alldf$stream > sdate = alldf$sampdate > comp = alldf$param > value = alldf$quant > for (i in sname) { > sname <- subset(alldf, alldf$stream, select = c(sdate, comp, value))Never use the form dfrm$colname as the argument to the subset argument of subset. You can see that 'stream' is a factor, right? Perhaps Furthermore, by inspection you can see that there is no colname =='sdate', so I would guess that would be your next error. Or 'comp' or 'value' for that matter. Oh now I see, you made them outside of `alldf`. Then how is that supposed to work. The subset function is supposed to be looking inside `alldf` to find those column names. Perhaps: subset(alldf, stream %in% c('B', 'J', 'S'), .... .... but have not figured out why you used 'subset' if you wanted: select = c(sdate, comp, value)) Furthermore, it is generally error prone to use `subset` inside functions. The help page warns against the practice. Better to use "[".> return(sname) > } > } > > This is the result of running source('eda.R') followed by > >> extstream(testset) > Error in subset.data.frame(alldf, alldf$stream, select = c(sdate, comp, : > 'subset' must be logical > > I've tried using sname for the rows to select, but that produces a > different error of trying to select undefined columns.Right. Those are not column names in any dataframe.> > A pointer to the correct syntax for subset() is needed.No. A pointer to the correct use of "[" is needed. -- David Winsemius Alameda, CA, USA
On Tue, 30 Jun 2015, Steve Taylor wrote:> Using return() within a for loop makes no sense: only the first one will be returned.Steve, Mea culpa. Didn't catch that.> How about: > alldf.B = subset(alldf, stream=='B') # etc...I used to do each stream manually, like the above, and want to learn how to loop through all of them ...> Also, have a look at unique(alldf$stream) or levels(alldf$stream) if you > want to use a for loop on each unique value.... which unique() and levels() will probably do. Will test these tomorrow after rading the man pages. Many thanks, Rich
On Mon, 29 Jun 2015, David Winsemius wrote:> No. A pointer to the correct use of "[" is needed.Thanks, David. This puts me on the the right path. Much appreciated, Rich
If you want a pointer to the correct syntax for subset(), try help("subset")!!! The syntax of your "extstream" function is totally screwed up, convoluted and over-complicated. Note that even if you had your "subset" argument specified correctly, the return() call will give you only the result from the *first* pass through the for loop. That aside, the error message is perfectly clear: 'subset' must be logical. Your "subset" argument is "stream" which is a factor. You *could* redefine your "extstream" function as follows: function(alldf) { sname <- levels(alldf$stream) rslt <- vector("list",length(sname)) names(rslt) <- sname for (i in sname) { rslt[[i]] <- subset(alldf, alldf$stream==i, sampdate:quant) } rslt } However you don't need to go through such contortions: split(testset,testset$stream) will give essentially what you want. If you wish to strip out the redundant "stream" column from the data frames in the resulting list, you could do that using lapply() cheers, Rolf Turner -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 On 30/06/15 12:03, Rich Shepard wrote:> Moving from interactive use of R to scripts and functions and have > bumped > into what I believe is a problem with variable names. Did not see a > solution > in the two R programming books I have or from my Web searches. Inexperience > with ess-tracebug keeps me from refining my bug tracking. > > Here's a test data set (cleverly called 'testset.dput'): > > structure(list(stream = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, > 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label > = c("B", "J", "S"), class = "factor"), > sampdate = structure(c(8121, 8121, 8121, 8155, 8155, 8155, > 8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257, > 8257, 8257, 8308, 8785, 8785, 8785, 8785, 8785, 8785, 8785, > 8847, 8847, 8847, 8847, 8847, 8847, 8847, 8875, 8875, 8875, > 8875, 8875, 8875, 8875, 8121, 8121, 8121, 8155, 8155, 8155, > 8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257, > 8257, 8257, 8301, 8301, 8301), class = "Date"), param > structure(c(2L, > 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, > 6L, 7L, 2L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, > 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, > 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L > ), .Label = c("Ca", "Cl", "K", "Mg", "Na", "SO4", "pH"), class > "factor"), > quant = c(4, 33, 8.43, 4, 32, 8.46, 4, 31, 8.43, 6, 33, 8.32, > 5, 33, 8.5, 5, 32, 8.5, 5, 59.9, 3.46, 1.48, 29, 7.54, 64.6, > 7.36, 46, 2.95, 1.34, 21.8, 5.76, 48.8, 7.72, 74.2, 5.36, > 2.33, 38.4, 8.27, 141, 7.8, 3, 76, 6.64, 4, 74, 7.46, 2, > 82, 7.58, 5, 106, 7.91, 3, 56, 7.83, 3, 51, 7.6, 6, 149, > 7.73)), .Names = c("stream", "sampdate", "param", "quant" > ), row.names = c(NA, -61L), class = "data.frame") > > I want to subset that data.frame on each of the stream names: B, J, > and S. > This is the function that has the naming error (eda.R): > > extstream = function(alldf) { > sname = alldf$stream > sdate = alldf$sampdate > comp = alldf$param > value = alldf$quant > for (i in sname) { > sname <- subset(alldf, alldf$stream, select = c(sdate, comp, > value)) > return(sname) > } > } > > This is the result of running source('eda.R') followed by > >> extstream(testset) > Error in subset.data.frame(alldf, alldf$stream, select = c(sdate, comp, : > 'subset' must be logical > > I've tried using sname for the rows to select, but that produces a > different error of trying to select undefined columns. > > A pointer to the correct syntax for subset() is needed.
On Tue, 30 Jun 2015, Rolf Turner wrote:> If you want a pointer to the correct syntax for subset(), try > help("subset")!!! > > The syntax of your "extstream" function is totally screwed up, convoluted and > over-complicated. Note that even if you had your "subset" argument specified > correctly, the return() call will give you only the result from the *first* > pass through the for loop. > > That aside, the error message is perfectly clear: 'subset' must be logical. > Your "subset" argument is "stream" which is a factor. > > You *could* redefine your "extstream" function as follows: > > function(alldf) { > sname <- levels(alldf$stream) > rslt <- vector("list",length(sname)) > names(rslt) <- sname > for (i in sname) { > rslt[[i]] <- subset(alldf, alldf$stream==i, sampdate:quant) > } > rslt > } > > However you don't need to go through such contortions: > > split(testset,testset$stream) > > will give essentially what you want. If you wish to strip out the redundant > "stream" column from the data frames in the resulting list, you could do that > using lapply()Rolf, I did re-read the subset man page, but did not associate the error message with the problem. Thanks very much for the lesson. I will read the split() man page; simple is always better. Regards, Rich