Moving from interactive use of R to scripts and functions and have bumped
into what I believe is a problem with variable names. Did not see a solution
in the two R programming books I have or from my Web searches. Inexperience
with ess-tracebug keeps me from refining my bug tracking.
Here's a test data set (cleverly called 'testset.dput'):
structure(list(stream = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L), .Label = c("B", "J",
"S"), class = "factor"),
sampdate = structure(c(8121, 8121, 8121, 8155, 8155, 8155,
8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257,
8257, 8257, 8308, 8785, 8785, 8785, 8785, 8785, 8785, 8785,
8847, 8847, 8847, 8847, 8847, 8847, 8847, 8875, 8875, 8875,
8875, 8875, 8875, 8875, 8121, 8121, 8121, 8155, 8155, 8155,
8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257,
8257, 8257, 8301, 8301, 8301), class = "Date"), param =
structure(c(2L,
6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L,
6L, 7L, 2L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L,
2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L
), .Label = c("Ca", "Cl", "K",
"Mg", "Na", "SO4", "pH"), class =
"factor"),
quant = c(4, 33, 8.43, 4, 32, 8.46, 4, 31, 8.43, 6, 33, 8.32,
5, 33, 8.5, 5, 32, 8.5, 5, 59.9, 3.46, 1.48, 29, 7.54, 64.6,
7.36, 46, 2.95, 1.34, 21.8, 5.76, 48.8, 7.72, 74.2, 5.36,
2.33, 38.4, 8.27, 141, 7.8, 3, 76, 6.64, 4, 74, 7.46, 2,
82, 7.58, 5, 106, 7.91, 3, 56, 7.83, 3, 51, 7.6, 6, 149,
7.73)), .Names = c("stream", "sampdate",
"param", "quant"
), row.names = c(NA, -61L), class = "data.frame")
I want to subset that data.frame on each of the stream names: B, J, and S.
This is the function that has the naming error (eda.R):
extstream = function(alldf) {
sname = alldf$stream
sdate = alldf$sampdate
comp = alldf$param
value = alldf$quant
for (i in sname) {
sname <- subset(alldf, alldf$stream, select = c(sdate, comp, value))
return(sname)
}
}
This is the result of running source('eda.R') followed by
> extstream(testset)
Error in subset.data.frame(alldf, alldf$stream, select = c(sdate, comp, :
'subset' must be logical
I've tried using sname for the rows to select, but that produces a
different error of trying to select undefined columns.
A pointer to the correct syntax for subset() is needed.
Rich
Using return() within a for loop makes no sense: only the first one will be
returned.
How about:
alldf.B = subset(alldf, stream=='B') # etc...
Also, have a look at unique(alldf$stream) or levels(alldf$stream) if you want to
use a for loop on each unique value.
cheers,
Steve
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Rich Shepard
Sent: Tuesday, 30 June 2015 12:04p
To: r-help at r-project.org
Subject: [R] Subset() within function: logical error
Moving from interactive use of R to scripts and functions and have bumped
into what I believe is a problem with variable names. Did not see a solution
in the two R programming books I have or from my Web searches. Inexperience
with ess-tracebug keeps me from refining my bug tracking.
Here's a test data set (cleverly called 'testset.dput'):
structure(list(stream = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L), .Label = c("B", "J",
"S"), class = "factor"),
sampdate = structure(c(8121, 8121, 8121, 8155, 8155, 8155,
8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257,
8257, 8257, 8308, 8785, 8785, 8785, 8785, 8785, 8785, 8785,
8847, 8847, 8847, 8847, 8847, 8847, 8847, 8875, 8875, 8875,
8875, 8875, 8875, 8875, 8121, 8121, 8121, 8155, 8155, 8155,
8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257,
8257, 8257, 8301, 8301, 8301), class = "Date"), param =
structure(c(2L,
6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L,
6L, 7L, 2L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L,
2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L
), .Label = c("Ca", "Cl", "K",
"Mg", "Na", "SO4", "pH"), class =
"factor"),
quant = c(4, 33, 8.43, 4, 32, 8.46, 4, 31, 8.43, 6, 33, 8.32,
5, 33, 8.5, 5, 32, 8.5, 5, 59.9, 3.46, 1.48, 29, 7.54, 64.6,
7.36, 46, 2.95, 1.34, 21.8, 5.76, 48.8, 7.72, 74.2, 5.36,
2.33, 38.4, 8.27, 141, 7.8, 3, 76, 6.64, 4, 74, 7.46, 2,
82, 7.58, 5, 106, 7.91, 3, 56, 7.83, 3, 51, 7.6, 6, 149,
7.73)), .Names = c("stream", "sampdate",
"param", "quant"
), row.names = c(NA, -61L), class = "data.frame")
I want to subset that data.frame on each of the stream names: B, J, and S.
This is the function that has the naming error (eda.R):
extstream = function(alldf) {
sname = alldf$stream
sdate = alldf$sampdate
comp = alldf$param
value = alldf$quant
for (i in sname) {
sname <- subset(alldf, alldf$stream, select = c(sdate, comp, value))
return(sname)
}
}
This is the result of running source('eda.R') followed by
> extstream(testset)
Error in subset.data.frame(alldf, alldf$stream, select = c(sdate, comp, :
'subset' must be logical
I've tried using sname for the rows to select, but that produces a
different error of trying to select undefined columns.
A pointer to the correct syntax for subset() is needed.
Rich
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Well, your code is, ah, too incorrect to convey what you want out of this
effort. If I were to guess based on your description, you want all of the data,
not a subset. An example data frame containing what you hope to extract might be
helpful.
However, extracting subsets is rarely done for just one subset... usually you
want to process the data in groups. Base functions such as ave, aggregate, or
split work at a higher level than you seem to be thinking. Packages such as plyr
and dplyr handle this breaking and recombining more succinctly, leaving you to
think more about what you want to do with the pieces and less about making
pieces.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
On June 29, 2015 5:03:38 PM PDT, Rich Shepard <rshepard at
appl-ecosys.com> wrote:>Moving from interactive use of R to scripts and functions and have
>bumped
>into what I believe is a problem with variable names. Did not see a
>solution
>in the two R programming books I have or from my Web searches.
>Inexperience
>with ess-tracebug keeps me from refining my bug tracking.
>
> Here's a test data set (cleverly called 'testset.dput'):
>
>structure(list(stream = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
>1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
>2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
>2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
>3L, 3L, 3L, 3L, 3L, 3L), .Label = c("B", "J",
"S"), class = "factor"),
> sampdate = structure(c(8121, 8121, 8121, 8155, 8155, 8155,
> 8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257,
> 8257, 8257, 8308, 8785, 8785, 8785, 8785, 8785, 8785, 8785,
> 8847, 8847, 8847, 8847, 8847, 8847, 8847, 8875, 8875, 8875,
> 8875, 8875, 8875, 8875, 8121, 8121, 8121, 8155, 8155, 8155,
> 8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257,
>8257, 8257, 8301, 8301, 8301), class = "Date"), param =
structure(c(2L,
> 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L,
> 6L, 7L, 2L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L,
> 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L,
> 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L
>), .Label = c("Ca", "Cl", "K", "Mg",
"Na", "SO4", "pH"), class >"factor"),
> quant = c(4, 33, 8.43, 4, 32, 8.46, 4, 31, 8.43, 6, 33, 8.32,
> 5, 33, 8.5, 5, 32, 8.5, 5, 59.9, 3.46, 1.48, 29, 7.54, 64.6,
> 7.36, 46, 2.95, 1.34, 21.8, 5.76, 48.8, 7.72, 74.2, 5.36,
> 2.33, 38.4, 8.27, 141, 7.8, 3, 76, 6.64, 4, 74, 7.46, 2,
> 82, 7.58, 5, 106, 7.91, 3, 56, 7.83, 3, 51, 7.6, 6, 149,
> 7.73)), .Names = c("stream", "sampdate",
"param", "quant"
>), row.names = c(NA, -61L), class = "data.frame")
>
>I want to subset that data.frame on each of the stream names: B, J, and
>S.
>This is the function that has the naming error (eda.R):
>
>extstream = function(alldf) {
> sname = alldf$stream
> sdate = alldf$sampdate
> comp = alldf$param
> value = alldf$quant
> for (i in sname) {
> sname <- subset(alldf, alldf$stream, select = c(sdate, comp, value))
> return(sname)
> }
>}
>
> This is the result of running source('eda.R') followed by
>
>> extstream(testset)
>Error in subset.data.frame(alldf, alldf$stream, select = c(sdate, comp,
> :
> 'subset' must be logical
>
> I've tried using sname for the rows to select, but that produces a
>different error of trying to select undefined columns.
>
> A pointer to the correct syntax for subset() is needed.
>
>Rich
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
On Jun 29, 2015, at 5:03 PM, Rich Shepard wrote:> Moving from interactive use of R to scripts and functions and have bumped > into what I believe is a problem with variable names. Did not see a solution > in the two R programming books I have or from my Web searches. Inexperience > with ess-tracebug keeps me from refining my bug tracking. > > Here's a test data set (cleverly called 'testset.dput'): > > structure(list(stream = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("B", "J", "S"), class = "factor"), > sampdate = structure(c(8121, 8121, 8121, 8155, 8155, 8155, > 8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257, > 8257, 8257, 8308, 8785, 8785, 8785, 8785, 8785, 8785, 8785, > 8847, 8847, 8847, 8847, 8847, 8847, 8847, 8875, 8875, 8875, > 8875, 8875, 8875, 8875, 8121, 8121, 8121, 8155, 8155, 8155, > 8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257, > 8257, 8257, 8301, 8301, 8301), class = "Date"), param = structure(c(2L, > 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, > 6L, 7L, 2L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, > 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, > 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L > ), .Label = c("Ca", "Cl", "K", "Mg", "Na", "SO4", "pH"), class = "factor"), > quant = c(4, 33, 8.43, 4, 32, 8.46, 4, 31, 8.43, 6, 33, 8.32, > 5, 33, 8.5, 5, 32, 8.5, 5, 59.9, 3.46, 1.48, 29, 7.54, 64.6, > 7.36, 46, 2.95, 1.34, 21.8, 5.76, 48.8, 7.72, 74.2, 5.36, > 2.33, 38.4, 8.27, 141, 7.8, 3, 76, 6.64, 4, 74, 7.46, 2, > 82, 7.58, 5, 106, 7.91, 3, 56, 7.83, 3, 51, 7.6, 6, 149, > 7.73)), .Names = c("stream", "sampdate", "param", "quant" > ), row.names = c(NA, -61L), class = "data.frame") > > I want to subset that data.frame on each of the stream names: B, J, and S. > This is the function that has the naming error (eda.R): > > extstream = function(alldf) { > sname = alldf$stream > sdate = alldf$sampdate > comp = alldf$param > value = alldf$quant > for (i in sname) { > sname <- subset(alldf, alldf$stream, select = c(sdate, comp, value))Never use the form dfrm$colname as the argument to the subset argument of subset. You can see that 'stream' is a factor, right? Perhaps Furthermore, by inspection you can see that there is no colname =='sdate', so I would guess that would be your next error. Or 'comp' or 'value' for that matter. Oh now I see, you made them outside of `alldf`. Then how is that supposed to work. The subset function is supposed to be looking inside `alldf` to find those column names. Perhaps: subset(alldf, stream %in% c('B', 'J', 'S'), .... .... but have not figured out why you used 'subset' if you wanted: select = c(sdate, comp, value)) Furthermore, it is generally error prone to use `subset` inside functions. The help page warns against the practice. Better to use "[".> return(sname) > } > } > > This is the result of running source('eda.R') followed by > >> extstream(testset) > Error in subset.data.frame(alldf, alldf$stream, select = c(sdate, comp, : > 'subset' must be logical > > I've tried using sname for the rows to select, but that produces a > different error of trying to select undefined columns.Right. Those are not column names in any dataframe.> > A pointer to the correct syntax for subset() is needed.No. A pointer to the correct use of "[" is needed. -- David Winsemius Alameda, CA, USA
On Tue, 30 Jun 2015, Steve Taylor wrote:> Using return() within a for loop makes no sense: only the first one will be returned.Steve, Mea culpa. Didn't catch that.> How about: > alldf.B = subset(alldf, stream=='B') # etc...I used to do each stream manually, like the above, and want to learn how to loop through all of them ...> Also, have a look at unique(alldf$stream) or levels(alldf$stream) if you > want to use a for loop on each unique value.... which unique() and levels() will probably do. Will test these tomorrow after rading the man pages. Many thanks, Rich
On Mon, 29 Jun 2015, David Winsemius wrote:> No. A pointer to the correct use of "[" is needed.Thanks, David. This puts me on the the right path. Much appreciated, Rich
If you want a pointer to the correct syntax for subset(), try
help("subset")!!!
The syntax of your "extstream" function is totally screwed up,
convoluted and over-complicated. Note that even if you had your
"subset"
argument specified correctly, the return() call will give you only the
result from the *first* pass through the for loop.
That aside, the error message is perfectly clear: 'subset' must be
logical. Your "subset" argument is "stream" which is a
factor.
You *could* redefine your "extstream" function as follows:
function(alldf) {
sname <- levels(alldf$stream)
rslt <- vector("list",length(sname))
names(rslt) <- sname
for (i in sname) {
rslt[[i]] <- subset(alldf, alldf$stream==i, sampdate:quant)
}
rslt
}
However you don't need to go through such contortions:
split(testset,testset$stream)
will give essentially what you want. If you wish to strip out the
redundant "stream" column from the data frames in the resulting list,
you could do that using lapply()
cheers,
Rolf Turner
--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276
On 30/06/15 12:03, Rich Shepard wrote:> Moving from interactive use of R to scripts and functions and have
> bumped
> into what I believe is a problem with variable names. Did not see a
> solution
> in the two R programming books I have or from my Web searches. Inexperience
> with ess-tracebug keeps me from refining my bug tracking.
>
> Here's a test data set (cleverly called 'testset.dput'):
>
> structure(list(stream = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label
> = c("B", "J", "S"), class =
"factor"),
> sampdate = structure(c(8121, 8121, 8121, 8155, 8155, 8155,
> 8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257,
> 8257, 8257, 8308, 8785, 8785, 8785, 8785, 8785, 8785, 8785,
> 8847, 8847, 8847, 8847, 8847, 8847, 8847, 8875, 8875, 8875,
> 8875, 8875, 8875, 8875, 8121, 8121, 8121, 8155, 8155, 8155,
> 8185, 8185, 8185, 8205, 8205, 8205, 8236, 8236, 8236, 8257,
> 8257, 8257, 8301, 8301, 8301), class = "Date"), param >
structure(c(2L,
> 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L,
> 6L, 7L, 2L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L,
> 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L,
> 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L, 2L, 6L, 7L
> ), .Label = c("Ca", "Cl", "K",
"Mg", "Na", "SO4", "pH"), class >
"factor"),
> quant = c(4, 33, 8.43, 4, 32, 8.46, 4, 31, 8.43, 6, 33, 8.32,
> 5, 33, 8.5, 5, 32, 8.5, 5, 59.9, 3.46, 1.48, 29, 7.54, 64.6,
> 7.36, 46, 2.95, 1.34, 21.8, 5.76, 48.8, 7.72, 74.2, 5.36,
> 2.33, 38.4, 8.27, 141, 7.8, 3, 76, 6.64, 4, 74, 7.46, 2,
> 82, 7.58, 5, 106, 7.91, 3, 56, 7.83, 3, 51, 7.6, 6, 149,
> 7.73)), .Names = c("stream", "sampdate",
"param", "quant"
> ), row.names = c(NA, -61L), class = "data.frame")
>
> I want to subset that data.frame on each of the stream names: B, J,
> and S.
> This is the function that has the naming error (eda.R):
>
> extstream = function(alldf) {
> sname = alldf$stream
> sdate = alldf$sampdate
> comp = alldf$param
> value = alldf$quant
> for (i in sname) {
> sname <- subset(alldf, alldf$stream, select = c(sdate, comp,
> value))
> return(sname)
> }
> }
>
> This is the result of running source('eda.R') followed by
>
>> extstream(testset)
> Error in subset.data.frame(alldf, alldf$stream, select = c(sdate, comp, :
> 'subset' must be logical
>
> I've tried using sname for the rows to select, but that produces a
> different error of trying to select undefined columns.
>
> A pointer to the correct syntax for subset() is needed.
On Tue, 30 Jun 2015, Rolf Turner wrote:> If you want a pointer to the correct syntax for subset(), try > help("subset")!!! > > The syntax of your "extstream" function is totally screwed up, convoluted and > over-complicated. Note that even if you had your "subset" argument specified > correctly, the return() call will give you only the result from the *first* > pass through the for loop. > > That aside, the error message is perfectly clear: 'subset' must be logical. > Your "subset" argument is "stream" which is a factor. > > You *could* redefine your "extstream" function as follows: > > function(alldf) { > sname <- levels(alldf$stream) > rslt <- vector("list",length(sname)) > names(rslt) <- sname > for (i in sname) { > rslt[[i]] <- subset(alldf, alldf$stream==i, sampdate:quant) > } > rslt > } > > However you don't need to go through such contortions: > > split(testset,testset$stream) > > will give essentially what you want. If you wish to strip out the redundant > "stream" column from the data frames in the resulting list, you could do that > using lapply()Rolf, I did re-read the subset man page, but did not associate the error message with the problem. Thanks very much for the lesson. I will read the split() man page; simple is always better. Regards, Rich