R version 2.5.0, under gentoo linux. This may be just my ignorance about naming conventions inside loops and subsets, but the following appears like a bug to me. y = c( 1963, 1963, 1964, 1964, 1965, 1965 ); r1= rnorm(6); d= data.frame ( y=y, r1=r1 ); ## note: I am not attach()ing anything anywhere ## this should give me two results, which it does ahw.y= subset(d, d$y==1963); print(summary(ahw.y)); ## this should give me the same two results, which it does not. All 6 are included now. for (y in 1963:1963) { subd= subset(d, d$y==y); print(summary(subd)); } ## this should give me the same two results, which it does for (yr in 1963:1963) { subd= subset(d, d$y==yr); print(summary(subd)); } hope this helps. (if its a bother, please let me know and I won't post such emails anymore. would save me time, too.) regards, /ivo
In the 'subset' function, the 'select' parameter can contain the names of the columns (without the df$ qualifier). So in your 'for' loop you basically have subset(d, d$y ==d$y) which selects all the data since you have a column name of 'y' which is the same as your variable. On 5/17/07, ivo welch <ivowel@gmail.com> wrote:> > R version 2.5.0, under gentoo linux. This may be just my ignorance > about naming conventions inside loops and subsets, but the following > appears like a bug to me. > > > y = c( 1963, 1963, 1964, 1964, 1965, 1965 ); > r1= rnorm(6); > d= data.frame ( y=y, r1=r1 ); > > ## note: I am not attach()ing anything anywhere > > ## this should give me two results, which it does > ahw.y= subset(d, d$y==1963); print(summary(ahw.y)); > > > ## this should give me the same two results, which it does not. All 6 > are included now. > for (y in 1963:1963) { > subd= subset(d, d$y==y); > print(summary(subd)); > } > > ## this should give me the same two results, which it does > for (yr in 1963:1963) { > subd= subset(d, d$y==yr); > print(summary(subd)); > } > > > hope this helps. (if its a bother, please let me know and I won't > post such emails anymore. would save me time, too.) > > regards, > > /ivo > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]]
ahh...it is the silent substitution of the data frame in the subset statement. I should have known this. (PS: this may not be desirable behavior; maybe it would be useful to issue a warning if the same name is defined in an upper data frame. just an opinion...) mea misunderstanding. /iaw
... but it **is** explicitly documented in ?subset: "For data frames, the subset argument works on the rows. Note that subset will be evaluated in the data frame, so columns can be referred to (by name) as variables in the expression (see the examples). " Bert Gunter Genentech Nonclinical Statistics -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of ivo welch Sent: Thursday, May 17, 2007 11:53 AM To: jim holtman Cc: r-help Subject: Re: [R] bug or feature? ahh...it is the silent substitution of the data frame in the subset statement. I should have known this. (PS: this may not be desirable behavior; maybe it would be useful to issue a warning if the same name is defined in an upper data frame. just an opinion...) mea misunderstanding. /iaw ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
of course it is. virtually everything in R is somewhere documented, and once one sees it, pretty well documented, too. my suggestion for a warning is more a matter of "user friendliness"---a warning, not an error. chances are that if a variable exists in the upper frames, and a user reuses it, there is a good chance that it could be ambiguous. for such cases, a user may be well advised to use subset$variablename explicitly instead of simply variablename IMHO. I believe many C compilers and the perl interpreter routinely issue warnings when there is a good chance that a behavior may not necessarily be what a naive user might expect, even though the code is perfectly correct and unambiguous. of course, I am really NAIVE very often. regards, /ivo On 5/17/07, Bert Gunter <gunter.berton at gene.com> wrote:> ... but it **is** explicitly documented in ?subset: > > "For data frames, the subset argument works on the rows. Note that subset > will be evaluated in the data frame, so columns can be referred to (by name) > as variables in the expression (see the examples). " > > > Bert Gunter > Genentech Nonclinical Statistics > > > -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of ivo welch > Sent: Thursday, May 17, 2007 11:53 AM > To: jim holtman > Cc: r-help > Subject: Re: [R] bug or feature? > > ahh...it is the silent substitution of the data frame in the subset > statement. I should have known this. (PS: this may not be desirable > behavior; maybe it would be useful to issue a warning if the same name > is defined in an upper data frame. just an opinion...) > > mea misunderstanding. > > /iaw > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >