Richard.Cotton at hsl.gov.uk
2006-Oct-31 14:02 UTC
[R] Odd behaviour of removing 'nothing' from an array or data frame
I've just found some behaviour which strikes me as odd, but I'm not sure whether it's a bug or a feature. If you don't mind, I'd like to explain via a couple of examples. Let x = 1:10. Then intuitively, to me at least, the command x[-integer(0)] should leave x untouched. However the actual output under R2.4.0 is integer(0). A slightly more involved example demonstrates why I think this behaviour is back to front. First we define a data frame, in this case some people, with their heights. peoples.heights = data.frame(names = c("Alice", "Bob", "Carol"), heights = c(1.67, 1.85, 175)) To make sure the heights are sensible, we define a filter out impossibly tall people. dubious.records = which(peoples.heights$heights > 2.5) #3 peoples.heights = peoples.heights[-dubious.records,] This all works fine since dubious.records is not empty. However, if all the records had been entered properly, then we would get #dubious.records = integer(0) Then the command peoples.heights = peoples.heights[-dubious.records,] strips all the rows to give #[1] names heights #<0 rows> (or 0-length row.names) i.e. instead of removing the bad records, I've lost everything. I know that it's possible to recode this so problems don't occur, but the point is that the answer is unexpected. Can anybody explain if this behaviour is intentional or useful in some way, or is it an oversight? Regards, Richie. Mathematical Sciences Unit HSL Buxton SK17 9JN 01298 21(x8672) ------------------------------------------------------------------------ ATTENTION: This message contains privileged and confidential informatio...{{dropped}}
Gabor Grothendieck
2006-Oct-31 14:19 UTC
[R] Odd behaviour of removing 'nothing' from an array or data frame
But what if you wanted to get the subset of those rows containing heights over 250 -- you would want zero rows to be returned: On 10/31/06, Richard.Cotton at hsl.gov.uk <Richard.Cotton at hsl.gov.uk> wrote:> I've just found some behaviour which strikes me as odd, but I'm not sure > whether it's a bug or a feature. If you don't mind, I'd like to explain > via a couple of examples. > > Let x = 1:10. > Then intuitively, to me at least, the command x[-integer(0)] should leave > x untouched. However the actual output under R2.4.0 is integer(0). > > A slightly more involved example demonstrates why I think this behaviour > is back to front. > First we define a data frame, in this case some people, with their > heights. > peoples.heights = data.frame(names = c("Alice", "Bob", "Carol"), heights > c(1.67, 1.85, 175)) > > To make sure the heights are sensible, we define a filter out impossibly > tall people. > dubious.records = which(peoples.heights$heights > 2.5) #3 > peoples.heights = peoples.heights[-dubious.records,] > > This all works fine since dubious.records is not empty. However, if all > the records had been entered properly, then we would get > #dubious.records = integer(0) > > Then the command peoples.heights = peoples.heights[-dubious.records,] > strips all the rows to give > #[1] names heights > #<0 rows> (or 0-length row.names) > > i.e. instead of removing the bad records, I've lost everything. > I know that it's possible to recode this so problems don't occur, but the > point is that the answer is unexpected. > > Can anybody explain if this behaviour is intentional or useful in some > way, or is it an oversight? > > Regards, > Richie. > > Mathematical Sciences Unit > HSL > Buxton > SK17 9JN > 01298 21(x8672) > > > ------------------------------------------------------------------------ > ATTENTION: > > This message contains privileged and confidential informatio...{{dropped}} > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Peter Dalgaard
2006-Oct-31 14:27 UTC
[R] Odd behaviour of removing 'nothing' from an array or data frame
Richard.Cotton at hsl.gov.uk writes:> I've just found some behaviour which strikes me as odd, but I'm not sure > whether it's a bug or a feature. If you don't mind, I'd like to explain > via a couple of examples. > > Let x = 1:10. > Then intuitively, to me at least, the command x[-integer(0)] should leave > x untouched. However the actual output under R2.4.0 is integer(0). > > A slightly more involved example demonstrates why I think this behaviour > is back to front. > First we define a data frame, in this case some people, with their > heights. > peoples.heights = data.frame(names = c("Alice", "Bob", "Carol"), heights = > c(1.67, 1.85, 175)) > > To make sure the heights are sensible, we define a filter out impossibly > tall people. > dubious.records = which(peoples.heights$heights > 2.5) #3 > peoples.heights = peoples.heights[-dubious.records,] > > This all works fine since dubious.records is not empty. However, if all > the records had been entered properly, then we would get > #dubious.records = integer(0) > > Then the command peoples.heights = peoples.heights[-dubious.records,] > strips all the rows to give > #[1] names heights > #<0 rows> (or 0-length row.names) > > i.e. instead of removing the bad records, I've lost everything. > I know that it's possible to recode this so problems don't occur, but the > point is that the answer is unexpected. > > Can anybody explain if this behaviour is intentional or useful in some > way, or is it an oversight?Consistency! It's not particularly useful, but it follows from general principles, which it in the long run doesn't pay to depart from. The issue is that the result of using an indexing operator ("[") should depend only on the _value_ of its argument, not the expression used to compute it. Just like you most likely expect log(2+2) not to be different from log(4). And since> dubious.records <- integer(0) > identical(dubious.records, -dubious.records)[1] TRUE how can peoples.heights[-dubious.records,] be different from peoples.heights[dubious.records,]? R could actually look at the expression and act on the minus sign, but that way lies madness. Consider keep <- -dubious.records drop <- dubious.records peoples.heights[keep,] peoples.heights[-dubious.records,] peoples.heights[-keep,] etc... I think you'll get the picture. -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Richard.Cotton at hsl.gov.uk
2006-Oct-31 14:50 UTC
[R] Odd behaviour of removing 'nothing' from an array or data frame
Thanks for the reply Peter, though I'm not quite convinced.> > #dubious.records = integer(0) > > identical(dubious.records, -dubious.records) > [1] TRUE> how can peoples.heights[-dubious.records,] be different from > peoples.heights[dubious.records,]?Tell me if I'm being willfully ignorant here, but I'm sure they should be different. In the first case, the minus sign represents subtraction, so it is correct that dubious.records and -dubious.records are identical. However, in the second case, inside the square brackets, the minus sign represents set complement, not subtraction, so dubious.records and - dubious.records are not the same. If x = runif(10), then x[-c(2,3,5)] means "remove from x the values at the second, third and fifth position". By extension x[-integer(0)] should mean "remove from x no values", and not "remove from x all values", which is the current behaviour. Regards, Richie. Mathematical Sciences Unit HSL Buxton SK17 9JN 01298 21(x8672) pd at pubhealth.ku.dk wrote on 31/10/2006 14:27:05:> Richard.Cotton at hsl.gov.uk writes: > > > I've just found some behaviour which strikes me as odd, but I'm notsure> > whether it's a bug or a feature. If you don't mind, I'd like toexplain> > via a couple of examples. > > > > Let x = 1:10. > > Then intuitively, to me at least, the command x[-integer(0)] shouldleave> > x untouched. However the actual output under R2.4.0 is integer(0). > > > > A slightly more involved example demonstrates why I think thisbehaviour> > is back to front. > > First we define a data frame, in this case some people, with their > > heights. > > peoples.heights = data.frame(names = c("Alice", "Bob", "Carol"),heights => > c(1.67, 1.85, 175)) > > > > To make sure the heights are sensible, we define a filter outimpossibly> > tall people. > > dubious.records = which(peoples.heights$heights > 2.5) #3 > > peoples.heights = peoples.heights[-dubious.records,] > > > > This all works fine since dubious.records is not empty. However, ifall> > the records had been entered properly, then we would get > > #dubious.records = integer(0) > > > > Then the command peoples.heights = peoples.heights[-dubious.records,] > > strips all the rows to give > > #[1] names heights > > #<0 rows> (or 0-length row.names) > > > > i.e. instead of removing the bad records, I've lost everything. > > I know that it's possible to recode this so problems don't occur, butthe> > point is that the answer is unexpected. > > > > Can anybody explain if this behaviour is intentional or useful in some> > way, or is it an oversight? > > Consistency! It's not particularly useful, but it follows from general > principles, which it in the long run doesn't pay to depart from. > > The issue is that the result of using an indexing operator ("[") > should depend only on the _value_ of its argument, not the expression > used to compute it. Just like you most likely expect log(2+2) not to be > different from log(4). And since > > > dubious.records <- integer(0) > > identical(dubious.records, -dubious.records) > [1] TRUE > > how can peoples.heights[-dubious.records,] be different from > peoples.heights[dubious.records,]? > > R could actually look at the expression and act on the minus sign, but > that way lies madness. Consider > > keep <- -dubious.records > drop <- dubious.records > > peoples.heights[keep,] > peoples.heights[-dubious.records,] > peoples.heights[-keep,] > > etc... I think you'll get the picture. > > -- > O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > (*) \(*) -- University of Copenhagen Denmark Ph: (+45)35327918> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45)35327907 ------------------------------------------------------------------------ ATTENTION: This message contains privileged and confidential informatio...{{dropped}}
Leeds, Mark (IED)
2006-Oct-31 15:18 UTC
[R] Odd behaviour of removing 'nothing' from an array or data frame
I think I had that similar problem at some point in the way past and got around it by checking the length of dubious records before I Sent it into the expression. If the length is zero don't send it into the expression. -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Richard.Cotton at hsl.gov.uk Sent: Tuesday, October 31, 2006 9:02 AM To: r-help at stat.math.ethz.ch Subject: [R] Odd behaviour of removing 'nothing' from an array or data frame I've just found some behaviour which strikes me as odd, but I'm not sure whether it's a bug or a feature. If you don't mind, I'd like to explain via a couple of examples. Let x = 1:10. Then intuitively, to me at least, the command x[-integer(0)] should leave x untouched. However the actual output under R2.4.0 is integer(0). A slightly more involved example demonstrates why I think this behaviour is back to front. First we define a data frame, in this case some people, with their heights. peoples.heights = data.frame(names = c("Alice", "Bob", "Carol"), heights = c(1.67, 1.85, 175)) To make sure the heights are sensible, we define a filter out impossibly tall people. dubious.records = which(peoples.heights$heights > 2.5) #3 peoples.heights = peoples.heights[-dubious.records,] This all works fine since dubious.records is not empty. However, if all the records had been entered properly, then we would get #dubious.records = integer(0) Then the command peoples.heights = peoples.heights[-dubious.records,] strips all the rows to give #[1] names heights #<0 rows> (or 0-length row.names) i.e. instead of removing the bad records, I've lost everything. I know that it's possible to recode this so problems don't occur, but the point is that the answer is unexpected. Can anybody explain if this behaviour is intentional or useful in some way, or is it an oversight? Regards, Richie. Mathematical Sciences Unit HSL Buxton SK17 9JN 01298 21(x8672) ------------------------------------------------------------------------ ATTENTION: This message contains privileged and confidential\ informati...{{dropped}}
Richard.Cotton at hsl.gov.uk
2006-Oct-31 16:28 UTC
[R] Odd behaviour of removing 'nothing' from an array or data frame
Thanks to all for the swift replies. I'm willing to concede that changing the behaviour of [-integer(0)] is probably going to end in tears, so I'll stop arguing. I've added a quick note on the traps page of the R wiki to explain things; feel free to expand upon it. http://wiki.r-project.org/rwiki/doku.php?id=tips:surprises:traps Regards, Richie. Mathematical Sciences Unit HSL Buxton SK17 9JN 01298 21(x8672) ------------------------------------------------------------------------ ATTENTION: This message contains privileged and confidential informatio...{{dropped}}
Possibly Parallel Threads
- [Bug 22317] New: Interlace does not work with Geforce 6200 / NV44a
- [Bug 20298] New: Nouveau doesn' t allow my modeline because of hardcoded value
- errors when setting up R2.4.0-win32.exe
- Does samba support Active Directory User Group name with space
- Powell's unconstrained derivative-free nonlinear least squares routine, VA05AD