[Using R 2.2.0 on Windows XP; OK, OK, I will update soon!] I have noticed some undesirable behaviour when applying ifelse to a data frame. Here is my code: A <- scan() 1.000000 0.000000 0.000000 0 0.00000 0.027702 0.972045 0.000253 0 0.00000 A <- matrix(A,nrow=2,ncol=5,byrow=T) A == 0 ifelse(A==0,0,-A*log(A)) A <- as.data.frame(A) ifelse(A==0,0,-A*log(A)) and this is the output:> A <- scan()1: 1.000000 0.000000 0.000000 0 0.00000 6: 0.027702 0.972045 0.000253 0 0.00000 11: Read 10 items> A <- matrix(A,nrow=2,ncol=5,byrow=T) > A == 0[,1] [,2] [,3] [,4] [,5] [1,] FALSE TRUE TRUE TRUE TRUE [2,] FALSE FALSE FALSE TRUE TRUE> ifelse(A==0,0,-A*log(A))[,1] [,2] [,3] [,4] [,5] [1,] 0.00000000 0.00000000 0.000000000 0 0 [2,] 0.09934632 0.02756057 0.002095377 0 0> > A <- as.data.frame(A) > ifelse(A==0,0,-A*log(A))[[1]] [1] 0.00000000 0.09934632 [[2]] [1] NaN 0.02756057 [[3]] [1] 0 [[4]] [1] NaN NaN [[5]] [1] 0 [[6]] [1] 0.00000000 0.09934632 [[7]] [1] 0 [[8]] [1] 0 [[9]] [1] 0 [[10]] [1] 0>Is this a bug or a feature? Can the behaviour be explained? Regards, Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: maj at waikato.ac.nz Fax 7 838 4155 Phone +64 7 838 4773 wk Home +64 7 825 0441 Mobile 021 1395 862
Hi you could use also another approach in case of data frames A <- as.data.frame(A) A0 <- -A*log(A) A0[is.na(A0)] <- 0 which changes NaN's to zeroes HTH Petr On 5 Jan 2007 at 16:38, talepanda wrote: Date sent: Fri, 5 Jan 2007 16:38:05 +0900 From: talepanda <talepanda at gmail.com> To: maj at stats.waikato.ac.nz Copies to: r-help at stat.math.ethz.ch Subject: Re: [R] ifelse on data frames> It can be explained. > > > class(A) > [1] "data.frame" > > length(A) > [1] 5 > > class(A==0) > [1] "matrix" > > length(A==0) > [1] 10 > > class(-A*log(A)) > [1] "data.frame" > > length(-A*log(A)) > [1] 5 > > as you can see, the result of A==0 is matrix with length=10, while the > result of -A*log(A) is still data.frame with length=5. > > then, when calling ifelse( [length=10], 0, [length=5] ), internally, > the NO(3rd) argument was repeated by rep(-A*log(A),length.out=10) (try > this). the result is "list" with length=10 and each element has 2 > sub-elements. > > So, the return value of A[(A==0)==FALSE] has 2 sub-elements as you > get. > > I think what confusing you is the behavior of A==0. > > However, when using 'ifelse', I think you should use matrix as the > arguments because data.frame is not consistent with the purpose of > 'ifelse'. > > On 1/5/07, maj at stats.waikato.ac.nz <maj at stats.waikato.ac.nz> wrote: > > [Using R 2.2.0 on Windows XP; OK, OK, I will update soon!] > > I have > noticed some undesirable behaviour when applying > ifelse to a data > frame. Here is my code: > > A <- scan() > 1.000000 0.000000 0.000000 > 0 0.00000 > 0.027702 0.972045 0.000253 0 0.00000 > > A <- > matrix(A,nrow=2,ncol=5,byrow=T) > A == 0 > ifelse(A==0,0,-A*log(A)) > > > A <- as.data.frame(A) > ifelse(A==0,0,-A*log(A)) > > and this is the > output: > > > A <- scan() > 1: 1.000000 0.000000 0.000000 0 0.00000 > > 6: 0.027702 0.972045 0.000253 0 0.00000 > 11: > Read 10 items > > > A <- matrix(A,nrow=2,ncol=5,byrow=T) > > A == 0 > [,1] [,2] > [,3] [,4] [,5] > [1,] FALSE TRUE TRUE TRUE TRUE > [2,] FALSE FALSE > FALSE TRUE TRUE > > ifelse(A==0,0,-A*log(A)) > [,1] > [,2] [,3] [,4] [,5] > [1,] 0.00000000 0.00000000 0.000000000 > 0 0 > [2,] 0.09934632 0.02756057 0.002095377 0 0 > > > > A <- > as.data.frame(A) > > ifelse(A==0,0,-A*log(A)) > [[1]] > [1] 0.00000000 > 0.09934632 > > [[2]] > [1] NaN 0.02756057 > > [[3]] > [1] 0 > > > [[4]] > [1] NaN NaN > > [[5]] > [1] 0 > > [[6]] > [1] 0.00000000 > 0.09934632 > > [[7]] > [1] 0 > > [[8]] > [1] 0 > > [[9]] > [1] 0 > > > [[10]] > [1] 0 > > > > > Is this a bug or a feature? Can the behaviour > be explained? > > Regards, Murray Jorgensen > -- > Dr Murray > Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html > > Department of Statistics, University of Waikato, Hamilton, New Zealand > > Email: maj at waikato.ac.nz Fax 7 838 > 4155 > Phone +64 7 838 4773 wk Home +64 7 825 0441 Mobile 021 > 1395 862 > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the > posting guide http://www.R-project.org/posting-guide.html > and > provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide commented, > minimal, self-contained, reproducible code.Petr Pikal petr.pikal at precheza.cz
On Friday 05 January 2007 12:34, Petr Pikal wrote:> Hi > > you could use also another approach in case of dataframes> > A <- as.data.frame(A) > A0 <- -A*log(A) > A0[is.na(A0)] <- 0I think you meant A0[which(is.na(A0))] <- 0> > which changes NaN's to zeroes > > HTH > PetrRegards
``MrJ Man'' wrote:> On Friday 05 January 2007 12:34, Petr Pikal wrote: > > Hi > > > > you could use also another approach in case of data > frames > > > > A <- as.data.frame(A) > > A0 <- -A*log(A) > > A0[is.na(A0)] <- 0 > I think you meant A0[which(is.na(A0))] <- 0He most certainly DOES NOT mean this! You should try things out before offering gratuitous advice. (a) A0[is.na(A0)] <- 0 works perfectly. (b) A0[which(is.na(A0))] <- 0 gets it wrong!!! I would have thought that A0[which(is.na(A0),arr.ind=TRUE)] <- 0 would work and get it right, but it gives the error message Error in `[<-.data.frame`(`*tmp*`, which(is.na(A0), arr.ind = TRUE), value = 0) : only logical matrix subscripts are allowed in replacement> > > > which changes NaN's to zeroes > > > > HTH > > Petrcheers, Rolf Turner rolf at math.unb.ca
maj at stats.waikato.ac.nz said the following on 2007-01-05 04:18:> [Using R 2.2.0 on Windows XP; OK, OK, I will update soon!] > > I have noticed some undesirable behaviour when applying > ifelse to a data frame. Here is my code: > > A <- scan() > 1.000000 0.000000 0.000000 0 0.00000 > 0.027702 0.972045 0.000253 0 0.00000 > > A <- matrix(A,nrow=2,ncol=5,byrow=T) > A == 0 > ifelse(A==0,0,-A*log(A)) > > A <- as.data.frame(A) > ifelse(A==0,0,-A*log(A))How about using sapply(A, function(x) ifelse(x == 0, 0, -x*log(x))) ? HTH, Henric> > and this is the output: > >> A <- scan() > 1: 1.000000 0.000000 0.000000 0 0.00000 > 6: 0.027702 0.972045 0.000253 0 0.00000 > 11: > Read 10 items >> A <- matrix(A,nrow=2,ncol=5,byrow=T) >> A == 0 > [,1] [,2] [,3] [,4] [,5] > [1,] FALSE TRUE TRUE TRUE TRUE > [2,] FALSE FALSE FALSE TRUE TRUE >> ifelse(A==0,0,-A*log(A)) > [,1] [,2] [,3] [,4] [,5] > [1,] 0.00000000 0.00000000 0.000000000 0 0 > [2,] 0.09934632 0.02756057 0.002095377 0 0 >> A <- as.data.frame(A) >> ifelse(A==0,0,-A*log(A)) > [[1]] > [1] 0.00000000 0.09934632 > > [[2]] > [1] NaN 0.02756057 > > [[3]] > [1] 0 > > [[4]] > [1] NaN NaN > > [[5]] > [1] 0 > > [[6]] > [1] 0.00000000 0.09934632 > > [[7]] > [1] 0 > > [[8]] > [1] 0 > > [[9]] > [1] 0 > > [[10]] > [1] 0 > > > Is this a bug or a feature? Can the behaviour be explained? > > Regards, Murray Jorgensen