Arthur Steinmetz
2008-Jan-30 15:53 UTC
[R] numeric coercion when one or more elements is non numerice
I don't understand this behavior. Why does the every data point get trashed by data.matrix when there is one non-numeric element in the array? Thanks.> tempGDP CPIYOY 19540 2098.1 garbage 19632 2085.4 0.9 19724 2052.5 0.8 19814 2042.4 1.1> data.matrix(temp)GDP CPIYOY 19540 4 4 19632 3 2 19724 2 1 19814 1 3>I'd like garbage to become NA but I tried filtering the array to scrub the data but it has no effect. This illustrates it:> temp[1,2] <- NA> tempGDP CPIYOY 19540 2098.1 <NA> 19632 2085.4 0.9 19724 2052.5 0.8 19814 2042.4 1.1> data.matrix(temp)GDP CPIYOY 19540 4 NA 19632 3 2 19724 2 1 19814 1 3>-- Art Steinmetz ____________________________________________________________________________________ Be a better friend, newshound, and
Gavin Simpson
2008-Jan-30 16:22 UTC
[R] numeric coercion when one or more elements is non numerice
hits=-2.6 tests=BAYES_00 X-USF-Spam-Flag: NO On Wed, 2008-01-30 at 07:53 -0800, Arthur Steinmetz wrote:> I don't understand this behavior. Why does the every data point get > trashed by data.matrix when there is one non-numeric element in the > array? Thanks.I suspect it is because your data GDP variable is not what you think it is. What does str(temp) say about GDP? I'll guess it says something like this:> str(temp)'data.frame': 4 obs. of 2 variables: $ GDP : Factor w/ 4 levels "2042.4","2052.5",..: 4 3 2 1 $ CPIYOY: Factor w/ 4 levels "0.8","0.9","1.1",..: 4 2 1 3 which indicates that GDP is a factor. As this shows, if GDP is numeric then data.matrix does produce what you want. If GDP is a factor however, you get the behaviour you observe.> temp <- data.frame(GDP = c(2098.1, 2085.4, 2052.5, 2042.4), CPIYOY c("garbage", "0.9", "0.8", "1.1")) > str(temp)'data.frame': 4 obs. of 2 variables: $ GDP : num 2098 2085 2052 2042 $ CPIYOY: Factor w/ 4 levels "0.8","0.9","1.1",..: 4 2 1 3> tempGDP CPIYOY 1 2098.1 garbage 2 2085.4 0.9 3 2052.5 0.8 4 2042.4 1.1> data.matrix(temp)GDP CPIYOY [1,] 2098.1 4 [2,] 2085.4 2 [3,] 2052.5 1 [4,] 2042.4 3> temp2 <- temp > temp2$GDP <- as.factor(temp2$GDP) > data.matrix(temp)GDP CPIYOY [1,] 4 4 [2,] 3 2 [3,] 2 1 [4,] 1 3 One option could be to convert anything in $CPIYOY that is "garbage" to NA, and having made sure that temp$GDP is numeric and not a factor, then use data.matrix, which will now do what you want.> temp$CPIYOY[temp$CPIYOY == "garbage"] <- NA > tempGDP CPIYOY 1 2098.1 <NA> 2 2085.4 0.9 3 2052.5 0.8 4 2042.4 1.1> data.matrix(temp)GDP CPIYOY [1,] 2098.1 NA [2,] 2085.4 2 [3,] 2052.5 1 [4,] 2042.4 3 If temp$GDP is a factor, you can't just do as.numeric(temp$GDP) as this will result in the same behaviour as data.matrix. You need to convert to character then to numeric:> temp2$GDP <- as.numeric(as.character(temp2$GDP)) > temp2GDP CPIYOY 1 2098.1 <NA> 2 2085.4 0.9 3 2052.5 0.8 4 2042.4 1.1> data.matrix(temp2)GDP CPIYOY [1,] 2098.1 NA [2,] 2085.4 2 [3,] 2052.5 1 [4,] 2042.4 3 HTH G> > > > temp > > GDP CPIYOY > > 19540 2098.1 garbage > > 19632 2085.4 0.9 > > 19724 2052.5 0.8 > > 19814 2042.4 1.1 > > > > data.matrix(temp) > > GDP CPIYOY > > 19540 4 4 > > 19632 3 2 > > 19724 2 1 > > 19814 1 3 > > > > > > > I'd like garbage to become NA but I tried filtering the array to scrub > the data but it has no effect. This illustrates it: > > > temp[1,2] <- NA > > > temp > GDP CPIYOY > 19540 2098.1 <NA> > 19632 2085.4 0.9 > 19724 2052.5 0.8 > 19814 2042.4 1.1 > > > data.matrix(temp) > GDP CPIYOY > 19540 4 NA > 19632 3 2 > 19724 2 1 > 19814 1 3 > > > -- Art Steinmetz > > > > > > > > ____________________________________________________________________________________ > Be a better friend, newshound, and > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Arthur Steinmetz
2008-Jan-31 20:57 UTC
[R] numeric coercion when one or more elements is non numerice
That did it. Thanks! What I was getting was> tempGDP CPIYOY 23832 3108.2 garbage 23923 garbage 1.8 24015 3214.1 1.8 24107 3291.8 2> str(temp)`data.frame': 4 obs. of 2 variables: $ GDP :Error in importIntoEnv(impenv, impnames, ns, impvars) : objects 'dev.interactive', 'palette', 'extendrange', 'xy.coords' are not exported by 'namespace:grDevices'>so your tip of>temp$GDP <- as.numeric(as.character(temp$GDP)) did the trick. Two twists: 1. I am not sure what characters will actually be garbage but I see that as.numeric coerces any garbage to be NA automatically so I don't search and replace. 2. Since I contemplated 'garbage' scattered about I need to step through the vectors in the frame for (j in 1:dim(hist)[2]){hist[,j]<- as.numeric(as.character(hist[,j]))} Although I suspect, given how powerful R is, there is a way to avoid the for loop and operate on the whole frame. Anyway, problem solved! ____________________________________________________________________________________ Looking for last minute shopping deals?