Dear Noah,
You get Nan values because you devide by zero is all values are equal.
So it does not only happen when you have only zero's. Try scale(rep(1,
10))
A workaround is to turn the scaling off when all values are identical
(have zero variance). You can rearrange your nested loops with the
(untested) code below.
library(reshape)
library(plyr)
group_names <- grep('norm_',names(rawdata))
Molten <- melt(rawdata, id.vars= "code", measure.vars =
group_names)
MoltenScaled <- ddply(Molten, c("code", "variable"),
function(x){
x$value <- scale(x$value, scale = var(x$value) != 0)
})
cast(code~variable, data = MoltenScaled)
HTH,
Thierry
------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
-----Oorspronkelijk bericht-----
Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
Namens Noah Silverman
Verzonden: maandag 3 augustus 2009 8:26
Aan: r help
Onderwerp: [R] Scale set of 0 values returns NAN??
Hi,
More questions in my ongoing quest to convert from RapidMiner to R.
One thing has become VERY CLEAR: None of the issues I'm asking about
here are addressed in RapidMiner. How it handles misisng values,
scaling, etc. is hidden within the "black box". Using R is forcing me
to take a much deeper look at my data and how my experiments are
constructed. (That's a very "Good Thing")
So, on to the question...
I'm scaling data based on groups. I have it working well in a nice
loop. (This WORKS, but if someone has a faster/cleaner way, I'd be
curious.)
#group-wide normailzation
groups <- unique(rawdata$group)
group_names = grep('norm_',names(rawdata)) for(group in groups){
for(name in group_names){
rawdata[rawdata$code==group, name] <-
c(scale(rawdata[rawdata$code==group, name]))
}
}
My problem is that if the particular list of data I'm scoring is all 0,
then scale returns NaN for all of them, subsequently breaking my SVM
training.
>foo <- c(0,0,0,0,0)
>scale(foo)
[,1]
[1,] NaN
[2,] NaN
[3,] NaN
[4,] NaN
attr(,"scaled:center")
[1] 0
attr(,"scaled:scale")
[1] 0
I would have expected scale to just return back 0 for all the values.
Is there some trick to fixing this?
Thanks!
-Noah
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in this message
and any annex are purely those of the writer and may not be regarded as stating
an official position of INBO, as long as the message is not confirmed by a duly
signed document.