thr3ads.net - R help - [R] Root mean square on binned GAM results [Jun 2010]

If this information is useful, please help other people find it:
Share via:

David Jarvis

2010-Jun-18 23:54 UTC

[R] Root mean square on binned GAM results

Hi,

Standard correlations (Pearson's, Spearman's, Kendall's Tau) do not
accurately reflect how closely the model (GAM) fits the data. I was told
that the accuracy of the correlation can be improved using a root mean
square deviation (RMSD) calculation on binned data.

For example, let 'o' be the real, observed data and 'm' be the
model data. I
believe I can calculate the root mean squared deviation as:

sqrt( mean( o - m ) ^ 2 )

However, this does not bin the data into mean sets. What I would like to do
is:

oangry <- c( mean(o[1:5]), mean(o[6:10]), ... )
mangry <- c( mean(m[1:5]), mean(m[6:10]), ... )

Then:

sqrt( mean( oangry - mangry ) ^ 2 )

That calculation I would like to simplify into (or similar to):

sqrt( mean( bin( o, 5 ) - bin( m, 5 ) ) ^ 2 )

I have read the help for ?cut, ?table, ?hist, and ?split, but am stumped for
which one to use in this case--if any.

How do you calculate c( mean(o[1:5]), mean(o[6:10]), ... ) for an arbitrary
length vector using an appropriate number of bins (fixed at 5, or perhaps
calculated using Sturges' formula)?

I have also posted a more detailed version of this question on
StackOverflow:

http://stackoverflow.com/questions/3073365/root-mean-square-deviation-on-binned-gam-results-using-r

Many thanks.

Dave

	[[alternative HTML version deleted]]

David Jarvis

2010-Jun-19 01:26 UTC

head link

[R] Root mean square on binned GAM results

Hi,

To calculate the mean of binned data from an arbitrary length vector
'd',
the following works:

d1 <- runif( 67,0,9 )

while( length(d1) %% 5 != 0 ) {
  d1 <- d1[-length(d1)]
}

dmean1 <- apply( matrix(d1, 5), 2, mean )

Unfortunately, this means dropping (two) data points from the end before I
can execute:

sqrt( mean( dmean1 - dmean2 ) ^ 2 )

Where 'dmean2' is an array of values constructed from running GAM on
'dmean1'.

What is a better way to do this?

Many thanks!
Dave

	[[alternative HTML version deleted]]

Joris Meys

2010-Jun-19 01:31 UTC

head link

[R] Root mean square on binned GAM results

Don't know about the correlations (never used them in a gam context
actually...), but you can "bin" the mean by :> x <- 1:100
> tapply(x,cut(x,5),mean)(0.901,20.7]  (20.7,40.6]  (40.6,60.4]  (60.4,80.3]   (80.3,100]
        10.5         30.5         50.5         70.5         90.5

Cheers
Joris

On Sat, Jun 19, 2010 at 1:54 AM, David Jarvis <thangalin at gmail.com>
wrote:> Hi,
>
> Standard correlations (Pearson's, Spearman's, Kendall's Tau) do
not
> accurately reflect how closely the model (GAM) fits the data. I was told
> that the accuracy of the correlation can be improved using a root mean
> square deviation (RMSD) calculation on binned data.
>
> For example, let 'o' be the real, observed data and 'm' be
the model data. I
> believe I can calculate the root mean squared deviation as:
>
> sqrt( mean( o - m ) ^ 2 )
>
> However, this does not bin the data into mean sets. What I would like to do
> is:
>
> oangry <- c( mean(o[1:5]), mean(o[6:10]), ... )
> mangry <- c( mean(m[1:5]), mean(m[6:10]), ... )
>
> Then:
>
> sqrt( mean( oangry - mangry ) ^ 2 )
>
> That calculation I would like to simplify into (or similar to):
>
> sqrt( mean( bin( o, 5 ) - bin( m, 5 ) ) ^ 2 )
>
> I have read the help for ?cut, ?table, ?hist, and ?split, but am stumped
for
> which one to use in this case--if any.
>
> How do you calculate c( mean(o[1:5]), mean(o[6:10]), ... ) for an arbitrary
> length vector using an appropriate number of bins (fixed at 5, or perhaps
> calculated using Sturges' formula)?
>
> I have also posted a more detailed version of this question on
> StackOverflow:
>
>
http://stackoverflow.com/questions/3073365/root-mean-square-deviation-on-binned-gam-results-using-r
>
> Many thanks.
>
> Dave
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

David Winsemius

2010-Jun-19 01:31 UTC

head link

[R] Root mean square on binned GAM results

On Jun 18, 2010, at 7:54 PM, David Jarvis wrote:
> Hi,
>
> Standard correlations (Pearson's, Spearman's, Kendall's Tau) do
not
> accurately reflect how closely the model (GAM) fits the data. I was  
> told
> that the accuracy of the correlation can be improved using a root mean
> square deviation (RMSD) calculation on binned data.
By whom? ...  and with what theoretical basis?
>
> For example, let 'o' be the real, observed data and 'm' be
the model
> data. I
> believe I can calculate the root mean squared deviation as:
>
> sqrt( mean( o - m ) ^ 2 )
>
> However, this does not bin the data into mean sets. What I would  
> like to do
> is:
>
> oangry <- c( mean(o[1:5]), mean(o[6:10]), ... )
> mangry <- c( mean(m[1:5]), mean(m[6:10]), ... )
>
> Then:
>
> sqrt( mean( oangry - mangry ) ^ 2 )
>
> That calculation I would like to simplify into (or similar to):
>
> sqrt( mean( bin( o, 5 ) - bin( m, 5 ) ) ^ 2 )
I doubt that your strategy offers any statistical advantage, but if  
you want to play around with it then consider:

binned.x <- round( (x + 2.5)/5)

-- 
David.>
> I have read the help for ?cut, ?table, ?hist, and ?split, but am  
> stumped for
> which one to use in this case--if any.
>
> How do you calculate c( mean(o[1:5]), mean(o[6:10]), ... ) for an  
> arbitrary
> length vector using an appropriate number of bins (fixed at 5, or  
> perhaps
> calculated using Sturges' formula)?
>
> I have also posted a more detailed version of this question on
> StackOverflow:
>
>
http://stackoverflow.com/questions/3073365/root-mean-square-deviation-on-binned-gam-results-using-r
>
> Many thanks.
>
> Dave
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Joris Meys

2010-Jun-19 23:27 UTC

head link

[R] Root mean square on binned GAM results

Just for the record, if you have NA's in it, you do :

tapply(d,cut(d,round(length(d)/5)),mean, na.rm=T)

tapply applies a function over a vector by groups defined by another
vector. In this case, it applies the function mean with the argument
na.rm=T over the vector d by the groups defined by the cut function.

cut splits a numeric vector in bins of equal size. In this case the
vector is d and the amount of bins is round(length(d)/5).

Cheers
Joris

On Sun, Jun 20, 2010 at 1:24 AM, Joris Meys <jorismeys at gmail.com>
wrote:> On Sat, Jun 19, 2010 at 4:29 AM, David Jarvis <thangalin at
gmail.com> wrote:
>> Hi, Joris.
>>
>> Thanks again; I don't get it. Reading the help pages for R reminds
me of
>> reading the manual pages for Unix: great for people who already know
what it
>> means.
>
> Just read them, from top to bottom, and take a look at the examples.
> If you scare away from them, forget about ever finding your way around
> R. Never skip the details, and just run the examples at the bottom.
> Then you can see what's going on, it often clarifies things a whole
> lot.
>
>>
>> I can see how cut is dividing the data into 14 rows, and I can take the
>> factor results from cut:
>>
>> tapply(d,cut(d,round(length(d)/5)),mean)
>>
>> But the results are ... well, negative?
>
> That is explained in the help file. The left side is not included in
> the interval - (0,1] is equivalent to ]0,1]. To include the extreme
> values, the lower limit is extended with 0.1% of the range.
>
>>
>>> tapply(d,cut(d,round(length(d)/5)),mean)
>> (-0.009,0.685]?? (0.685,1.38]??? (1.38,2.07]??? (2.07,2.77]???
(2.77,3.46]
>> ???????????? 0????????????? 1????????????? 2????????????
NA????????????? 3
>> ?? (3.46,4.15]??? (4.15,4.85]??? (4.85,5.54]??? (5.54,6.23]???
(6.23,6.93]
>> ???????????? 4???????????? NA????????????? 5????????????? 6????????????
NA
>> ?? (6.93,7.62]??? (7.62,8.32]??? (8.32,9.01]
>> ???????????? 7????????????? 8????????????? 9
>>
>> I don't see how rounding up with ceiling would apply.
>
> well : 67/5 = 13,4. Round gives 13 bins, ceiling gives 14 bins. It's a
> matter of choice.
>
>>
>> I appreciate your patience; I think this might be beyond my capacity to
>> understand.
>
> You ain't stupid. Lazy maybe, but definitely not stupid ;)
>
> Cheers
> Joris
>
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
>
> tel : +32 9 264 59 87
> Joris.Meys at Ugent.be
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>


-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Jun 2010 - Root mean square on binned GAM results

[R] Root mean square on binned GAM results

[R] Root mean square on binned GAM results

[R] Root mean square on binned GAM results

[R] Root mean square on binned GAM results

[R] Root mean square on binned GAM results

Seemingly Similar Threads