Displaying 20 results from an estimated 132 matches for "overfit".
2007 Oct 03
1
How to avoid overfitting in gam(mgcv)
Dear listers,
I'm using gam(from mgcv) for semi-parametric regression on small and
noisy datasets(10 to 200
observations), and facing a problem of overfitting.
According to the book(Simon N. Wood / Generalized Additive Models: An
Introduction with R), it is
suggested to avoid overfitting by inflating the effective degrees of
freedom in GCV evaluation with
increased "gamma" value(e.g. 1.4). But in my case, it didn't make a
significant c...
2004 Dec 22
2
GAM: Overfitting
I am analyzing particulate matter data (PM10) on a small data set (147
observations). I fitted a semi-parametric model and am worried about
overfitting. How can one check for model fit in GAM?
Jean G. Orelien
2008 Feb 16
2
Possible overfitting of a GAM
The subject is a Generalized Additive Model. Experts caution us against
overfitting the data, which can cause inaccurate results. I am not a
statistician (my background is in Computer Science). Perhaps some kind soul
would take a look and vet the model for overfitting the data.
The study estimated the ebb and flow of traffic through a voting place. Just
one voting place wa...
2017 Nov 21
0
Do I need to transform backtest returns before using pbo (probability of backtest overfitting) package functions?
...make it reproducible):
(Full vignette:
https://cran.r-project.org/web/packages/pbo/vignettes/pbo.html)
library(pbo)
#First, we assemble the trials into an NxT matrix where each column
#represents a trial and each trial has the same length T. This example
#is random data so the backtest should be overfit.`
set.seed(765)
n <- 100
t <- 2400
m <- data.frame(matrix(rnorm(n*t),nrow=t,ncol=n,
dimnames=list(1:t,1:n)), check.names=FALSE)
sr_base <- 0
mu_base <- sr_base/(252.0)
sigma_base <- 1.00/(252.0)**0.5
for ( i in 1:n ) {
m[,i] = m[,i] * sigma_base / sd(m[,...
2013 Jan 15
0
e1071 SVM, cross-validation and overfitting
I am accustomed to the LIBSVM package, which provides cross-validation
on training with the -v option
% svm-train -v 5 ...
This does 5 fold cross validation while building the model and avoids
over-fitting.
But I don't see how to accomplish that in the e1071 package. (I
learned that svm(... cross=5 ...) only _tests_ using cross-validation
-- it doesn't affect the training.) Can
2010 Apr 08
2
Overfitting/Calibration plots (Statistics question)
...dicted
response [Y hat] on the horizontal axis).
According to Frank Harrell's "Regression Modeling Strategies" book
(pp. 61-63), when making such a plot on new data (having obtained a
model from other data) we should expect the points to be around a line
with slope < 1, indicating overfitting. As he writes, "Typically, low
predictions will be too low and high predictions too high."
However, when I make these plots, both with real data and with simple
simulated data, I get the opposite: the points are scattered around a
line with slope >1. Low predictions are too high a...
2006 Feb 28
3
does svm have a CV to obtain the best "cost" parameter?
Hi all,
I am using the "svm" command in the e1071 package.
Does it have an automatic way of setting the "cost" parameter?
I changed a few values for the "cost" parameter but I hope there is a
systematic way of obtaining the best "cost" value.
I noticed that there is a "cross" (Cross validation) parameter in the "svm"
function.
But I
2010 Jun 29
1
Model validation and penalization with rms package
I?ve been using Frank Harrell?s rms package to do bootstrap model
validation. Is it the case that the optimum penalization may still
give a model which is substantially overfitted?
I calculated corrected R^2, optimism in R^2, and corrected slope for
various penalties for a simple example:
x1 <- rnorm(45)
x2 <- rnorm(45)
x3 <- rnorm(45)
y <- x1 + 2*x2 + rnorm(45,0,3)
ols0 <- ols(y ~ x1 + x2 + x3, x=TRUE, y=TRUE)
corrected.Rsq <- rep(0,60)
optimism.Rsq...
2017 Nov 21
0
Do I need to transform backtest returns before using pbo (probability of backtest overfitting) package functions?
...l vignette:
> https://cran.r-project.org/web/packages/pbo/vignettes/pbo.html)
>
> library(pbo)
> #First, we assemble the trials into an NxT matrix where each column
> #represents a trial and each trial has the same length T. This example
> #is random data so the backtest should be overfit.`
>
> set.seed(765)
> n <- 100
> t <- 2400
> m <- data.frame(matrix(rnorm(n*t),nrow=t,ncol=n,
> dimnames=list(1:t,1:n)), check.names=FALSE)
>
> sr_base <- 0
> mu_base <- sr_base/(252.0)
> sigma_base <- 1.00/(252.0)**0.5
> for...
2018 May 04
2
RFC: Are auto-generated assertions a good practice?
On Fri, May 4, 2018 at 10:16 AM Sanjay Patel <spatel at rotateright.com> wrote:
> I understand the overfit argument (but in most cases it just shows that a
> unit test isn't minimized)...
>
Even minimized tests sometimes need a few other things to setup the
circumstance (many DWARF tests, for example - produce the full DWARF
output, but maybe you only care about one part of it (maybe you care...
2010 Jul 14
1
question about SVM in e1071
Hi,
I have a question about the parameter C (cost) in svm function in e1071. I
thought larger C is prone to overfitting than smaller C, and hence leads to
more support vectors. However, using the Wisconsin breast cancer example on
the link:
http://planatscher.net/svmtut/svmtut.html
I found that the largest cost have fewest support vectors, which is contrary
to what I think. please see the scripts below:
Am I mis...
2017 Nov 21
2
Do I need to transform backtest returns before using pbo (probability of backtest overfitting) package functions?
...make it reproducible):
(Full vignette:
https://cran.r-project.org/web/packages/pbo/vignettes/pbo.html)
library(pbo)
#First, we assemble the trials into an NxT matrix where each column
#represents a trial and each trial has the same length T. This example
#is random data so the backtest should be overfit.`
set.seed(765)
n <- 100
t <- 2400
m <- data.frame(matrix(rnorm(n*t),nrow=t,ncol=n,
dimnames=list(1:t,1:n)), check.names=FALSE)
sr_base <- 0
mu_base <- sr_base/(252.0)
sigma_base <- 1.00/(252.0)**0.5
for ( i in 1:n ) {
m[,i] = m[,i] * sigma_base / sd(m[,...
2002 Mar 01
2
step, leaps, lasso, LSE or what?
...thods that are available for
selecting
variables in a regression without simply imposing my own bias (having "good
judgement"). The methods implimented in leaps and step and stepAIC seem to
fall into the general class of stepwise procedures. But these are commonly
condemmed for inducing overfitting.
In Hastie, Tibshirani and Friedman "The Elements of Statistical Learning"
chapter 3,
they describe a number of procedures that seem better. The use of
cross-validation
in the training stage presumably helps guard against overfitting. They seem
particularly favorable to shrinkage...
2013 Mar 01
2
solving x in a polynomial function
Hi there,
Does anyone know how I solve for x from a given y in a polynomial
function? Here's some example code:
##example file
a<-1:10
b<-c(1,2,2.5,3,3.5,4,6,7,7.5,8)
po.lm<-lm(a~b+I(b^2)+I(b^3)+I(b^4)); summary(po.lm)
(please ignore that the model is severely overfit- that's not the point).
Let's say I want to solve for the value b where a = 5.5.
Any thoughts? I did come across the polynom package, but I don't think
that does it- I suspect the answer is simpler than I am making it out
to be. Any help would be welcome.
--
Michael Rennie, Research...
2018 May 04
0
RFC: Are auto-generated assertions a good practice?
I understand the overfit argument (but in most cases it just shows that a
unit test isn't minimized)...but I don't see how the complete
auto-generated assertions could be worse at detecting a miscompile than
incomplete manually-generated assertions?
The whole point of auto-generating complete checks is to catch
mi...
2018 May 04
2
RFC: Are auto-generated assertions a good practice?
Yep - all about balance.
The main risk are tests that overfit (golden files being the worst case -
checking that the entire output matches /exactly/ - this is what FileCheck
is intended to help avoid) and maintainability. In the case of the
autogenerated FileCheck lines I've seen so far - they seem like they still
walk a fairly good line of checking exact...
2017 Nov 21
0
Do I need to transform backtest returns before using pbo (probability of backtest overfitting) package functions?
...org/web/packages/pbo/vignettes/pbo.html)
>>>
>>> library(pbo)
>>> #First, we assemble the trials into an NxT matrix where each column
>>> #represents a trial and each trial has the same length T. This example
>>> #is random data so the backtest should be overfit.`
>>>
>>> set.seed(765)
>>> n <- 100
>>> t <- 2400
>>> m <- data.frame(matrix(rnorm(n*t),nrow=t,ncol=n,
>>> dimnames=list(1:t,1:n)), check.names=FALSE)
>>>
>>> sr_base <- 0
>>> mu_bas...
2017 Nov 21
1
Do I need to transform backtest returns before using pbo (probability of backtest overfitting) package functions?
...vignettes/pbo.html)
>>>>
>>>> library(pbo)
>>>> #First, we assemble the trials into an NxT matrix where each column
>>>> #represents a trial and each trial has the same length T. This example
>>>> #is random data so the backtest should be overfit.`
>>>>
>>>> set.seed(765)
>>>> n <- 100
>>>> t <- 2400
>>>> m <- data.frame(matrix(rnorm(n*t),nrow=t,ncol=n,
>>>> dimnames=list(1:t,1:n)), check.names=FALSE)
>>>>
>>>> s...
2018 May 04
0
RFC: Are auto-generated assertions a good practice?
On Fri, May 4, 2018 at 11:30 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
> On Fri, May 4, 2018 at 10:16 AM Sanjay Patel <spatel at rotateright.com>
> wrote:
>
>> I understand the overfit argument (but in most cases it just shows that a
>> unit test isn't minimized)...
>>
>
> Even minimized tests sometimes need a few other things to setup the
> circumstance (many DWARF tests, for example - produce the full DWARF
> output, but maybe you only care about one...
2017 Nov 21
2
Do I need to transform backtest returns before using pbo (probability of backtest overfitting) package functions?
...ps://cran.r-project.org/web/packages/pbo/vignettes/pbo.html)
>>
>> library(pbo)
>> #First, we assemble the trials into an NxT matrix where each column
>> #represents a trial and each trial has the same length T. This example
>> #is random data so the backtest should be overfit.`
>>
>> set.seed(765)
>> n <- 100
>> t <- 2400
>> m <- data.frame(matrix(rnorm(n*t),nrow=t,ncol=n,
>> dimnames=list(1:t,1:n)), check.names=FALSE)
>>
>> sr_base <- 0
>> mu_base <- sr_base/(252.0)
>> sig...