thr3ads.net - similar to: "Rpart -- using predict() when missing data is present?"

Displaying 20 results from an estimated 1000 matches similar to: "Rpart -- using predict() when missing data is present?"

How consistent is predict() syntax?

2007 Apr 13

How consistent is predict() syntax?

I have a situation where lagged values of a time-series are used to predict future values. I have packed together the time-series and the lagged values into a data frame: > str(D) 'data.frame': 191 obs. of 13 variables: $ y : num -0.21 -2.28 -2.71 2.26 -1.11 1.71 2.63 -0.45 -0.11 4.79 ... $ y.l1 : num NA -0.21 -2.28 -2.71 2.26 -1.11 1.71 2.63 -0.45 -0.11 ... $ y.l2 : num

In-sample / Out-of-sample using R

2004 Apr 13

In-sample / Out-of-sample using R

I'm trying to learn how to use R to: * Make a random partition of a data frame between in-sample and out-of-sample * Estimate a model (e.g. lm()) for the in-sample * Make predictions for all observations * Compare the in-sample error sigma against the out-of-sample error sigma. I came up with the following code. I think it's okay, but I can't help feeling this is

Puzzled at rpart prediction

2005 Aug 04

Puzzled at rpart prediction

I'm in a situation where I say: > predict(m.rpart, newdata=D[N1+t,]) 0 1 173 0.8 0.2 which I interpret as meaning: an 80% chance of "0" and a 20% chance of "1". Okay. This is consistent with: > predict(m.rpart, newdata=D[N1+t,], type="class") [1] 0 Levels: 0 1 But I'm puzzled at the following. If I say: > predict(m.rpart,

Interleaving elements of two vectors?

2006 Mar 06

Interleaving elements of two vectors?

Suppose one has x <- c(1, 2, 7, 9, 14) y <- c(71, 72, 77) How would one write an R function which alternates between elements of one vector and the next? In other words, one wants z <- c(x[1], y[1], x[2], y[2], x[3], y[3], x[4], y[4], x[5], y[5]) I couldn't think of a clever and general way to write this. I am aware of gdata::interleave() but it deals

Prediction when using orthogonal polynomials in regression

2006 Jan 26

Prediction when using orthogonal polynomials in regression

Folks, I'm doing fine with using orthogonal polynomials in a regression context: # We will deal with noisy data from the d.g.p. y = sin(x) + e x <- seq(0, 3.141592654, length.out=20) y <- sin(x) + 0.1*rnorm(10) d <- lm(y ~ poly(x, 4)) plot(x, y, type="l"); lines(x, d$fitted.values, col="blue") # Fits great! all.equal(as.numeric(d$coefficients[1] + m

R and MLE

2005 Jun 07

R and MLE

I learned R & MLE in the last few days. It is great! I wrote up my explorations as http://www.mayin.org/ajayshah/KB/R/mle/mle.html I will be most happy if R gurus will look at this and comment on how it can be improved. I have a few specific questions: * Should one use optim() or should one use stats4::mle()? I felt that mle() wasn't adding much value compared with optim, and

Puzzled at ifelse()

2005 Jul 12

Puzzled at ifelse()

I have a situation where this is fine: > if (length(x)>15) { clever <- rr.ATM(x, maxtrim=7) } else { clever <- rr.ATM(x) } > clever $ATM [1] 1848.929 $sigma [1] 1.613415 $trim [1] 0 $lo [1] 1845.714 $hi [1] 1852.143 But this variant, using ifelse(), breaks: > clever <- ifelse(length(x)>15, rr.ATM(x, maxtrim=7), rr.ATM(x))

R commandline editor question

2005 May 27

R commandline editor question

I am using R 2.1 on Apple OS X. When I get the ">" prompt, I find it works well with emacs commandline editing. Keys like M-f C-k etc. work fine. The one thing that I really yearn for, which is missing, is bracket matching When I am doing something which ends in )))) it is really useful to have emacs or vi-style bracket matching, so as to be able to visually keep track of whether I

Need a factor level even though there are no observations

2005 May 08

Need a factor level even though there are no observations

I'm in this situation: factorlabels <- c("School", "College", "Beyond") with data for 8 families: education.man <- c(1,2,1,2,1,2,1,2) # Note : no "3" values education.wife <- c(1,2,3,1,2,3,1,2) # 1,2,3 are all present. My goal is to create this table: School College Beyond

Problem with get.hist.quote() in tseries

2005 Aug 19

Problem with get.hist.quote() in tseries

When using get.hist.quote(), I find the dates are broken. This is with R 2.1.1 on Mac OS X `panther'. > library(tseries) Loading required package: quadprog 'tseries' version: 0.9-27 'tseries' is a package for time series analysis and computational finance. See 'library(help="tseries")' for details. > x <-

Catching an error with lm()

2005 May 24

Catching an error with lm()

Folks, I'm in a situation where I do a few thousand regressions, and some of them are bad data. How do I get back an error value (return code such as NULL) from lm(), instead of an error _message_? Here's an example: > x <- c(NA, 3, 4) > y <- c(2, NA, NA) > d <- lm(y ~ x) Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases

Puzzled in utilising summary.lm() to obtain Var(x)

2005 Jun 14

Puzzled in utilising summary.lm() to obtain Var(x)

I have a program which is doing a few thousand runs of lm(). Suppose it is a simple model y = a + bx1 + cx2 + e I have the R object "d" where d <- summary(lm(y ~ x1 + x2)) I would like to obtain Var(x2) out of "d". How might I do it? I can, of course, always do sd(x2). But it would be much more convenient if I could snoop around the contents of summary.lm and

Extracting some rows from a data frame - lapses into a vector

2005 Aug 16

Extracting some rows from a data frame - lapses into a vector

I have a data frame with one column "x": > str(data) `data.frame': 20 obs. of 1 variable: $ x: num 0.0495 0.0986 0.9662 0.7501 0.8621 ... Normally, I know that the notation dataframe[indexes,] gives you a new data frame which is the specified set of rows. But I find: > str(data[1:10,]) num [1:10] 0.0495 0.0986 0.9662 0.7501 0.8621 ... Here, it looks like the operation

Question on lm(): When does R-squared come out as NA?

2005 Sep 25

Question on lm(): When does R-squared come out as NA?

I have a situation with a large dataset (3000+ observations), where I'm doing lags as regressors, where I get: Call: lm(formula = rj ~ rM + rM.1 + rM.2 + rM.3 + rM.4) Residuals: 1990-06-04 1994-11-14 1998-08-21 2002-03-13 2005-09-15 -5.64672 -0.59596 -0.04143 0.55412 8.18229 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.003297 0.017603

Presentation of multiple models in one table using xtable

2006 Aug 14

Presentation of multiple models in one table using xtable

Consider this situation: > x1 <- runif(100); x2 <- runif(100); y <- 2 + 3*x1 - 4*x2 + rnorm(100) > m1 <- summary(lm(y ~ x1)) > m2 <- summary(lm(y ~ x2)) > m3 <- summary(lm(y ~ x1 + x2)) Now you have estimated 3 different "competing" models, and suppose you want to present the set of models in one table. xtable(m1) is cool, but doing that thrice would give

The math underlying the `betareg' package?

2007 Jan 18

The math underlying the `betareg' package?

Folks, The betareg package appears to be polished and works well. But I would like to look at the exact formulas for the underlying model being estimated, the likelihood function, etc. E.g. if one has to compute \frac{\partial E(y)}{\partial x_i}, this requires careful calculations through these formulas. I read "Regression analysis of variates observed on (0,1): percentages, proportions and

update.packages() is broken?

2005 Aug 26

update.packages() is broken?

Folks, I am using R 2.1.1 on Apple OS X 10.3. Earlier, I used to say $ sudo R > update.packages() and all the packages used to get installed. For several weeks, I noticed that nothing has been coming through. I used the R-for-Mac graphics console and I find that there are many packages where new versions have come out which I don't have. Is something wrong with update.packages()? I

A performance anomaly

2005 Jun 06

A performance anomaly

I wrote a simple log likelihood (for the ordinary least squares (OLS) model), in two ways. The first works out the likelihood. The second merely calls the first, but after transforming the variance parameter, so as to allow an unconstrained maximisation. So the second suffers a slight cost for one exp() and then it pays the cost of calling the first. I did performance measurement. One would

Placing axes label strings closer to the graph?

2005 Oct 01

Placing axes label strings closer to the graph?

Folks, I have placed an example of a self-contained R program later in this mail. It generates a file inflation.pdf. When I stare at the picture, I see the "X label string" and "Y label string" sitting lonely and far away from the axes. How can these distances be adjusted? I read ?par and didn't find this directly. I want to hang on to 2.8 x 2.8 inches as the overall size

Tobit estimation?

2006 Jan 19

Tobit estimation?

Folks, Based on http://www.biostat.wustl.edu/archives/html/s-news/1999-06/msg00125.html I thought I should experiment with using survreg() to estimate tobit models. I start by simulating a data frame with 100 observations from a tobit model > x1 <- runif(100) > x2 <- runif(100)*3 > ystar <- 2 + 3*x1 - 4*x2 + rnorm(100)*2 > y <- ystar > censored <- ystar <= 0

similar to: Rpart -- using predict() when missing data is present?