Displaying 20 results from an estimated 140 matches similar to: "tree model with at most one split point per variable"
2017 Jul 28
0
problem with "unique" function
Most likely, previous computations have ended up giving slightly different values of say 0.13333. A pragmatic way out is to round to, say, 5 digits before applying unique. In this particular case, it seems like all numbers are multiples of 1/30, so another idea could be to multiply by 30, round, and divide by 30.
-pd
> On 28 Jul 2017, at 17:17 , li li <hannah.hlx at gmail.com> wrote:
2017 Jul 28
3
problem with "unique" function
I have the joint distribution of three discrete random variables z1, z2 and
z3 which is captured by "z"
and "prob" as described below.
For example, the probability for z1=0.46667, z2=-1 and z3=-1 is 2.752e-13.
Also, the probability adds up to 1.
> head(z) z1 z2 z3
[1,] -0.46667 -1.0000 -1.0000
[2,] -0.33333 -0.9333 -0.9333
[3,] -0.20000 -0.8667 -0.8667
2010 Dec 13
1
How does R compute sums of squares?
Consider the following missing data problem:
y = c(1, 2, 2, 2, 3)
a = factor(c(1, 1, 1, 2, 2))
b = factor(c(1, 2, 3, 1, 2))
fit = lm(y ~ a + b)
anova(fit)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
a 1 0.83333 0.83333 1.3637e+33 < 2.2e-16 ***
b 2 1.16667 0.58333 9.5461e+32 < 2.2e-16 ***
Residuals 1 0.00000 0.00000
---
2012 Sep 29
1
Unexpected behavior with weights in binomial glm()
Hi useRs,
I'm experiencing something quite weird with glm() and weights, and
maybe someone can explain what I'm doing wrong. I have a dataset
where each row represents a single case, and I run
glm(...,family="binomial") and get my coefficients. However, some of
my cases have the exact same values for predictor variables, so I
should be able to aggregate up my data frame and
2012 Oct 08
1
arima.sim
Hi,
I have been using arima.sim from the stats package recently, and I'm
wondering why I get different results when using what seem to be the
same parameters. For example, I've given examples of three different
ways to run arima.sim with what I believe are the same parameters.
It's my understanding from the R documentation that rnorm is the
default function for rand.gen if not
2008 Apr 06
1
row by row similarity
Hello all and thanks in advance for any advice.
I am very new to R and have searched my question but have not come up with
anything quite like what I would like to do.
My problem is:
I have a data set for individuals (rows) and values for behaviours
(columns). I would like to know the proportion of shared behaviours for all
possible pairs of individuals. The sum of shared behaviours divided by
2006 Nov 13
0
Confidence intervals for relative risk
Wolfgang,
It is common to handle relative risk problems using Poisson regression.
In your example you have 8 events out of 508 tries, and 0/500 in the second
data set.
> tdata <- data.frame(y=c(8,0), n=c(508,500), group=1:0)
> fit <- glm(y ~ group + offset(log(n)), data=tdata, family=poisson)
Because of the zero, the standard beta/se(beta) confidence intervals don't
work.
2006 Jan 04
1
Difficulty with 'merge'
Dear R-helpers,
Happy New Year to all the helpful members of the list.
Here is the behavior I'm looking for:
> v1 <- c("a","b","c")
> n1 <- c(0, 1, 2)
> v2 <- c("c", "a", "b")
> n2 <- c(0, 1 , 2)
> (f1 <- data.frame(v1, n1))
v1 n1
1 a 0
2 b 1
3 c 2
> (f2 <- data.frame(v2, n2))
2009 Dec 21
3
Signif. codes
My question is about the "Signif. codes" and the p-value, specifically, the
output when I run
summary(nameofregression.lm)
So you get this little key:
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
And on a regression I ran, next to the intercept data, I get '***'
Coefficients:
>
> Estimate Std. Error t value Pr(>|t|)
>
>
2005 Feb 22
2
ERROR NaNs produced; when comparing two logistic regression models with the ANOVA CHI test
Dear R-list,
*When comparing two logistic regression models with the anova CHi test, I
obtain the following error: (there are no NA's in the time series). How can
this be solved such that I can compare two models on the same dataset were
different explanatory variables are used?
l.KBDI <- glm(zna.arson2 ~ zna.KBDI,family = binomial)
l.NDWI <- glm(zna.arson2 ~ zna.NDWI,family
2010 May 11
1
how to extract the variables used in decision tree
HI, Dear R community,
How to extract the variables actually used in tree construction? I want to
extract these variables and combine other variable as my features in next
step model building.
> printcp(fit.dimer)
Classification tree:
rpart(formula = outcome ~ ., data = p_df, method = "class")
Variables actually used in tree construction:
[1] CT DP DY FC NE NW QT SK TA WC WD WG WW
2012 Jul 04
1
Error in hclust?
Dear R users,
I have noted a difference in the merge distances given by hclust using
centroid method.
For the following data:
x<-c(1009.9,1012.5,1011.1,1011.8,1009.3,1010.6)
and using Euclidean distance, hclust using centroid method gives the
following results:
> x.dist<-dist(x)
> x.aah<-hclust(x.dist,method="centroid")
> x.aah$merge
[,1] [,2]
[1,] -3 -6
2007 Mar 02
4
significant anova but no distinct groups ?
Dear all,
I am studying a dataset using the aov() function.
The independant variable 'cds' is a factor() with 8 levels and here is
the result in studying the dependant variable 'rta' with aov() :
> summary(aov(rta ~ cds))
Df Sum Sq Mean Sq F value Pr(>F)
cds 7 0.34713 0.04959 2.3807 0.02777
Residuals 92 1.91635 0.02083
The dependant variable
2012 Sep 07
7
Producing a table with mean values
Hi All,
I have a data set wit three size classes (pico, nano and micro) and 12
different sites (Seamounts). I want to produce a table with the mean and
standard deviation values for each site.
Seamount Pico Nano Micro Total_Ch
1 Off_Mount 1 0.0691 0.24200 0.00100 0.31210
2 Off_Mount 1 0.0938 0.00521 0.02060 0.11961
3 Off_Mount 1 0.1130 0.20000 0.06620 0.37920
4 Off_Mount 1
2024 Feb 27
4
converting MATLAB -> R | element-wise operation
So, trying to convert a very long, somewhat technical bit of lin alg
MATLAB code to R. Most of it working, but raninto a stumbling block that
is probaably simple enough for someone to explain.
Basically, trying to 'line up' MATLAB results from an element-wise
division of a matrix by a vector with R output.
Here is a simplified version of the MATLAB code I'm translating:
NN = [1,
2024 Feb 27
2
[External] converting MATLAB -> R | element-wise operation
> t(t(NN)/lambda)
[,1] [,2] [,3]
[1,] 0.5 0.6666667 0.75
[2,] 2.0 1.6666667 1.50
>
R matrices are column-based. MATLAB matrices are row-based.
> On Feb 27, 2024, at 14:54, Evan Cooch <evan.cooch at gmail.com> wrote:
>
> So, trying to convert a very long, somewhat technical bit of lin alg
> MATLAB code to R. Most of it working, but raninto a stumbling block
2007 Aug 28
0
help with aggregate(): tables of means for terms in an mlm
I'm trying to extend some work in the car and heplots packages
that requires getting a table of multivariate means for one
(or later, more) terms in an mlm object. I can do this for
concrete examples, using aggregate(), but can't figure out how to
generalize it. I want to return a result that has the factor-level
combinations as rownames, and the means as the body of the table
2012 Jul 06
2
Anova Type II and Contrasts
the study design of the data I have to analyse is simple. There is 1 control group (CTRL) and 2 different treatment groups (TREAT_1 and TREAT_2).
The data also includes 2 covariates COV1 and COV2. I have been asked to check if there is a linear or quadratic treatment effect in the data.
I created a dummy data set to explain my situation:
df1 <- data.frame(
Observation =
2010 Nov 06
0
variable type assignment in daisy
Dear Rhelp,
I did a daisy on 5 lifestyle variables, 3 of which were nominal and 2 were ordinal and assigned types “nominal” and “ordinal” for the variables, respectively. I got an output indicating their types as “I” for interval(?). Doing it on the Rdata example “flower” gave the same types in the output as the types they were assigned to. Why is this so? Below are the codes and outputs.
2009 Jun 14
1
time function behavior for ts class objects
Hi all-
I am trying to use the time function for ts class objects and do not
understand the return value. I want to use it to set up a time trend in
arima fits. It does not seem to return a correct linear sequence that
matches the underlying time series. I am running:
R version 2.8.1 (2008-12-22).
For example:
R> ## create a time series
R> x <- rnorm(24)
R> (xts <-