Displaying 20 results from an estimated 4000 matches similar to: "Regression with very high number of categorical variables"
2011 May 13
6
Powerful PC to run R
Dear all,
I'm currently running R on my laptop -- a Lenovo Thinkpad X201 (Intel Core
i7 CPU, M620, 2.67 Ghz, 8 GB RAM). The problem is that some of my
calculations run for several days sometimes even weeks (mainly simulations
over a large parameter space). Depending on the external conditions, my
laptop sometimes shuts down due to overheating.
I'm now thinking about buying a more
2011 Apr 12
2
Testing equality of coefficients in coxph model
Dear all,
I'm running a coxph model of the form:
coxph(Surv(Start, End, Death.ID) ~ x1 + x2 + a1 + a2 + a3)
Within this model, I would like to compare the influence of x1 and x2 on the
hazard rate.
Specifically I am interested in testing whether the estimated coefficient
for x1 is equal (or not) to the estimated coefficient for x2.
I was thinking of using a Chow-test for this but the Chow
2016 Apr 16
1
Social Network Simulation
Dear all,
I am trying to simulate a series of networks that have characteristics
similar to real life social networks. Specifically I am interested in
networks that have (a) a reasonable degree of clustering (as measured by
the transitivity function in igraph) and (b) a reasonable degree of degree
polarization (as measured by the average degree of the top 10% nodes with
highest degree divided by
2011 Mar 26
1
Effect size in multiple regression
Dear all,
is there a convenient way to determine the effect size for a regression
coefficient in a multiple regression model?
I have a model of the form lm(y ~ A*B*C*D) and would like to determine
Cohen's f2 (http://en.wikipedia.org/wiki/Effect_size) for each predictor
without having to do it manually.
Thanks,
Michael
Michael Haenlein
Associate Professor of Marketing
ESCP Europe
Paris,
2013 Jan 22
2
Approximating discrete distribution by continuous distribution
Dear all,
I have a discrete distribution showing how age is distributed across a
population using a certain set of bands:
Age <- matrix(c(74045062, 71978405, 122718362, 40489415), ncol=1,
dimnames=list(c("<18", "18-34", "35-64", "65+"),c()))
Age_dist <- Age/sum(Age)
For example I know that 23.94% of all people are between 0-18 years, 23.28%
2012 May 29
2
Wilcoxon-Mann-Whitney U value: outcomes from different stat packages
Given this example
#start code
a<-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940,
760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430)
b<-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90,
3220,490,20790,290,740,5350,940,3910,0,640,850,260)
wilcox.test(a, b, paired=FALSE)
#sum of rank for first sample
sum.rank.a <-
2010 Nov 11
2
predict.coxph and predict.survreg
Dear all,
I'm struggling with predicting "expected time until death" for a coxph and
survreg model.
I have two datasets. Dataset 1 includes a certain number of people for which
I know a vector of covariates (age, gender, etc.) and their event times
(i.e., I know whether they have died and when if death occurred prior to the
end of the observation period). Dataset 2 includes another
2010 Jul 14
1
Printing status updates in while-loop
Dear all,
I'm using a while loop in the context of an iterative optimization
procedure. Within my while loop I have a counter variable that helps me to
determine how long the loop has been running. Before the loop I initialize
it as counter <- 0 and the last condition within my loop is counter <-
counter + 1.
I'd like to print out the current status of "counter" while the
2011 Sep 21
2
Cannot allocate vector of size x
Dear all,
I am running a simulation in which I randomly generate a series of vectors
to test whether they fulfill a certain condition. In most cases, there is no
problem. But from time to time, the (randomly) generated vectors are too
large for my system and I get the error message: "Cannot allocate vector of
size x".
The problem is that in those cases my simulation stops and I have to
2010 Sep 08
1
Aggregating data from two data frames
Dear all,
I'm working with two data frames.
The first frame (agg_data) consists of two columns. agg_data[,1] is a unique
ID for each row and agg_data[,2] contains a continuous variable.
The second data frame (geo_data) consists of several columns. One of these
columns (geo_data$ZCTA) corresponds to the unique ID in the first data
frame. The problem is that only a subset of the unique ID
2011 Jul 15
2
Convert continuous variable into discrete variable
Dear all,
I have a continuous variable that can take on values between 0 and 100, for
example: x<-runif(100,0,100)
I also have a second variable that defines a series of thresholds, for
example: y<-c(3, 4.5, 6, 8)
I would like to convert my continuous variable into a discrete one using the
threshold variables:
If x is between 0 and 3 the discrete variable should be 1
If x is between 3
2011 Sep 19
1
Binary optimization problem in R
Dear all,
I would like to solve a problem similar to a multiple knapsack problem and
am looking for a function in R that can help me.
Specifically, my situation is as follows: I have a list of n items which I
would like to allocate to m groups with fixed size. Each item has a certain
profit value and this profit depends on the type of group the item is in. My
problem is to allocate the items
2010 Jul 25
1
Equivalent to go-to statement
Dear all,
I'm working with a code that consists of two parts: In Part 1 I'm generating
a random graph using the igraph library (which represents the relationships
between different nodes) and a vector (which represents a certain
characteristic for each node):
library(igraph)
g <- watts.strogatz.game(1,100,5,0.05)
z <- rlnorm(100,0,1)
In Part 2 I'm iteratively changing the
2012 Apr 12
2
Curve fitting, probably splines
Dear all,
This is probably more related to statistics than to [R] but I hope someone
can give me an idea how to solve it nevertheless:
Assume I have a variable y that is a function of x: y=f(x). I know the
average value of y for different intervals of x. For example, I know that
in the interval[0;x1] the average y is y1, in the interval [x1;x2] the
average y is y2 and so forth.
I would like to
2017 Dec 14
1
Aggregation across two variables in data.table
Dear all,
I have a data.frame that includes a series of demographic variables for a
set of respondents plus a dependent variable (Theta). For example:
Age Education Marital Familysize
Income Housing Theta
1: 50 Associate degree Divorced 4
70K+ Owned with mortgage 9.147777
2: 65
2010 Nov 19
2
question about constraint minimization
Hi,
I am a beginner of R. There is a question about constraint minimization. A
function, y=f(x1,x2,x3....x12), needs to be minimized. There are 3
requirements for the minimization:
(1) x2+x3+...+x12=1.5 (x1 is excluded);
(2) x1=x3=x4;
(3) x1, x3 and x5 are in the range of -1~0, respectively. The rest variables
(x2, x4, x6, x7, ...., x12) are in the range of 0~1, respectively.
The
2012 Nov 15
1
Stepwise regression scope: all interacting terms (.^2)
Dear Gurus,
Thank you in advance for your assistance. I'm trying to understand scope better when performing stepwise regression using "step." I have a model with a binary response variable and 10 predictor variables. When I perform stepwise regression I define scope=.^2 to allow interactions between all terms. But I am missing something. When I perform stepwise regression (both
2012 Nov 15
1
Step-wise method for large dimension
Hi ,
I want to apply the following code fo my data with 400 predictors.
I was wondering if there ia an alternative way instead of typing 400 predictors for the following code.
I really appreciate your help.
fit0<-lm(Y~1, data= mydata)
fit.final<- lm(Y~X1+X2+X3+.....+X400, data=mydata) ???
step(fit0, scope=list(lower=fit0, upper=fit.final), data=mydata, direction="forward")
2011 Jun 18
1
Applying function to all elements of a formula
Hi,
I would like to do a regression like:
reg <- lm(y~log(.), data)
where the log function is applied to "." in the form:
log(x1)+ log(x2)+ log(x3)...
instead of in the form
log(x1+x2+x3+...)
Is this possible?
Thank you,
Scott
[[alternative HTML version deleted]]
2009 Aug 31
3
Two way joining vs heatmap
Hi
STATISTICA has a function called "Two-way joining" (see
http://www.statsoft.com/TEXTBOOK/stcluan.html#twotwo) and the
reference material states that this is based on the method as
published by Hartigan (found this paper:
http://www.jstor.org/pss/2284710 through wikipedia).
What is the relationship (if any) between the "heatmap" function in R
and this technique? Is there an