DS
2008-Jul-07 23:56 UTC
[R] question on lm or glm matrix of coeficients X test data terms
Hi, is there an easy way to get the calculated weights in a regression equation? for e.g. if my model has 2 variables 1 and 2 with coefficient .05 and .6 how can I get the computed values for a test dataset for each coefficient? data var1,var2 10,100 so I want to get .5, 60 back in a vector. This is a one row example but I would want to get a matrix of multiplied out coefficients and terms for use in comparing contribution of variables to final score. As in a scorecard using logistic regression. Please advise. thanks Dhruv
Jorge Ivan Velez
2008-Jul-08 00:08 UTC
[R] question on lm or glm matrix of coeficients X test data terms
Dear Dhruv,
Try this:
# data set
set.seed(123)
X=matrix(rpois(10,10),ncol=2)
X
[,1] [,2]
[1,] 8 15
[2,] 9 11
[3,] 14 5
[4,] 10 4
[5,] 10 13
# outcome
t(apply(X,1,function(x,betas){
if(length(x)!=length(betas)) stop("x and betas are of different
length!")
y=x*betas
y
},betas=c(0.05,0.6)))
[,1] [,2]
[1,] 0.40 9.0
[2,] 0.45 6.6
[3,] 0.70 3.0
[4,] 0.50 2.4
[5,] 0.50 7.8
HTH,
Jorge
On Mon, Jul 7, 2008 at 7:56 PM, DS <ds5j@excite.com> wrote:
>
> Hi,
>
> is there an easy way to get the calculated weights in a regression
> equation?
>
>
>
> for e.g.
>
> if my model has 2 variables 1 and 2 with coefficient .05 and .6
>
> how can I get the computed values for a test dataset for each coefficient?
>
> data
>
> var1,var2
>
> 10,100
>
>
>
> so I want to get .5, 60 back in a vector. This is a one row example but I
> would want to get a matrix of multiplied out coefficients and terms for use
> in comparing contribution of variables to final score. As in a scorecard
> using logistic regression.
>
>
>
> Please advise.
>
> thanks
>
> Dhruv
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
DS
2008-Jul-08 00:57 UTC
[R] question on lm or glm matrix of coeficients X test data terms
thanks Jorge. I appreciate your quick help.
Will this work if I have 20 columns of data but my regression only has 5
variables?
I am looking for something generic where I can give it my model and test data
and get back a vector of the multiplied coefficients (with no hard coding).
When predict is called with an input model and data, R must be multiplying all
co-efficients times variables and summing the number but is there a way to get
components of the regressiom terms stored in a matrix before they are added?
The idea is to build n models with various terms and after producing a
prediction list the top 3 variables that had the biggest impact in that
particular set of predictor values.
e.g. if I build a model to predict default of loans I would then need to list
the top factors in the model that can be used to explain why the loan is risky.
With 10-16 variables which can be present or not for each case there be a
different 2 or 3 variables that led to the said prediction.
Dhruv
--- On Mon 07/07, Jorge Ivan Velez < jorgeivanvelez at gmail.com > wrote:
From: Jorge Ivan Velez [mailto: jorgeivanvelez at gmail.com]
To: ds5j at excite.com
Date: Mon, 7 Jul 2008 20:12:53 -0400
Subject: Re: [R] question on lm or glm matrix of coeficients X test data terms
Dear Dhruv,Try also:# data setset.seed(123)X=matrix(rpois(10,10),ncol=2)#
Function to estimate your
outcomeoutcome=function(x,betas){if(length(x)!=length(betas)) stop("x and
betas are of different length!")
y=x*betasy}# outcome for beta1=0.05 and
beta2=0.6t(apply(X,1,outcome,betas=c(0.05,0.6)))# outcome for beta1=5 and
beta2=6
t(apply(X,1,outcome,betas=c(5,6)))
HTH,JorgeOn Mon, Jul 7, 2008 at 7:56 PM, DS <ds5j at excite.com> wrote:
Hi,
is there an easy way to get the calculated weights in a regression equation?
for e.g.
if my model has 2 variables 1 and 2 with coefficient .05 and .6
how can I get the computed values for a test dataset for each coefficient?
data
var1,var2
10,100
so I want to get .5, 60 back in a vector. This is a one row example but I would
want to get a matrix of multiplied out coefficients and terms for use in
comparing contribution of variables to final score. As in a scorecard using
logistic regression.
Please advise.
thanks
Dhruv
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
DS
2008-Jul-08 02:04 UTC
[R] question on lm or glm matrix of coeficients X test data terms
thanks Jorge. I appreciate your multiple improvements.
This still involves hard coding the co-efficients. I wonder if this is what glm
and lm are doing.
for e.g.
m<-lm(K~a+b,data=data)
m$coefficients would have 0 for all variables except a and b and then R must be
multiplying the weights the same way as your function.
I will try to use your code with the coefficients matrix from the model and see
if that works and report back what I find tomorrow.
Then if I can add code to return the names of the columns with the resulting
highest 3 values of the numbers then I should be done.
thanks a lot Jorge.
regards,
Dhruv
--- On Mon 07/07, Jorge Ivan Velez < jorgeivanvelez at gmail.com > wrote:
From: Jorge Ivan Velez [mailto: jorgeivanvelez at gmail.com]
To: ds5j at excite.com
Date: Mon, 7 Jul 2008 21:42:54 -0400
Subject: Re: [R] question on lm or glm matrix of coeficients X test data terms
That's R: you come out with solutions every time. I hope don't bother
you with this. Try also:# data set (10 rows, 10
columns)set.seed(123)X=matrix(rpois(100,10),ncol=10)# Function to estimate your
outcome
outcome=function(x,betas){if(length(x)!=length(betas)) stop("x and beta
have different lengths!")y=x*betassum(y)}# let's assume that you want
to include x1, x4, x7 and x9 only# by using beta1=0.5, beta4=0.6, beta7=-0.1,
beta9=0.3
betas=c(0.5,0,0,0.6,0,0,-0.1,0,0.3,0)# Resultsapply(X,1,outcome,
betas=betas)HTH,JorgeOn Mon, Jul 7, 2008 at 9:31 PM, Jorge Ivan Velez
<jorgeivanvelez at gmail.com> wrote:
Sorry, I forgot to the the sum over the rows:# data set (10 rows, 10 columns)
set.seed(123)X=matrix(rpois(100,10),ncol=10)# Function to estimate your
outcomeoutcome=function(x,betas){if(length(x)!=length(betas)) stop("x and
beta have different lengths!")
y=x*betasy}# let's assume that you want to include x1, x4, x7 and x9 only#
by using beta1=0.5, beta4=0.6, beta7=-0.1,
beta9=0.3betas=c(0.5,0,0,0.6,0,0,-0.1,0,0.3,0)
# Resultsapply(t(apply(X,1,outcome, betas=betas)),1,sum)
HTH,JorgeOn Mon, Jul 7, 2008 at 9:23 PM, Jorge Ivan Velez <jorgeivanvelez at
gmail.com> wrote:
Dear Dhruv,It's me again. I've been thinking about a little bit. If you
want to include/exclude variables to estimate your outcome, you could try
something like this:# data set (10 rows, 10 columns)
set.seed(123)X=matrix(rpois(100,10),ncol=10)# Function to estimate your
outcomeoutcome=function(x,betas){if(length(x)!=length(betas)) stop("x and
beta have different lengths!")
y=x*betasy}# let's assume that you want to include x1, x4, x7 and x9 only#
by using beta1=0.5, beta4=0.6, beta7=-0.1,
beta9=0.3betas=c(0.5,0,0,0.6,0,0,-0.1,0,0.3,0)# Resultst(apply(X,1,outcome,
betas=betas))
HTH,JorgeOn Mon, Jul 7, 2008 at 9:11 PM, Jorge Ivan Velez <jorgeivanvelez at
gmail.com> wrote:
Dear Dhruv,The short answer is not, because the function I built doesn't
work for more variables than coefficients (see the "stop" I
introduced). You should do some modifications such as coefficients equals to 1
or 0. For example:
# data set (10 rows, 10 columns)set.seed(123)X=matrix(rpois(100,10),ncol=10)X#
Function to estimate your
outcomeoutcome=function(x,betas,val){k=length(x)nb=length(betas)
if(length(x)!=length(betas)) betas=c(betas, rep(val,k-nb))
y=x*betasy}# beta1=1, beta2=2, the rest is equal to
zerot(apply(X,1,outcome,betas=c(1,2),val=0))# beta1=0.5, beta2=0.6, the rest is
equal to 1
t(apply(X,1,outcome,betas=c(1,2),val=1))
HTH,JorgeOn Mon, Jul 7, 2008 at 8:57 PM, DS <ds5j at excite.com> wrote:
thanks Jorge. I appreciate your quick help.
Will this work if I have 20 columns of data but my regression only has 5
variables?
I am looking for something generic where I can give it my model and test data
and get back a vector of the multiplied coefficients (with no hard coding).
When predict is called with an input model and data, R must be multiplying all
co-efficients times variables and summing the number but is there a way to get
components of the regressiom terms stored in a matrix before they are added?
The idea is to build n models with various terms and after producing a
prediction list the top 3 variables that had the biggest impact in that
particular set of predictor values.
e.g. if I build a model to predict default of loans I would then need to list
the top factors in the model that can be used to explain why the loan is risky.
With 10-16 variables which can be present or not for each case there be a
different 2 or 3 variables that led to the said prediction.
Dhruv
--- On Mon 07/07, Jorge Ivan Velez < jorgeivanvelez at gmail.com > wrote:
From: Jorge Ivan Velez [mailto: jorgeivanvelez at gmail.com]
To: ds5j at excite.com
Date: Mon, 7 Jul 2008 20:12:53 -0400
Subject: Re: [R] question on lm or glm matrix of coeficients X test data terms
Dear Dhruv,Try also:# data setset.seed(123)X=matrix(rpois(10,10),ncol=2)#
Function to estimate your
outcomeoutcome=function(x,betas){if(length(x)!=length(betas)) stop("x and
betas are of different length!")
y=x*betasy}# outcome for beta1=0.05 and
beta2=0.6t(apply(X,1,outcome,betas=c(0.05,0.6)))# outcome for beta1=5 and
beta2=6
t(apply(X,1,outcome,betas=c(5,6)))
HTH,JorgeOn Mon, Jul 7, 2008 at 7:56 PM, DS <ds5j at excite.com> wrote:
Hi,
is there an easy way to get the calculated weights in a regression equation?
for e.g.
if my model has 2 variables 1 and 2 with coefficient .05 and .6
how can I get the computed values for a test dataset for each coefficient?
data
var1,var2
10,100
so I want to get .5, 60 back in a vector. This is a one row example but I would
want to get a matrix of multiplied out coefficients and terms for use in
comparing contribution of variables to final score. As in a scorecard using
logistic regression.
Please advise.
thanks
Dhruv
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
DS
2008-Jul-08 23:33 UTC
[R] question on lm or glm matrix of coeficients X test data terms
Hi,
I found some of what I was looking for.
using the following I can get a matrix of regression coefficient multiplied out
by the variable data.
g<-predict(comodel,type='terms',data4)
m<-cbind(data4,g)
What remains is how do I pick the 3-4 rows for each data row with the highest
values?
I need to get the column names of the top 3 coefficients from this matrix.
Some looping through for each row and pick the top 3 highest
coefficient/variable products and then getting the columns names for these 3.
is there an easy way to get this in an R function?
thanks
Dhruv
--- On Mon 07/07, Jorge Ivan Velez < jorgeivanvelez at gmail.com > wrote:
From: Jorge Ivan Velez [mailto: jorgeivanvelez at gmail.com]
To: ds5j at excite.com
Date: Mon, 7 Jul 2008 21:42:54 -0400
Subject: Re: [R] question on lm or glm matrix of coeficients X test data terms
That's R: you come out with solutions every time. I hope don't bother
you with this. Try also:# data set (10 rows, 10
columns)set.seed(123)X=matrix(rpois(100,10),ncol=10)# Function to estimate your
outcome
outcome=function(x,betas){if(length(x)!=length(betas)) stop("x and beta
have different lengths!")y=x*betassum(y)}# let's assume that you want
to include x1, x4, x7 and x9 only# by using beta1=0.5, beta4=0.6, beta7=-0.1,
beta9=0.3
betas=c(0.5,0,0,0.6,0,0,-0.1,0,0.3,0)# Resultsapply(X,1,outcome,
betas=betas)HTH,JorgeOn Mon, Jul 7, 2008 at 9:31 PM, Jorge Ivan Velez
<jorgeivanvelez at gmail.com> wrote:
Sorry, I forgot to the the sum over the rows:# data set (10 rows, 10 columns)
set.seed(123)X=matrix(rpois(100,10),ncol=10)# Function to estimate your
outcomeoutcome=function(x,betas){if(length(x)!=length(betas)) stop("x and
beta have different lengths!")
y=x*betasy}# let's assume that you want to include x1, x4, x7 and x9 only#
by using beta1=0.5, beta4=0.6, beta7=-0.1,
beta9=0.3betas=c(0.5,0,0,0.6,0,0,-0.1,0,0.3,0)
# Resultsapply(t(apply(X,1,outcome, betas=betas)),1,sum)
HTH,JorgeOn Mon, Jul 7, 2008 at 9:23 PM, Jorge Ivan Velez <jorgeivanvelez at
gmail.com> wrote:
Dear Dhruv,It's me again. I've been thinking about a little bit. If you
want to include/exclude variables to estimate your outcome, you could try
something like this:# data set (10 rows, 10 columns)
set.seed(123)X=matrix(rpois(100,10),ncol=10)# Function to estimate your
outcomeoutcome=function(x,betas){if(length(x)!=length(betas)) stop("x and
beta have different lengths!")
y=x*betasy}# let's assume that you want to include x1, x4, x7 and x9 only#
by using beta1=0.5, beta4=0.6, beta7=-0.1,
beta9=0.3betas=c(0.5,0,0,0.6,0,0,-0.1,0,0.3,0)# Resultst(apply(X,1,outcome,
betas=betas))
HTH,JorgeOn Mon, Jul 7, 2008 at 9:11 PM, Jorge Ivan Velez <jorgeivanvelez at
gmail.com> wrote:
Dear Dhruv,The short answer is not, because the function I built doesn't
work for more variables than coefficients (see the "stop" I
introduced). You should do some modifications such as coefficients equals to 1
or 0. For example:
# data set (10 rows, 10 columns)set.seed(123)X=matrix(rpois(100,10),ncol=10)X#
Function to estimate your
outcomeoutcome=function(x,betas,val){k=length(x)nb=length(betas)
if(length(x)!=length(betas)) betas=c(betas, rep(val,k-nb))
y=x*betasy}# beta1=1, beta2=2, the rest is equal to
zerot(apply(X,1,outcome,betas=c(1,2),val=0))# beta1=0.5, beta2=0.6, the rest is
equal to 1
t(apply(X,1,outcome,betas=c(1,2),val=1))
HTH,JorgeOn Mon, Jul 7, 2008 at 8:57 PM, DS <ds5j at excite.com> wrote:
thanks Jorge. I appreciate your quick help.
Will this work if I have 20 columns of data but my regression only has 5
variables?
I am looking for something generic where I can give it my model and test data
and get back a vector of the multiplied coefficients (with no hard coding).
When predict is called with an input model and data, R must be multiplying all
co-efficients times variables and summing the number but is there a way to get
components of the regressiom terms stored in a matrix before they are added?
The idea is to build n models with various terms and after producing a
prediction list the top 3 variables that had the biggest impact in that
particular set of predictor values.
e.g. if I build a model to predict default of loans I would then need to list
the top factors in the model that can be used to explain why the loan is risky.
With 10-16 variables which can be present or not for each case there be a
different 2 or 3 variables that led to the said prediction.
Dhruv
--- On Mon 07/07, Jorge Ivan Velez < jorgeivanvelez at gmail.com > wrote:
From: Jorge Ivan Velez [mailto: jorgeivanvelez at gmail.com]
To: ds5j at excite.com
Date: Mon, 7 Jul 2008 20:12:53 -0400
Subject: Re: [R] question on lm or glm matrix of coeficients X test data terms
Dear Dhruv,Try also:# data setset.seed(123)X=matrix(rpois(10,10),ncol=2)#
Function to estimate your
outcomeoutcome=function(x,betas){if(length(x)!=length(betas)) stop("x and
betas are of different length!")
y=x*betasy}# outcome for beta1=0.05 and
beta2=0.6t(apply(X,1,outcome,betas=c(0.05,0.6)))# outcome for beta1=5 and
beta2=6
t(apply(X,1,outcome,betas=c(5,6)))
HTH,JorgeOn Mon, Jul 7, 2008 at 7:56 PM, DS <ds5j at excite.com> wrote:
Hi,
is there an easy way to get the calculated weights in a regression equation?
for e.g.
if my model has 2 variables 1 and 2 with coefficient .05 and .6
how can I get the computed values for a test dataset for each coefficient?
data
var1,var2
10,100
so I want to get .5, 60 back in a vector. This is a one row example but I would
want to get a matrix of multiplied out coefficients and terms for use in
comparing contribution of variables to final score. As in a scorecard using
logistic regression.
Please advise.
thanks
Dhruv
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
DS
2008-Jul-09 01:32 UTC
[R] question on lm or glm matrix of coeficients X test data terms
thanks Jorge. This is great!
regards,
Dhruv
--- On Tue 07/08, Jorge Ivan Velez < jorgeivanvelez at gmail.com > wrote:
From: Jorge Ivan Velez [mailto: jorgeivanvelez at gmail.com]
To: ds5j at excite.com
Date: Tue, 8 Jul 2008 20:45:06 -0400
Subject: Re: [R] question on lm or glm matrix of coeficients X test data terms
Hi Dhruv,Thanks for the data. Here is what you need so far:# Data
setyourdata=structure(c(0.024575733, 0.775009533, 0.216823408, 0.413676529,
0.270053406, 0.579946123, 0.634013362, 0.928518128, 0.405825012,
0.862204203, 0.856558209, 0.187865722, 0.818774004, 0.918802224, 0.469496189,
0.240583922, 0.390818789, 0.767969261, 0.13339806, 0.986023924, 0.442655239,
0.437441939, 0.313678293, 0.952285599, 0.528433974, 0.328609537, 0.84584467,
0.608194527, 0.96139021,
0.485592658, 0.251827955, 0.289777559), .Dim = c(4L, 8L), .Dimnames = list(
NULL, c("A", "B", "C", "D",
"E", "F", "A:B", "G:H")))
# Function to select the top k values (names)ftopk= function(x,top=3){
res=cnames[order(x, decreasing = TRUE)][1:top]
paste(res,collapse=";",sep="")}# Application of the function
using the top 3 rows
topk=apply(yourdata,1,ftopk,top=3)# Resultdata.frame(yourdata,topk) A B
C D E F A.B G.H topk1 0.02457573
0.2700534 0.4058250 0.8187740 0.3908188 0.4426552 0.5284340 0.9613902 G:H;D;A:B
2 0.77500953 0.5799461 0.8622042 0.9188022 0.7679693 0.4374419 0.3286095
0.4855927 D;C;A3 0.21682341 0.6340134 0.8565582 0.4694962 0.1333981
0.3136783 0.8458447 0.2518280 C;A:B;B4 0.41367653 0.9285181 0.1878657
0.2405839 0.9860239 0.9522856 0.6081945 0.2897776 E;F;B
HTH,JorgeOn Tue, Jul 8, 2008 at 8:19 PM, DS <ds5j at excite.com> wrote:
Hi Jorge,
I am attaching some sample data that looks like the coefficient matrix.
In the spreadsheet for each row I have listed the column names I would want to
extract for each row. (the ones with the highest values in the row).
hope this helps.
thanks
regards,
Dhruv
--- On Tue 07/08, Jorge Ivan Velez < jorgeivanvelez at gmail.com > wrote:
From: Jorge Ivan Velez [mailto: jorgeivanvelez at gmail.com]
To: ds5j at excite.com
Date: Tue, 8 Jul 2008 19:36:52 -0400
Subject: Re: [R] question on lm or glm matrix of coeficients X test data terms
Dear Dhruv, Could you please send me part your data set m? Just 10-20 rows, so
I'll have any idea about what you have and what you'd like. I hope you
don't mind.Thanks a lot,Jorge
On Tue, Jul 8, 2008 at 7:33 PM, DS wrote:
Hi,
I found some of what I was looking for.
using the following I can get a matrix of regression coefficient multiplied out
by the variable data.
g wrote:
From: Jorge Ivan Velez [mailto: jorgeivanvelez at gmail.com]
To: ds5j at excite.com
Date: Mon, 7 Jul 2008 20:12:53 -0400
Subject: Re: [R] question on lm or glm matrix of coeficients X test data terms
Dear Dhruv,Try also:# data setset.seed(123)X=matrix(rpois(10,10),ncol=2)#
Function to estimate your
outcomeoutcome=function(x,betas){if(length(x)!=length(betas)) stop("x and
betas are of different length!")
y=x*betasy}# outcome for beta1=0.05 and
beta2=0.6t(apply(X,1,outcome,betas=c(0.05,0.6)))# outcome for beta1=5 and
beta2=6
t(apply(X,1,outcome,betas=c(5,6)))
HTH,JorgeOn Mon, Jul 7, 2008 at 7:56 PM, DS wrote:
Hi,
is there an easy way to get the calculated weights in a regression equation?
for e.g.
if my model has 2 variables 1 and 2 with coefficient .05 and .6
how can I get the computed values for a test dataset for each coefficient?
data
var1,var2
10,100
so I want to get .5, 60 back in a vector. This is a one row example but I would
want to get a matrix of multiplied out coefficients and terms for use in
comparing contribution of variables to final score. As in a scorecard using
logistic regression.
Please advise.
thanks
Dhruv
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.