Dear helpers I want to use rpart several times in a loop to build a classification tree. My problem is that rpart needs a formula as argument and for that the variables need to have names and this doesn't happen in my case. Every iteration in the loop has a different dataset with several variables (ex. 38 or more) and so I can't type the names by hand every time. Is there any function that generates names for variables in a dataframe. If so, how can I use then the argument rpart(classlabels~. ,.....) thanks
On Tue, 22 Jul 2003, Luis Miguel Almeida da Silva wrote:> Dear helpers > > I want to use rpart several times in a loop to build a classification tree. My problem is that rpart needs a formula as argument and for that the variables need to have names and this doesn't happen in my case. Every iteration in the loop has a different dataset with several variables (ex. 38 or more) and so I can't type the names by hand every time. Is there any function that generates names for variables in a dataframe. If so, how can I use then the argument >If your data is organised in a data.frame, (dummy) variable names are available by default: R> mydata <- data.frame(matrix(rnorm(25), ncol=5)) R> mydata X1 X2 X3 X4 X5 1 1.3806313 -0.41827136 0.9591628 -1.3351038 0.02746110 2 0.5114590 -1.34111439 -0.9617552 -0.8367088 -0.06913021 3 -1.7508089 -0.49387076 -1.7597395 2.3899490 -0.15209650 4 -1.6753809 -1.28381808 -1.0424903 0.1002998 0.27784949 5 -0.2605535 -0.09035652 -2.5786418 1.0483400 -0.70445615 R> rpart(X1 ~ ., data = mydata) n= 5 node), split, n, deviance, yval * denotes terminal node 1) root 5 7.463698 -0.3589306 * best, Torsten> rpart(classlabels~. ,.....) > > thanks > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >
Dear Luis,
You might want to have a look at
"Bill Venables. Programmer's niche. R News, 2(2):24-26, June 2002"
which you can find at http://cran.r-project.org/doc/Rnews/
or look into the manual "R Language Definition", chapter
"Computing on the language".
Assuming that in your case the variable to be classified is called
classLabels, and the names of your datasets are in a vector called
dataNames, you could use something like
lapply(dataNames,function(theName){
eval(parse(text = paste("rpart(classLabels~., data =", theName,
")")))
})
which will return a list of results. This is achieved by generating the
command line you would type as a string using paste, which is then parsed
and explicitly evaluated.
See the corresponding help pages for more details.
HTH
Thomas
---
Thomas Hotz
Research Associate in Medical Statistics
University of Leicester
United Kingdom
Department of Epidemiology and Public Health
22-28 Princess Road West
Leicester
LE1 6TP
Tel +44 116 252-5410
Fax +44 116 252-5423
Division of Medicine for the Elderly
Department of Medicine
The Glenfield Hospital
Leicester
LE3 9QP
Tel +44 116 256-3643
Fax +44 116 232-2976
> -----Original Message-----
> From: Luis Miguel Almeida da Silva [mailto:lsilva at fc.up.pt]
> Sent: 22 July 2003 14:49
> To: r-help at stat.math.ethz.ch
> Subject: [R] variable names
>
>
> Dear helpers
>
> I want to use rpart several times in a loop to build a
> classification tree. My problem is that rpart needs a formula
> as argument and for that the variables need to have names and
> this doesn't happen in my case. Every iteration in the loop
> has a different dataset with several variables (ex. 38 or
> more) and so I can't type the names by hand every time. Is
> there any function that generates names for variables in a
> dataframe. If so, how can I use then the argument
>
> rpart(classlabels~. ,.....)
>
> thanks
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
I didn't noticed that fact. I've already found a way to do that
x <- 1:40
colnames(df.treino) <- paste("Ncp",x,sep=".")
and this generates names that I can relate with the variables. Thanks anyway
The problem is that I use rpart in a loop and the class labels are in the last
column. For the above example I would "type"
rpart(Ncp.40~.,data=df.treino)
But in the next step of the loop I can have only 35 variables and the class
labels would be at the Ncp.36. So I have to refresh the formula in rpart... and
that is my problem
-----Original Message-----
From: Torsten Hothorn [mailto:hothorn at ci.tuwien.ac.at]
Sent: Tue 22/07/2003 14:57
To: Luis Miguel Almeida da Silva
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] variable names
On Tue, 22 Jul 2003, Luis Miguel Almeida da Silva wrote:
> Dear helpers
>
> I want to use rpart several times in a loop to build a classification
tree. My problem is that rpart needs a formula as argument and for that the
variables need to have names and this doesn't happen in my case. Every
iteration in the loop has a different dataset with several variables (ex. 38 or
more) and so I can't type the names by hand every time. Is there any
function that generates names for variables in a dataframe. If so, how can I use
then the argument
>
If your data is organised in a data.frame, (dummy) variable names are
available by default:
R> mydata <- data.frame(matrix(rnorm(25), ncol=5))
R> mydata
X1 X2 X3 X4 X5
1 1.3806313 -0.41827136 0.9591628 -1.3351038 0.02746110
2 0.5114590 -1.34111439 -0.9617552 -0.8367088 -0.06913021
3 -1.7508089 -0.49387076 -1.7597395 2.3899490 -0.15209650
4 -1.6753809 -1.28381808 -1.0424903 0.1002998 0.27784949
5 -0.2605535 -0.09035652 -2.5786418 1.0483400 -0.70445615
R> rpart(X1 ~ ., data = mydata)
n= 5
node), split, n, deviance, yval
* denotes terminal node
1) root 5 7.463698 -0.3589306 *
best,
Torsten
> rpart(classlabels~. ,.....)
>
> thanks
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
>
Read Bill Venables' column in R News issue 2/2, page 24. Andy> From: Luis Miguel Almeida da Silva [mailto:lsilva at fc.up.pt] > > I didn't noticed that fact. I've already found a way to do that > > x <- 1:40 > colnames(df.treino) <- paste("Ncp",x,sep=".") > > and this generates names that I can relate with the > variables. Thanks anyway > > The problem is that I use rpart in a loop and the class > labels are in the last column. For the above example I would "type" > > rpart(Ncp.40~.,data=df.treino) > > But in the next step of the loop I can have only 35 variables > and the class labels would be at the Ncp.36. So I have to > refresh the formula in rpart... and that is my problem > > -----Original Message----- > From: Torsten Hothorn [mailto:hothorn at ci.tuwien.ac.at] > Sent: Tue 22/07/2003 14:57 > To: Luis Miguel Almeida da Silva > Cc: r-help at stat.math.ethz.ch > Subject: Re: [R] variable names > > > > > On Tue, 22 Jul 2003, Luis Miguel Almeida da Silva wrote: > > > Dear helpers > > > > I want to use rpart several times in a loop to build > a classification tree. My problem is that rpart needs a > formula as argument and for that the variables need to have > names and this doesn't happen in my case. Every iteration in > the loop has a different dataset with several variables (ex. > 38 or more) and so I can't type the names by hand every time. > Is there any function that generates names for variables in a > dataframe. If so, how can I use then the argument > > > > If your data is organised in a data.frame, (dummy) > variable names are > available by default: > > R> mydata <- data.frame(matrix(rnorm(25), ncol=5)) > R> mydata > X1 X2 X3 X4 X5 > 1 1.3806313 -0.41827136 0.9591628 -1.3351038 0.02746110 > 2 0.5114590 -1.34111439 -0.9617552 -0.8367088 -0.06913021 > 3 -1.7508089 -0.49387076 -1.7597395 2.3899490 -0.15209650 > 4 -1.6753809 -1.28381808 -1.0424903 0.1002998 0.27784949 > 5 -0.2605535 -0.09035652 -2.5786418 1.0483400 -0.70445615 > R> rpart(X1 ~ ., data = mydata) > n= 5 > > node), split, n, deviance, yval > * denotes terminal node > > 1) root 5 7.463698 -0.3589306 * > > best, > > Torsten > > > rpart(classlabels~. ,.....) > > > > thanks > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > > > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo> /r-help >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, ...{{dropped}}
Dear Luis,
you could change my previous reply to something like
lapply(dataNames,function(theName){
eval(parse(text = paste("rpart(X", ncol(get(theName)), "~.,
data =", theName, ")")))
})
making use of get(), ncol() and Torsten's suggestion.
HTH
Thomas
> -----Original Message-----
> From: Luis Miguel Almeida da Silva [mailto:lsilva at fc.up.pt]
> Sent: 22 July 2003 15:11
> To: Torsten Hothorn
> Cc: r-help at stat.math.ethz.ch
> Subject: RE: [R] variable names
>
>
> I didn't noticed that fact. I've already found a way to do that
>
> x <- 1:40
> colnames(df.treino) <- paste("Ncp",x,sep=".")
>
> and this generates names that I can relate with the
> variables. Thanks anyway
>
> The problem is that I use rpart in a loop and the class
> labels are in the last column. For the above example I would
"type"
>
> rpart(Ncp.40~.,data=df.treino)
>
> But in the next step of the loop I can have only 35 variables
> and the class labels would be at the Ncp.36. So I have to
> refresh the formula in rpart... and that is my problem
>
> -----Original Message-----
> From: Torsten Hothorn [mailto:hothorn at ci.tuwien.ac.at]
> Sent: Tue 22/07/2003 14:57
> To: Luis Miguel Almeida da Silva
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] variable names
>
>
>
>
> On Tue, 22 Jul 2003, Luis Miguel Almeida da Silva wrote:
>
> > Dear helpers
> >
> > I want to use rpart several times in a loop to build
> a classification tree. My problem is that rpart needs a
> formula as argument and for that the variables need to have
> names and this doesn't happen in my case. Every iteration in
> the loop has a different dataset with several variables (ex.
> 38 or more) and so I can't type the names by hand every time.
> Is there any function that generates names for variables in a
> dataframe. If so, how can I use then the argument
> >
>
> If your data is organised in a data.frame, (dummy)
> variable names are
> available by default:
>
> R> mydata <- data.frame(matrix(rnorm(25), ncol=5))
> R> mydata
> X1 X2 X3 X4 X5
> 1 1.3806313 -0.41827136 0.9591628 -1.3351038 0.02746110
> 2 0.5114590 -1.34111439 -0.9617552 -0.8367088 -0.06913021
> 3 -1.7508089 -0.49387076 -1.7597395 2.3899490 -0.15209650
> 4 -1.6753809 -1.28381808 -1.0424903 0.1002998 0.27784949
> 5 -0.2605535 -0.09035652 -2.5786418 1.0483400 -0.70445615
> R> rpart(X1 ~ ., data = mydata)
> n= 5
>
> node), split, n, deviance, yval
> * denotes terminal node
>
> 1) root 5 7.463698 -0.3589306 *
>
> best,
>
> Torsten
>
> > rpart(classlabels~. ,.....)
> >
> > thanks
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> >
> >
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
It worked! thank you
-----Original Message-----
From: Torsten Hothorn [mailto:hothorn at ci.tuwien.ac.at]
Sent: Tue 22/07/2003 15:16
To: Luis Miguel Almeida da Silva
Cc: r-help at stat.math.ethz.ch
Subject: RE: [R] variable names
On Tue, 22 Jul 2003, Luis Miguel Almeida da Silva wrote:
> I didn't noticed that fact. I've already found a way to do that
>
> x <- 1:40
> colnames(df.treino) <- paste("Ncp",x,sep=".")
>
> and this generates names that I can relate with the variables. Thanks
anyway
>
> The problem is that I use rpart in a loop and the class labels are in the
last column. For the above example I would "type"
>
> rpart(Ncp.40~.,data=df.treino)
>
> But in the next step of the loop I can have only 35 variables and the
class labels would be at the Ncp.36. So I have to refresh the formula in
rpart... and that is my problem
>
R> df.treino <- data.frame(matrix(rnorm(25), ncol=5))
R> thisformula <- as.formula(paste(colnames(df.treino)[ncol(df.treino)],
"~ ."))
R> thisformula
X5 ~ .
R> rpart(thisformula, data = df.treino)
n= 5
node), split, n, deviance, yval
* denotes terminal node
1) root 5 3.032904 -0.3392065 *
Torsten
> -----Original Message-----
> From: Torsten Hothorn [mailto:hothorn at ci.tuwien.ac.at]
> Sent: Tue 22/07/2003 14:57
> To: Luis Miguel Almeida da Silva
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] variable names
>
>
>
>
> On Tue, 22 Jul 2003, Luis Miguel Almeida da Silva wrote:
>
> > Dear helpers
> >
> > I want to use rpart several times in a loop to build a
classification tree. My problem is that rpart needs a formula as argument and
for that the variables need to have names and this doesn't happen in my
case. Every iteration in the loop has a different dataset with several variables
(ex. 38 or more) and so I can't type the names by hand every time. Is there
any function that generates names for variables in a dataframe. If so, how can I
use then the argument
> >
>
> If your data is organised in a data.frame, (dummy) variable names
are
> available by default:
>
> R> mydata <- data.frame(matrix(rnorm(25), ncol=5))
> R> mydata
> X1 X2 X3 X4 X5
> 1 1.3806313 -0.41827136 0.9591628 -1.3351038 0.02746110
> 2 0.5114590 -1.34111439 -0.9617552 -0.8367088 -0.06913021
> 3 -1.7508089 -0.49387076 -1.7597395 2.3899490 -0.15209650
> 4 -1.6753809 -1.28381808 -1.0424903 0.1002998 0.27784949
> 5 -0.2605535 -0.09035652 -2.5786418 1.0483400 -0.70445615
> R> rpart(X1 ~ ., data = mydata)
> n= 5
>
> node), split, n, deviance, yval
> * denotes terminal node
>
> 1) root 5 7.463698 -0.3589306 *
>
> best,
>
> Torsten
>
> > rpart(classlabels~. ,.....)
> >
> > thanks
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> >
> >
>
>
>
>