Dear helpers I want to use rpart several times in a loop to build a classification tree. My problem is that rpart needs a formula as argument and for that the variables need to have names and this doesn't happen in my case. Every iteration in the loop has a different dataset with several variables (ex. 38 or more) and so I can't type the names by hand every time. Is there any function that generates names for variables in a dataframe. If so, how can I use then the argument rpart(classlabels~. ,.....) thanks
On Tue, 22 Jul 2003, Luis Miguel Almeida da Silva wrote:> Dear helpers > > I want to use rpart several times in a loop to build a classification tree. My problem is that rpart needs a formula as argument and for that the variables need to have names and this doesn't happen in my case. Every iteration in the loop has a different dataset with several variables (ex. 38 or more) and so I can't type the names by hand every time. Is there any function that generates names for variables in a dataframe. If so, how can I use then the argument >If your data is organised in a data.frame, (dummy) variable names are available by default: R> mydata <- data.frame(matrix(rnorm(25), ncol=5)) R> mydata X1 X2 X3 X4 X5 1 1.3806313 -0.41827136 0.9591628 -1.3351038 0.02746110 2 0.5114590 -1.34111439 -0.9617552 -0.8367088 -0.06913021 3 -1.7508089 -0.49387076 -1.7597395 2.3899490 -0.15209650 4 -1.6753809 -1.28381808 -1.0424903 0.1002998 0.27784949 5 -0.2605535 -0.09035652 -2.5786418 1.0483400 -0.70445615 R> rpart(X1 ~ ., data = mydata) n= 5 node), split, n, deviance, yval * denotes terminal node 1) root 5 7.463698 -0.3589306 * best, Torsten> rpart(classlabels~. ,.....) > > thanks > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >
Dear Luis, You might want to have a look at "Bill Venables. Programmer's niche. R News, 2(2):24-26, June 2002" which you can find at http://cran.r-project.org/doc/Rnews/ or look into the manual "R Language Definition", chapter "Computing on the language". Assuming that in your case the variable to be classified is called classLabels, and the names of your datasets are in a vector called dataNames, you could use something like lapply(dataNames,function(theName){ eval(parse(text = paste("rpart(classLabels~., data =", theName, ")"))) }) which will return a list of results. This is achieved by generating the command line you would type as a string using paste, which is then parsed and explicitly evaluated. See the corresponding help pages for more details. HTH Thomas --- Thomas Hotz Research Associate in Medical Statistics University of Leicester United Kingdom Department of Epidemiology and Public Health 22-28 Princess Road West Leicester LE1 6TP Tel +44 116 252-5410 Fax +44 116 252-5423 Division of Medicine for the Elderly Department of Medicine The Glenfield Hospital Leicester LE3 9QP Tel +44 116 256-3643 Fax +44 116 232-2976> -----Original Message----- > From: Luis Miguel Almeida da Silva [mailto:lsilva at fc.up.pt] > Sent: 22 July 2003 14:49 > To: r-help at stat.math.ethz.ch > Subject: [R] variable names > > > Dear helpers > > I want to use rpart several times in a loop to build a > classification tree. My problem is that rpart needs a formula > as argument and for that the variables need to have names and > this doesn't happen in my case. Every iteration in the loop > has a different dataset with several variables (ex. 38 or > more) and so I can't type the names by hand every time. Is > there any function that generates names for variables in a > dataframe. If so, how can I use then the argument > > rpart(classlabels~. ,.....) > > thanks > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >
I didn't noticed that fact. I've already found a way to do that x <- 1:40 colnames(df.treino) <- paste("Ncp",x,sep=".") and this generates names that I can relate with the variables. Thanks anyway The problem is that I use rpart in a loop and the class labels are in the last column. For the above example I would "type" rpart(Ncp.40~.,data=df.treino) But in the next step of the loop I can have only 35 variables and the class labels would be at the Ncp.36. So I have to refresh the formula in rpart... and that is my problem -----Original Message----- From: Torsten Hothorn [mailto:hothorn at ci.tuwien.ac.at] Sent: Tue 22/07/2003 14:57 To: Luis Miguel Almeida da Silva Cc: r-help at stat.math.ethz.ch Subject: Re: [R] variable names On Tue, 22 Jul 2003, Luis Miguel Almeida da Silva wrote: > Dear helpers > > I want to use rpart several times in a loop to build a classification tree. My problem is that rpart needs a formula as argument and for that the variables need to have names and this doesn't happen in my case. Every iteration in the loop has a different dataset with several variables (ex. 38 or more) and so I can't type the names by hand every time. Is there any function that generates names for variables in a dataframe. If so, how can I use then the argument > If your data is organised in a data.frame, (dummy) variable names are available by default: R> mydata <- data.frame(matrix(rnorm(25), ncol=5)) R> mydata X1 X2 X3 X4 X5 1 1.3806313 -0.41827136 0.9591628 -1.3351038 0.02746110 2 0.5114590 -1.34111439 -0.9617552 -0.8367088 -0.06913021 3 -1.7508089 -0.49387076 -1.7597395 2.3899490 -0.15209650 4 -1.6753809 -1.28381808 -1.0424903 0.1002998 0.27784949 5 -0.2605535 -0.09035652 -2.5786418 1.0483400 -0.70445615 R> rpart(X1 ~ ., data = mydata) n= 5 node), split, n, deviance, yval * denotes terminal node 1) root 5 7.463698 -0.3589306 * best, Torsten > rpart(classlabels~. ,.....) > > thanks > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >
Read Bill Venables' column in R News issue 2/2, page 24. Andy> From: Luis Miguel Almeida da Silva [mailto:lsilva at fc.up.pt] > > I didn't noticed that fact. I've already found a way to do that > > x <- 1:40 > colnames(df.treino) <- paste("Ncp",x,sep=".") > > and this generates names that I can relate with the > variables. Thanks anyway > > The problem is that I use rpart in a loop and the class > labels are in the last column. For the above example I would "type" > > rpart(Ncp.40~.,data=df.treino) > > But in the next step of the loop I can have only 35 variables > and the class labels would be at the Ncp.36. So I have to > refresh the formula in rpart... and that is my problem > > -----Original Message----- > From: Torsten Hothorn [mailto:hothorn at ci.tuwien.ac.at] > Sent: Tue 22/07/2003 14:57 > To: Luis Miguel Almeida da Silva > Cc: r-help at stat.math.ethz.ch > Subject: Re: [R] variable names > > > > > On Tue, 22 Jul 2003, Luis Miguel Almeida da Silva wrote: > > > Dear helpers > > > > I want to use rpart several times in a loop to build > a classification tree. My problem is that rpart needs a > formula as argument and for that the variables need to have > names and this doesn't happen in my case. Every iteration in > the loop has a different dataset with several variables (ex. > 38 or more) and so I can't type the names by hand every time. > Is there any function that generates names for variables in a > dataframe. If so, how can I use then the argument > > > > If your data is organised in a data.frame, (dummy) > variable names are > available by default: > > R> mydata <- data.frame(matrix(rnorm(25), ncol=5)) > R> mydata > X1 X2 X3 X4 X5 > 1 1.3806313 -0.41827136 0.9591628 -1.3351038 0.02746110 > 2 0.5114590 -1.34111439 -0.9617552 -0.8367088 -0.06913021 > 3 -1.7508089 -0.49387076 -1.7597395 2.3899490 -0.15209650 > 4 -1.6753809 -1.28381808 -1.0424903 0.1002998 0.27784949 > 5 -0.2605535 -0.09035652 -2.5786418 1.0483400 -0.70445615 > R> rpart(X1 ~ ., data = mydata) > n= 5 > > node), split, n, deviance, yval > * denotes terminal node > > 1) root 5 7.463698 -0.3589306 * > > best, > > Torsten > > > rpart(classlabels~. ,.....) > > > > thanks > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > > > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo> /r-help >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, ...{{dropped}}
Dear Luis, you could change my previous reply to something like lapply(dataNames,function(theName){ eval(parse(text = paste("rpart(X", ncol(get(theName)), "~., data =", theName, ")"))) }) making use of get(), ncol() and Torsten's suggestion. HTH Thomas> -----Original Message----- > From: Luis Miguel Almeida da Silva [mailto:lsilva at fc.up.pt] > Sent: 22 July 2003 15:11 > To: Torsten Hothorn > Cc: r-help at stat.math.ethz.ch > Subject: RE: [R] variable names > > > I didn't noticed that fact. I've already found a way to do that > > x <- 1:40 > colnames(df.treino) <- paste("Ncp",x,sep=".") > > and this generates names that I can relate with the > variables. Thanks anyway > > The problem is that I use rpart in a loop and the class > labels are in the last column. For the above example I would "type" > > rpart(Ncp.40~.,data=df.treino) > > But in the next step of the loop I can have only 35 variables > and the class labels would be at the Ncp.36. So I have to > refresh the formula in rpart... and that is my problem > > -----Original Message----- > From: Torsten Hothorn [mailto:hothorn at ci.tuwien.ac.at] > Sent: Tue 22/07/2003 14:57 > To: Luis Miguel Almeida da Silva > Cc: r-help at stat.math.ethz.ch > Subject: Re: [R] variable names > > > > > On Tue, 22 Jul 2003, Luis Miguel Almeida da Silva wrote: > > > Dear helpers > > > > I want to use rpart several times in a loop to build > a classification tree. My problem is that rpart needs a > formula as argument and for that the variables need to have > names and this doesn't happen in my case. Every iteration in > the loop has a different dataset with several variables (ex. > 38 or more) and so I can't type the names by hand every time. > Is there any function that generates names for variables in a > dataframe. If so, how can I use then the argument > > > > If your data is organised in a data.frame, (dummy) > variable names are > available by default: > > R> mydata <- data.frame(matrix(rnorm(25), ncol=5)) > R> mydata > X1 X2 X3 X4 X5 > 1 1.3806313 -0.41827136 0.9591628 -1.3351038 0.02746110 > 2 0.5114590 -1.34111439 -0.9617552 -0.8367088 -0.06913021 > 3 -1.7508089 -0.49387076 -1.7597395 2.3899490 -0.15209650 > 4 -1.6753809 -1.28381808 -1.0424903 0.1002998 0.27784949 > 5 -0.2605535 -0.09035652 -2.5786418 1.0483400 -0.70445615 > R> rpart(X1 ~ ., data = mydata) > n= 5 > > node), split, n, deviance, yval > * denotes terminal node > > 1) root 5 7.463698 -0.3589306 * > > best, > > Torsten > > > rpart(classlabels~. ,.....) > > > > thanks > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > > > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >
It worked! thank you -----Original Message----- From: Torsten Hothorn [mailto:hothorn at ci.tuwien.ac.at] Sent: Tue 22/07/2003 15:16 To: Luis Miguel Almeida da Silva Cc: r-help at stat.math.ethz.ch Subject: RE: [R] variable names On Tue, 22 Jul 2003, Luis Miguel Almeida da Silva wrote: > I didn't noticed that fact. I've already found a way to do that > > x <- 1:40 > colnames(df.treino) <- paste("Ncp",x,sep=".") > > and this generates names that I can relate with the variables. Thanks anyway > > The problem is that I use rpart in a loop and the class labels are in the last column. For the above example I would "type" > > rpart(Ncp.40~.,data=df.treino) > > But in the next step of the loop I can have only 35 variables and the class labels would be at the Ncp.36. So I have to refresh the formula in rpart... and that is my problem > R> df.treino <- data.frame(matrix(rnorm(25), ncol=5)) R> thisformula <- as.formula(paste(colnames(df.treino)[ncol(df.treino)], "~ .")) R> thisformula X5 ~ . R> rpart(thisformula, data = df.treino) n= 5 node), split, n, deviance, yval * denotes terminal node 1) root 5 3.032904 -0.3392065 * Torsten > -----Original Message----- > From: Torsten Hothorn [mailto:hothorn at ci.tuwien.ac.at] > Sent: Tue 22/07/2003 14:57 > To: Luis Miguel Almeida da Silva > Cc: r-help at stat.math.ethz.ch > Subject: Re: [R] variable names > > > > > On Tue, 22 Jul 2003, Luis Miguel Almeida da Silva wrote: > > > Dear helpers > > > > I want to use rpart several times in a loop to build a classification tree. My problem is that rpart needs a formula as argument and for that the variables need to have names and this doesn't happen in my case. Every iteration in the loop has a different dataset with several variables (ex. 38 or more) and so I can't type the names by hand every time. Is there any function that generates names for variables in a dataframe. If so, how can I use then the argument > > > > If your data is organised in a data.frame, (dummy) variable names are > available by default: > > R> mydata <- data.frame(matrix(rnorm(25), ncol=5)) > R> mydata > X1 X2 X3 X4 X5 > 1 1.3806313 -0.41827136 0.9591628 -1.3351038 0.02746110 > 2 0.5114590 -1.34111439 -0.9617552 -0.8367088 -0.06913021 > 3 -1.7508089 -0.49387076 -1.7597395 2.3899490 -0.15209650 > 4 -1.6753809 -1.28381808 -1.0424903 0.1002998 0.27784949 > 5 -0.2605535 -0.09035652 -2.5786418 1.0483400 -0.70445615 > R> rpart(X1 ~ ., data = mydata) > n= 5 > > node), split, n, deviance, yval > * denotes terminal node > > 1) root 5 7.463698 -0.3589306 * > > best, > > Torsten > > > rpart(classlabels~. ,.....) > > > > thanks > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > > > > > > >