Frank Duan
2006-Jan-04 03:23 UTC
[R] Looking for packages to do Feature Selection and Classification
Hi All, Sorry if this is a repost (a quick browse didn't give me the answer). I wonder if there are packages that can do the feature selection and classification at the same time. For instance, I am using SVM to classify my samples, but it's easy to get overfitted if using all of the features. Thus, it is necessary to select "good" features to build an optimum hyperplane (?). Here is a simple example: Suppose I have 100 "useful" features and 100 "useless" features (or noise features), I want the SVM to give me the same results when 1) using only 100 useful features or 2) using all 200 features. Any suggestions or point me to a reference? Thanks in advance! Frank [[alternative HTML version deleted]]
Diaz.Ramon
2006-Jan-04 08:30 UTC
[R] Looking for packages to do Feature Selection and Classification
Dear Frank, I expect you'll get many different answers since a wide variety of approaches have been suggested. So I'll stick to self-advertisment: I've written an R package, varSelRF (available from R), that uses random forest together with a simple variable selection approach, and provides also bootstrap estimates of the error rate of the procedure. Andy Liaw and collaborators previously developed and published a somewhat similar procedure. You probably also want to take a look at several packages available from BioConductor. Best, R. -----Original Message----- From: r-help-bounces at stat.math.ethz.ch on behalf of Frank Duan Sent: Wed 1/4/2006 4:23 AM To: r-help Cc: Subject: [R] Looking for packages to do Feature Selection and Classification Hi All, Sorry if this is a repost (a quick browse didn't give me the answer). I wonder if there are packages that can do the feature selection and classification at the same time. For instance, I am using SVM to classify my samples, but it's easy to get overfitted if using all of the features. Thus, it is necessary to select "good" features to build an optimum hyperplane (?). Here is a simple example: Suppose I have 100 "useful" features and 100 "useless" features (or noise features), I want the SVM to give me the same results when 1) using only 100 useful features or 2) using all 200 features. Any suggestions or point me to a reference? Thanks in advance! Frank [[alternative HTML version deleted]] ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Ram??n D??az-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncol??gicas (CNIO) (Spanish National Cancer Center) Melchor Fern??ndez Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electr??nico, y en s...{{dropped}}
Diaz.Ramon
2006-Jan-06 01:18 UTC
[R] Looking for packages to do Feature Selection and Classification
Thanks for the reference, it looks very interesting. Best, R. -----Original Message----- From: Weiwei Shi [mailto:helprhelp at gmail.com] Sent: Thu 1/5/2006 9:01 PM To: Diaz.Ramon Cc: Frank Duan; r-help Subject: Re: [R] Looking for packages to do Feature Selection and Classification FYI: check the following paper on svm (using libsvm) as well as random forest in the context of feature selection. http://www.csie.ntu.edu.tw/~cjlin/papers/features.pdf HTH On 1/4/06, Diaz.Ramon <rdiaz at cnio.es> wrote:> Dear Frank, > I expect you'll get many different answers since a wide variety of approaches have been suggested. So I'll stick to self-advertisment: I've written an R package, varSelRF (available from R), that uses random forest together with a simple variable selection approach, and provides also bootstrap estimates of the error rate of the procedure. Andy Liaw and collaborators previously developed and published a somewhat similar procedure. You probably also want to take a look at several packages available from BioConductor. > > Best, > > R. > > > -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch on behalf of Frank Duan > Sent: Wed 1/4/2006 4:23 AM > To: r-help > Cc: > Subject: [R] Looking for packages to do Feature Selection and Classification > > Hi All, > > Sorry if this is a repost (a quick browse didn't give me the answer). > > I wonder if there are packages that can do the feature selection and > classification at the same time. For instance, I am using SVM to classify my > samples, but it's easy to get overfitted if using all of the features. Thus, > it is necessary to select "good" features to build an optimum hyperplane > (?). Here is a simple example: Suppose I have 100 "useful" features and 100 > "useless" features (or noise features), I want the SVM to give me the > same results when 1) using only 100 useful features or 2) using all 200 > features. > > Any suggestions or point me to a reference? > > Thanks in advance! > > Frank > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > > -- > Ram??n D??az-Uriarte > Bioinformatics Unit > Centro Nacional de Investigaciones Oncol??gicas (CNIO) > (Spanish National Cancer Center) > Melchor Fern??ndez Almagro, 3 > 28029 Madrid (Spain) > Fax: +-34-91-224-6972 > Phone: +-34-91-224-6900 > > http://ligarto.org/rdiaz > PGP KeyID: 0xE89B3462 > (http://ligarto.org/rdiaz/0xE89B3462.asc) > > > > **NOTA DE CONFIDENCIALIDAD** Este correo electr??nico, y en s...{{dropped}} > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Weiwei Shi, Ph.D "Did you always know?" "No, I did not. But I believed..." ---Matrix III **NOTA DE CONFIDENCIALIDAD** Este correo electr??nico, y en s...{{dropped}}