Paul Smith
2022-Mar-03 21:00 UTC
[R] Looking for package for data generation for classification and regression
Dear All, I am in need of generating artificial data for machine learning classification and regression analysis. What I am looking for is something similar to Python sklearn.datasets.make_classification and sklearn.datasets.make_regression: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html I have searched CRAN for something similar, but found nothing. Could someone please help me with this? Thanks in advance, Paul
Tom Woolman
2022-Mar-03 21:04 UTC
[R] Looking for package for data generation for classification and regression
Hi Paul. Have you considered just going onto Kaggle and GitHub and searching for some of the many freely available real datasets that are posted there? I'm seeing a lot of productivity there days with research focused on data generation, and not just on creating algorithms and predictive models. Which is a good thing for us ;) One of the current research papers I'm working on now is based on mining a dataset I discovered on Kaggle a few months back and trying to create a novel solution for that. Proper credit will of course be provided in the citation references for the data provider. Thanks, Tom On 2022-03-03 16:00, Paul Smith wrote:> Dear All, > > I am in need of generating artificial data for machine learning > classification and regression analysis. What I am looking for is > something similar to Python sklearn.datasets.make_classification and > sklearn.datasets.make_regression: > > https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html > > https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html > > I have searched CRAN for something similar, but found nothing. Could > someone please help me with this? > > Thanks in advance, > > Paul > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Ranjan Maitra
2022-Mar-04 05:00 UTC
[R] Looking for package for data generation for classification and regression
On Thu Mar03'22 09:00:08PM, Paul Smith wrote:> From: Paul Smith <phhs80 at gmail.com> > Date: Thu, 3 Mar 2022 21:00:08 +0000 > To: "r-help at r-project.org" <r-help at r-project.org> > Subject: [R] Looking for package for data generation for classification and > regression > > Dear All, > > I am in need of generating artificial data for machine learning > classification and regression analysis. What I am looking for is > something similar to Python sklearn.datasets.make_classification and > sklearn.datasets.make_regression: > > https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html > > https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html > > I have searched CRAN for something similar, but found nothing. Could > someone please help me with this?Not sure if this helps, but at least for classification and clustering, there is the MixSim package on CRAN which provides classification datasets according to an overall overlap measure. Hope this helps! Best wishes, Ranjan