Ranjan Maitra
2022-Mar-04 05:00 UTC
[R] Looking for package for data generation for classification and regression
On Thu Mar03'22 09:00:08PM, Paul Smith wrote:> From: Paul Smith <phhs80 at gmail.com> > Date: Thu, 3 Mar 2022 21:00:08 +0000 > To: "r-help at r-project.org" <r-help at r-project.org> > Subject: [R] Looking for package for data generation for classification and > regression > > Dear All, > > I am in need of generating artificial data for machine learning > classification and regression analysis. What I am looking for is > something similar to Python sklearn.datasets.make_classification and > sklearn.datasets.make_regression: > > https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html > > https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html > > I have searched CRAN for something similar, but found nothing. Could > someone please help me with this?Not sure if this helps, but at least for classification and clustering, there is the MixSim package on CRAN which provides classification datasets according to an overall overlap measure. Hope this helps! Best wishes, Ranjan
Paul Smith
2022-Mar-04 10:41 UTC
[R] Looking for package for data generation for classification and regression
On Fri, Mar 4, 2022 at 8:07 AM Ranjan Maitra <mlmaitra at gmx.com> wrote:> > > I am in need of generating artificial data for machine learning > > classification and regression analysis. What I am looking for is > > something similar to Python sklearn.datasets.make_classification and > > sklearn.datasets.make_regression: > > > > https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html > > > > https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html > > > > I have searched CRAN for something similar, but found nothing. Could > > someone please help me with this? > > Not sure if this helps, but at least for classification and clustering, there is the MixSim package on CRAN which provides classification datasets according to an overall overlap measure.Thanks, Ranjan, that is also quite helpful, since clustering is also a topic of the course! Paul