Paul Smith
2022-Mar-04 10:41 UTC
[R] Looking for package for data generation for classification and regression
On Fri, Mar 4, 2022 at 8:07 AM Ranjan Maitra <mlmaitra at gmx.com> wrote:> > > I am in need of generating artificial data for machine learning > > classification and regression analysis. What I am looking for is > > something similar to Python sklearn.datasets.make_classification and > > sklearn.datasets.make_regression: > > > > https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html > > > > https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html > > > > I have searched CRAN for something similar, but found nothing. Could > > someone please help me with this? > > Not sure if this helps, but at least for classification and clustering, there is the MixSim package on CRAN which provides classification datasets according to an overall overlap measure.Thanks, Ranjan, that is also quite helpful, since clustering is also a topic of the course! Paul
Ranjan Maitra
2022-Mar-04 17:03 UTC
[R] Looking for package for data generation for classification and regression
On Fri Mar04'22 10:41:24AM, Paul Smith wrote:> From: Paul Smith <phhs80 at gmail.com> > Date: Fri, 4 Mar 2022 10:41:24 +0000 > To: Ranjan Maitra <mlmaitra at gmx.com> > Cc: "r-help at r-project.org" <r-help at r-project.org> > Subject: Re: [R] Looking for package for data generation for > classification and regression > > On Fri, Mar 4, 2022 at 8:07 AM Ranjan Maitra <mlmaitra at gmx.com> wrote: > > > > > I am in need of generating artificial data for machine learning > > > classification and regression analysis. What I am looking for is > > > something similar to Python sklearn.datasets.make_classification and > > > sklearn.datasets.make_regression: > > > > > > https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html > > > > > > https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html > > > > > > I have searched CRAN for something similar, but found nothing. Could > > > someone please help me with this? > > > > Not sure if this helps, but at least for classification and clustering, there is the MixSim package on CRAN which provides classification datasets according to an overall overlap measure. > > Thanks, Ranjan, that is also quite helpful, since clustering is also a > topic of the course! > > Paul >The Clustering Algorithms Referee Package (CARP) uses the same codebase but is more general. https://jmlr.org/papers/v12/melnykov11a.html Unfortunately, it is written in C, so may not help. It is on www.mloss.org at: https://mloss.org/software/view/248/ but perhaps should also be moved to github. Best wishes, Ranjan