thr3ads.net - R help - [R] sample train and test data using dplyr [Dec 2016]

If this information is useful, please help other people find it:
Share via:

Ulrik Stervbo

2016-Dec-08 17:53 UTC

[R] sample train and test data using dplyr

In addition to 'sample', and if you insist on dplyr, you can use
'sample_n'.

Best,
Ulrik

On Thu, 8 Dec 2016 at 18:47 Bert Gunter <bgunter.4567 at gmail.com> wrote:
> Usually we expect posters to do their homework by reading necessary R
> documentation and relevant subject matter resources (e.g. on
> clustering) and making a serious attempt to solve the problem by
> offering their code to us along as part of  a reproducible example of
> how it failed. You have done none of these things, and so you may not
> receive a helpful reply -- or maybe some kind soul will offer one.
>
> I am not such a kind soul. However I will tell you that ?sample is
> probably relevant and that you should read and follow the posting
> guide at the foot of this email to post a coherent query, which, IMO,
> yours is not.
>
> Cheers,
> Bert
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
>
> On Thu, Dec 8, 2016 at 8:57 AM, Partha Sinha <pnsinha68 at gmail.com>
wrote:
> > I want to create two files train and test using dplyr (by random
sampling
> > method). How to do the same same using lets say iris data.
> > Regards
> > Parth
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Partha Sinha

2016-Dec-09 05:55 UTC

head link

[R] sample train and test data using dplyr

How to get two sets of non overlapping data?
Regards
Parth

On 8 December 2016 at 23:23, Ulrik Stervbo <ulrik.stervbo at gmail.com>
wrote:
> In addition to 'sample', and if you insist on dplyr, you can use
> 'sample_n'.
>
> Best,
> Ulrik
>
> On Thu, 8 Dec 2016 at 18:47 Bert Gunter <bgunter.4567 at gmail.com>
wrote:
>
>> Usually we expect posters to do their homework by reading necessary R
>> documentation and relevant subject matter resources (e.g. on
>> clustering) and making a serious attempt to solve the problem by
>> offering their code to us along as part of  a reproducible example of
>> how it failed. You have done none of these things, and so you may not
>> receive a helpful reply -- or maybe some kind soul will offer one.
>>
>> I am not such a kind soul. However I will tell you that ?sample is
>> probably relevant and that you should read and follow the posting
>> guide at the foot of this email to post a coherent query, which, IMO,
>> yours is not.
>>
>> Cheers,
>> Bert
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming
along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic
strip )
>>
>>
>> On Thu, Dec 8, 2016 at 8:57 AM, Partha Sinha <pnsinha68 at
gmail.com> wrote:
>> > I want to create two files train and test using dplyr (by random
>> sampling
>> > method). How to do the same same using lets say iris data.
>> > Regards
>> > Parth
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
	[[alternative HTML version deleted]]

Ulrik Stervbo

2016-Dec-09 06:42 UTC

head link

[R] sample train and test data using dplyr

df <- data.frame(x = 1:12, y = rnorm(12))

If you use sample:

RowIndex <- sample(1:nrow(df), 5)
TrainSet <- df[RowIndex, ]
TestSet <- df[-RowIndex, ]

Or with dplyr:

TrainSet <- sample_n(df, 5)
TestSet <- anti_join(TestSet, df)

HTH
Ulrik

On Fri, 9 Dec 2016, 06:56 Partha Sinha, <pnsinha68 at gmail.com> wrote:
> How to get two sets of non overlapping data?
> Regards
> Parth
>
> On 8 December 2016 at 23:23, Ulrik Stervbo <ulrik.stervbo at
gmail.com>
> wrote:
>
> In addition to 'sample', and if you insist on dplyr, you can use
> 'sample_n'.
>
> Best,
> Ulrik
>
> On Thu, 8 Dec 2016 at 18:47 Bert Gunter <bgunter.4567 at gmail.com>
wrote:
>
> Usually we expect posters to do their homework by reading necessary R
> documentation and relevant subject matter resources (e.g. on
> clustering) and making a serious attempt to solve the problem by
> offering their code to us along as part of  a reproducible example of
> how it failed. You have done none of these things, and so you may not
> receive a helpful reply -- or maybe some kind soul will offer one.
>
> I am not such a kind soul. However I will tell you that ?sample is
> probably relevant and that you should read and follow the posting
> guide at the foot of this email to post a coherent query, which, IMO,
> yours is not.
>
> Cheers,
> Bert
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
>
> On Thu, Dec 8, 2016 at 8:57 AM, Partha Sinha <pnsinha68 at gmail.com>
wrote:
> > I want to create two files train and test using dplyr (by random
sampling
> > method). How to do the same same using lets say iris data.
> > Regards
> > Parth
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
	[[alternative HTML version deleted]]

Jim Lemon

2016-Dec-10 06:33 UTC

head link

[R] sample train and test data using dplyr

Sample without replacement and then split that sample into train and
test components.

Jim

On Fri, Dec 9, 2016 at 4:55 PM, Partha Sinha <pnsinha68 at gmail.com>
wrote:> How to get two sets of non overlapping data?
> Regards
> Parth

R help - Dec 2016 - sample train and test data using dplyr

[R] sample train and test data using dplyr

[R] sample train and test data using dplyr

[R] sample train and test data using dplyr

[R] sample train and test data using dplyr