In addition to 'sample', and if you insist on dplyr, you can use 'sample_n'. Best, Ulrik On Thu, 8 Dec 2016 at 18:47 Bert Gunter <bgunter.4567 at gmail.com> wrote:> Usually we expect posters to do their homework by reading necessary R > documentation and relevant subject matter resources (e.g. on > clustering) and making a serious attempt to solve the problem by > offering their code to us along as part of a reproducible example of > how it failed. You have done none of these things, and so you may not > receive a helpful reply -- or maybe some kind soul will offer one. > > I am not such a kind soul. However I will tell you that ?sample is > probably relevant and that you should read and follow the posting > guide at the foot of this email to post a coherent query, which, IMO, > yours is not. > > Cheers, > Bert > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Thu, Dec 8, 2016 at 8:57 AM, Partha Sinha <pnsinha68 at gmail.com> wrote: > > I want to create two files train and test using dplyr (by random sampling > > method). How to do the same same using lets say iris data. > > Regards > > Parth > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
How to get two sets of non overlapping data? Regards Parth On 8 December 2016 at 23:23, Ulrik Stervbo <ulrik.stervbo at gmail.com> wrote:> In addition to 'sample', and if you insist on dplyr, you can use > 'sample_n'. > > Best, > Ulrik > > On Thu, 8 Dec 2016 at 18:47 Bert Gunter <bgunter.4567 at gmail.com> wrote: > >> Usually we expect posters to do their homework by reading necessary R >> documentation and relevant subject matter resources (e.g. on >> clustering) and making a serious attempt to solve the problem by >> offering their code to us along as part of a reproducible example of >> how it failed. You have done none of these things, and so you may not >> receive a helpful reply -- or maybe some kind soul will offer one. >> >> I am not such a kind soul. However I will tell you that ?sample is >> probably relevant and that you should read and follow the posting >> guide at the foot of this email to post a coherent query, which, IMO, >> yours is not. >> >> Cheers, >> Bert >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Thu, Dec 8, 2016 at 8:57 AM, Partha Sinha <pnsinha68 at gmail.com> wrote: >> > I want to create two files train and test using dplyr (by random >> sampling >> > method). How to do the same same using lets say iris data. >> > Regards >> > Parth >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >[[alternative HTML version deleted]]
df <- data.frame(x = 1:12, y = rnorm(12)) If you use sample: RowIndex <- sample(1:nrow(df), 5) TrainSet <- df[RowIndex, ] TestSet <- df[-RowIndex, ] Or with dplyr: TrainSet <- sample_n(df, 5) TestSet <- anti_join(TestSet, df) HTH Ulrik On Fri, 9 Dec 2016, 06:56 Partha Sinha, <pnsinha68 at gmail.com> wrote:> How to get two sets of non overlapping data? > Regards > Parth > > On 8 December 2016 at 23:23, Ulrik Stervbo <ulrik.stervbo at gmail.com> > wrote: > > In addition to 'sample', and if you insist on dplyr, you can use > 'sample_n'. > > Best, > Ulrik > > On Thu, 8 Dec 2016 at 18:47 Bert Gunter <bgunter.4567 at gmail.com> wrote: > > Usually we expect posters to do their homework by reading necessary R > documentation and relevant subject matter resources (e.g. on > clustering) and making a serious attempt to solve the problem by > offering their code to us along as part of a reproducible example of > how it failed. You have done none of these things, and so you may not > receive a helpful reply -- or maybe some kind soul will offer one. > > I am not such a kind soul. However I will tell you that ?sample is > probably relevant and that you should read and follow the posting > guide at the foot of this email to post a coherent query, which, IMO, > yours is not. > > Cheers, > Bert > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Thu, Dec 8, 2016 at 8:57 AM, Partha Sinha <pnsinha68 at gmail.com> wrote: > > I want to create two files train and test using dplyr (by random sampling > > method). How to do the same same using lets say iris data. > > Regards > > Parth > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > >[[alternative HTML version deleted]]
Sample without replacement and then split that sample into train and test components. Jim On Fri, Dec 9, 2016 at 4:55 PM, Partha Sinha <pnsinha68 at gmail.com> wrote:> How to get two sets of non overlapping data? > Regards > Parth