This may be a simple problem, but I am looking to select a subset of rows from a dataframe that will have the same parameters as all the rows in another dataframe. e.g. I have a 500 row dataframe with 20 columns. I want to select a subset of rows from a larger dataframe that match the distribution of values for one or more of the columns within the 500 row dataframe (i.e. within same range, but also having same mean/median and overall shape). By basic subsetting I can get a set with a similar approximate distribution to the 500 row dataset, but not highly similar, and this might be a problem for the analysis. Any help would be much appreciated, thanks. -- View this message in context: http://www.nabble.com/Select-subset-with-specific-distribution-parameters.-tp24848201p24848201.html Sent from the R help mailing list archive at Nabble.com.
stephen sefick
2009-Aug-09 12:13 UTC
[R] Select subset with specific distribution parameters.
Could you provide a reproducible example? On Thu, Aug 6, 2009 at 10:59 AM, sedm1000<gdoran at mit.edu> wrote:> > This may be a simple problem, but I am looking to select a subset of rows > from a dataframe that will have the same parameters as all the rows in > another dataframe. > > e.g. I have a 500 row dataframe with 20 columns. I want to select a subset > of rows from a larger dataframe that match the distribution of values for > one or more of the columns within the 500 row dataframe (i.e. within same > range, but also having same mean/median and overall shape). > > By basic subsetting I can get a set with a similar approximate distribution > to the 500 row dataset, but not highly similar, and this might be a problem > for the analysis. Any help would be much appreciated, thanks. > > -- > View this message in context: http://www.nabble.com/Select-subset-with-specific-distribution-parameters.-tp24848201p24848201.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Stephen Sefick Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis
I'm sorry, an example needed a lot of data, hence I skipped it. I've settled for taking random sets from within a defined range to simulate the potential distributions and to calculate a rough significance value from that... Thanks for your help. -- View this message in context: http://www.nabble.com/Select-subset-with-specific-distribution-parameters.-tp24848201p24891867.html Sent from the R help mailing list archive at Nabble.com.
Seemingly Similar Threads
- Table of Summaries
- Reading in and writing out one line at a time
- Subset with selection variable from function argument. Is there another way?
- how to select rows per subset in a data frame that are max. w.r.t. a column
- subset based on column names and then subset based on the inverse (grep?, or...)