thr3ads.net - R help - [R] A programming question - is what I want to do possible in R? [Sep 2009]

If this information is useful, please help other people find it:
Share via:

ewaters

2009-Sep-28 02:45 UTC

[R] A programming question - is what I want to do possible in R?

I have a large data frame, 77 rows, with 10 columns. Each row represents a
unique individual with 10 characteristics, some of which are categorical
factors and some continuous numeric variables. Each of the ten variables is
important (the 10 columns obviously correspond to the individuals of
interest). Importantly, this data set represents a population (not sample)
of people with a certain medical condition.

What I want to do is to select 2000 random samples of between 2 and 24
individuals, preserving all the information.

I can easily write loops that will sample from 1:77 2 - 24 times, what I
really want to know is there any way to easily link the output of loops like
that to the data set so I don't have to trawl through and do it manually
2000 times?

Any advice on whether I should even attempt that in R, or try some sort of
hash table in C or somewhere, would be appreciated.
-- 
View this message in context:
http://www.nabble.com/A-programming-question---is-what-I-want-to-do-possible-in-R--tp25639955p25639955.html
Sent from the R help mailing list archive at Nabble.com.

jim holtman

2009-Sep-28 12:02 UTC

head link

[R] A programming question - is what I want to do possible in R?

nTime <- 15  # how many samples to take
randomSamples <- lapply(1:2000, function(){
    largeDF[sample(nrow(largeDF), nTimes),]
})

This will create a list of 2000 dataframes with the samples

On Sun, Sep 27, 2009 at 10:45 PM, ewaters <ewaters at nchecr.unsw.edu.au>
wrote:>
> I have a large data frame, 77 rows, with 10 columns. Each row represents a
> unique individual with 10 characteristics, some of which are categorical
> factors and some continuous numeric variables. Each of the ten variables is
> important (the 10 columns obviously correspond to the individuals of
> interest). Importantly, this data set represents a population (not sample)
> of people with a certain medical condition.
>
> What I want to do is to select 2000 random samples of between 2 and 24
> individuals, preserving all the information.
>
> I can easily write loops that will sample from 1:77 2 - 24 times, what I
> really want to know is there any way to easily link the output of loops
like
> that to the data set so I don't have to trawl through and do it
manually
> 2000 times?
>
> Any advice on whether I should even attempt that in R, or try some sort of
> hash table in C or somewhere, would be appreciated.
> --
> View this message in context:
http://www.nabble.com/A-programming-question---is-what-I-want-to-do-possible-in-R--tp25639955p25639955.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

David Winsemius

2009-Sep-28 12:06 UTC

head link

[R] A programming question - is what I want to do possible in R?

On Sep 27, 2009, at 10:45 PM, ewaters wrote:
>
> I have a large data frame, 77 rows, with 10 columns. Each row  
> represents a
> unique individual with 10 characteristics, some of which are  
> categorical
> factors and some continuous numeric variables.
Most of us would consider that a small dataframe, unless of course we  
entered the values by hand as you may have.
> Each of the ten variables is
> important (the 10 columns obviously correspond to the individuals of
> interest). Importantly, this data set represents a population (not  
> sample)
> of people with a certain medical condition.
You have the world's enumeration of persons with condition X? Besides  
that obvious objection, if you really thought you had an entire  
population, there would be little point in doing statistics through  
random sampling.
>
> What I want to do is to select 2000 random samples of between 2 and 24
> individuals, preserving all the information.
>
> I can easily write loops that will sample from 1:77 2 - 24 times,  
> what I
> really want to know is there any way to easily link the output of  
> loops like
> that to the data set so I don't have to trawl through and do it  
> manually
> 2000 times?
If you have a vector, vec of any length that represents a sample from  
1:77 and your dataframe is df1, then you can use that index vector to  
extract a group thusly:

df1[vec, ]

Example:

 > set.seed(97)
 > df1 <- data.frame(casenum=1:77, ht=rnorm(77, 56, 7), wt=rnorm(77,  
160, 30) )
 > df1[c(17, 29, 36, 55, 72), ]
    casenum       ht       wt
17      17 62.68708 110.0956
29      29 69.97378 124.9440
36      36 49.97847 101.8919
55      55 57.46707 169.7421
72      72 52.25796 118.2071
>
> Any advice on whether I should even attempt that in R, or try some  
> sort of
> hash table in C or somewhere, would be appreciated.-- 

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

R help - Sep 2009 - A programming question - is what I want to do possible in R?

[R] A programming question - is what I want to do possible in R?

[R] A programming question - is what I want to do possible in R?

[R] A programming question - is what I want to do possible in R?

Apparently Analagous Threads