Greetings: I am looking for some help (probably really basic) with looping. What I want to do is repeatedly sample observations (about 100 per sample) from a large dataset (100,000 observations). I would like the samples labelled sample.1, sample.2, and so on (or some other suitably simple naming scheme). To do this manually I would>smp.1 <- sample(100000, 100) >sample.1 <- dataset[smp.1,] >smp.2 <- sample(100000, 100) >sample.2 <- dataset[smp.2,]. . .>smp.50 <- sample(100000, 100) >sample.50 <- dataset[smp.50,]and so on. I tried the following loop code to generate 100 samples:>for (i in 1:50){ >+ smp.[i] <- sample(100000, 100) >+ sample.[i] <- dataset[smp.[i],]}Unfortunately, that does not work -- specifying the looping variable i in the way that I have does not work since R uses that to reference places in a vector (x[i] would be the ith element in the vector x) Is it possible to assign the value of the looping variable in a name within the loop structure? Cheers, Neil Hepburn ==========================================Neil Hepburn, Economics Instructor Social Sciences Department, The University of Alberta Augustana Campus 4901 - 46 Avenue Camrose, Alberta T4V 2R3 Phone (780) 697-1588 email nhepburn at augustana.ca
You do not say -- and I am unable to divine -- whether you wish to sample with or without replacement: each time or as a whole. In general, when you want to do this sort of thing, the fastest way to do it is just to sample everything you need at once and then form it into a list or matrix or whatever. For example, for sampling 100 each time with replacement 200 times: mySamples <- matrix(sample(yourDatavector, 100*200,replace=FALSE),ncol=200) will give you a 100 row by 200 column matrix of samples without replacement from yourDatavector. I hope that you can adapt this to suit your needs. Bert Gunter Nonclinical Statistics 7-7374 -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Neil Hepburn Sent: Monday, February 26, 2007 4:11 PM To: r-help at stat.math.ethz.ch Subject: [R] looping Greetings: I am looking for some help (probably really basic) with looping. What I want to do is repeatedly sample observations (about 100 per sample) from a large dataset (100,000 observations). I would like the samples labelled sample.1, sample.2, and so on (or some other suitably simple naming scheme). To do this manually I would>smp.1 <- sample(100000, 100) >sample.1 <- dataset[smp.1,] >smp.2 <- sample(100000, 100) >sample.2 <- dataset[smp.2,]. . .>smp.50 <- sample(100000, 100) >sample.50 <- dataset[smp.50,]and so on. I tried the following loop code to generate 100 samples:>for (i in 1:50){ >+ smp.[i] <- sample(100000, 100) >+ sample.[i] <- dataset[smp.[i],]}Unfortunately, that does not work -- specifying the looping variable i in the way that I have does not work since R uses that to reference places in a vector (x[i] would be the ith element in the vector x) Is it possible to assign the value of the looping variable in a name within the loop structure? Cheers, Neil Hepburn ==========================================Neil Hepburn, Economics Instructor Social Sciences Department, The University of Alberta Augustana Campus 4901 - 46 Avenue Camrose, Alberta T4V 2R3 Phone (780) 697-1588 email nhepburn at augustana.ca ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Another way is to use an indexed list, which is far more tidier than your method. If you mean "about 100" as in an irregular number, then a list is your friend (i.e., a ragged array, that can have sometimes 97 samples, sometime 105 samples, etc.). Similar to your example: dat <- runif(100000,0,100) # fake dataset smp <- list() # need an empty list first for(i in 1:1000) smp[[i]] <- sample(dat,100) However, if you are new to R/S, the best advice is to learn to _not_ use the for loop (because it is slow, and there are "vectorized" ways). For example, if we want to find the mean of each sample, then return a tidy result: sapply(samp,mean) or a crazy new analysis you might be working on: crazy <- function(x,y) (sum(x>y)^2)/sum(x) sapply(smp,crazy,10) etc. +mt
For the example that you give, using lapply, sapply, or replicate may be the better way to go:> mysample <- replicate( 50, dataset[ sample(100000,100), ] )If you really want to use a loop, then use a list:> mysamples <- list() > mysampdata <- list() > for (i in 1:50){+ mysamples[[i]] <- sample(100000, 100) + mysampdata[[i]] <- dataset[ mysamples[[i]], ] + } Then you can use lapply or sapply to do something with each sampled dataset:> sapply( mysampdata, summary )Or you can access individual elements in a number of ways:> summary( mysampdata[[1]] ) > names(mysampdata) <- paste('d',1:50, sep='') > with(mysampdata, summary(d2)) > summary( mysampdata$d3 ) > attach(mysampdata) > summary(d4) > detach()Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at intermountainmail.org (801) 408-8111> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Neil Hepburn > Sent: Monday, February 26, 2007 5:11 PM > To: r-help at stat.math.ethz.ch > Subject: [R] looping > > > Greetings: > > I am looking for some help (probably really basic) with > looping. What I want to do is repeatedly sample observations > (about 100 per sample) from a large dataset (100,000 > observations). I would like the samples labelled sample.1, > sample.2, and so on (or some other suitably simple naming > scheme). To do this manually I would > > >smp.1 <- sample(100000, 100) > >sample.1 <- dataset[smp.1,] > >smp.2 <- sample(100000, 100) > >sample.2 <- dataset[smp.2,] > . > . > . > >smp.50 <- sample(100000, 100) > >sample.50 <- dataset[smp.50,] > > and so on. > > I tried the following loop code to generate 100 samples: > > >for (i in 1:50){ > >+ smp.[i] <- sample(100000, 100) > >+ sample.[i] <- dataset[smp.[i],]} > > Unfortunately, that does not work -- specifying the looping > variable i in the way that I have does not work since R uses > that to reference places in a vector (x[i] would be the ith > element in the vector x) > > Is it possible to assign the value of the looping variable in > a name within the loop structure? > > Cheers, > Neil Hepburn > > ==========================================> Neil Hepburn, Economics Instructor > Social Sciences Department, > The University of Alberta Augustana Campus > 4901 - 46 Avenue > Camrose, Alberta > T4V 2R3 > > Phone (780) 697-1588 > email nhepburn at augustana.ca > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >