I have a data set of individual trees and the plots that they are in: Tree Plot 56749 1 63494 1 87375 1 37494 2 92753 3 34847 3 38747 4 etc... So each plot is represented once for every individual that occurrs in it. Plots get different numbers of rows because there can be a different number of individuals in each plot. I want to make a data frame that consists of one individual from each plot. I would like to randomly choose one individual from each plot that is present in the data set. I will have to do this to multiple data sets which may contain different plots, and may contain up to 1200 plots, so I can't choose the plots by hand. Please help me with this. I'm an ecologist and I'm in Panama, with no one around who is educated in R. Whoever solves this problem for me will be acknowledged in any resulting publications. Thanks! -Claire -- View this message in context: http://www.nabble.com/sampling-problem---new-to-R-tf3872130.html#a10970708 Sent from the R help mailing list archive at Nabble.com.
This should create your samples: x <- "Tree Plot 56749 1 63494 1 87375 1 37494 2 92753 3 34847 3 38747 4 " x <- read.table(textConnection(x), header=TRUE) for(i in 1:10){ # take 10 sample # partition data by plot z <- by(x, x$Plot, function(.plot){ .plot[sample(nrow(.plot),1),] # choose a random sample from the plot }) z <- do.call('rbind', z) # create dataframe print(z) } On 6/5/07, baldeck <cabaldeck@yahoo.com> wrote:> > > I have a data set of individual trees and the plots that they are in: > > Tree Plot > 56749 1 > 63494 1 > 87375 1 > 37494 2 > 92753 3 > 34847 3 > 38747 4 etc... > > So each plot is represented once for every individual that occurrs in it. > Plots get different numbers of rows because there can be a different > number > of individuals in each plot. > > I want to make a data frame that consists of one individual from each > plot. > I would like to randomly choose one individual from each plot that is > present in the data set. I will have to do this to multiple data sets > which > may contain different plots, and may contain up to 1200 plots, so I can't > choose the plots by hand. > > Please help me with this. I'm an ecologist and I'm in Panama, with no one > around who is educated in R. Whoever solves this problem for me will be > acknowledged in any resulting publications. > > Thanks! > -Claire > -- > View this message in context: > http://www.nabble.com/sampling-problem---new-to-R-tf3872130.html#a10970708 > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]]
Claire, Here's one way to do it: # first, generate some sample data to try> treedata <- cbind.data.frame(Tree=1:25, Plot=sample(1:5, 25, replace=TRUE)) > treedata <- treedata[order(treedata$Plot),] > treedataTree Plot 1 1 1 2 2 1 6 6 1 9 9 1 11 11 1 17 17 1 18 18 1 23 23 1 13 13 2 16 16 2 25 25 2 5 5 3 10 10 3 4 4 4 7 7 4 8 8 4 24 24 4 3 3 5 12 12 5 14 14 5 15 15 5 19 19 5 20 20 5 21 21 5 22 22 5 # then randomly choose one tree from each plot # getting a different random set each time> sapply(split(treedata$Tree, treedata$Plot), sample, 1)1 2 3 4 5 2 25 10 4 21> sapply(split(treedata$Tree, treedata$Plot), sample, 1)1 2 3 4 5 23 13 5 4 14 Hope that solves it for you, Sarah On 6/5/07, baldeck <cabaldeck at yahoo.com> wrote:> > I have a data set of individual trees and the plots that they are in: > > Tree Plot > 56749 1 > 63494 1 > 87375 1 > 37494 2 > 92753 3 > 34847 3 > 38747 4 etc... > > So each plot is represented once for every individual that occurrs in it. > Plots get different numbers of rows because there can be a different number > of individuals in each plot. > > I want to make a data frame that consists of one individual from each plot. > I would like to randomly choose one individual from each plot that is > present in the data set. I will have to do this to multiple data sets which > may contain different plots, and may contain up to 1200 plots, so I can't > choose the plots by hand. > > Please help me with this. I'm an ecologist and I'm in Panama, with no one > around who is educated in R. Whoever solves this problem for me will be > acknowledged in any resulting publications. > > Thanks! > -Claire > ---- Sarah Goslee http://www.functionaldiversity.org
On Tue, 5 Jun 2007, baldeck wrote:> I have a data set of individual trees and the plots that they are in: > > Tree Plot > 56749 1 > 63494 1 > 87375 1 > 37494 2 > 92753 3 > 34847 3 > 38747 4 etc...You haven't told us what form the 'data set' is, but I will presume a data frame called DF. The obvious first step is to split by Plot. Using 'resample' from ?sample sapply(with(DF, split(Tree, Plot)), resample, size=1) give a vector of trees ('individuals'?) with names the plots sampled from. That seems to be what you want, but if not please come back to us with a more extensive example including the desired output.> So each plot is represented once for every individual that occurrs in it. > Plots get different numbers of rows because there can be a different number > of individuals in each plot. > > I want to make a data frame that consists of one individual from each plot. > I would like to randomly choose one individual from each plot that is > present in the data set. I will have to do this to multiple data sets which > may contain different plots, and may contain up to 1200 plots, so I can't > choose the plots by hand. > > Please help me with this. I'm an ecologist and I'm in Panama, with no one > around who is educated in R. Whoever solves this problem for me will be > acknowledged in any resulting publications. > > Thanks! > -Claire >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
I dealt with something like this recently. x <- data.frame(plot = gl(2,5), tree = rnorm(10)) y <- split(x, x$plot) ss <- numeric(2) for(i in 1:2){ ss[i] <- sample(row.names(y[[i]][1]), 1) } z <- x[ss,] People help out of the goodness of the hearts and not for publication recognition. Harold> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of baldeck > Sent: Tuesday, June 05, 2007 10:30 AM > To: r-help at stat.math.ethz.ch > Subject: [R] sampling problem - new to R > > > I have a data set of individual trees and the plots that they are in: > > Tree Plot > 56749 1 > 63494 1 > 87375 1 > 37494 2 > 92753 3 > 34847 3 > 38747 4 etc... > > So each plot is represented once for every individual that > occurrs in it. > Plots get different numbers of rows because there can be a > different number of individuals in each plot. > > I want to make a data frame that consists of one individual > from each plot. > I would like to randomly choose one individual from each plot > that is present in the data set. I will have to do this to > multiple data sets which may contain different plots, and may > contain up to 1200 plots, so I can't choose the plots by hand. > > Please help me with this. I'm an ecologist and I'm in > Panama, with no one around who is educated in R. Whoever > solves this problem for me will be acknowledged in any > resulting publications. > > Thanks! > -Claire > -- > View this message in context: > http://www.nabble.com/sampling-problem---new-to-R-tf3872130.ht > ml#a10970708 > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hi If I understand correctly, use split and sample with lapply. If DF is your dataframe lapply(split(DF$Tree, DF$Plot), function(x) sample(x,1)) shall select random tree from each plot. Or you can get it in tabular form with sapply. Regards Petr r-help-bounces at stat.math.ethz.ch napsal dne 05.06.2007 16:29:49:> > I have a data set of individual trees and the plots that they are in: > > Tree Plot > 56749 1 > 63494 1 > 87375 1 > 37494 2 > 92753 3 > 34847 3 > 38747 4 etc... > > So each plot is represented once for every individual that occurrs init.> Plots get different numbers of rows because there can be a differentnumber> of individuals in each plot. > > I want to make a data frame that consists of one individual from eachplot.> I would like to randomly choose one individual from each plot that is > present in the data set. I will have to do this to multiple data setswhich> may contain different plots, and may contain up to 1200 plots, so Ican't> choose the plots by hand. > > Please help me with this. I'm an ecologist and I'm in Panama, with noone> around who is educated in R. Whoever solves this problem for me will be > acknowledged in any resulting publications. > > Thanks! > -Claire > -- > View this message in context:http://www.nabble.com/sampling-problem---new-to-> R-tf3872130.html#a10970708 > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.