All, I have data that looks like below. For each id there may be more than one value per day. I want to select a random value for that day for that id. The end result would hopefully be a matrix with the id as rows, date as columns and populated by the random hab value. Thanks to someone on here (Jim) I know how to do the matrix, but now realize I need to randomly select some of my values. All help is appreciated. jm id, date, loctype, habtype 50022 1/25/2006 0 6 50022 1/31/2006 0 6 50022 2/8/2006 0 6 50022 2/13/2006 0 6 50022 2/15/2006 0 6 50022 2/24/2006 0 6 50022 3/2/2006 0 6 50022 3/9/2006 0 6 50022 3/16/2006 0 6 50022 3/24/2006 0 6 50022 4/9/2006 0 3 50022 4/18/2006 0 6 50022 4/27/2006 0 3 50022 5/23/2006 1 3 50022 5/23/2006 1 6 50022 5/24/2006 1 3 50022 5/24/2006 1 3 50022 5/24/2006 1 3 50022 5/24/2006 1 3 50022 5/25/2006 1 3 50022 5/25/2006 1 3 50022 5/25/2006 1 3 50022 5/26/2006 1 5 50022 5/26/2006 1 3 50022 5/27/2006 1 5 50022 5/27/2006 1 3 50022 5/28/2006 1 3 50022 5/29/2006 1 3 50022 5/30/2006 1 5 50022 5/30/2006 1 3 50022 5/31/2006 1 3 50022 5/31/2006 1 3 50022 6/1/2006 1 3 50022 6/2/2006 1 3 50022 6/3/2006 1 3 50022 6/4/2006 1 3 50022 6/5/2006 1 3 50022 6/6/2006 1 5 50022 6/6/2006 1 5 50022 6/6/2006 1 5 50022 6/6/2006 1 3 50022 6/6/2006 1 3 50022 6/7/2006 1 5 50022 6/7/2006 1 3 50022 6/7/2006 1 3 50022 6/7/2006 1 3 50022 6/7/2006 1 3 50022 6/8/2006 1 3 50022 6/8/2006 1 3 50022 6/8/2006 1 3 50022 6/9/2006 1 5 50022 6/10/2006 1 3 50022 6/11/2006 1 5 [[alternative HTML version deleted]]
#Highlight the text below (without the header) # read the data in from clipboard df <- do.call(data.frame, scan("clipboard", what=list(id=0, date="",loctype=0 ,haptype=0))) # split the data by date, sample 1 observation from each split, and rbind sampled_df <- do.call(rbind, lapply(split(df, df$date),function(x)x[sample(1:nrow(x), 1),])) On Mon, Jun 29, 2009 at 9:11 AM, James Martin <just.struttin@gmail.com>wrote:> All, > > I have data that looks like below. For each id there may be more than one > value per day. I want to select a random value for that day for that id. > The end result would hopefully be a matrix with the id as rows, date as > columns and populated by the random hab value. Thanks to someone on here > (Jim) I know how to do the matrix, but now realize I need to randomly > select > some of my values. All help is appreciated. jm > id, date, loctype, habtype > 50022 1/25/2006 0 6 50022 1/31/2006 0 6 50022 2/8/2006 0 6 50022 > 2/13/2006 0 6 50022 2/15/2006 0 6 50022 2/24/2006 0 6 50022 3/2/2006 0 6 > 50022 3/9/2006 0 6 50022 3/16/2006 0 6 50022 3/24/2006 0 6 50022 > 4/9/2006 > 0 3 50022 4/18/2006 0 6 50022 4/27/2006 0 3 50022 5/23/2006 1 3 50022 > 5/23/2006 1 6 50022 5/24/2006 1 3 50022 5/24/2006 1 3 50022 5/24/2006 1 > 3 > 50022 5/24/2006 1 3 50022 5/25/2006 1 3 50022 5/25/2006 1 3 50022 > 5/25/2006 1 3 50022 5/26/2006 1 5 50022 5/26/2006 1 3 50022 5/27/2006 1 > 5 > 50022 5/27/2006 1 3 50022 5/28/2006 1 3 50022 5/29/2006 1 3 50022 > 5/30/2006 1 5 50022 5/30/2006 1 3 50022 5/31/2006 1 3 50022 5/31/2006 1 > 3 > 50022 6/1/2006 1 3 50022 6/2/2006 1 3 50022 6/3/2006 1 3 50022 6/4/2006 > 1 > 3 50022 6/5/2006 1 3 50022 6/6/2006 1 5 50022 6/6/2006 1 5 50022 > 6/6/2006 1 5 50022 6/6/2006 1 3 50022 6/6/2006 1 3 50022 6/7/2006 1 5 > 50022 6/7/2006 1 3 50022 6/7/2006 1 3 50022 6/7/2006 1 3 50022 6/7/2006 > 1 > 3 50022 6/8/2006 1 3 50022 6/8/2006 1 3 50022 6/8/2006 1 3 50022 > 6/9/2006 1 5 50022 6/10/2006 1 3 50022 6/11/2006 1 5 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
On Wed, Jul 1, 2009 at 2:10 PM, Sunil Suchindran<sunilsuchindran at gmail.com> wrote:> #Highlight the text below (without the header) > # read the data in from clipboard > > df <- do.call(data.frame, scan("clipboard", what=list(id=0, > date="",loctype=0 ,haptype=0))) > > # split the data by date, sample 1 observation from each split, and rbind > > sampled_df <- do.call(rbind, lapply(split(df, > df$date),function(x)x[sample(1:nrow(x), 1),]))ddply from the plyr package (http://had.co.nz/plyr), makes this sort of operation a little simpler: ddply(df, "date", function(df) df[sample(nrow(df), 1), ]) Hadley -- http://had.co.nz/
Hadley, Sunil, and list, This is not quite doing what I wanted it to do (as far as I can tell). I perhaps did not explain it thoroughly. It seems to be sampling one value for each day leaving ~200 observations. I need for it randomly chose one hab value for each bird if there is more than one value for a given day, I will try and example below. id,date,location2,hab 1,05/23/06,0,1 1,05/23/06,0,2 1,05/23/06,0,1 So in this case the animal was located 3 times on may 23rd but I only want one of the locations and instead of arbitrarily choosing one I wanted to randomly sample one. I hope I did a better job explaining my issue. Thanks in advance. jm On Wed, Jul 1, 2009 at 3:38 PM, hadley wickham <h.wickham at gmail.com> wrote:> On Wed, Jul 1, 2009 at 2:10 PM, Sunil > Suchindran<sunilsuchindran at gmail.com> wrote: > > #Highlight the text below (without the header) > > # read the data in from clipboard > > > > df <- do.call(data.frame, scan("clipboard", what=list(id=0, > > date="",loctype=0 ,haptype=0))) > > > > # split the data by date, sample 1 observation from each split, and rbind > > > > sampled_df <- do.call(rbind, lapply(split(df, > > df$date),function(x)x[sample(1:nrow(x), 1),])) > > ddply from the plyr package (http://had.co.nz/plyr), makes this sort > of operation a little simpler: > > ddply(df, "date", function(df) df[sample(nrow(df), 1), ]) > > Hadley > > > -- > http://had.co.nz/ >-- James A. Martin 850-445-9773
On Thu, Jul 2, 2009 at 8:15 AM, James Martin<just.struttin at gmail.com> wrote:> Hadley, Sunil, and list, > > This is not quite doing what I wanted it to do (as far as I can tell). I > perhaps did not explain it thoroughly.? It seems to be sampling one value > for each day leaving ~200 observations. I need for it randomly chose one hab > value for each bird if there is more than one value for a given day, I will > try and example below. > > id,date,location2,hab > > 1,05/23/06,0,1 > 1,05/23/06,0,2 > 1,05/23/06,0,1 > > So in this case the animal was located 3 times on may 23rd but I only want > one of the locations and instead of arbitrarily choosing one I wanted to > randomly sample one.ddply(df, c("date", "location"), function(df) df[sample(nrow(df), 1), ]) Hadley -- http://had.co.nz/
ddply(df, c("date", "id"), function(df) df[sample(nrow(df), 1), ]) Thanks to Hadley and Sunil. The above code solves my problem. jm On Mon, Jun 29, 2009 at 9:11 AM, James Martin <just.struttin@gmail.com>wrote:> All, > > I have data that looks like below. For each id there may be more than one > value per day. I want to select a random value for that day for that id. > The end result would hopefully be a matrix with the id as rows, date as > columns and populated by the random hab value. Thanks to someone on here > (Jim) I know how to do the matrix, but now realize I need to randomly select > some of my values. All help is appreciated. jm > id, date, loctype, habtype > 50022 1/25/2006 0 6 50022 1/31/2006 0 6 50022 2/8/2006 0 6 50022 > 2/13/2006 0 6 50022 2/15/2006 0 6 50022 2/24/2006 0 6 50022 3/2/2006 0 > 6 50022 3/9/2006 0 6 50022 3/16/2006 0 6 50022 3/24/2006 0 6 50022 > 4/9/2006 0 3 50022 4/18/2006 0 6 50022 4/27/2006 0 3 50022 5/23/2006 1 > 3 50022 5/23/2006 1 6 50022 5/24/2006 1 3 50022 5/24/2006 1 3 50022 > 5/24/2006 1 3 50022 5/24/2006 1 3 50022 5/25/2006 1 3 50022 5/25/2006 1 > 3 50022 5/25/2006 1 3 50022 5/26/2006 1 5 50022 5/26/2006 1 3 50022 > 5/27/2006 1 5 50022 5/27/2006 1 3 50022 5/28/2006 1 3 50022 5/29/2006 1 > 3 50022 5/30/2006 1 5 50022 5/30/2006 1 3 50022 5/31/2006 1 3 50022 > 5/31/2006 1 3 50022 6/1/2006 1 3 50022 6/2/2006 1 3 50022 6/3/2006 1 3 > 50022 6/4/2006 1 3 50022 6/5/2006 1 3 50022 6/6/2006 1 5 50022 6/6/2006 > 1 5 50022 6/6/2006 1 5 50022 6/6/2006 1 3 50022 6/6/2006 1 3 50022 > 6/7/2006 1 5 50022 6/7/2006 1 3 50022 6/7/2006 1 3 50022 6/7/2006 1 3 > 50022 6/7/2006 1 3 50022 6/8/2006 1 3 50022 6/8/2006 1 3 50022 6/8/2006 > 1 3 50022 6/9/2006 1 5 50022 6/10/2006 1 3 50022 6/11/2006 1 5 > > > >-- James A. Martin 850-445-9773 [[alternative HTML version deleted]]