All, I have data that looks like below. For each id there may be more than one value per day. I want to select a random value for that day for that id. The end result would hopefully be a matrix with the id as rows, date as columns and populated by the random hab value. Thanks to someone on here (Jim) I know how to do the matrix, but now realize I need to randomly select some of my values. All help is appreciated. jm id, date, loctype, habtype 50022 1/25/2006 0 6 50022 1/31/2006 0 6 50022 2/8/2006 0 6 50022 2/13/2006 0 6 50022 2/15/2006 0 6 50022 2/24/2006 0 6 50022 3/2/2006 0 6 50022 3/9/2006 0 6 50022 3/16/2006 0 6 50022 3/24/2006 0 6 50022 4/9/2006 0 3 50022 4/18/2006 0 6 50022 4/27/2006 0 3 50022 5/23/2006 1 3 50022 5/23/2006 1 6 50022 5/24/2006 1 3 50022 5/24/2006 1 3 50022 5/24/2006 1 3 50022 5/24/2006 1 3 50022 5/25/2006 1 3 50022 5/25/2006 1 3 50022 5/25/2006 1 3 50022 5/26/2006 1 5 50022 5/26/2006 1 3 50022 5/27/2006 1 5 50022 5/27/2006 1 3 50022 5/28/2006 1 3 50022 5/29/2006 1 3 50022 5/30/2006 1 5 50022 5/30/2006 1 3 50022 5/31/2006 1 3 50022 5/31/2006 1 3 50022 6/1/2006 1 3 50022 6/2/2006 1 3 50022 6/3/2006 1 3 50022 6/4/2006 1 3 50022 6/5/2006 1 3 50022 6/6/2006 1 5 50022 6/6/2006 1 5 50022 6/6/2006 1 5 50022 6/6/2006 1 3 50022 6/6/2006 1 3 50022 6/7/2006 1 5 50022 6/7/2006 1 3 50022 6/7/2006 1 3 50022 6/7/2006 1 3 50022 6/7/2006 1 3 50022 6/8/2006 1 3 50022 6/8/2006 1 3 50022 6/8/2006 1 3 50022 6/9/2006 1 5 50022 6/10/2006 1 3 50022 6/11/2006 1 5 [[alternative HTML version deleted]]
#Highlight the text below (without the header)
# read the data in from clipboard
df <- do.call(data.frame, scan("clipboard", what=list(id=0,
date="",loctype=0 ,haptype=0)))
# split the data by date, sample 1 observation from each split, and rbind
sampled_df <- do.call(rbind, lapply(split(df,
df$date),function(x)x[sample(1:nrow(x), 1),]))
On Mon, Jun 29, 2009 at 9:11 AM, James Martin
<just.struttin@gmail.com>wrote:
> All,
>
> I have data that looks like below. For each id there may be more than one
> value per day. I want to select a random value for that day for that id.
> The end result would hopefully be a matrix with the id as rows, date as
> columns and populated by the random hab value. Thanks to someone on here
> (Jim) I know how to do the matrix, but now realize I need to randomly
> select
> some of my values. All help is appreciated. jm
> id, date, loctype, habtype
> 50022 1/25/2006 0 6 50022 1/31/2006 0 6 50022 2/8/2006 0 6 50022
> 2/13/2006 0 6 50022 2/15/2006 0 6 50022 2/24/2006 0 6 50022 3/2/2006 0 6
> 50022 3/9/2006 0 6 50022 3/16/2006 0 6 50022 3/24/2006 0 6 50022
> 4/9/2006
> 0 3 50022 4/18/2006 0 6 50022 4/27/2006 0 3 50022 5/23/2006 1 3 50022
> 5/23/2006 1 6 50022 5/24/2006 1 3 50022 5/24/2006 1 3 50022 5/24/2006 1
> 3
> 50022 5/24/2006 1 3 50022 5/25/2006 1 3 50022 5/25/2006 1 3 50022
> 5/25/2006 1 3 50022 5/26/2006 1 5 50022 5/26/2006 1 3 50022 5/27/2006 1
> 5
> 50022 5/27/2006 1 3 50022 5/28/2006 1 3 50022 5/29/2006 1 3 50022
> 5/30/2006 1 5 50022 5/30/2006 1 3 50022 5/31/2006 1 3 50022 5/31/2006 1
> 3
> 50022 6/1/2006 1 3 50022 6/2/2006 1 3 50022 6/3/2006 1 3 50022 6/4/2006
> 1
> 3 50022 6/5/2006 1 3 50022 6/6/2006 1 5 50022 6/6/2006 1 5 50022
> 6/6/2006 1 5 50022 6/6/2006 1 3 50022 6/6/2006 1 3 50022 6/7/2006 1 5
> 50022 6/7/2006 1 3 50022 6/7/2006 1 3 50022 6/7/2006 1 3 50022 6/7/2006
> 1
> 3 50022 6/8/2006 1 3 50022 6/8/2006 1 3 50022 6/8/2006 1 3 50022
> 6/9/2006 1 5 50022 6/10/2006 1 3 50022 6/11/2006 1 5
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
>
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
On Wed, Jul 1, 2009 at 2:10 PM, Sunil Suchindran<sunilsuchindran at gmail.com> wrote:> #Highlight the text below (without the header) > # read the data in from clipboard > > df <- do.call(data.frame, scan("clipboard", what=list(id=0, > date="",loctype=0 ,haptype=0))) > > # split the data by date, sample 1 observation from each split, and rbind > > sampled_df <- do.call(rbind, lapply(split(df, > df$date),function(x)x[sample(1:nrow(x), 1),]))ddply from the plyr package (http://had.co.nz/plyr), makes this sort of operation a little simpler: ddply(df, "date", function(df) df[sample(nrow(df), 1), ]) Hadley -- http://had.co.nz/
Hadley, Sunil, and list, This is not quite doing what I wanted it to do (as far as I can tell). I perhaps did not explain it thoroughly. It seems to be sampling one value for each day leaving ~200 observations. I need for it randomly chose one hab value for each bird if there is more than one value for a given day, I will try and example below. id,date,location2,hab 1,05/23/06,0,1 1,05/23/06,0,2 1,05/23/06,0,1 So in this case the animal was located 3 times on may 23rd but I only want one of the locations and instead of arbitrarily choosing one I wanted to randomly sample one. I hope I did a better job explaining my issue. Thanks in advance. jm On Wed, Jul 1, 2009 at 3:38 PM, hadley wickham <h.wickham at gmail.com> wrote:> On Wed, Jul 1, 2009 at 2:10 PM, Sunil > Suchindran<sunilsuchindran at gmail.com> wrote: > > #Highlight the text below (without the header) > > # read the data in from clipboard > > > > df <- do.call(data.frame, scan("clipboard", what=list(id=0, > > date="",loctype=0 ,haptype=0))) > > > > # split the data by date, sample 1 observation from each split, and rbind > > > > sampled_df <- do.call(rbind, lapply(split(df, > > df$date),function(x)x[sample(1:nrow(x), 1),])) > > ddply from the plyr package (http://had.co.nz/plyr), makes this sort > of operation a little simpler: > > ddply(df, "date", function(df) df[sample(nrow(df), 1), ]) > > Hadley > > > -- > http://had.co.nz/ >-- James A. Martin 850-445-9773
On Thu, Jul 2, 2009 at 8:15 AM, James Martin<just.struttin at gmail.com> wrote:> Hadley, Sunil, and list, > > This is not quite doing what I wanted it to do (as far as I can tell). I > perhaps did not explain it thoroughly.? It seems to be sampling one value > for each day leaving ~200 observations. I need for it randomly chose one hab > value for each bird if there is more than one value for a given day, I will > try and example below. > > id,date,location2,hab > > 1,05/23/06,0,1 > 1,05/23/06,0,2 > 1,05/23/06,0,1 > > So in this case the animal was located 3 times on may 23rd but I only want > one of the locations and instead of arbitrarily choosing one I wanted to > randomly sample one.ddply(df, c("date", "location"), function(df) df[sample(nrow(df), 1), ]) Hadley -- http://had.co.nz/
ddply(df, c("date", "id"), function(df) df[sample(nrow(df),
1), ])
Thanks to Hadley and Sunil. The above code solves my problem.
jm
On Mon, Jun 29, 2009 at 9:11 AM, James Martin
<just.struttin@gmail.com>wrote:
> All,
>
> I have data that looks like below. For each id there may be more than one
> value per day. I want to select a random value for that day for that id.
> The end result would hopefully be a matrix with the id as rows, date as
> columns and populated by the random hab value. Thanks to someone on here
> (Jim) I know how to do the matrix, but now realize I need to randomly
select
> some of my values. All help is appreciated. jm
> id, date, loctype, habtype
> 50022 1/25/2006 0 6 50022 1/31/2006 0 6 50022 2/8/2006 0 6 50022
> 2/13/2006 0 6 50022 2/15/2006 0 6 50022 2/24/2006 0 6 50022 3/2/2006 0
> 6 50022 3/9/2006 0 6 50022 3/16/2006 0 6 50022 3/24/2006 0 6 50022
> 4/9/2006 0 3 50022 4/18/2006 0 6 50022 4/27/2006 0 3 50022 5/23/2006 1
> 3 50022 5/23/2006 1 6 50022 5/24/2006 1 3 50022 5/24/2006 1 3 50022
> 5/24/2006 1 3 50022 5/24/2006 1 3 50022 5/25/2006 1 3 50022 5/25/2006 1
> 3 50022 5/25/2006 1 3 50022 5/26/2006 1 5 50022 5/26/2006 1 3 50022
> 5/27/2006 1 5 50022 5/27/2006 1 3 50022 5/28/2006 1 3 50022 5/29/2006 1
> 3 50022 5/30/2006 1 5 50022 5/30/2006 1 3 50022 5/31/2006 1 3 50022
> 5/31/2006 1 3 50022 6/1/2006 1 3 50022 6/2/2006 1 3 50022 6/3/2006 1 3
> 50022 6/4/2006 1 3 50022 6/5/2006 1 3 50022 6/6/2006 1 5 50022 6/6/2006
> 1 5 50022 6/6/2006 1 5 50022 6/6/2006 1 3 50022 6/6/2006 1 3 50022
> 6/7/2006 1 5 50022 6/7/2006 1 3 50022 6/7/2006 1 3 50022 6/7/2006 1 3
> 50022 6/7/2006 1 3 50022 6/8/2006 1 3 50022 6/8/2006 1 3 50022 6/8/2006
> 1 3 50022 6/9/2006 1 5 50022 6/10/2006 1 3 50022 6/11/2006 1 5
>
>
>
>
--
James A. Martin
850-445-9773
[[alternative HTML version deleted]]