thr3ads.net - R help - [R] sample (randomly select) to get a number of successive days [Dec 2018]

If this information is useful, please help other people find it:
Share via:

Dagmar Cimiotti

2018-Dec-10 07:37 UTC

[R] sample (randomly select) to get a number of successive days

Hi Marc,

Yes, you got it to the point! That is exactly what I want. But I do not know how
to do that. I know how to randomly pick the first day but I do not know how to
set a range of values which cover the 25 days starting from that random value.

Best,
Dagmar


Hi,

I am confused.

As far as I can tell, only the first day is selected randomly from your dataset.
The subsequent 24 days are deterministic, since they need to be consecutive days
from the first day, for a total of 25 consecutive days.

Thus, all you need to do is to randomly select 1 day from within the time range
of your dataset to be the first day, that is also far enough from the maximum
date, to allow you to then select the data from the additional 24 consecutive
days.

So randomly pick your first day and set a range of values, covering the 25 days,
to use to then subset your full dataset.

What am I missing?

Regards,

Marc Schwartz

> On Dec 7, 2018, at 7:18 PM, Dagmar Cimiotti<dagmar.cimiotti at
ftz-west.uni-kiel.de>  wrote:
>
> Hi Jim and everyone else,
>
> Mhm, no this is not what I am looking for. I think in your way I would
> randomly sample two values of day 1 and of day 2. But I want the
> opposite: I want to randomly draw two successive (!) days and put those
> values in a new dataframe to continue working with them.
>
> In my real data I do have a huge time span and I want to draw 25
> consecutive days. So maybe my example was a little misleading. And now
> that I read it again my text was, too. Sorry about that!
>
> Good try though and I am very gratefull for your good will to help me
> Would anyone give another try?
>
> Dagmar
>
> Am 07.12.2018 um 10:30 schrieb Jim Lemon:
>> Hi Dagmar,
>> This will probably involve creating a variable to differentiate the
>> two days in each data.frame:
>>
>> myframe$day<-as.Date(as.character(myframe$Timestamp),"%d.%m.%Y
%H:%M:%S")
>> days<-unique(myframe$day)
>>
>> Then just sample the two subsets and concatenate them:
>>
>> myframe[c(sample(which(myframe$day==days[1]),2),
>>    sample(which(myframe$day==days[2]),2)),]
>>
>> Jim
>>
>>
>> On Fri, Dec 7, 2018 at 8:08 PM Dagmar Cimiotti
>> <dagmar.cimiotti at ftz-west.uni-kiel.de>  wrote:
>>> Dear all,
>>>
>>> I have data from a time span like this:
>>>
>>> myframe <- data.frame (Timestamp=c("24.09.2012
09:00:00", "24.09.2012
>>> 10:00:00","25.09.2012 09:00:00",
>>>                                       "25.09.2012
>>> 09:00:00","24.09.2012 09:00:00", "24.09.2012
10:00:00"),
>>>                            Event=c(50,60,30,40,42,54) )
>>> myframe
>>>
>>>
>>> I want to create a new dataframe which includes in this example the
>>> data from two successive days (in my real data I have a big time
span
>>> and want data from 25 consecutive days). I understand that I can do
a
>>> simple sample like this
>>>
>>> mysample <- myframe[sample(1:nrow(myframe), 4,replace=FALSE),]
>>> mysample
>>>
>>> But I need the data from consecutive days in my random sample. Can
>>> anyone help me with this?
>>>
>>>
>>> Many thanks in advance,
>>> Dagmar

	[[alternative HTML version deleted]]

Marc Schwartz

2018-Dec-10 13:53 UTC

head link

[R] sample (randomly select) to get a number of successive days

Hi,

Given that your original data frame example is:

myframe <- data.frame (Timestamp=c("24.09.2012 09:00:00",
"24.09.2012 10:00:00","25.09.2012 09:00:00",
                                   "25.09.2012
09:00:00","24.09.2012 09:00:00", "24.09.2012
10:00:00"),
                       Event=c(50,60,30,40,42,54))
> str(myframe)'data.frame':	6 obs. of  2 variables:
 $ Timestamp: Factor w/ 3 levels "24.09.2012 09:00:00",..: 1 2 3 3 1 2
 $ Event    : num  50 60 30 40 42 54


Your Timestamp variable is a factor, not a datetime variable. So you first need
to coerce it to one, in order to be able to define a range of dates.

Thus:

## See ?as.POSIXlt and the See Also links therein for more information on how R
handles dates/times

myframe$Timestamp <- as.POSIXct(myframe$Timestamp, format = "%d.%m.%Y
%H:%M:%S")
> str(myframe)'data.frame':	6 obs. of  2 variables:
 $ Timestamp: POSIXct, format: "2012-09-24 09:00:00" ...
 $ Event    : num  50 60 30 40 42 54


So, to keep it simple, since you appear to be only concerned during the range
selection process with the day and not the time, let's use the day part of
the datetime as the basis for defining your interval. So, for clarity, let's
create a new column in the data frame that is just the date:

myframe$day <- as.Date(myframe$Timestamp)
> str(myframe)'data.frame':	6 obs. of  3 variables:
 $ Timestamp: POSIXct, format: "2012-09-24 09:00:00" ...
 $ Event    : num  50 60 30 40 42 54
 $ day      : Date, format: "2012-09-24" ...

> myframe            Timestamp Event        day
1 2012-09-24 09:00:00    50 2012-09-24
2 2012-09-24 10:00:00    60 2012-09-24
3 2012-09-25 09:00:00    30 2012-09-25
4 2012-09-25 09:00:00    40 2012-09-25
5 2012-09-24 09:00:00    42 2012-09-24
6 2012-09-24 10:00:00    54 2012-09-24


With that in place, let's presume that you selected 2012-09-24 as your
starting date. You can then use ?seq.Date to define the range:

set.seed(1)
start <- sample(myframe$day, 1)
> start[1] "2012-09-24"
> str(start) Date[1:1], format: "2012-09-24"


So, create the range of 25 dates:
> seq(start, length.out = 25, by = "day") [1] "2012-09-24" "2012-09-25" "2012-09-26"
"2012-09-27" "2012-09-28"
 [6] "2012-09-29" "2012-09-30" "2012-10-01"
"2012-10-02" "2012-10-03"
[11] "2012-10-04" "2012-10-05" "2012-10-06"
"2012-10-07" "2012-10-08"
[16] "2012-10-09" "2012-10-10" "2012-10-11"
"2012-10-12" "2012-10-13"
[21] "2012-10-14" "2012-10-15" "2012-10-16"
"2012-10-17" "2012-10-18"


Now, use the result of the above to subset your data frame. See ?subset and
?"%in%":

myframe.rand <- subset(myframe, day %in% seq(start, length.out = 25, by =
"day"))


In your example, all rows will be returned, but from your larger dataset, you
will only get the rows that have dates within the range defined.

Given the above, I will leave it to you to define the truncated date range from
your full dataset, so that your initial starting date is sufficiently before
your 'max' date, so that you can select 25 consecutive days.

Regards,

Marc Schwartz

> On Dec 10, 2018, at 2:37 AM, Dagmar Cimiotti <dagmar.cimiotti at
ftz-west.uni-kiel.de> wrote:
> 
> Hi Marc,
> 
> Yes, you got it to the point! That is exactly what I want. But I do not
know how to do that. I know how to randomly pick the first day but I do not know
how to set a range of values which cover the 25 days starting from that random
value.
> 
> Best,
> Dagmar
> 
> 
> Hi,
> 
> I am confused.
> 
> As far as I can tell, only the first day is selected randomly from your
dataset. The subsequent 24 days are deterministic, since they need to be
consecutive days from the first day, for a total of 25 consecutive days.
> 
> Thus, all you need to do is to randomly select 1 day from within the time
range of your dataset to be the first day, that is also far enough from the
maximum date, to allow you to then select the data from the additional 24
consecutive days.
> 
> So randomly pick your first day and set a range of values, covering the 25
days, to use to then subset your full dataset.
> 
> What am I missing?
> 
> Regards,
> 
> Marc Schwartz
> 
> 
>> On Dec 7, 2018, at 7:18 PM, Dagmar Cimiotti<dagmar.cimiotti at
ftz-west.uni-kiel.de>  wrote:
>> 
>> Hi Jim and everyone else,
>> 
>> Mhm, no this is not what I am looking for. I think in your way I would
>> randomly sample two values of day 1 and of day 2. But I want the
>> opposite: I want to randomly draw two successive (!) days and put those
>> values in a new dataframe to continue working with them.
>> 
>> In my real data I do have a huge time span and I want to draw 25
>> consecutive days. So maybe my example was a little misleading. And now
>> that I read it again my text was, too. Sorry about that!
>> 
>> Good try though and I am very gratefull for your good will to help me
>> Would anyone give another try?
>> 
>> Dagmar
>> 
>> Am 07.12.2018 um 10:30 schrieb Jim Lemon:
>>> Hi Dagmar,
>>> This will probably involve creating a variable to differentiate the
>>> two days in each data.frame:
>>> 
>>>
myframe$day<-as.Date(as.character(myframe$Timestamp),"%d.%m.%Y
%H:%M:%S")
>>> days<-unique(myframe$day)
>>> 
>>> Then just sample the two subsets and concatenate them:
>>> 
>>> myframe[c(sample(which(myframe$day==days[1]),2),
>>>   sample(which(myframe$day==days[2]),2)),]
>>> 
>>> Jim
>>> 
>>> 
>>> On Fri, Dec 7, 2018 at 8:08 PM Dagmar Cimiotti
>>> <dagmar.cimiotti at ftz-west.uni-kiel.de>  wrote:
>>>> Dear all,
>>>> 
>>>> I have data from a time span like this:
>>>> 
>>>> myframe <- data.frame (Timestamp=c("24.09.2012
09:00:00", "24.09.2012
>>>> 10:00:00","25.09.2012 09:00:00",
>>>>                                      "25.09.2012
>>>> 09:00:00","24.09.2012 09:00:00",
"24.09.2012 10:00:00"),
>>>>                           Event=c(50,60,30,40,42,54) )
>>>> myframe
>>>> 
>>>> 
>>>> I want to create a new dataframe which includes in this example
the
>>>> data from two successive days (in my real data I have a big
time span
>>>> and want data from 25 consecutive days). I understand that I
can do a
>>>> simple sample like this
>>>> 
>>>> mysample <- myframe[sample(1:nrow(myframe),
4,replace=FALSE),]
>>>> mysample
>>>> 
>>>> But I need the data from consecutive days in my random sample.
Can
>>>> anyone help me with this?
>>>> 
>>>> 
>>>> Many thanks in advance,
>>>> Dagmar

R help - Dec 2018 - sample (randomly select) to get a number of successive days

[R] sample (randomly select) to get a number of successive days

[R] sample (randomly select) to get a number of successive days