Dear all I try to make hourly average by cut() function, which almost works as *I* expected. What puzled me is that if there is only one item at the end of your data it results in NA. Example will explain what I mean datum<-seq(ISOdate(2004,8,31), ISOdate(2004,9,1), "min") cut(datum[1370:1381],"hour", labels=F) [1] 1 1 1 1 1 1 1 1 1 1 1 NA cut(datum[1370:1382],"hour", labels=F) [1] 1 1 1 1 1 1 1 1 1 1 1 2 2 I do not understand why the last item in first call is NA. I found it only when there was a switch from DST to standard time as it coused a trouble in one of my functions and I found there is NA value where I did not expected it. I can make some workaround but can you please explain me why first call results in NA value at the end of a vector and if it is *intended* behaviour. If yes I can count with it in improvement of my function(s), if not I can make some temporary workaround. Thank you. Petr Pikal petr.pikal at precheza.cz
On Wed, 3 Nov 2004, Petr Pikal wrote:> Dear all > > I try to make hourly average by cut() function, which almost works > as *I* expected. What puzled me is that if there is only one item at > the end of your data it results in NA. > > Example will explain what I mean > > datum<-seq(ISOdate(2004,8,31), ISOdate(2004,9,1), "min") > > cut(datum[1370:1381],"hour", labels=F) > [1] 1 1 1 1 1 1 1 1 1 1 1 NA > > cut(datum[1370:1382],"hour", labels=F) > [1] 1 1 1 1 1 1 1 1 1 1 1 2 2 > > I do not understand why the last item in first call is NA. I found it > only when there was a switch from DST to standard time as it > coused a trouble in one of my functions and I found there is NA > value where I did not expected it.cut(datum[1370:1381],"hour", labels=F, include.lowest=T) is what you need. See ?cut, in the See Also, which says include.lowest: logical, indicating if an 'x[i]' equal to the lowest (or highest, for 'right = FALSE') 'breaks' value should be included.> I can make some workaround but can you please explain me why > first call results in NA value at the end of a vector and if it is > *intended* behaviour.It is the documented behaviour, for better or for worse. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Petr Pikal <petr.pikal <at> precheza.cz> writes: : : Dear all : : I try to make hourly average by cut() function, which almost works : as *I* expected. What puzled me is that if there is only one item at : the end of your data it results in NA. : : Example will explain what I mean : : datum<-seq(ISOdate(2004,8,31), ISOdate(2004,9,1), "min") : : cut(datum[1370:1381],"hour", labels=F) : [1] 1 1 1 1 1 1 1 1 1 1 1 NA : : cut(datum[1370:1382],"hour", labels=F) : [1] 1 1 1 1 1 1 1 1 1 1 1 2 2 : : I do not understand why the last item in first call is NA. I found it : only when there was a switch from DST to standard time as it : coused a trouble in one of my functions and I found there is NA : value where I did not expected it. : : I can make some workaround but can you please explain me why : first call results in NA value at the end of a vector and if it is : *intended* behaviour. If yes I can count with it in improvement of : my function(s), if not I can make some temporary workaround. : Your question has already been answered but here is an alternate approach that avoids cut. We format the datetimes and truncate each string at the hour, make that a factor and then get the integer codes: R> cutHour <- function(x) as.integer(factor(substring(format(x),1,13))) R> cutHour(datum[1370:1381]) [1] 1 1 1 1 1 1 1 1 1 1 1 2 R> cutHour(datum[1370:1382]) [1] 1 1 1 1 1 1 1 1 1 1 1 2 2
Dear prof. Ripley Thank you very much for explanation (without it I would not consider include.lowest has something to do with my observation). I changed my code to get rid of single final POSIXdates. BTW there is no mention in cut.POSIXt help page about include.lowest and I think that in case of dates it does something what is maybe not so *understandable* (61 minutes in one hour). datum<-seq(ISOdate(2004,8,31), ISOdate(2004,9,1), "min") # part of a datum variable datum[1379:1381] [1] "2004-09-01 12:58:00 St??edn\355 Evropa (letn\355 ??as)" "2004-09-01 12:59:00 St??edn\355 Evropa (letn\355 ??as)" [3] "2004-09-01 13:00:00 St??edn\355 Evropa (letn\355 ??as)"># the last item seems to me to belong to time from 13:00:00 to 13:59:00 e.g. it is part of thirteen's hour of a day cut(datum[1370:1381],"hour", include.lowest=T) # it will include it to previous hour [1] 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01 12:00:00 [7] 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01 12:00:00 Levels: 2004-09-01 12:00:00 cut(datum[1370:1381],"hour") # this will drop it from result, correct but unfortunate [1] 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01 12:00:00 [7] 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01 12:00:00 2004-09-01 12:00:00 <NA> Levels: 2004-09-01 12:00:00 # so as a result an hour can have 61 minutes levels(cut(datum[1321:1381],"hour", include.lowest=T)) [1] "2004-09-01 12:00:00" length(cut(datum[1321:1381],"hour", include.lowest=T)) #??? [1] 61 Is it correct? Thank you again. Best regards Petr Pikal On 3 Nov 2004 at 11:20, Prof Brian Ripley wrote:> On Wed, 3 Nov 2004, Petr Pikal wrote: > > > Dear all > > > > I try to make hourly average by cut() function, which almost works > > as *I* expected. What puzled me is that if there is only one item at > > the end of your data it results in NA. > > > > Example will explain what I mean > > > > datum<-seq(ISOdate(2004,8,31), ISOdate(2004,9,1), "min") > > > > cut(datum[1370:1381],"hour", labels=F) > > [1] 1 1 1 1 1 1 1 1 1 1 1 NA > > > > cut(datum[1370:1382],"hour", labels=F) > > [1] 1 1 1 1 1 1 1 1 1 1 1 2 2 > > > > I do not understand why the last item in first call is NA. I found > > it only when there was a switch from DST to standard time as it > > coused a trouble in one of my functions and I found there is NA > > value where I did not expected it. > > cut(datum[1370:1381],"hour", labels=F, include.lowest=T) > > is what you need. See ?cut, in the See Also, which says > > include.lowest: logical, indicating if an 'x[i]' equal to the lowest > (or highest, for 'right = FALSE') 'breaks' value should be > included. > > > I can make some workaround but can you please explain me why > > first call results in NA value at the end of a vector and if it is > > *intended* behaviour. > > It is the documented behaviour, for better or for worse. > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) 1 South > Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, > UK Fax: +44 1865 272595 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.htmlPetr Pikal petr.pikal at precheza.cz