This is a simple fix. I just extract the part of cut.R that calculated
breaks by a number, then convert the breaks format, provide the breaks
manually to cut again. I used lubridate as_datetime because it's simpler.
Of course it can be replaced with as.POSIXct.
The breaks are always formatted in one way, but user can format it anyway
he/she want by just use divide. I felt the return result of divide is often
very useful, so it's worth to be extracted as an individual function.
------------------------------------
# focused on one case: cut x into intervals given a number of interval count
# divide x into interval_count intervals. Taken from
https://github.com/wch/r-source/blob/trunk/src/library/base/R/cut.R
divide <-
function (x, interval_count)
{
if (is.na(interval_count) || interval_count < 2L)
stop("invalid number of intervals")
nb <- as.integer(interval_count + 1) # one more than #{intervals}
dx <- diff(rx <- range(x, na.rm = TRUE))
if(dx == 0) {
dx <- abs(rx[1L])
breaks <- seq.int(rx[1L] - dx/1000, rx[2L] + dx/1000,
length.out = nb)
} else {
breaks <- seq.int(rx[1L], rx[2L], length.out = nb)
breaks[c(1L, nb)] <- c(rx[1L] - dx/1000, rx[2L] + dx/1000)
}
return(breaks)
}
cut_date_time <- function(x, interval_count) {
brks <- divide(as.numeric(x), interval_count)
return(cut(x, as_datetime(brks)))
}
divide_date_time <- function(x, interval_count) {
return(as_datetime(divide(as.numeric(x), interval_count)))
}
--------------------
Best,
Xianghui Dong
On Thu, Apr 6, 2017 at 3:37 PM, Xianghui Dong <xhdong at umd.edu> wrote:
> The exact error was reported before in *Bug 14288*
> <https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14288> *- **bug
in
> cut.POSIXt(..., breaks = <numeric>) and cut.Date. *But the fix in
that
> bug report only covered the simplest case.
>
> This is the error I met
> -----------------------------
>
> x <- structure(c(1057067700, 1057215720, 1060597800, 1061470800,
> 1061911680,
> 1062048000, 1062137880, 1064479440, 1064926380, 1064995140, 1066822800,
> 1068033720, 1070869740, 1070939820, 1071030540, 1074244560,
> 1077545880,
> 1078449720, 1084955460, 1129020000, 1130324280, 1130404800,
> 1131519420,
> 1132640100, 1133772000, 1137567960, 1138952640, 1141810380,
> 1147444200,
> 1161643440, 1164086160), class = c("POSIXct",
"POSIXt"), tzone > "UTC")
>
> > cut(x, 20)
> Error in `levels<-.factor`(`*tmp*`, value = as.character(if
> (is.numeric(breaks)) x[!duplicated(res)] else breaks[-length(breaks)])) :
> number of levels differs
> -----------------------------
>
> The cause of the bug is that the input have spread out date-time values,
> only 10 breaks in the total 20 breaks have value.
> -------------------
>
> cut_n <- cut(as.numeric(x), 20)
>
> > unique(cut_n)
> [1] (1.057e+09,1.062e+09] (1.062e+09,1.068e+09] (1.068e+09,1.073e+09]
> (1.073e+09,1.078e+09]
> [5] (1.084e+09,1.089e+09] (1.127e+09,1.132e+09] (1.132e+09,1.137e+09]
> (1.137e+09,1.143e+09]
> [9] (1.143e+09,1.148e+09] (1.159e+09,1.164e+09]
> 20 Levels: (1.057e+09,1.062e+09] (1.062e+09,1.068e+09]
> (1.068e+09,1.073e+09] ... (1.159e+09,1.164e+09]
> ------------------------
> To get proper 20 labels of each break, the break need to be formatted from
> number to date-time string. Current code didn't really convert the
breaks
> However the code just used the original date-time values from input data.
> This will not work if the interval value doesn't happen to equal to
> original input. For a even simpler example from the original bug report:
> -----------------------
> x <- seq(as.POSIXct("2000-01-01"), by = "days",
length = 20)
> > cut(x, breaks = 30)
> Error in `levels<-.factor`(`*tmp*`, value = as.character(if
> (is.numeric(breaks)) x[!duplicated(res)] else breaks[-length(breaks)])) :
> number of levels differs
> ---------------------
>
> I think to fix the bug will need either
> - get the actual numeric value of the breaks from "cut", modify
"cut" if
> needed. Then convert the numeric value back to date-time
> - or use regex to extract the break value then convert to date-time
>
> Best,
> Xianghui Dong
>
[[alternative HTML version deleted]]