Liviu Andronic
2010-Feb-16 17:06 UTC
[R] OT: computing percentage changes with negative and zero values?
Dear all I need to compute percentage changes of my data, but unfortunately they contain both negative and zero values, and I am quite confused on how to proceed. Searching the internet I found that many people ran into similar issues, with no obvious solution available. The last couple of weeks I've been playing with all the data transformations that I could think of. Below I will expose on a dummy example the issues encountered:> x$var[1] 0.43 -0.79 0.69 0.76 0.00 -1.51 -0.71 0.80 1.17 1.58 1.48 -1.83 -0.88 1.44 -0.72 -0.22 1.89 -1.27 -0.76 [20] 1.33 - raw data: percentage variations of the original data---containing negative and zero values---get messed up when passing from a negative to a positive value, and around the value 0.> x[, "raw"] <- c(NA, diff(x$var) / x[1:19,"var"])- raw data with abs denominator: compared to the above improves the handling of the signs, but still fails around zero, and in some cases gives unexpected results (see [1]).> x[, "raw abs"] <- c(NA, diff(x$var) / abs(x[1:19,"var"]))- raw data + constant: add a constant to the data to transform them to strictly positive, then compute the deltas. This solves the negative and zero value problems, but I am not sure if this introduces some bias along the way.> x[, "raw +cst"] <- c(NA, diff((2 + x$var)) / (2 + x[1:19,"var"]))- log, car::box.cox.powers: both transformations involve adding a constant to the original data.> x[, "log"] <- c(NA, diff(log(2 + x$var)) / log(2 + x[1:19,"var"])) > require(car) > x1 <- box.cox.powers(2 + x$var); x1$lambda > x[, "box cox"] <- c(NA, diff(box.cox(2 + x$var, x1$lambda)) / box.cox(2 + x[1:19,"var"], x1$lambda))- sqrt: very similar to the above, but the results are a bit different (and apparently better).> x[, "sqrt"] <- c(NA, diff(sqrt(2 + x$var)) / sqrt(2 + x[1:19,"var"]))- exp: the exponential transformation introduces too much, and unevenly distributed variability (my actual data contain values bigger than "5"), and the variations can quickly get to astronomical levels.> x[, "exp"] <- c(NA, diff(exp(x$var)) / exp(x[1:19,"var"]))- atan transformation: this is an in-house bred solution, which insures that values from -Inf to +Inf are stacked between 0 and pi. Again, not sure what bias this might introduce.> mytan <- function(x) .5*pi + atan(x) > x[, "mytan"] <- c(NA, diff(mytan(x$var)) / mytan(x[1:19,"var"]))The resulting data frame:> round(x, 3)var raw raw abs raw +cst log sqrt box cox exp mytan 1 0.43 NA NA NA NA NA NA NA NA 2 -0.79 -2.837 -2.837 -0.502 -0.785 -0.294 -0.840 -0.705 -0.544 3 0.69 -1.873 1.873 1.223 4.191 0.491 6.289 3.393 1.411 4 0.76 0.101 0.101 0.026 0.026 0.013 0.038 0.073 0.021 5 0.00 -1.000 -1.000 -0.275 -0.317 -0.149 -0.407 -0.532 -0.293 6 -1.51 -Inf -Inf -0.755 -2.029 -0.505 -1.591 -0.779 -0.628 7 -0.71 -0.530 0.530 1.633 -1.357 0.623 -1.517 1.226 0.630 8 0.80 -2.127 2.127 1.171 3.043 0.473 4.631 3.527 1.355 9 1.17 0.462 0.462 0.132 0.121 0.064 0.185 0.448 0.084 10 1.58 0.350 0.350 0.129 0.105 0.063 0.169 0.507 0.059 11 1.48 -0.063 -0.063 -0.028 -0.022 -0.014 -0.035 -0.095 -0.012 12 -1.83 -2.236 -2.236 -0.951 -2.421 -0.779 -1.450 -0.963 -0.804 13 -0.88 -0.519 0.519 5.588 -1.064 1.567 -1.124 1.586 0.698 14 1.44 -2.636 2.636 2.071 9.902 0.753 16.643 9.176 1.985 15 -0.72 -1.500 -1.500 -0.628 -0.800 -0.390 -0.870 -0.885 -0.626 16 -0.22 -0.694 0.694 0.391 1.336 0.179 1.679 0.649 0.430 17 1.89 -9.591 9.591 1.185 1.356 0.478 2.333 7.248 0.960 18 -1.27 -1.672 -1.672 -0.812 -1.232 -0.567 -1.115 -0.958 -0.749 19 -0.76 -0.402 0.402 0.699 -1.684 0.303 -1.841 0.665 0.381 20 1.33 -2.750 2.750 1.685 4.592 0.639 7.559 7.085 1.711 As you have noticed, I'm quite unsure on how to proceed. My actual data represents financial EPS (earnings per share) forecasts, ranging from -1 to 5. So, it has a "natural zero point" (see David Winsemius' comments in [2]). However, I need to compute percentage variations since I am primarily interested in the evolution of the forecasts (for a given company), while EPS data between two companies are not necessarily comparable. The percentage data would subsequently be used in performing statistical analyses (regression, etc.). Please advise Liviu [1] http://sci.tech-archive.net/Archive/sci.stat.math/2006-04/msg00544.html [2] http://sci.tech-archive.net/Archive/sci.stat.math/2006-04/msg00548.html
Jim Lemon
2010-Feb-17 11:13 UTC
[R] OT: computing percentage changes with negative and zero values?
On 02/17/2010 04:06 AM, Liviu Andronic wrote:> Dear all > I need to compute percentage changes of my data, but unfortunately > they contain both negative and zero values, and I am quite confused on > how to proceed. Searching the internet I found that many people ran > into similar issues, with no obvious solution available. > ... > As you have noticed, I'm quite unsure on how to proceed. My actual > data represents financial EPS (earnings per share) forecasts, ranging > from -1 to 5. So, it has a "natural zero point" (see David Winsemius' > comments in [2]). However, I need to compute percentage variations > since I am primarily interested in the evolution of the forecasts (for > a given company), while EPS data between two companies are not > necessarily comparable. The percentage data would subsequently be used > in performing statistical analyses (regression, etc.). >Hi Liviu, My understanding of percentage change is the absolute value of the change divided by the absolute value of the initial value and the result multiplied by 100. Thus: 100*abs(diff(x$var))/abs(x$var[1:19]) [1] 283.720930 187.341772 10.144928 100.000000 Inf 52.980132 [7] 212.676056 46.250000 35.042735 6.329114 223.648649 51.912568 [13] 263.636364 150.000000 69.444444 959.090909 167.195767 40.157480 [19] 275.000000 gives me correct values (and the Inf for the fifth value is correct, as the denominator is zero). My definition might not be the only one, though. Jim