Gavin Simpson
2009-Aug-26 14:52 UTC
[R] Batch replacement, by factor, of values in a data frame
Dear List,
I'm wondering if there is a better/cleaner/more efficient way of
replacing 0 values in a variable with the minimum of the non-missing and
non-zero values of that same variable, but doing it within the levels of
a factor?
Consider the dummy example data presented at the end of my message.
Within each 'Site' there are some 0 values and possibly some NA's. I
can
compute the minimum of the non-missing and non-zero values by 'Site' as
indicated below using aggregate for example. Save for looping over the
'Site's and replacing 0's with the relevant minimum is there a way
of
using a vectorised approach to do the replacement?
Thanks in advance,
G
## dummy data
set.seed(123)
D <- data.frame(Site = factor(rep(LETTERS[1:5], times = 10)),
Var = runif(5*10))
D <- D[with(D, order(Site, Var)), ]
## simulate some 0's
D[c(1,3,11,12,23,27,34,36,41,49), "Var"] <- 0
## just to complicate matters, some NA
D[sample(NROW(D), 3), "Var"] <- NA
head(D)
## Compute minimums per Site
aggregate(D$Var, by = list(Site = D$Site),
FUN = function(x) min(x[x>0], na.rm = TRUE))
## How replace the appropriate 0's with the appropriate minimum?
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Phil Spector
2009-Aug-26 15:06 UTC
[R] Batch replacement, by factor, of values in a data frame
The ave function is very handy for things like this: mins = ave(D$Var,D$Site,FUN=function(x)min(x[x>0],na.rm=TRUE)) D$Var = ifelse(is.na(D$Var) | D$Var == 0,mins,D$Var) should do the required replacements. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Wed, 26 Aug 2009, Gavin Simpson wrote:> Dear List, > > I'm wondering if there is a better/cleaner/more efficient way of > replacing 0 values in a variable with the minimum of the non-missing and > non-zero values of that same variable, but doing it within the levels of > a factor? > > Consider the dummy example data presented at the end of my message. > Within each 'Site' there are some 0 values and possibly some NA's. I can > compute the minimum of the non-missing and non-zero values by 'Site' as > indicated below using aggregate for example. Save for looping over the > 'Site's and replacing 0's with the relevant minimum is there a way of > using a vectorised approach to do the replacement? > > Thanks in advance, > > G > > ## dummy data > set.seed(123) > D <- data.frame(Site = factor(rep(LETTERS[1:5], times = 10)), > Var = runif(5*10)) > D <- D[with(D, order(Site, Var)), ] > ## simulate some 0's > D[c(1,3,11,12,23,27,34,36,41,49), "Var"] <- 0 > ## just to complicate matters, some NA > D[sample(NROW(D), 3), "Var"] <- NA > head(D) > ## Compute minimums per Site > aggregate(D$Var, by = list(Site = D$Site), > FUN = function(x) min(x[x>0], na.rm = TRUE)) > ## How replace the appropriate 0's with the appropriate minimum? > -- > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > Dr. Gavin Simpson [t] +44 (0)20 7679 0522 > ECRC, UCL Geography, [f] +44 (0)20 7679 0565 > Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk > Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ > UK. WC1E 6BT. [w] http://www.freshwaters.org.uk > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >