Hello,
Your function doesn't seem to be very difficult to generalize.
d <- read.table(text="
trg_type child_type_1
1 Scientists NA
2 of used
", header=TRUE)
str(d)
subs_na <- function(tok, na_factor_level = "NOT_REALIZED", na_num =
99999) {
ifac <- which(sapply(tok, is.factor))
inum <- which(sapply(tok, is.numeric))
for(i in ifac) {
levels(tok[, i]) <- c(levels(tok[, i]), na_factor_level)
tok[is.na(tok[, i]), i] <- as.factor(na_factor_level)
}
for(i in inum)
tok[is.na(tok[, i]), i] <- na_num
tok
}
r1 <- substitute_na(d)
r2 <- subs_na(d)
str(r1)
str(r2)
identical(r1, r2) # TRUE
You could use the same coding for characters, Dates, etc.
Hope this helps,
Rui Barradas
Em 22-08-2012 20:16, Ingmar Schuster escreveu:> Hi,
>
> I have a data set with variables that are _not_ missing at random. Now I
> use a package for learning a Bayesian Network which won't accept NA as
a
> value. From a database I query data.frames with k,k+n,k+2n, ... variables
> (there are always at least k variables as leftmost columns). Using
> rbind.fill from the reshape package on two data frames I would get a data
> frame like
>
> trg_type child_type_1
> 1 Scientists NA
> 2 of used
>
> Now to get rid of NA values I use the following function, which works for
> data frames with only factor values:
>
> substitute_na <- function(tok, na_factor_level =
"NOT_REALIZED") {
> for (i in 1:length(tok)) {levels(tok[,i]) <- c(levels(tok[,i]),
> na_factor_level)}
> tok[is.na(tok)] <- as.factor(na_factor_level)
> return(tok)
> }
>
> Is there a better/faster way to do it? It would also be great to be able to
> distinguish factor columns from numeric columns and use a special numeric
> value there. The current version of rbind.fill makes no direct reference to
> the fill value so that I could change its implementation for my purpose.
>
>
> Thanks!
>
> Ingmar
>