Gorjanc Gregor
2005-Feb-13 02:04 UTC
[R] corrupt data frame: columns will be truncated or padded with NAs in: format.data.frame(x, digits = digits)
Hello R users! I have written one function (look at the end), which will ease my work with analysis of data in another programme, for which I need sometimes a special data structure. However I encountered several problems with a created data frame. --------------------------------------------------------------- The data frame (produced from the example at the end) looks like the way I want and is: c1 c2 f2 f1 y1.A y2.A y1.B y2.B 1 1 2 M A -1.2776840 -1.4695219 NA NA 3 3 6 M A 0.1593941 0.7581128 NA NA 5 5 10 M A 1.1085950 0.8556062 NA NA 7 7 14 F A -1.8259281 3.0675536 NA NA 9 9 18 F A 0.8017311 -0.1056571 NA NA 2 2 4 M B <NA> <NA> 0.3577166 0.27310051 4 4 8 M B <NA> <NA> -0.8021399 -1.10060507 6 6 12 F B <NA> <NA> -0.4912098 0.04526153 8 8 16 F B <NA> <NA> -1.2522998 -1.03796810 10 10 20 F B <NA> <NA> -0.3446779 0.53854276 Warning message: corrupt data frame: columns will be truncated or padded with NAs in: format.data.frame(x, digits = digits) Is this data frame really corrupted as R points out? --------------------------------------------------------------- Then I have a problem with this function if there is also a factor column between other columns i.e. columns that are being "divided" according to levels. For example this call mt.by.factor(x=data, factor="f1", common=c("c1", "c2"))) gives me: c1 c2 f1 y1.A y2.A f2.A y1.B y2.B f2.B 1 1 2 A -0.02040825 -0.28686293 2 NA NA NA 3 3 6 A -0.60497978 0.84527030 2 NA NA NA 5 5 10 A -0.74968516 -0.01094755 2 NA NA NA 7 7 14 A 0.07658122 -0.30101228 1 NA NA NA 9 9 18 A -0.68788670 -0.02177379 1 NA NA NA 2 2 4 B <NA> <NA> <NA> 0.003037107 0.4067418 2 4 4 8 B <NA> <NA> <NA> -0.035371363 -1.9397670 2 6 6 12 B <NA> <NA> <NA> 0.970424682 -1.3881620 1 8 8 16 B <NA> <NA> <NA> -1.169746470 0.7670071 1 10 10 20 B <NA> <NA> <NA> 1.238606959 -0.1831825 1 Warning message: corrupt data frame: columns will be truncated or padded with NAs in: format.data.frame(x, digits = digits) Why are factor columns 'f2.A' and 'f2.B' now represented as integers? It looks like that I lost somewhere the factor class but I do not know why. It should have happened in this part of the function (the whole function is at the end). Can anyone help me with this? # - add all other columns but as a set for each level of a factor levels <- unique(X[factor]) for (level in 1:length(unlist(levels))) { X[x[factor] == as.character(levels[level, ]), paste(other, as.character(levels[level, ]), sep=".")] <- x[x[factor] == as.character(levels[level, ]), other] } --------------------------------------------------------------- And another thing are NAs. If I compute means I get:> mean(data1$y1.A)[1] -0.2067784> mean(data1$y1.A, na.rm=T)[1] -0.2067784> mean(data1$y1.B)[1] NA> mean(data1$y1.B, na.rm=T)[1] -0.5065222 So <NA> and NA do not behave the same. Is this OK? It really does not bother me, but I am just curious. --------------------------------------------------------------- Here is the whole description of the function, the function and example. Thanks in advance. # mt.by.factor.R #------------------------------------------------------------------------- # What: Create multiple trait data frame by given factor # Time-stamp: <2005-02-12 02:28:00 ggorjan> #------------------------------------------------------------------------- # Quite often one wants to treat a trait for different levels e.g. sex, # breed, ... as a different trait. This function eases preparation of data # for such an analysis. # # Input data frame with given variables is expanded in such a way, that # output represents a data frame with c + l + n * v columns, where c is a # number of common columns for all levels of a factor, l is a factor # column, n is a number of levels in a factor and v number of variables # that should be given for each level of a factor. Number of rows stays # the same. # #------------------------------------------------------------------------- # Example n=10 (data <- data.frame(y1=rnorm(n=n), y2=rnorm(n=n), f1=factor(rep(c("A", "B"), n/2)), f2=factor(c(rep(c("M"), n/2), rep(c("F"), n/2))), c1=1:n, c2=2*(1:n))) (data1 <-mt.by.factor(x=data, factor="f1", common=c("c1", "c2", "f2"))) (data1 <-mt.by.factor(x=data, factor="f1", common=c("c1", "c2"))) # x <- data factor <- "f1" common <- c("c1", "c2") # Function mt.by.factor <- function(x, factor, common, sort=TRUE) { # Checks if (!is.data.frame(x)) { stop("`x' must be a data frame") } if (!is.factor(x[[factor]])) { stop("`factor' must be a factor") } # Sort if (sort) { x <- x[order(x[, factor]),] } # New data frame X <- x[common] # Common columns X[factor] <- x[factor] # Factor column # Other columns # - remove common and factor other <- names(x) for (i in 1:length(names(x[common]))) { other <- other[other != common[i]] } for (i in 1:length(names(x[factor]))) { other <- other[other != factor[i]] } # - add all other columns but as a set for each level of a factor levels <- unique(X[factor]) for (level in 1:length(unlist(levels))) { X[x[factor] == as.character(levels[level, ]), paste(other, as.character(levels[level, ]), sep=".")] <- x[x[factor] == as.character(levels[level, ]), other] } return(X) } #------------------------------------------------------------------------- # mt.by.factor.R ends here -- Lep pozdrav / With regards, Gregor GORJANC --------------------------------------------------------------- University of Ljubljana Biotechnical Faculty URI: http://www.bfro.uni-lj.si Zootechnical Department email: gregor.gorjanc <at> bfro.uni-lj.si Groblje 3 tel: +386 (0)1 72 17 861 SI-1230 Domzale fax: +386 (0)1 72 17 888 Slovenia
Maybe Matching Threads
- Re: [Rd] corrupt data frame: columns will be truncated or padded with NAs in: format.data.frame(x, digits = digits)
- corrupt data frame: columns will be truncated or padded with NAs in: format.data.frame(x, digits = digits)
- corrupt data frame: columns will be truncated or padded with NAs in: format.data.frame(x, digits = digits)
- [PATCH] drm/nva3-/hda: fix eld writing, needs to be padded
- [PATCH] syslinux/com32: Fix the printing of left zero padded hexadecimals with a leading '0x'.