Dear R-Help I am running apply on a data.frame containing factors and numeric columns. It appears to convert are columns into as.character? Does it convert data.frame into matrix? Is this expected? I wish it to recognise numerical columns and round numbers. Can I use another function instead of apply, or should I use a for loop in the case? > summary(xmat) A B C D Min. : 1.0 414 : 1 Stage 2: 5 Min. :-0.075369 1st Qu.:113.8 422 : 1 Stage 3: 6 1st Qu.:-0.018102 Median :226.5 426 : 1 Stage 4:441 Median :-0.003033 Mean :226.5 436 : 1 Mean : 0.008007 3rd Qu.:339.2 460 : 1 3rd Qu.: 0.015499 Max. :452.0 462 : 1 Max. : 0.400578 (Other):446 E F G Min. :0.2345 Min. :0.9808 Min. :0.01558 1st Qu.:0.2840 1st Qu.:0.9899 1st Qu.:0.02352 Median :0.3265 Median :0.9965 Median :0.02966 Mean :0.3690 Mean :1.0079 Mean :0.03580 3rd Qu.:0.3859 3rd Qu.:1.0129 3rd Qu.:0.03980 Max. :2.0422 Max. :1.3742 Max. :0.20062 > for(i in 1:7) print(class(xmat[,i])) [1] "integer" [1] "factor" [1] "factor" [1] "numeric" [1] "numeric" [1] "numeric" [1] "numeric" > apply(xmat, 2, class) A B C D E F "character" "character" "character" "character" "character" "character" G "character" Thanks for your help Aedin
On Fri, 13 Apr 2007, aedin culhane wrote:> Dear R-Help > I am running apply on a data.frame containing factors and numeric > columns. It appears to convert are columns into as.character? Does it > convert data.frame into matrix? Is this expected? I wish it to recogniseYes, and quite explicit on the help page Arguments: X: the array to be used. ^^^^^ If 'X' is not an array but has a dimension attribute, 'apply' attempts to coerce it to an array via 'as.matrix' if it is two-dimensional (e.g., data frames) or via 'as.array'. I am baffled as to how you managed to miss this, as it is part of the homework the posting guide asked you to do *before* posting.> numerical columns and round numbers. Can I use another function instead > of apply, or should I use a for loop in the case?You haven;t told us _what_ you want to do, but lapply works by column.> > summary(xmat) > A B C D > Min. : 1.0 414 : 1 Stage 2: 5 Min. :-0.075369 > 1st Qu.:113.8 422 : 1 Stage 3: 6 1st Qu.:-0.018102 > Median :226.5 426 : 1 Stage 4:441 Median :-0.003033 > Mean :226.5 436 : 1 Mean : 0.008007 > 3rd Qu.:339.2 460 : 1 3rd Qu.: 0.015499 > Max. :452.0 462 : 1 Max. : 0.400578 > (Other):446 > E F G > Min. :0.2345 Min. :0.9808 Min. :0.01558 > 1st Qu.:0.2840 1st Qu.:0.9899 1st Qu.:0.02352 > Median :0.3265 Median :0.9965 Median :0.02966 > Mean :0.3690 Mean :1.0079 Mean :0.03580 > 3rd Qu.:0.3859 3rd Qu.:1.0129 3rd Qu.:0.03980 > Max. :2.0422 Max. :1.3742 Max. :0.20062 > > > for(i in 1:7) print(class(xmat[,i])) > [1] "integer" > [1] "factor" > [1] "factor" > [1] "numeric" > [1] "numeric" > [1] "numeric" > [1] "numeric"Better, sapply(xmat, class).> > apply(xmat, 2, class) > A B C D E F > "character" "character" "character" "character" "character" "character" > G > "character"Well, all columns of a matrix are of the same class. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
?apply says If X is not an array but has a dimension attribute, apply attempts to coerce it to an array via as.matrix if it is two-dimensional (e.g., data frames). . . It would probably be easiest with a FOR-LOOP, but you could also try something like the code below (and insert your operations in #...). myfunc <- function(x,classOfX) { x <- as.data.frame(t(x)) factvars <- which(classOfX=="factor") x[,factvars] <- lapply(x[,factvars],factor) for( i in seq(along=x) ) x[,i] <- as(x[,i],Class=classOfX[i]) # ... return(x) } x <- data.frame(a=as.integer(1:10),b=factor(letters[1:10]),c=runif(10)) Fold <- function(f,x,L) for(e in L) x <- f(x,e) y <- Fold(rbind,vector(),apply(x,1,myfunc,rapply(x,class)))> rapply(x,class)a b c "integer" "factor" "numeric"> rapply(y,class)a b c "integer" "factor" "numeric" --- aedin culhane <aedin at jimmy.harvard.edu> wrote:> Dear R-Help > I am running apply on a data.frame containing factors and numeric > columns. It appears to convert are columns into as.character? Does it > convert data.frame into matrix? Is this expected? I wish it to recognise > numerical columns and round numbers. Can I use another function instead > of apply, or should I use a for loop in the case? > > > summary(xmat) > A B C D > Min. : 1.0 414 : 1 Stage 2: 5 Min. :-0.075369 > 1st Qu.:113.8 422 : 1 Stage 3: 6 1st Qu.:-0.018102 > Median :226.5 426 : 1 Stage 4:441 Median :-0.003033 > Mean :226.5 436 : 1 Mean : 0.008007 > 3rd Qu.:339.2 460 : 1 3rd Qu.: 0.015499 > Max. :452.0 462 : 1 Max. : 0.400578 > (Other):446 > E F G > Min. :0.2345 Min. :0.9808 Min. :0.01558 > 1st Qu.:0.2840 1st Qu.:0.9899 1st Qu.:0.02352 > Median :0.3265 Median :0.9965 Median :0.02966 > Mean :0.3690 Mean :1.0079 Mean :0.03580 > 3rd Qu.:0.3859 3rd Qu.:1.0129 3rd Qu.:0.03980 > Max. :2.0422 Max. :1.3742 Max. :0.20062 > > > for(i in 1:7) print(class(xmat[,i])) > [1] "integer" > [1] "factor" > [1] "factor" > [1] "numeric" > [1] "numeric" > [1] "numeric" > [1] "numeric" > > > apply(xmat, 2, class) > A B C D E F > "character" "character" "character" "character" "character" "character" > G > "character" > > > > Thanks for your help > Aedin > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
aedin culhane <aedin at jimmy.harvard.edu> writes:> Dear R-Help > I am running apply on a data.frame containing factors and numeric > columns. It appears to convert are columns into as.character? Does it > convert data.frame into matrix? Is this expected? I wish it to recognise > numerical columns and round numbers. Can I use another function instead > of apply, or should I use a for loop in the case?If you want to modify the data.frame object, a for loop will likely be the best bet. As noted in other replies, lapply will operate on the columns of a data.frame since a data.frame is a list. But the return value will be a list, not a data.frame. I think for loops get a bad wrap. There are times when they are appropriate and even optimal in R programming. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org