Dear R-Help
I am running apply on a data.frame containing factors and numeric
columns. It appears to convert are columns into as.character? Does it
convert data.frame into matrix? Is this expected? I wish it to recognise
numerical columns and round numbers. Can I use another function instead
of apply, or should I use a for loop in the case?
> summary(xmat)
A B C D
Min. : 1.0 414 : 1 Stage 2: 5 Min. :-0.075369
1st Qu.:113.8 422 : 1 Stage 3: 6 1st Qu.:-0.018102
Median :226.5 426 : 1 Stage 4:441 Median :-0.003033
Mean :226.5 436 : 1 Mean : 0.008007
3rd Qu.:339.2 460 : 1 3rd Qu.: 0.015499
Max. :452.0 462 : 1 Max. : 0.400578
(Other):446
E F G
Min. :0.2345 Min. :0.9808 Min. :0.01558
1st Qu.:0.2840 1st Qu.:0.9899 1st Qu.:0.02352
Median :0.3265 Median :0.9965 Median :0.02966
Mean :0.3690 Mean :1.0079 Mean :0.03580
3rd Qu.:0.3859 3rd Qu.:1.0129 3rd Qu.:0.03980
Max. :2.0422 Max. :1.3742 Max. :0.20062
> for(i in 1:7) print(class(xmat[,i]))
[1] "integer"
[1] "factor"
[1] "factor"
[1] "numeric"
[1] "numeric"
[1] "numeric"
[1] "numeric"
> apply(xmat, 2, class)
A B C D E F
"character" "character" "character"
"character" "character" "character"
G
"character"
Thanks for your help
Aedin
On Fri, 13 Apr 2007, aedin culhane wrote:> Dear R-Help > I am running apply on a data.frame containing factors and numeric > columns. It appears to convert are columns into as.character? Does it > convert data.frame into matrix? Is this expected? I wish it to recogniseYes, and quite explicit on the help page Arguments: X: the array to be used. ^^^^^ If 'X' is not an array but has a dimension attribute, 'apply' attempts to coerce it to an array via 'as.matrix' if it is two-dimensional (e.g., data frames) or via 'as.array'. I am baffled as to how you managed to miss this, as it is part of the homework the posting guide asked you to do *before* posting.> numerical columns and round numbers. Can I use another function instead > of apply, or should I use a for loop in the case?You haven;t told us _what_ you want to do, but lapply works by column.> > summary(xmat) > A B C D > Min. : 1.0 414 : 1 Stage 2: 5 Min. :-0.075369 > 1st Qu.:113.8 422 : 1 Stage 3: 6 1st Qu.:-0.018102 > Median :226.5 426 : 1 Stage 4:441 Median :-0.003033 > Mean :226.5 436 : 1 Mean : 0.008007 > 3rd Qu.:339.2 460 : 1 3rd Qu.: 0.015499 > Max. :452.0 462 : 1 Max. : 0.400578 > (Other):446 > E F G > Min. :0.2345 Min. :0.9808 Min. :0.01558 > 1st Qu.:0.2840 1st Qu.:0.9899 1st Qu.:0.02352 > Median :0.3265 Median :0.9965 Median :0.02966 > Mean :0.3690 Mean :1.0079 Mean :0.03580 > 3rd Qu.:0.3859 3rd Qu.:1.0129 3rd Qu.:0.03980 > Max. :2.0422 Max. :1.3742 Max. :0.20062 > > > for(i in 1:7) print(class(xmat[,i])) > [1] "integer" > [1] "factor" > [1] "factor" > [1] "numeric" > [1] "numeric" > [1] "numeric" > [1] "numeric"Better, sapply(xmat, class).> > apply(xmat, 2, class) > A B C D E F > "character" "character" "character" "character" "character" "character" > G > "character"Well, all columns of a matrix are of the same class. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
?apply says
If X is not an array but has a dimension attribute, apply attempts to coerce
it to an array via as.matrix if it is two-dimensional (e.g., data frames). .
.
It would probably be easiest with a FOR-LOOP, but you could also try
something like the code below (and insert your operations in #...).
myfunc <- function(x,classOfX) {
x <- as.data.frame(t(x))
factvars <- which(classOfX=="factor")
x[,factvars] <- lapply(x[,factvars],factor)
for( i in seq(along=x) ) x[,i] <- as(x[,i],Class=classOfX[i])
# ...
return(x)
}
x <- data.frame(a=as.integer(1:10),b=factor(letters[1:10]),c=runif(10))
Fold <- function(f,x,L) for(e in L) x <- f(x,e)
y <- Fold(rbind,vector(),apply(x,1,myfunc,rapply(x,class)))
> rapply(x,class)
a b c
"integer" "factor" "numeric"
> rapply(y,class)
a b c
"integer" "factor" "numeric"
--- aedin culhane <aedin at jimmy.harvard.edu> wrote:
> Dear R-Help
> I am running apply on a data.frame containing factors and numeric
> columns. It appears to convert are columns into as.character? Does it
> convert data.frame into matrix? Is this expected? I wish it to recognise
> numerical columns and round numbers. Can I use another function instead
> of apply, or should I use a for loop in the case?
>
> > summary(xmat)
> A B C D
> Min. : 1.0 414 : 1 Stage 2: 5 Min. :-0.075369
> 1st Qu.:113.8 422 : 1 Stage 3: 6 1st Qu.:-0.018102
> Median :226.5 426 : 1 Stage 4:441 Median :-0.003033
> Mean :226.5 436 : 1 Mean : 0.008007
> 3rd Qu.:339.2 460 : 1 3rd Qu.: 0.015499
> Max. :452.0 462 : 1 Max. : 0.400578
> (Other):446
> E F G
> Min. :0.2345 Min. :0.9808 Min. :0.01558
> 1st Qu.:0.2840 1st Qu.:0.9899 1st Qu.:0.02352
> Median :0.3265 Median :0.9965 Median :0.02966
> Mean :0.3690 Mean :1.0079 Mean :0.03580
> 3rd Qu.:0.3859 3rd Qu.:1.0129 3rd Qu.:0.03980
> Max. :2.0422 Max. :1.3742 Max. :0.20062
>
> > for(i in 1:7) print(class(xmat[,i]))
> [1] "integer"
> [1] "factor"
> [1] "factor"
> [1] "numeric"
> [1] "numeric"
> [1] "numeric"
> [1] "numeric"
>
> > apply(xmat, 2, class)
> A B C D E F
> "character" "character" "character"
"character" "character" "character"
> G
> "character"
>
>
>
> Thanks for your help
> Aedin
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
aedin culhane <aedin at jimmy.harvard.edu> writes:> Dear R-Help > I am running apply on a data.frame containing factors and numeric > columns. It appears to convert are columns into as.character? Does it > convert data.frame into matrix? Is this expected? I wish it to recognise > numerical columns and round numbers. Can I use another function instead > of apply, or should I use a for loop in the case?If you want to modify the data.frame object, a for loop will likely be the best bet. As noted in other replies, lapply will operate on the columns of a data.frame since a data.frame is a list. But the return value will be a list, not a data.frame. I think for loops get a bad wrap. There are times when they are appropriate and even optimal in R programming. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org