thr3ads.net - R help - [R] aggregate.data.frame - prevent conversion to factors? show statistics for NA values of "by" variable? [Jul 2007]

If this information is useful, please help other people find it:
Share via:

Thomas Pujol

2007-Jul-31 20:18 UTC

[R] aggregate.data.frame - prevent conversion to factors? show statistics for NA values of "by" variable?

I have a two question regarding the "aggregate.data.frame" method of
the "aggregate" function.

My situation:

a. My "x" variable is a data.frame ("mydf") with two
columns, both columns of type/format "numeric".

b. My "by" variable is a data.frame("mybys") with two
columns, both columns of type/format "character".

c. Some of the values contained in "mybys" are originally
"NA".

Prior to submitting the by variables to the aggregate function, I convert the NA
values to the text-string "is_na". ( I do this because I want to
understand the statistics of variables where their "by" value is NA,
and want this information in the results of the aggregate function.)

My questions:

1. Is there a "better" way, (other then converting NA's to some
text-string), to see the "statistics" ("mean", etc.) of the
variables where the by is "NA"? (i.e to have them included within the
results of the aggregate function)

2. When I run the aggregate function, the two column that contain the
"by" variables are always formatted as "factors".  Is there
a way to prevent this, and to instead have them retain the format in the
original "mybys" data.frame (i.e to have them come back formatted as
"character"?  Or do I just need to re-format them once I have my
results?



mydf=data.frame(testvar1=c(1,3,5,7,8,3,5,NA,4,5,7,9),
testvar2=c(11,33,55,77,88,33,55,NA,44,55,77,99) )
str(mydf)
#

myby1=c('red','blue',1,2,NA,'big',1,2,'red',1,NA,12)
myby2=c('wet','dry',99,95,NA,'damp',95,99,'red',99,NA,NA)

myby1.new = ifelse(is.na(myby1)==T,"is_na",myby1)
myby2.new = ifelse(is.na(myby2)==T,"is_na",myby2)
str(myby1.new)
str(myby2.new)

mybys=data.frame(mbn1=myby1.new,mbn2=myby2.new , stringsAsFactors =F)
str(mybys)


#
myagg1 = aggregate(x=mydf, by=mybys, FUN='mean')
str(myagg1)


myagg2 = myagg1
myagg2[1:ncol(mybys)] = as.character(unlist(myagg1[1:ncol(mybys)]))
str(myagg2)

myagg1
myagg2

       
---------------------------------

	[[alternative HTML version deleted]]

Prof Brian Ripley

2007-Aug-01 03:24 UTC

head link

[R] aggregate.data.frame - prevent conversion to factors? show statistics for NA values of "by" variable?

The behaviour has been changed in the R-devel version of R, so the 'by' 
columns are not converted to factors.

On Tue, 31 Jul 2007, Thomas Pujol wrote:
> I have a two question regarding the "aggregate.data.frame" method
of the
> "aggregate" function.
>
> My situation:
>
> a. My "x" variable is a data.frame ("mydf") with two
columns, both
> columns of type/format "numeric".
>
> b. My "by" variable is a data.frame("mybys") with two
columns, both
> columns of type/format "character".
>
> c. Some of the values contained in "mybys" are originally
"NA".
I think you mean NA_character_ , not the same thing.
> Prior to submitting the by variables to the aggregate function, I 
> convert the NA values to the text-string "is_na". ( I do this
because I
> want to understand the statistics of variables where their "by"
value is
> NA, and want this information in the results of the aggregate function.)
>
> My questions:
>
> 1. Is there a "better" way, (other then converting NA's to
some
> text-string), to see the "statistics" ("mean", etc.) of
the variables
> where the by is "NA"? (i.e to have them included within the
results of
> the aggregate function)
You need to tell R that the NA (not "NA") values form a group, which
is
not obvious as they are unknown.  So you do need to recode them.  Making 
them a factor is the obvious way (with exclude=""), and I don't
understand
your aveersion to factors for categorical variables.
> 2. When I run the aggregate function, the two column that contain the 
> "by" variables are always formatted as "factors".  Is
there a way to
> prevent this, and to instead have them retain the format in the original 
> "mybys" data.frame (i.e to have them come back formatted as
"character"?
> Or do I just need to re-format them once I have my results?
>
>
>
> mydf=data.frame(testvar1=c(1,3,5,7,8,3,5,NA,4,5,7,9),
testvar2=c(11,33,55,77,88,33,55,NA,44,55,77,99) )
> str(mydf)
> #
>
>
myby1=c('red','blue',1,2,NA,'big',1,2,'red',1,NA,12)
>
myby2=c('wet','dry',99,95,NA,'damp',95,99,'red',99,NA,NA)
>
> myby1.new = ifelse(is.na(myby1)==T,"is_na",myby1)
> myby2.new = ifelse(is.na(myby2)==T,"is_na",myby2)
> str(myby1.new)
> str(myby2.new)
>
> mybys=data.frame(mbn1=myby1.new,mbn2=myby2.new , stringsAsFactors =F)
> str(mybys)
>
>
> #
> myagg1 = aggregate(x=mydf, by=mybys, FUN='mean')
> str(myagg1)
>
>
> myagg2 = myagg1
> myagg2[1:ncol(mybys)] = as.character(unlist(myagg1[1:ncol(mybys)]))
> str(myagg2)
>
> myagg1
> myagg2
>
>
> ---------------------------------
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

R help - Jul 2007 - aggregate.data.frame - prevent conversion to factors? show statistics for NA values of "by" variable?

[R] aggregate.data.frame - prevent conversion to factors? show statistics for NA values of "by" variable?

[R] aggregate.data.frame - prevent conversion to factors? show statistics for NA values of "by" variable?

Maybe Matching Threads