Hi,
I am trying to aggregate some data and I am confused by the results.
I load a data frame "all" from a csv file, and then I do:
(FOO,BAR,X,Y come from the header line in the csv file,
BTW, how do I rename a column?)
byFOO <- aggregate(list(all$BAR,all$QUUX,all$X/all$Y),
                     by = list(FOO=all$FOO),
                     FUN = mean);
I expect a data frame with 4 columns: FOO,BAR,QUUX and X/Y with all FOO
being different (they are character strings, do I need a special
incantation to turn them into factors?)
what I get is indeed a data frame but with names
[1] "FOO"
[2]
"c.1.78e.11..4.38e.09..1.461e.11..4.3186e.10..1.1181e.10..5.5389e.10.."
[3]
"c.33879300..3713870..190963000..7042170..4590010..91569200..12108200.."
[4]
"c.1.37087599544937..1.72690992018244..1.82034830430797..1.70338983050847.."
why? how do I fix the column names?
then I am trying to add to that same frame byFOO some other columns:
byFOO$Count <- aggregate(all$FOO, by = list(all$FOO), FUN = length);
byFOO$Mean <- aggregate(all$Value, by = list(all$FOO), FUN = mean);
byFOO$Total <- aggregate(all$Value, by = list(all$FOO), FUN = sum);
however, byFOO$Count et al are not columns in byFOO with the appropriate
names ("Count" &c) but data frames with columns
"Group.1" and "x".
Luckily, at least it appears that byFOO$Count$Group.1 is the same as
byFOO$FOO, as they should be, although I don't see any function which
would check that two vectors are the same ("==" returns a vector which
I
have to manually inspect for presence of "FALSE").
So, how do I aggregate the data frame?
How do I rename a column?
How do I check that two vectors are the same?
thanks a lot!
PS. I have not used R for a few years, so please be gentle...
PPS. Please do not tell me to RTFM - I did. At least tell me what to
search for.
-- 
Sam Steingold (http://sds.podval.org/) on CentOS release 5.3 (Final)
http://dhimmi.com http://camera.org http://palestinefacts.org
http://memri.org http://jihadwatch.org http://ffii.org http://pmw.org.il
There are 10 kinds of people: those who count in binary and those who do not.
The fact that your column names from your aggregate result contain multiple numbers, suggests that something has gone wrong with reading your data in from file. Have you had a look at your data.frame 'all'? Are BAR and X etc. numeric? Judging from the 'c. etc' they aren't.> So, how do I aggregate the data frame?Aggregate either accepts a data.frame or a vector as first argument (actually anything that can be coerced into a data.frame). In case of a data.frame is applies the aggregation function to each column. So, your first aggregate call should be ok (except that you input might be wrong (see above)). However, you didn't use names arguments in you list() so R will generate names for you. Hence, the strange names. aggregate returns a data.frame. So if you want to do combine more than one aggregate call, you can use merge to merge the results: Count<- aggregate(all$FOO, by = list(FOO=all$FOO), FUN = length); byFOO<- merge(byFOO, by="FOO") If you want to have a vector you could use tapply.> How do I rename a column??names e.g. names(all)<- c("column1" , "column2", ...)> How do I check that two vectors are the same??all all(vector1 == vector2) but first have a look at: http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f HTH, Jan On 02/15/2011 12:42 AM, Sam Steingold wrote:> Hi, > > I am trying to aggregate some data and I am confused by the results. > I load a data frame "all" from a csv file, and then I do: > (FOO,BAR,X,Y come from the header line in the csv file, > BTW, how do I rename a column?) > > byFOO<- aggregate(list(all$BAR,all$QUUX,all$X/all$Y), > by = list(FOO=all$FOO), > FUN = mean); > > I expect a data frame with 4 columns: FOO,BAR,QUUX and X/Y with all FOO > being different (they are character strings, do I need a special > incantation to turn them into factors?) > what I get is indeed a data frame but with names > > [1] "FOO" > [2] "c.1.78e.11..4.38e.09..1.461e.11..4.3186e.10..1.1181e.10..5.5389e.10.." > [3] "c.33879300..3713870..190963000..7042170..4590010..91569200..12108200.." > [4] "c.1.37087599544937..1.72690992018244..1.82034830430797..1.70338983050847.." > > why? how do I fix the column names? > > then I am trying to add to that same frame byFOO some other columns: > > byFOO$Count<- aggregate(all$FOO, by = list(all$FOO), FUN = length); > byFOO$Mean<- aggregate(all$Value, by = list(all$FOO), FUN = mean); > byFOO$Total<- aggregate(all$Value, by = list(all$FOO), FUN = sum); > > however, byFOO$Count et al are not columns in byFOO with the appropriate > names ("Count"&c) but data frames with columns "Group.1" and "x". > Luckily, at least it appears that byFOO$Count$Group.1 is the same as > byFOO$FOO, as they should be, although I don't see any function which > would check that two vectors are the same ("==" returns a vector which I > have to manually inspect for presence of "FALSE"). > > So, how do I aggregate the data frame? > How do I rename a column? > How do I check that two vectors are the same? > > thanks a lot! > > PS. I have not used R for a few years, so please be gentle... > PPS. Please do not tell me to RTFM - I did. At least tell me what to > search for. >
On 2011-02-14 15:42, Sam Steingold wrote:> Hi, > > I am trying to aggregate some data and I am confused by the results. > I load a data frame "all" from a csv file, and then I do: > (FOO,BAR,X,Y come from the header line in the csv file, > BTW, how do I rename a column?) > > byFOO<- aggregate(list(all$BAR,all$QUUX,all$X/all$Y), > by = list(FOO=all$FOO), > FUN = mean); > > I expect a data frame with 4 columns: FOO,BAR,QUUX and X/Y with all FOO > being different (they are character strings, do I need a special > incantation to turn them into factors?) > what I get is indeed a data frame but with names > > [1] "FOO" > [2] "c.1.78e.11..4.38e.09..1.461e.11..4.3186e.10..1.1181e.10..5.5389e.10.." > [3] "c.33879300..3713870..190963000..7042170..4590010..91569200..12108200.." > [4] "c.1.37087599544937..1.72690992018244..1.82034830430797..1.70338983050847.."I think that all you need is to provide names in your aggregate() call: byFOO <- aggregate(list(V1 = all$BAR, V2 = ....) For renaming variables in the dataframe any time, see help(names). Peter Ehlers> > why? how do I fix the column names? > > then I am trying to add to that same frame byFOO some other columns: > > byFOO$Count<- aggregate(all$FOO, by = list(all$FOO), FUN = length); > byFOO$Mean<- aggregate(all$Value, by = list(all$FOO), FUN = mean); > byFOO$Total<- aggregate(all$Value, by = list(all$FOO), FUN = sum); > > however, byFOO$Count et al are not columns in byFOO with the appropriate > names ("Count"&c) but data frames with columns "Group.1" and "x". > Luckily, at least it appears that byFOO$Count$Group.1 is the same as > byFOO$FOO, as they should be, although I don't see any function which > would check that two vectors are the same ("==" returns a vector which I > have to manually inspect for presence of "FALSE"). > > So, how do I aggregate the data frame? > How do I rename a column? > How do I check that two vectors are the same? > > thanks a lot! > > PS. I have not used R for a few years, so please be gentle... > PPS. Please do not tell me to RTFM - I did. At least tell me what to > search for. >
> * Sam Steingold <fqf at tah.bet> [2011-02-14 18:42:40 -0500]: > > byFOO$Mean <- aggregate(all$Value, by = list(all$FOO), FUN = mean)$x;this fails with There were 50 or more warnings (use warnings() to see the first 50)> warnings()Warning messages: 1: In mean.default(X[[1L]], ...) : argument is not numeric or logical: returning NA 2: In mean.default(X[[2L]], ...) : argument is not numeric or logical: returning NA 3: In mean.default(X[[3L]], ...) : argument is not numeric or logical: returning NA 4: In mean.default(X[[4L]], ...) : argument is not numeric or logical: returning NA 5: In mean.default(X[[5L]], ...) : argument is not numeric or logical: returning NA note that there are absolutely no NAs in all$Value:> all(!is.na(all$Value))[1] TRUE -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.3 (Final) http://iris.org.il http://www.memritv.org http://pmw.org.il http://camera.org http://mideasttruth.com http://dhimmi.com ((lambda (x) `(,x ',x)) '(lambda (x) `(,x ',x)))