Hello I have a huge data frame with three columns 'Roof' 'Month' and 'Temp' i want to run analyses on the numerical Temp data by the factors Roof and Month, separately and together. For using more than one factor i understand i should use aggregate, but i am struggling with the tapply for single factor analysis.> tapply(Temp, INDEX = Roof, FUN = median)This works fine, however if i try to do anything a bit more complex, such as:> tapply(Temp, INDEX = Roof, FUN = kruskal.test)it gives the error - Error in length(g) : 'g' is missing What could be the problem? Thanks -- View this message in context: http://r.789695.n4.nabble.com/tapply-confusion-tp4641729.html Sent from the R help mailing list archive at Nabble.com.
Le mercredi 29 ao?t 2012 ? 07:37 -0700, andyspeak a ?crit :> Hello > I have a huge data frame with three columns 'Roof' 'Month' and 'Temp' > i want to run analyses on the numerical Temp data by the factors Roof and > Month, separately and together. > For using more than one factor i understand i should use aggregate, but i am > struggling with the tapply for single factor analysis. > > > tapply(Temp, INDEX = Roof, FUN = median) > > This works fine, however if i try to do anything a bit more complex, such > as: > > > tapply(Temp, INDEX = Roof, FUN = kruskal.test) > > it gives the error - Error in length(g) : 'g' is missing > > What could be the problem?If you read ?kruskal.test, you'll notice its default function takes (at least) two arguments, the second being "g". Its description is: g: a vector or factor object giving the group for the corresponding elements of ?x?. Ignored if ?x? is a list. So you do not need tapply(): just call kruskal.test(Temp, Roof) The "theoretical" reason you cannot use tapply() is that it calls "FUN" separately for each subset of the data. kruskal.test() would never be passed the whole data set, which is needed to make a test of differences. Regards
On Aug 29, 2012, at 7:37 AM, andyspeak wrote:> Hello > I have a huge data frame with three columns 'Roof' 'Month' and 'Temp' > i want to run analyses on the numerical Temp data by the factors > Roof and > Month, separately and together. > For using more than one factor i understand i should use aggregate, > but i am > struggling with the tapply for single factor analysis. > >> tapply(Temp, INDEX = Roof, FUN = median) > > This works fine, however if i try to do anything a bit more complex, > such > as: > >> tapply(Temp, INDEX = Roof, FUN = kruskal.test) > > it gives the error - Error in length(g) : 'g' is missingWhat is the sound of one hand clapping? You are sending a bunch of single vectors with no grouping variable to a function that is expecting two data columns. Maybe you should explain what test you had in mind using natural language and we could help get you there. -- David Winsemius, MD Alameda, CA, USA
Hello Thankyou for the help. kruskal.test(Temp, Roof) is simple but just returns one result for the whole temperature dataset organised by roof. I want to compare the Temp data for each Roof in each Month. So because i have temperature data on the three roofs for 16 different months then i want 16 separate kruskal.test results., How do i do this? Thanks -- View this message in context: http://r.789695.n4.nabble.com/tapply-confusion-tp4641729p4641820.html Sent from the R help mailing list archive at Nabble.com.
Actually its okay. I just created 16 subsets of the dataframe using the different months and then ran kruskal test 16 times. Im sure there is a nice way to code this to do it automatically and produce a nice table of the results but i only started learning R two weeks ago!!! Thanks for all the help -- View this message in context: http://r.789695.n4.nabble.com/tapply-confusion-tp4641729p4641821.html Sent from the R help mailing list archive at Nabble.com.
On Aug 30, 2012, at 4:02 AM, andyspeak wrote:> Hello > Thankyou for the help. > > kruskal.test(Temp, Roof) is simple but just returns one result for > the > whole temperature dataset organised by roof. > > I want to compare the Temp data for each Roof in each Month. So > because i > have temperature data on the three roofs for 16 different months > then i want > 16 separate kruskal.test results., >lapply( split(dfrm, dfrm$Month), function(xfrm) { kruskal.test(xfrm[["Temp"]], xfrm[["Roof"]] } Notice that I used an assumed name for the dataframe. You have apparently been following unwise advice to use attach. You would be advised to disregard that advice. -- David Winsemius, MD Alameda, CA, USA