José Augusto M. de Andrade Junior
2008-Jan-03  10:53 UTC
[R] Data frame manipulation - newbie question
Hi all, Could someone please explain how can i efficientily query a data frame with several factors, as shown below: --------------------------------------------------------------------------------------------------------- Data frame: pt.knn --------------------------------------------------------------------------------------------------------- row | k.idx | step.forwd | pt.num | model | prev | value | abs.error 1 200 0 1 lm 09 10.5 1.5 2 200 0 2 lm 11 10.5 1.5 3 201 1 1 lm 10 12 2.0 4 201 1 2 lm 12 12 2.0 5 202 2 1 lm 12 12.1 0.1 6 202 2 2 lm 12 12.1 0.1 7 200 0 1 rlm 10.1 10.5 0.4 8 200 0 2 rlm 10.3 10.5 0.2 9 201 1 1 rlm 11.6 12 0.4 10 201 1 2 rlm 11.4 12 0.6 11 202 2 1 rlm 11.8 12.1 0.1 12 202 2 2 rlm 11.9 12.1 0.2 ---------------------------------------------------------------------------------------------------------- k.idx, step.forwd, pt.num and model columns are FACTORS. prev, value, abs.error are numeric I need to take the mean value of the numeric columns (prev, value and abs.error) for each k.idx and step.forwd and model. So: rows 1 and 2, 3 and 4, 5 and 6,7 and 8, 9 and 10, 11 and 12 must be grouped together. Next, i need to plot a boxplot of the mean(abs.error) of each model for each k.idx. I need to compare the abs.error of the two models for each step and the mean overall abs.error of each model. And so on. I read the manuals, but the examples there are too simple. I know how to do this manipulation in a "brute force" manner, but i wish to learn how to work the right way with R. Could someone help me? Thanks in advance. Jos? Augusto Undergraduate student University of S?o Paulo Business Administration Faculty
Hi r-help-bounces at r-project.org napsal dne 03.01.2008 11:53:38:> Hi all, > > Could someone please explain how can i efficientily query a data frame > with several factors, as shown below: > >---------------------------------------------------------------------------------------------------------> Data frame: pt.knn >---------------------------------------------------------------------------------------------------------> row | k.idx | step.forwd | pt.num | model | prev | value > | abs.error > 1 200 0 1 lm 09 > 10.5 1.5 > 2 200 0 2 lm 11 > 10.5 1.5 > 3 201 1 1 lm 10 > 12 2.0 > 4 201 1 2 lm 12 > 12 2.0 > 5 202 2 1 lm 12 > 12.1 0.1 > 6 202 2 2 lm 12 > 12.1 0.1 > 7 200 0 1 rlm 10.1 > 10.5 0.4 > 8 200 0 2 rlm 10.3 > 10.5 0.2 > 9 201 1 1 rlm 11.6 > 12 0.4 > 10 201 1 2 rlm 11.4 > 12 0.6 > 11 202 2 1 rlm 11.8 > 12.1 0.1 > 12 202 2 2 rlm 11.9 > 12.1 0.2 >----------------------------------------------------------------------------------------------------------> > k.idx, step.forwd, pt.num and model columns are FACTORS. > prev, value, abs.error are numeric > > I need to take the mean value of the numeric columns (prev, value and > abs.error) for each k.idx and step.forwd and model. So: rows 1 and 2, > 3 and 4, 5 and 6,7 and 8, 9 and 10, 11 and 12 must be grouped > together.aggregate(numeric.columns, list(factors), mean)> > Next, i need to plot a boxplot of the mean(abs.error) of each model > for each k.idx.Maybe boxplot(split(abs.error, interaction(k.idx, model))) Regards Petr> I need to compare the abs.error of the two models for each step and > the mean overall abs.error of each model. And so on. > > I read the manuals, but the examples there are too simple. I know how > to do this manipulation in a "brute force" manner, but i wish to learn > how to work the right way with R. > > Could someone help me? > Thanks in advance. > > Jos? Augusto > Undergraduate student > University of S?o Paulo > Business Administration Faculty > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Hi, you may want to use that apply / tapply function. Some find it a bit hard to grasp at first, but it will help you many times in many situations when you get the hang of it. Maybe you can get some information on my site: http:// www.rensenieuwenhuis.nl/r-project/manual/basics/tables/ Hope this helps, Rense Nieuwenhuis On Jan 3, 2008, at 11:53 , Jos? Augusto M. de Andrade Junior wrote:> Hi all, > > Could someone please explain how can i efficientily query a data frame > with several factors, as shown below: > > ---------------------------------------------------------------------- > ----------------------------------- > Data frame: pt.knn > ---------------------------------------------------------------------- > ----------------------------------- > row | k.idx | step.forwd | pt.num | model | prev | value > | abs.error > 1 200 0 1 lm 09 > 10.5 1.5 > 2 200 0 2 lm 11 > 10.5 1.5 > 3 201 1 1 lm 10 > 12 2.0 > 4 201 1 2 lm 12 > 12 2.0 > 5 202 2 1 lm 12 > 12.1 0.1 > 6 202 2 2 lm 12 > 12.1 0.1 > 7 200 0 1 rlm 10.1 > 10.5 0.4 > 8 200 0 2 rlm 10.3 > 10.5 0.2 > 9 201 1 1 rlm 11.6 > 12 0.4 > 10 201 1 2 rlm 11.4 > 12 0.6 > 11 202 2 1 rlm 11.8 > 12.1 0.1 > 12 202 2 2 rlm 11.9 > 12.1 0.2 > ---------------------------------------------------------------------- > ------------------------------------ > > k.idx, step.forwd, pt.num and model columns are FACTORS. > prev, value, abs.error are numeric > > I need to take the mean value of the numeric columns (prev, value and > abs.error) for each k.idx and step.forwd and model. So: rows 1 and 2, > 3 and 4, 5 and 6,7 and 8, 9 and 10, 11 and 12 must be grouped > together. > > Next, i need to plot a boxplot of the mean(abs.error) of each model > for each k.idx. > I need to compare the abs.error of the two models for each step and > the mean overall abs.error of each model. And so on. > > I read the manuals, but the examples there are too simple. I know how > to do this manipulation in a "brute force" manner, but i wish to learn > how to work the right way with R. > > Could someone help me? > Thanks in advance. > > Jos? Augusto > Undergraduate student > University of S?o Paulo > Business Administration Faculty > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. >