Giovanni Azua
2011-Nov-14 00:33 UTC
[R] 2^k*r (with replications) experimental design question
Hello, I have one replication (r=1 of the 2^k*r) of a 2^k experimental design in the context of performance analysis i.e. my response variables are Throughput and Response Time. I use the "aov" function and the results look ok:> str(throughput)'data.frame': 286 obs. of 7 variables: $ Time : int 6 7 8 9 10 11 12 13 14 15 ... $ Throughput : int 42 44 33 41 43 40 37 40 42 37 ... $ No_databases : Factor w/ 2 levels "1","4": 1 1 1 1 1 1 1 1 1 1 ... $ Partitioning : Factor w/ 2 levels "sharding","replication": 1 1 1 1 1 1 1 1 1 1 ... $ No_middlewares: Factor w/ 2 levels "2","4": 1 1 1 1 1 1 1 1 1 1 ... $ Queue_size : Factor w/ 2 levels "40","100": 1 1 1 1 1 1 1 1 1 1 ... $ No_clients : Factor w/ 1 level "128": 1 1 1 1 1 1 1 1 1 1 ...> head(throughput)Time Throughput No_databases Partitioning No_middlewares Queue_size 1 6 42 1 sharding 2 40 2 7 44 1 sharding 2 40 3 8 33 1 sharding 2 40 4 9 41 1 sharding 2 40 5 10 43 1 sharding 2 40 6 11 40 1 sharding 2 40> > throughput.aov <- aov(Throughput~No_databases+Partitioning+No_middlewares+Queue_size,data=throughput) > summary(throughput.aov)Df Sum Sq Mean Sq F value Pr(>F) No_databases 1 28488651 28488651 53.4981 2.713e-12 *** Partitioning 1 71687 71687 0.1346 0.713966 No_middlewares 1 5624454 5624454 10.5620 0.001295 ** Queue_size 1 50892 50892 0.0956 0.757443 Residuals 281 149637226 532517 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1>This is somehow what I expected and I am happy, it is saying that the Throughput is significatively affected firstly by the number of database instances and secondly by the number of middleware instances. The problem is that I need to integrate multiple replications of this same 2^k so I can also account for experimental error i.e. the _r_ of 2^k*r but I can't see how to integrate the _r_ term into the data and into the aov function parameters. Can anyone advice? TIA, Best regards, Giovanni
Dennis Murphy
2011-Nov-14 01:38 UTC
[R] 2^k*r (with replications) experimental design question
I'm guessing you have nine replicates of a 2^5 factorial design with a couple of missing values. If so, define a variable to designate the replicates and use it as a blocking factor in the ANOVA. If you want to treat the replicates as a random rather than a fixed factor, then look into the nlme or lme4 packages. HTH, Dennis On Sun, Nov 13, 2011 at 4:33 PM, Giovanni Azua <bravegag at gmail.com> wrote:> Hello, > > I have one replication (r=1 of the 2^k*r) of a 2^k experimental design in the context of performance analysis i.e. my response variables are Throughput and Response Time. I use the "aov" function and the results look ok: > >> str(throughput) > 'data.frame': ? 286 obs. of ?7 variables: > ?$ Time ? ? ? ? ?: int ?6 7 8 9 10 11 12 13 14 15 ... > ?$ Throughput ? ?: int ?42 44 33 41 43 40 37 40 42 37 ... > ?$ No_databases ?: Factor w/ 2 levels "1","4": 1 1 1 1 1 1 1 1 1 1 ... > ?$ Partitioning ?: Factor w/ 2 levels "sharding","replication": 1 1 1 1 1 1 1 1 1 1 ... > ?$ No_middlewares: Factor w/ 2 levels "2","4": 1 1 1 1 1 1 1 1 1 1 ... > ?$ Queue_size ? ?: Factor w/ 2 levels "40","100": 1 1 1 1 1 1 1 1 1 1 ... > ?$ No_clients ? ?: Factor w/ 1 level "128": 1 1 1 1 1 1 1 1 1 1 ... >> head(throughput) > ?Time Throughput No_databases Partitioning No_middlewares Queue_size > 1 ? ?6 ? ? ? ? 42 ? ? ? ? ? ?1 ? ? sharding ? ? ? ? ? ? ?2 ? ? ? ? 40 > 2 ? ?7 ? ? ? ? 44 ? ? ? ? ? ?1 ? ? sharding ? ? ? ? ? ? ?2 ? ? ? ? 40 > 3 ? ?8 ? ? ? ? 33 ? ? ? ? ? ?1 ? ? sharding ? ? ? ? ? ? ?2 ? ? ? ? 40 > 4 ? ?9 ? ? ? ? 41 ? ? ? ? ? ?1 ? ? sharding ? ? ? ? ? ? ?2 ? ? ? ? 40 > 5 ? 10 ? ? ? ? 43 ? ? ? ? ? ?1 ? ? sharding ? ? ? ? ? ? ?2 ? ? ? ? 40 > 6 ? 11 ? ? ? ? 40 ? ? ? ? ? ?1 ? ? sharding ? ? ? ? ? ? ?2 ? ? ? ? 40 >> >> throughput.aov <- aov(Throughput~No_databases+Partitioning+No_middlewares+Queue_size,data=throughput) >> summary(throughput.aov) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Df ? ?Sum Sq ?Mean Sq F value ? ?Pr(>F) > No_databases ? ? ? 1 ? ?28488651 28488651 53.4981 2.713e-12 *** > Partitioning ? ? ? ? ? ?1 ? ?71687 ? ?71687 ?0.1346 ?0.713966 > No_middlewares ? 1 ? ? 5624454 ?5624454 10.5620 ?0.001295 ** > Queue_size ? ? ? ? ?1 ? ? 50892 ? ?50892 ?0.0956 ?0.757443 > Residuals ? ? ? ? ? ? 281 149637226 ? 532517 > --- > Signif. codes: ?0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 >> > > This is somehow what I expected and I am happy, it is saying that the Throughput is significatively affected firstly by the number of database instances and secondly by the number of middleware instances. > > The problem is that I need to integrate multiple replications of this same 2^k so I can also account for experimental error i.e. the _r_ of 2^k*r but I can't see how to integrate the _r_ term into the data and into the aov function parameters. Can anyone advice? > > TIA, > Best regards, > Giovanni > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >