Dan E. Kelley
2000-Mar-14 23:29 UTC
[R] boxplots of 1 datum AND comparing rank and boolean
Q: When R does 'plot()' in a context that yields boxplots, is there a way to force it to draw something even if there are only 1 or two data in the category? I'd like for it to draw the data, perhaps using the outlier symbols. My code is (*** marks the line in question) is the following, for R-1.0.0: d <- read.table("nserc-results-pgsb", header=FALSE, col.names=c("name","dept","rank","accept")) # These data look like: # First.Student Some.Department 1 1 # Second.Student Another.Department 2 1 # Third.Student Another.Department 3 0 attach(d) rank.inv <- 1/rank ll <- lm(accept ~ rank.inv + dept, data=d) print(summary(ll)) print(anova(ll)) plot(dept,resid(ll)) # makes boxplots *** Actually, if anybody has a bright idea how I should analyse such data, I'd love to hear it. As you can see in the above, I transformed to 1/rank since our committee recorded high 'rank' values for students we favoured. It's not clear to me how to compare rankings to boolean (accept/deny) results, so the 'lm()' above might be silly. Thanks in advance for any advice. This group is so generous, it amazes me. PS: just because I think it's fun to read what sort of work folks are doing, the above is work I'm doing in trying to analyze the patterns in the granting of scholarships by NSERC, the science granting agency in Canada. I chair a committee at my university that ranks postgraduate students and sends the files to NSERC. While NSERC nearly obeys our rankings, it seems to me that favour some departments. I'd like to test that (hence "accept ~ rank.inv + dept" in the above). Dan E. Kelley internet: mailto:Dan.Kelley at Dal.CA Oceanography Department phone: (902)494-1694 Dalhousie University fax: (902)494-2885 Halifax, NS, CANADA, B3H 4J1 phys.ocean.dal.ca/~kelley -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>>>>> "Dan" == Dan E Kelley <kelley at Phys.Ocean.Dal.CA> writes:Dan> Q: When R does 'plot()' in a context that yields boxplots, is there a Dan> way to force it to draw something even if there are only 1 or two data Dan> in the category? I'd like for it to draw the data, perhaps using the Dan> outlier symbols. My code is (*** marks the line in question) is the Dan> following, for R-1.0.0: Dan> d <- read.table("nserc-results-pgsb", header=FALSE, Dan> col.names=c("name","dept","rank","accept")) Dan> # These data look like: Dan> # First.Student Some.Department 1 1 Dan> # Second.Student Another.Department 2 1 Dan> # Third.Student Another.Department 3 0 but contain more than just three observations, right ? Dan> attach(d) Dan> rank.inv <- 1/rank Dan> ll <- lm(accept ~ rank.inv + dept, data=d) Dan> print(summary(ll)) Dan> print(anova(ll)) Dan> plot(dept,resid(ll)) # makes boxplots *** Dan> Actually, if anybody has a bright idea how I should analyse such data, Dan> I'd love to hear it. As you can see in the above, I transformed to Dan> 1/rank since our committee recorded high 'rank' values for students we Dan> favoured. It's not clear to me how to compare rankings to boolean Dan> (accept/deny) results, so the 'lm()' above might be silly. I have misunderstood you completely.. Problem is I cannot repeat your example, since you didn't use "public" data. (Usually, you'd construct data, something like d <- data.frame(accept = rbinom(100, size=1, pr = .4), rank = sample(1:100), dept = gl(5, 20)) ) Are you discussing the boxplots that are produced with only 1 or 2 observations per group? Here are boxplots for n=1, 2, 3, and 4 obs. per group. What's wrong with these ? do.call("boxplot", lapply(1:4,seq)) title("Boxplot()s of very few points") *Or* are you suggesting that for n=1, n=2 (and maybe n=3) per group plot(factor, continuous) shouldn't use boxplot()s but rather dot plots ? This is a suggestion that I've heard and had myself before, very well worth discussing. - How should the decision boxplot / dotplot be made, just depend on n? Wouldn't one want the box + the single observations, e.g. when in one group n = 3, but in all other groups n ~= 20 (which would make boxplots there in any case)? - (When) should jittering be used ? Regards, Martin Maechler <maechler at stat.math.ethz.ch> stat.ethz.ch/~maechler Seminar fuer Statistik, ETH-Zentrum LEO D10 Leonhardstr. 27 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-3408 fax: ...-1228 <>< -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._