greg.kochanski@phon.ox.ac.uk
2006-Jan-29 18:16 UTC
[Rd] mosaicplot() labels overlap (PR#8536)
Full_Name: Greg Kochanski Version: 2.2.1 OS: Debian Linux (testing) Submission from: (NULL) (212.159.16.190) This is really a feature request. When you do mosaicplot() on a data set where the probability of several nearby rows is small, then the labels for those rows are plotted overlapping each other. This situation can be improved by calling mosaicplot() with a large value of "off", but sometimes, even off=50 (the largest allowable value) isn't sufficient, especially if the labels are several characters long. The problem exists even if the labels don't overlap, because one needs space between the labels to avoid confusion. For instance, labels "L*H", "!H*", and "L%" when too close together turn into "L*H!H*L%" which is confusing to anyone. The problem could be solved by breaking the assumption that the label position need always be exactly matched to the graphic. This is OK, especially for rows because (a) the graphical blocks that are part of a single row aren't aligned with each other anyway, and (b) if you can read the labels, you can generally match things up by counting. One way to do this in a fairly nice way is to position the labels in such a way to minimize the sum of the squared error between the label center and the average position of the blocks on that row, subject to the constraint that labels be non-overlapping. This problem is actually not too hard to solve: it is essentially Kruskal's algorithm for finding a best-fit monotonic sequence (which probably exists in CRAN already). Neglecting edge effects, assume you have a vector of desired positions z, and a vector of minimum widths for each label w. Then, you can compute the space used up by the labels: s[i] = -0.5*w[1] + sum(j<i of w[i]) + 0.5*w[i] and compute y = M(z-s) + s where M() gives the best-fit monotonically nondecreasing fit to it's argument. Y should then be the correct place to put each label. If there's a likelyhood of getting a patch accepted, I could probably supply one. (Given the opportunity, I'd think about shifting the blocks up and down also, to do an overall alignment.)
On Sun, 29 Jan 2006 greg.kochanski at phon.ox.ac.uk wrote:> Full_Name: Greg Kochanski > Version: 2.2.1 > OS: Debian Linux (testing) > Submission from: (NULL) (212.159.16.190) > > > This is really a feature request.Hence not a bug (just for the record). A potential solution to your problem is to write your own labeling function for strucplot() in vcd implementing the approach you suggest below. Also check the available labeling functions whether they can be used to produce acceptable results. One way which might work, depending on your specific data, is to rotate the labels. Details can again be found in the package vignettes. Z> When you do mosaicplot() on a data set where the probability of > several nearby rows is small, then the labels for those > rows are plotted overlapping each other. > > This situation can be improved by calling mosaicplot() > with a large value of "off", but sometimes, even off=50 > (the largest allowable value) isn't sufficient, > especially if the labels are several characters long. > > The problem exists even if the labels don't overlap, > because one needs space between the labels to avoid > confusion. For instance, labels "L*H", "!H*", and > "L%" when too close together turn into > "L*H!H*L%" which is confusing to anyone. > > The problem could be solved by breaking the assumption that > the label position need always be exactly matched to the > graphic. This is OK, especially for rows because > (a) the graphical blocks that are part of a single row > aren't aligned with each other anyway, and > (b) if you can read the labels, you can generally > match things up by counting. > > One way to do this in a fairly nice way is to position > the labels in such a way to minimize the > sum of the squared error between the label center > and the average position of the blocks on that row, > subject to the constraint that labels be > non-overlapping. > > This problem is actually not too hard to solve: > it is essentially Kruskal's algorithm for finding > a best-fit monotonic sequence (which probably exists in > CRAN already). > > Neglecting edge effects, assume you have a > vector of desired positions z, and > a vector of minimum widths for each label w. > Then, you can compute the space used up by > the labels: s[i] = -0.5*w[1] + sum(j<i of w[i]) + 0.5*w[i] > and compute y = M(z-s) + s > where M() gives the best-fit monotonically nondecreasing > fit to it's argument. Y should then be the correct > place to put each label. > > If there's a likelyhood of getting a patch accepted, > I could probably supply one. > > (Given the opportunity, I'd think about shifting the blocks > up and down also, to do an overall alignment.) > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >