If an array has missing values in different rows, plotting using the formula=20
interface can produce errors. Example:
fake.data <- matrix(rep(-100:100, 4),
ncol =3D 4)
par(mfrow =3D c(1,2))
boxplot(fake.data ~ col(fake.data))
abline(h =3D 0, lty =3D 2)
boxplot(as.data.frame(fake.data))
abline(h =3D 0, lty =3D 2)
##### Add the missing data
fake.data[190:200, 1] <- NA
fake.data[1:5, 3] <- NA
## Bot only columns 1 and 3 should change!! (and in opposite directions)
par(mfrow =3D c(1, 2))
boxplot(fake.data ~ col(fake.data))
abline(h =3D 0, lty =3D 2)
boxplot(as.data.frame(fake.data))
abline(h =3D 0, lty =3D 2)
### The problem is that the same rows are removed from all the columns:
bp.a <- boxplot(fake.data ~ col(fake.data))
bp.df<- boxplot(as.data.frame(fake.data))
### which happens during the call to
eval(m, parent.frame())
inside boxplot.formula
**********************************
This happens in at least:
_ =20
platform i686-pc-linux-gnu
arch i686 =20
os linux-gnu =20
system i686, linux-gnu =20
status Patched =20
major 1 =20
minor 9.0 =20
year 2004 =20
month 05 =20
day 02 =20
language R =20
_ =20
platform i386-pc-linux-gnu
arch i386 =20
os linux-gnu =20
system i386, linux-gnu =20
status =20
major 1 =20
minor 8.1 =20
year 2003 =20
month 11 =20
day 21 =20
language R =20
_ =20
platform i686-pc-linux-gnu =20
arch i686 =20
os linux-gnu =20
system i686, linux-gnu =20
status Under development (unstable)
major 2 =20
minor 0.0 =20
year 2004 =20
month 04 =20
day 30 =20
language R =20
=2D-=20
Ram=F3n D=EDaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncol=F3gicas (CNIO)
(Spanish National Cancer Center)
Melchor Fern=E1ndez Almagro, 3
28029 Madrid (Spain)
=46ax: +-34-91-224-6972
Phone: +-34-91-224-6900
http://bioinfo.cnio.es/~rdiaz
PGP KeyID: 0xE89B3462
(http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc)
Prof Brian Ripley
2004-May-03 13:34 UTC
[Rd] boxplot.formula with missing values (PR#6846)
I think this *is* the correct behaviour for a formula method. The problem I see is that boxplot.formula does not have an na.action argument and so you may not have realised that na.action=na.omit is the default. Note that subset= will `remove the same rows from all columns', too. It really is not the intention that the formula interface is used with matrices, and as.vector will do what I think you intended: boxplot(as.vector(fake.data) ~ as.vector(col(fake.data))) Also, setting options(na.action=na.pass) will work as you expected. I've added an na.action argument for R-devel. On Mon, 3 May 2004 rdiaz@cnio.es wrote:> If an array has missing values in different rows, plotting using the formul> a=20 > interface can produce errors. Example:Well, not do what you expected, but the error appears to be in your expectations.> fake.data <- matrix(rep(-100:100, 4), > ncol =3D 4) > > par(mfrow =3D c(1,2)) > boxplot(fake.data ~ col(fake.data)) > abline(h =3D 0, lty =3D 2) > boxplot(as.data.frame(fake.data)) > abline(h =3D 0, lty =3D 2) > > ##### Add the missing data > fake.data[190:200, 1] <- NA > fake.data[1:5, 3] <- NA > > ## Bot only columns 1 and 3 should change!! (and in opposite directions) > par(mfrow =3D c(1, 2)) > boxplot(fake.data ~ col(fake.data)) > abline(h =3D 0, lty =3D 2) > boxplot(as.data.frame(fake.data)) > abline(h =3D 0, lty =3D 2) > > ### The problem is that the same rows are removed from all the columns: > > bp.a <- boxplot(fake.data ~ col(fake.data)) > bp.df<- boxplot(as.data.frame(fake.data)) > > ### which happens during the call to > > eval(m, parent.frame()) > > inside boxplot.formula > > ********************************** > > This happens in at least: > > _ =20 > platform i686-pc-linux-gnu > arch i686 =20 > os linux-gnu =20 > system i686, linux-gnu =20 > status Patched =20 > major 1 =20 > minor 9.0 =20 > year 2004 =20 > month 05 =20 > day 02 =20 > language R =20 > > _ =20 > platform i386-pc-linux-gnu > > arch i386 =20 > os linux-gnu =20 > system i386, linux-gnu =20 > status =20 > major 1 =20 > minor 8.1 =20 > year 2003 =20 > month 11 =20 > day 21 =20 > language R =20 > > _ =20 > platform i686-pc-linux-gnu =20 > arch i686 =20 > os linux-gnu =20 > system i686, linux-gnu =20 > status Under development (unstable) > major 2 =20 > minor 0.0 =20 > year 2004 =20 > month 04 =20 > day 30 =20 > language R =20 > > > > > > =2D-=20 > Ram=F3n D=EDaz-Uriarte > Bioinformatics Unit > Centro Nacional de Investigaciones Oncol=F3gicas (CNIO) > (Spanish National Cancer Center) > Melchor Fern=E1ndez Almagro, 3 > 28029 Madrid (Spain) > =46ax: +-34-91-224-6972 > Phone: +-34-91-224-6900 > > http://bioinfo.cnio.es/~rdiaz > PGP KeyID: 0xE89B3462 > (http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc) > > ______________________________________________ > R-devel@stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-devel > >-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595