If an array has missing values in different rows, plotting using the formula=20 interface can produce errors. Example: fake.data <- matrix(rep(-100:100, 4), ncol =3D 4) par(mfrow =3D c(1,2)) boxplot(fake.data ~ col(fake.data)) abline(h =3D 0, lty =3D 2) boxplot(as.data.frame(fake.data)) abline(h =3D 0, lty =3D 2) ##### Add the missing data fake.data[190:200, 1] <- NA fake.data[1:5, 3] <- NA ## Bot only columns 1 and 3 should change!! (and in opposite directions) par(mfrow =3D c(1, 2)) boxplot(fake.data ~ col(fake.data)) abline(h =3D 0, lty =3D 2) boxplot(as.data.frame(fake.data)) abline(h =3D 0, lty =3D 2) ### The problem is that the same rows are removed from all the columns: bp.a <- boxplot(fake.data ~ col(fake.data)) bp.df<- boxplot(as.data.frame(fake.data)) ### which happens during the call to eval(m, parent.frame()) inside boxplot.formula ********************************** This happens in at least: _ =20 platform i686-pc-linux-gnu arch i686 =20 os linux-gnu =20 system i686, linux-gnu =20 status Patched =20 major 1 =20 minor 9.0 =20 year 2004 =20 month 05 =20 day 02 =20 language R =20 _ =20 platform i386-pc-linux-gnu arch i386 =20 os linux-gnu =20 system i386, linux-gnu =20 status =20 major 1 =20 minor 8.1 =20 year 2003 =20 month 11 =20 day 21 =20 language R =20 _ =20 platform i686-pc-linux-gnu =20 arch i686 =20 os linux-gnu =20 system i686, linux-gnu =20 status Under development (unstable) major 2 =20 minor 0.0 =20 year 2004 =20 month 04 =20 day 30 =20 language R =20 =2D-=20 Ram=F3n D=EDaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncol=F3gicas (CNIO) (Spanish National Cancer Center) Melchor Fern=E1ndez Almagro, 3 28029 Madrid (Spain) =46ax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://bioinfo.cnio.es/~rdiaz PGP KeyID: 0xE89B3462 (http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc)
Prof Brian Ripley
2004-May-03 13:34 UTC
[Rd] boxplot.formula with missing values (PR#6846)
I think this *is* the correct behaviour for a formula method. The problem I see is that boxplot.formula does not have an na.action argument and so you may not have realised that na.action=na.omit is the default. Note that subset= will `remove the same rows from all columns', too. It really is not the intention that the formula interface is used with matrices, and as.vector will do what I think you intended: boxplot(as.vector(fake.data) ~ as.vector(col(fake.data))) Also, setting options(na.action=na.pass) will work as you expected. I've added an na.action argument for R-devel. On Mon, 3 May 2004 rdiaz@cnio.es wrote:> If an array has missing values in different rows, plotting using the formul> a=20 > interface can produce errors. Example:Well, not do what you expected, but the error appears to be in your expectations.> fake.data <- matrix(rep(-100:100, 4), > ncol =3D 4) > > par(mfrow =3D c(1,2)) > boxplot(fake.data ~ col(fake.data)) > abline(h =3D 0, lty =3D 2) > boxplot(as.data.frame(fake.data)) > abline(h =3D 0, lty =3D 2) > > ##### Add the missing data > fake.data[190:200, 1] <- NA > fake.data[1:5, 3] <- NA > > ## Bot only columns 1 and 3 should change!! (and in opposite directions) > par(mfrow =3D c(1, 2)) > boxplot(fake.data ~ col(fake.data)) > abline(h =3D 0, lty =3D 2) > boxplot(as.data.frame(fake.data)) > abline(h =3D 0, lty =3D 2) > > ### The problem is that the same rows are removed from all the columns: > > bp.a <- boxplot(fake.data ~ col(fake.data)) > bp.df<- boxplot(as.data.frame(fake.data)) > > ### which happens during the call to > > eval(m, parent.frame()) > > inside boxplot.formula > > ********************************** > > This happens in at least: > > _ =20 > platform i686-pc-linux-gnu > arch i686 =20 > os linux-gnu =20 > system i686, linux-gnu =20 > status Patched =20 > major 1 =20 > minor 9.0 =20 > year 2004 =20 > month 05 =20 > day 02 =20 > language R =20 > > _ =20 > platform i386-pc-linux-gnu > > arch i386 =20 > os linux-gnu =20 > system i386, linux-gnu =20 > status =20 > major 1 =20 > minor 8.1 =20 > year 2003 =20 > month 11 =20 > day 21 =20 > language R =20 > > _ =20 > platform i686-pc-linux-gnu =20 > arch i686 =20 > os linux-gnu =20 > system i686, linux-gnu =20 > status Under development (unstable) > major 2 =20 > minor 0.0 =20 > year 2004 =20 > month 04 =20 > day 30 =20 > language R =20 > > > > > > =2D-=20 > Ram=F3n D=EDaz-Uriarte > Bioinformatics Unit > Centro Nacional de Investigaciones Oncol=F3gicas (CNIO) > (Spanish National Cancer Center) > Melchor Fern=E1ndez Almagro, 3 > 28029 Madrid (Spain) > =46ax: +-34-91-224-6972 > Phone: +-34-91-224-6900 > > http://bioinfo.cnio.es/~rdiaz > PGP KeyID: 0xE89B3462 > (http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc) > > ______________________________________________ > R-devel@stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-devel > >-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595