thr3ads.net - R help - [R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame) [Aug 2007]

If this information is useful, please help other people find it:
Share via:

Steven McKinney

2007-Aug-03 17:37 UTC

[R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame)

Hi all,

What are current methods people use in R to identify
mis-spelled column names when selecting columns
from a data frame?

Alice Johnson recently tackled this issue
(see [BioC] posting below).

Due to a mis-spelled column name ("FileName"
instead of "Filename") which produced no warning,
Alice spent a fair amount of time tracking down
this bug.  With my fumbling fingers I'll be tracking
down such a bug soon too.

Is there any options() setting, or debug technique
that will flag data frame column extractions that
reference a non-existent column?  It seems to me
that the "[.data.frame" extractor used to throw an
error if given a mis-spelled variable name, and I
still see lines of code in "[.data.frame" such as

if (any(is.na(cols))) 
            stop("undefined columns selected")



In R 2.5.1 a NULL is silently returned.
> foo <- data.frame(Filename = c("a", "b"))
> foo[, "FileName"]NULL

Has something changed so that the code lines
if (any(is.na(cols))) 
            stop("undefined columns selected")
in "[.data.frame" no longer work properly (if
I am understanding the intention properly)?

If not, could  "[.data.frame" check an
options() variable setting (say
warn.undefined.colnames) and throw a warning
if a non-existent column name is referenced?



> sessionInfo()R version 2.5.1 (2007-06-27) 
powerpc-apple-darwin8.9.1 

locale:
en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] "stats"     "graphics"  "grDevices"
"utils"     "datasets"  "methods"  
"base"

other attached packages:
     plotrix         lme4       Matrix      lattice 
     "2.2-3"  "0.99875-4" "0.999375-0"    
"0.16-2" > 


Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch on behalf of Johnstone, Alice
Sent: Wed 8/1/2007 7:20 PM
To: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame
 
 For interest sake, I have found out why I wasn't getting my expected
results when using read.AnnotatedDataFrame
Turns out the error was made in the ReadAffy command, where I specified
the filenames to be read from my AnnotatedDataFrame object.  There was a
typo error with a capital N ($FileName) rather than lowercase n
($Filename) as in my target file..whoops.  However this meant the
filename argument was ignored without the error message(!) and instead
of using the information in the AnnotatedDataFrame object (which
included filenames, but not alphabetically) it read the .cel files in
alphabetical order from the working directory - hence the wrong file was
given the wrong label (given by the order of Annotated object) and my
comparisons were confused without being obvious as to why or where.
Our solution: specify that filename is as.character so assignment of
file to target is correct(after correcting $Filename) now that using
read.AnnotatedDataFrame rather than readphenoData.

Data<-ReadAffy(filenames=as.character(pData(pd)$Filename),phenoData=pd)

Hurrah!

It may be beneficial to others, that if the filename argument isn't
specified, that filenames are read from the phenoData object if included
here.

Thanks!

-----Original Message-----
From: Martin Morgan [mailto:mtmorgan at fhcrc.org] 
Sent: Thursday, 26 July 2007 11:49 a.m.
To: Johnstone, Alice
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame

Hi Alice --

"Johnstone, Alice" <Alice.Johnstone at esr.cri.nz> writes:
> Using R2.5.0 and Bioconductor I have been following code to analysis 
> Affymetrix expression data: 2 treatments vs control.  The original 
> code was run last year and used the read.phenoData command, however 
> with the newer version I get the error message Warning messages:
> read.phenoData is deprecated, use read.AnnotatedDataFrame instead The 
> phenoData class is deprecated, use AnnotatedDataFrame (with
> ExpressionSet) instead
>  
> I use the read.AnnotatedDataFrame command, but when it comes to the 
> end of the analysis the comparison of the treatment to the controls 
> gets mixed up compared to what you get using the original 
> read.phenoData ie it looks like the 3 groups get labelled wrong and so
> the comparisons are different (but they can still be matched up).
> My questions are,
> 1) do you need to set up your target file differently when using 
> read.AnnotatedDataFrame - what is the standard format?
I can't quite tell where things are going wrong for you, so it would
help if you can narrow down where the problem occurs.  I think
read.AnnotatedDataFrame should be comparable to read.phenoData. Does
> pData(pd)
look right? What about
> pData(Data)
and
> pData(eset.rma)
? It's not important but pData(pd)$Target is the same as pd$Target.
Since the analysis is on eset.rma, it probably makes sense to use the
pData from there to construct your design matrix
> targs<-factor(eset.rma$Target)
> design<-model.matrix(~0+targs)
> colnames(design)<-levels(targs)
Does design look right?
> I have three columns sample, filename and target.
> 2) do you need to use a different model matrix to what I have?  
> 3) do you use a different command for making the contrasts?
Depends on the question! If you're performing the same analysis as last
year, then the model matrix and contrasts have to be the same!
> I have included my code below if that is of any assistance.
> Many Thanks!
> Alice
>  
>  
>  
> ##Read data
>
pd<-read.AnnotatedDataFrame("targets.txt",header=T,row.name="sample")
> Data<-ReadAffy(filenames=pData(pd)$FileName,phenoData=pd)
> ##normalisation
> eset.rma<-rma(Data)
> ##analysis
> targs<-factor(pData(pd)$Target)
> design<-model.matrix(~0+targs)
> colnames(design)<-levels(targs)
> fit<-lmFit(eset.rma,design)
>
cont.wt<-makeContrasts("treatment1-control","treatment2-control",level
> s> design)
> fit2<-contrasts.fit(fit,cont.wt)
> fit2.eb<-eBayes(fit2)
> testconts<-classifyTestsF(fit2.eb,p.value=0.01)
> topTable(fit2.eb,coef=2,n=300)
> topTable(fit2.eb,coef=1,n=300)
>  
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor

Steven McKinney

2007-Aug-03 18:10 UTC

head link

[R] FW: Selecting undefined column of a data frame (was [BioC]read.phenoData vs read.AnnotatedDataFrame)

I see now that for my example

> foo <- data.frame(Filename = c("a", "b"))
> foo[, "FileName"]NULL

the issue is in this clause of the 
"[.data.frame" extractor.

The lines
        if (drop && length(y) == 1L) 
            return(.subset2(y, 1L)) 
return the NULL result just before the
error check
        cols <- names(y)
        if (any(is.na(cols))) 
            stop("undefined columns selected")
is performed.

Is this intended behaviour, or has a logical
bug crept into the "[.data.frame" extractor?


    if (missing(i)) {
        if (missing(j) && drop && length(x) == 1L) 
            return(.subset2(x, 1L))
        y <- if (missing(j)) 
            x
        else .subset(x, j)
        if (drop && length(y) == 1L) 
            return(.subset2(y, 1L)) ## This returns a result before undefined
columns check is done.  Is this intended?
        cols <- names(y)
        if (any(is.na(cols))) 
            stop("undefined columns selected")
        if (any(duplicated(cols))) 
            names(y) <- make.unique(cols)
        nrow <- .row_names_info(x, 2L)
        if (drop && !mdrop && nrow == 1L) 
            return(structure(y, class = NULL, row.names = NULL))
        else return(structure(y, class = oldClass(x), row.names =
.row_names_info(x,
            0L)))
    }



> sessionInfo()R version 2.5.1 (2007-06-27) 
powerpc-apple-darwin8.9.1 

locale:
en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] "stats"     "graphics"  "grDevices"
"utils"     "datasets"  "methods"  
"base"

other attached packages:
     plotrix         lme4       Matrix      lattice 
     "2.2-3"  "0.99875-4" "0.999375-0"    
"0.16-2"

Should this discussion move to R-devel?

Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch on behalf of Steven McKinney
Sent: Fri 8/3/2007 10:37 AM
To: r-help at stat.math.ethz.ch
Subject: [R] FW: Selecting undefined column of a data frame (was
[BioC]read.phenoData vs read.AnnotatedDataFrame)
 
Hi all,

What are current methods people use in R to identify
mis-spelled column names when selecting columns
from a data frame?

Alice Johnson recently tackled this issue
(see [BioC] posting below).

Due to a mis-spelled column name ("FileName"
instead of "Filename") which produced no warning,
Alice spent a fair amount of time tracking down
this bug.  With my fumbling fingers I'll be tracking
down such a bug soon too.

Is there any options() setting, or debug technique
that will flag data frame column extractions that
reference a non-existent column?  It seems to me
that the "[.data.frame" extractor used to throw an
error if given a mis-spelled variable name, and I
still see lines of code in "[.data.frame" such as

if (any(is.na(cols))) 
            stop("undefined columns selected")



In R 2.5.1 a NULL is silently returned.
> foo <- data.frame(Filename = c("a", "b"))
> foo[, "FileName"]NULL

Has something changed so that the code lines
if (any(is.na(cols))) 
            stop("undefined columns selected")
in "[.data.frame" no longer work properly (if
I am understanding the intention properly)?

If not, could  "[.data.frame" check an
options() variable setting (say
warn.undefined.colnames) and throw a warning
if a non-existent column name is referenced?



> sessionInfo()R version 2.5.1 (2007-06-27) 
powerpc-apple-darwin8.9.1 

locale:
en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] "stats"     "graphics"  "grDevices"
"utils"     "datasets"  "methods"  
"base"

other attached packages:
     plotrix         lme4       Matrix      lattice 
     "2.2-3"  "0.99875-4" "0.999375-0"    
"0.16-2" > 


Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch on behalf of Johnstone, Alice
Sent: Wed 8/1/2007 7:20 PM
To: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame
 
 For interest sake, I have found out why I wasn't getting my expected
results when using read.AnnotatedDataFrame
Turns out the error was made in the ReadAffy command, where I specified
the filenames to be read from my AnnotatedDataFrame object.  There was a
typo error with a capital N ($FileName) rather than lowercase n
($Filename) as in my target file..whoops.  However this meant the
filename argument was ignored without the error message(!) and instead
of using the information in the AnnotatedDataFrame object (which
included filenames, but not alphabetically) it read the .cel files in
alphabetical order from the working directory - hence the wrong file was
given the wrong label (given by the order of Annotated object) and my
comparisons were confused without being obvious as to why or where.
Our solution: specify that filename is as.character so assignment of
file to target is correct(after correcting $Filename) now that using
read.AnnotatedDataFrame rather than readphenoData.

Data<-ReadAffy(filenames=as.character(pData(pd)$Filename),phenoData=pd)

Hurrah!

It may be beneficial to others, that if the filename argument isn't
specified, that filenames are read from the phenoData object if included
here.

Thanks!

-----Original Message-----
From: Martin Morgan [mailto:mtmorgan at fhcrc.org] 
Sent: Thursday, 26 July 2007 11:49 a.m.
To: Johnstone, Alice
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame

Hi Alice --

"Johnstone, Alice" <Alice.Johnstone at esr.cri.nz> writes:
> Using R2.5.0 and Bioconductor I have been following code to analysis 
> Affymetrix expression data: 2 treatments vs control.  The original 
> code was run last year and used the read.phenoData command, however 
> with the newer version I get the error message Warning messages:
> read.phenoData is deprecated, use read.AnnotatedDataFrame instead The 
> phenoData class is deprecated, use AnnotatedDataFrame (with
> ExpressionSet) instead
>  
> I use the read.AnnotatedDataFrame command, but when it comes to the 
> end of the analysis the comparison of the treatment to the controls 
> gets mixed up compared to what you get using the original 
> read.phenoData ie it looks like the 3 groups get labelled wrong and so
> the comparisons are different (but they can still be matched up).
> My questions are,
> 1) do you need to set up your target file differently when using 
> read.AnnotatedDataFrame - what is the standard format?
I can't quite tell where things are going wrong for you, so it would
help if you can narrow down where the problem occurs.  I think
read.AnnotatedDataFrame should be comparable to read.phenoData. Does
> pData(pd)
look right? What about
> pData(Data)
and
> pData(eset.rma)
? It's not important but pData(pd)$Target is the same as pd$Target.
Since the analysis is on eset.rma, it probably makes sense to use the
pData from there to construct your design matrix
> targs<-factor(eset.rma$Target)
> design<-model.matrix(~0+targs)
> colnames(design)<-levels(targs)
Does design look right?
> I have three columns sample, filename and target.
> 2) do you need to use a different model matrix to what I have?  
> 3) do you use a different command for making the contrasts?
Depends on the question! If you're performing the same analysis as last
year, then the model matrix and contrasts have to be the same!
> I have included my code below if that is of any assistance.
> Many Thanks!
> Alice
>  
>  
>  
> ##Read data
>
pd<-read.AnnotatedDataFrame("targets.txt",header=T,row.name="sample")
> Data<-ReadAffy(filenames=pData(pd)$FileName,phenoData=pd)
> ##normalisation
> eset.rma<-rma(Data)
> ##analysis
> targs<-factor(pData(pd)$Target)
> design<-model.matrix(~0+targs)
> colnames(design)<-levels(targs)
> fit<-lmFit(eset.rma,design)
>
cont.wt<-makeContrasts("treatment1-control","treatment2-control",level
> s> design)
> fit2<-contrasts.fit(fit,cont.wt)
> fit2.eb<-eBayes(fit2)
> testconts<-classifyTestsF(fit2.eb,p.value=0.01)
> topTable(fit2.eb,coef=2,n=300)
> topTable(fit2.eb,coef=1,n=300)
>  
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Prof Brian Ripley

2007-Aug-03 19:25 UTC

head link

[R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame)

You are reading the wrong part of the code for your argument list:
>  foo["FileName"]Error in `[.data.frame`(foo, "FileName") : undefined columns selected

[.data.frame is one of the most complex functions in R, and does many 
different things depending on which arguments are supplied.


On Fri, 3 Aug 2007, Steven McKinney wrote:
> Hi all,
>
> What are current methods people use in R to identify
> mis-spelled column names when selecting columns
> from a data frame?
>
> Alice Johnson recently tackled this issue
> (see [BioC] posting below).
>
> Due to a mis-spelled column name ("FileName"
> instead of "Filename") which produced no warning,
> Alice spent a fair amount of time tracking down
> this bug.  With my fumbling fingers I'll be tracking
> down such a bug soon too.
>
> Is there any options() setting, or debug technique
> that will flag data frame column extractions that
> reference a non-existent column?  It seems to me
> that the "[.data.frame" extractor used to throw an
> error if given a mis-spelled variable name, and I
> still see lines of code in "[.data.frame" such as
>
> if (any(is.na(cols)))
>            stop("undefined columns selected")
>
>
>
> In R 2.5.1 a NULL is silently returned.
>
>> foo <- data.frame(Filename = c("a", "b"))
>> foo[, "FileName"]
> NULL
>
> Has something changed so that the code lines
> if (any(is.na(cols)))
>            stop("undefined columns selected")
> in "[.data.frame" no longer work properly (if
> I am understanding the intention properly)?
>
> If not, could  "[.data.frame" check an
> options() variable setting (say
> warn.undefined.colnames) and throw a warning
> if a non-existent column name is referenced?
>
>
>
>
>> sessionInfo()
> R version 2.5.1 (2007-06-27)
> powerpc-apple-darwin8.9.1
>
> locale:
> en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
>
> attached base packages:
> [1] "stats"     "graphics"  "grDevices"
"utils"     "datasets"  "methods"  
"base"
>
> other attached packages:
>     plotrix         lme4       Matrix      lattice
>     "2.2-3"  "0.99875-4" "0.999375-0"    
"0.16-2"
>>
>
>
>
> Steven McKinney
>
> Statistician
> Molecular Oncology and Breast Cancer Program
> British Columbia Cancer Research Centre
>
> email: smckinney +at+ bccrc +dot+ ca
>
> tel: 604-675-8000 x7561
>
> BCCRC
> Molecular Oncology
> 675 West 10th Ave, Floor 4
> Vancouver B.C.
> V5Z 1L3
> Canada
>
>
>
>
> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch on behalf of Johnstone,
Alice
> Sent: Wed 8/1/2007 7:20 PM
> To: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame
>
> For interest sake, I have found out why I wasn't getting my expected
> results when using read.AnnotatedDataFrame
> Turns out the error was made in the ReadAffy command, where I specified
> the filenames to be read from my AnnotatedDataFrame object.  There was a
> typo error with a capital N ($FileName) rather than lowercase n
> ($Filename) as in my target file..whoops.  However this meant the
> filename argument was ignored without the error message(!) and instead
> of using the information in the AnnotatedDataFrame object (which
> included filenames, but not alphabetically) it read the .cel files in
> alphabetical order from the working directory - hence the wrong file was
> given the wrong label (given by the order of Annotated object) and my
> comparisons were confused without being obvious as to why or where.
> Our solution: specify that filename is as.character so assignment of
> file to target is correct(after correcting $Filename) now that using
> read.AnnotatedDataFrame rather than readphenoData.
>
> Data<-ReadAffy(filenames=as.character(pData(pd)$Filename),phenoData=pd)
>
> Hurrah!
>
> It may be beneficial to others, that if the filename argument isn't
> specified, that filenames are read from the phenoData object if included
> here.
>
> Thanks!
>
> -----Original Message-----
> From: Martin Morgan [mailto:mtmorgan at fhcrc.org]
> Sent: Thursday, 26 July 2007 11:49 a.m.
> To: Johnstone, Alice
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame
>
> Hi Alice --
>
> "Johnstone, Alice" <Alice.Johnstone at esr.cri.nz> writes:
>
>> Using R2.5.0 and Bioconductor I have been following code to analysis
>> Affymetrix expression data: 2 treatments vs control.  The original
>> code was run last year and used the read.phenoData command, however
>> with the newer version I get the error message Warning messages:
>> read.phenoData is deprecated, use read.AnnotatedDataFrame instead The
>> phenoData class is deprecated, use AnnotatedDataFrame (with
>> ExpressionSet) instead
>>
>> I use the read.AnnotatedDataFrame command, but when it comes to the
>> end of the analysis the comparison of the treatment to the controls
>> gets mixed up compared to what you get using the original
>> read.phenoData ie it looks like the 3 groups get labelled wrong and so
>
>> the comparisons are different (but they can still be matched up).
>> My questions are,
>> 1) do you need to set up your target file differently when using
>> read.AnnotatedDataFrame - what is the standard format?
>
> I can't quite tell where things are going wrong for you, so it would
> help if you can narrow down where the problem occurs.  I think
> read.AnnotatedDataFrame should be comparable to read.phenoData. Does
>
>> pData(pd)
>
> look right? What about
>
>> pData(Data)
>
> and
>
>> pData(eset.rma)
>
> ? It's not important but pData(pd)$Target is the same as pd$Target.
> Since the analysis is on eset.rma, it probably makes sense to use the
> pData from there to construct your design matrix
>
>> targs<-factor(eset.rma$Target)
>> design<-model.matrix(~0+targs)
>> colnames(design)<-levels(targs)
>
> Does design look right?
>
>> I have three columns sample, filename and target.
>> 2) do you need to use a different model matrix to what I have?
>> 3) do you use a different command for making the contrasts?
>
> Depends on the question! If you're performing the same analysis as last
> year, then the model matrix and contrasts have to be the same!
>
>> I have included my code below if that is of any assistance.
>> Many Thanks!
>> Alice
>>
>>
>>
>> ##Read data
>>
pd<-read.AnnotatedDataFrame("targets.txt",header=T,row.name="sample")
>> Data<-ReadAffy(filenames=pData(pd)$FileName,phenoData=pd)
>> ##normalisation
>> eset.rma<-rma(Data)
>> ##analysis
>> targs<-factor(pData(pd)$Target)
>> design<-model.matrix(~0+targs)
>> colnames(design)<-levels(targs)
>> fit<-lmFit(eset.rma,design)
>>
cont.wt<-makeContrasts("treatment1-control","treatment2-control",level
>> s>> design)
>> fit2<-contrasts.fit(fit,cont.wt)
>> fit2.eb<-eBayes(fit2)
>> testconts<-classifyTestsF(fit2.eb,p.value=0.01)
>> topTable(fit2.eb,coef=2,n=300)
>> topTable(fit2.eb,coef=1,n=300)
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
> Martin Morgan
> Bioconductor / Computational Biology
> http://bioconductor.org
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Steven McKinney

2007-Aug-03 21:50 UTC

head link

[R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame)

> What would break is that three methods for doing the same thing would
> give different answers.
> 
> Please do have the courtesy to actually read the detailed explanation you
> are given.
Sorry Prof. Ripley, I am attempting to read carefully, as this
issue has deeper coding/debugging implications, and as you
point out, 
  "[.data.frame is one of the most complex functions in R"
so please bear with me.  This change in behaviour has 
taken away a side-effect debugging tool, discussed below.

> 
> 
> On Fri, 3 Aug 2007, Steven McKinney wrote:
> 
> >
> >> -----Original Message-----
> >> From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]
> >> Sent: Fri 8/3/2007 1:05 PM
> >> To: Steven McKinney
> >> Cc: r-help at stat.math.ethz.ch
> >> Subject: Re: [R] FW: Selecting undefined column of a data frame
(was [BioC] read.phenoData vs read.AnnotatedDataFrame)
> >>
> >> I've since seen your followup a more detailed explanation may
help.
> >> The path through the code for your argument list does not go where
you
> >> quoted, and there is a reason for it.
> >
> >
> >> Generally when you extract in R and ask for an non-existent index
you get
> >> NA or NULL as the result (and no warning), e.g.
> >>
> >>> y <- list(x=1, y=2)
> >>> y[["z"]]
> >> NULL
> >>
> >> Because data frames 'must' have (column) names, they are a
partial
> >> exception and when the result is a data frame you get an error if
it would
> >> contain undefined columns.
> >>
> >> But in the case of foo[, "FileName"], the result is a
single column and so
> >> will not have a name: there seems no reason to be different from
> >>
> >>> foo[["FileName"]]
> >> NULL
> >>> foo$FileName
> >> NULL
> >>
> >> which similarly select a single column.  At one time they were
different
> >> in R, for no documented reason.

This difference provided a side-effect debugging tool, in that where

  > bar <- foo[, "FileName"]

used to throw an error, alerting as to a typo, it now does not.

Having been burned by NULL results due to typos in code lines using
the $ extractor such as
 
  > bar <- foo$FileName

I learned to use
  > bar <- foo[, "FileName"]
to help cut down on typo bugs.  With the ubiquity of
camelCase object names, this is a constant typing bug hazard.


I am wondering what to do now to double check spelling
when accessing columns of a dataframe.

If "[.data.frame" stays as is, can a debug mechanism
be implemented in R that forces strict adherence
to existing list names in debug mode?  This would also help debug
typos in camelCase names when using the $ and [[
extractors and accessors.

Are there other debugging tools already in R that
can help point out such camelCase list element
name typos?


> >>
> >>
> >> On Fri, 3 Aug 2007, Prof Brian Ripley wrote:
> >>
> >>> You are reading the wrong part of the code for your argument
list:
> >>>
> >>>>  foo["FileName"]
> >>> Error in `[.data.frame`(foo, "FileName") : undefined
columns selected
> >>>
> >>> [.data.frame is one of the most complex functions in R, and
does many
> >>> different things depending on which arguments are supplied.
> >>>
> >>>
> >>> On Fri, 3 Aug 2007, Steven McKinney wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> What are current methods people use in R to identify
> >>>> mis-spelled column names when selecting columns
> >>>> from a data frame?
> >>>>
> >>>> Alice Johnson recently tackled this issue
> >>>> (see [BioC] posting below).
> >>>>
> >>>> Due to a mis-spelled column name ("FileName"
> >>>> instead of "Filename") which produced no
warning,
> >>>> Alice spent a fair amount of time tracking down
> >>>> this bug.  With my fumbling fingers I'll be tracking
> >>>> down such a bug soon too.
> >>>>
> >>>> Is there any options() setting, or debug technique
> >>>> that will flag data frame column extractions that
> >>>> reference a non-existent column?  It seems to me
> >>>> that the "[.data.frame" extractor used to throw
an
> >>>> error if given a mis-spelled variable name, and I
> >>>> still see lines of code in "[.data.frame" such
as
> >>>>
> >>>> if (any(is.na(cols)))
> >>>>            stop("undefined columns selected")
> >>>>
> >>>>
> >>>>
> >>>> In R 2.5.1 a NULL is silently returned.
> >>>>
> >>>>> foo <- data.frame(Filename = c("a",
"b"))
> >>>>> foo[, "FileName"]
> >>>> NULL
> >>>>
> >>>> Has something changed so that the code lines
> >>>> if (any(is.na(cols)))
> >>>>            stop("undefined columns selected")
> >>>> in "[.data.frame" no longer work properly (if
> >>>> I am understanding the intention properly)?
> >>>>
> >>>> If not, could  "[.data.frame" check an
> >>>> options() variable setting (say
> >>>> warn.undefined.colnames) and throw a warning
> >>>> if a non-existent column name is referenced?
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> sessionInfo()
> >>>> R version 2.5.1 (2007-06-27)
> >>>> powerpc-apple-darwin8.9.1
> >>>>
> >>>> locale:
> >>>>
en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
> >>>>
> >>>> attached base packages:
> >>>> [1] "stats"     "graphics" 
"grDevices" "utils"     "datasets" 
"methods"
> >>>> "base"
> >>>>
> >>>> other attached packages:
> >>>>     plotrix         lme4       Matrix      lattice
> >>>>     "2.2-3"  "0.99875-4"
"0.999375-0"     "0.16-2"
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> Steven McKinney
> >>>>
> >>>> Statistician
> >>>> Molecular Oncology and Breast Cancer Program
> >>>> British Columbia Cancer Research Centre
> >>>>
> >>>> email: smckinney +at+ bccrc +dot+ ca
> >>>>
> >>>> tel: 604-675-8000 x7561
> >>>>
> >>>> BCCRC
> >>>> Molecular Oncology
> >>>> 675 West 10th Ave, Floor 4
> >>>> Vancouver B.C.
> >>>> V5Z 1L3
> >>>> Canada
> >>>>
> >>>>
> >>
> >>
> >> --
> >> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> >> Professor of Applied Statistics, 
http://www.stats.ox.ac.uk/~ripley/
> >> University of Oxford,             Tel:  +44 1865 272861 (self)
> >> 1 South Parks Road,                     +44 1865 272866 (PA)
> >> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> >>
> >>
> >>
> >
> >
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Bert Gunter

2007-Aug-03 22:19 UTC

head link

[R] FW: Selecting undefined column of a data frame (was [BioC]read.phenoData vs read.AnnotatedDataFrame)

I suspect you'll get some creative answers, but if all you're worried
about
is whether a column exists before you do something with it, what's wrong
with:

nm <- ... ## a character vector of names
if(!all(nm %in% names(yourdata))) ## complain
else ## do something

I think this is called defensive programming.

Bert Gunter
Genentech

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Steven McKinney
Sent: Friday, August 03, 2007 10:38 AM
To: r-help at stat.math.ethz.ch
Subject: [R] FW: Selecting undefined column of a data frame (was
[BioC]read.phenoData vs read.AnnotatedDataFrame)

Hi all,

What are current methods people use in R to identify
mis-spelled column names when selecting columns
from a data frame?

Alice Johnson recently tackled this issue
(see [BioC] posting below).

Due to a mis-spelled column name ("FileName"
instead of "Filename") which produced no warning,
Alice spent a fair amount of time tracking down
this bug.  With my fumbling fingers I'll be tracking
down such a bug soon too.

Is there any options() setting, or debug technique
that will flag data frame column extractions that
reference a non-existent column?  It seems to me
that the "[.data.frame" extractor used to throw an
error if given a mis-spelled variable name, and I
still see lines of code in "[.data.frame" such as

if (any(is.na(cols))) 
            stop("undefined columns selected")

In R 2.5.1 a NULL is silently returned.
> foo <- data.frame(Filename = c("a", "b"))
> foo[, "FileName"]NULL

Has something changed so that the code lines
if (any(is.na(cols))) 
            stop("undefined columns selected")
in "[.data.frame" no longer work properly (if
I am understanding the intention properly)?

If not, could  "[.data.frame" check an
options() variable setting (say
warn.undefined.colnames) and throw a warning
if a non-existent column name is referenced?

> sessionInfo()R version 2.5.1 (2007-06-27) 
powerpc-apple-darwin8.9.1 

locale:
en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] "stats"     "graphics"  "grDevices"
"utils"     "datasets"  "methods"
"base"     

other attached packages:
     plotrix         lme4       Matrix      lattice 
     "2.2-3"  "0.99875-4" "0.999375-0"    
"0.16-2" > 

Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada

-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch on behalf of Johnstone, Alice
Sent: Wed 8/1/2007 7:20 PM
To: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame

 For interest sake, I have found out why I wasn't getting my expected
results when using read.AnnotatedDataFrame
Turns out the error was made in the ReadAffy command, where I specified
the filenames to be read from my AnnotatedDataFrame object.  There was a
typo error with a capital N ($FileName) rather than lowercase n
($Filename) as in my target file..whoops.  However this meant the
filename argument was ignored without the error message(!) and instead
of using the information in the AnnotatedDataFrame object (which
included filenames, but not alphabetically) it read the .cel files in
alphabetical order from the working directory - hence the wrong file was
given the wrong label (given by the order of Annotated object) and my
comparisons were confused without being obvious as to why or where.
Our solution: specify that filename is as.character so assignment of
file to target is correct(after correcting $Filename) now that using
read.AnnotatedDataFrame rather than readphenoData.

Data<-ReadAffy(filenames=as.character(pData(pd)$Filename),phenoData=pd)

Hurrah!

It may be beneficial to others, that if the filename argument isn't
specified, that filenames are read from the phenoData object if included
here.

Thanks!

-----Original Message-----
From: Martin Morgan [mailto:mtmorgan at fhcrc.org] 
Sent: Thursday, 26 July 2007 11:49 a.m.
To: Johnstone, Alice
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame

Hi Alice --

"Johnstone, Alice" <Alice.Johnstone at esr.cri.nz> writes:
> Using R2.5.0 and Bioconductor I have been following code to analysis 
> Affymetrix expression data: 2 treatments vs control.  The original 
> code was run last year and used the read.phenoData command, however 
> with the newer version I get the error message Warning messages:
> read.phenoData is deprecated, use read.AnnotatedDataFrame instead The 
> phenoData class is deprecated, use AnnotatedDataFrame (with
> ExpressionSet) instead
>  
> I use the read.AnnotatedDataFrame command, but when it comes to the 
> end of the analysis the comparison of the treatment to the controls 
> gets mixed up compared to what you get using the original 
> read.phenoData ie it looks like the 3 groups get labelled wrong and so
> the comparisons are different (but they can still be matched up).
> My questions are,
> 1) do you need to set up your target file differently when using 
> read.AnnotatedDataFrame - what is the standard format?
I can't quite tell where things are going wrong for you, so it would
help if you can narrow down where the problem occurs.  I think
read.AnnotatedDataFrame should be comparable to read.phenoData. Does
> pData(pd)
look right? What about
> pData(Data)
and
> pData(eset.rma)
? It's not important but pData(pd)$Target is the same as pd$Target.
Since the analysis is on eset.rma, it probably makes sense to use the
pData from there to construct your design matrix
> targs<-factor(eset.rma$Target)
> design<-model.matrix(~0+targs)
> colnames(design)<-levels(targs)
Does design look right?
> I have three columns sample, filename and target.
> 2) do you need to use a different model matrix to what I have?  
> 3) do you use a different command for making the contrasts?
Depends on the question! If you're performing the same analysis as last
year, then the model matrix and contrasts have to be the same!
> I have included my code below if that is of any assistance.
> Many Thanks!
> Alice
>  
>  
>  
> ##Read data
>
pd<-read.AnnotatedDataFrame("targets.txt",header=T,row.name="sample")
> Data<-ReadAffy(filenames=pData(pd)$FileName,phenoData=pd)
> ##normalisation
> eset.rma<-rma(Data)
> ##analysis
> targs<-factor(pData(pd)$Target)
> design<-model.matrix(~0+targs)
> colnames(design)<-levels(targs)
> fit<-lmFit(eset.rma,design)
>
cont.wt<-makeContrasts("treatment1-control","treatment2-control",level
> s> design)
> fit2<-contrasts.fit(fit,cont.wt)
> fit2.eb<-eBayes(fit2)
> testconts<-classifyTestsF(fit2.eb,p.value=0.01)
> topTable(fit2.eb,coef=2,n=300)
> topTable(fit2.eb,coef=1,n=300)
>  
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Apparently Analagous Threads

Search for more seemingly similar threads

R help - Aug 2007 - FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame)

[R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame)

[R] FW: Selecting undefined column of a data frame (was [BioC]read.phenoData vs read.AnnotatedDataFrame)

[R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame)

[R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame)

[R] FW: Selecting undefined column of a data frame (was [BioC]read.phenoData vs read.AnnotatedDataFrame)

Apparently Analagous Threads