peverlorenvanthemaat@amc.uva.nl
2006-Jan-20 10:47 UTC
[Rd] read.table with ":" in column names (PR#8511)
Full_Name: emiel ver loren Version: 2.2.0 OS: Windows XP Submission from: (NULL) (145.117.31.248) Dear R-community and developers, I have been trying to read in a tab delimeted file where the column names and the row names are of the form "GO:0000051" (gene ontology IDs). When using:> gomat<-read.table("test.txt") > colnames(gomat)[1][1] "GO.0000051"> rownames(gomat)[1][1] "GO:0000002" Which means that ":" is transformed into a "." !! This seems like Excel when it is trying to guess what I am really ment (and turning 1/1/1 into 1-1-2001). Furthermore, I found the following quite strange as well:> gomat2<-read.delim2("test.txt",header=FALSE) > gomat2[1,1:2]V1 V2 1 GO:0000051 GO:0000280> as.character(gomat2[1,1:2])[1] "8" "2"> as.character(gomat2[1,1])[1] "GO:0000051" I have found a way to work around it, but I am wandering what's happening.... The tab-delimited file look like: GO:0000051 GO:0000280 GO:0000740 GO:0000002 0 0 0 GO:0000004 0 0 0 GO:0000012 0 0 0 GO:0000014 0 0 0 GO:0000015 0 0 0 GO:0000018 0 0 0 GO:0000019 0 0 0 Thanks for helping, and Emiel
ripley@stats.ox.ac.uk
2006-Jan-20 11:14 UTC
[Rd] read.table with ":" in column names (PR#8511)
Please do not report documented behaviour as a bug! See the 'check.names' argument to read.table. In your second example you are applying as.character to a data frame, and you seem not to realize that. We specifically ask you NOT to use R-bugs to ask questions. (What is happening is that you got the internal codes of the factor columns, which is not what you intended. If you want character columns, read them as such.) On Fri, 20 Jan 2006 peverlorenvanthemaat at amc.uva.nl wrote:> Full_Name: emiel ver loren > Version: 2.2.0We do ask you not to send reports on obselete versions of R.> OS: Windows XP > Submission from: (NULL) (145.117.31.248) > > > Dear R-community and developers, > > I have been trying to read in a tab delimeted file where the column names and > the row names are of the form "GO:0000051" (gene ontology IDs). When using: > >> gomat<-read.table("test.txt") >> colnames(gomat)[1] > [1] "GO.0000051" >> rownames(gomat)[1] > [1] "GO:0000002" > > Which means that ":" is transformed into a "." !! This seems like Excel when it > is trying to guess what I am really ment (and turning 1/1/1 into 1-1-2001). > > Furthermore, I found the following quite strange as well: > >> gomat2<-read.delim2("test.txt",header=FALSE) >> gomat2[1,1:2] > V1 V2 > 1 GO:0000051 GO:0000280 >> as.character(gomat2[1,1:2]) > [1] "8" "2" >> as.character(gomat2[1,1]) > [1] "GO:0000051" > > I have found a way to work around it, but I am wandering what's happening.... > > The tab-delimited file look like: > > GO:0000051 GO:0000280 GO:0000740 > GO:0000002 0 0 0 > GO:0000004 0 0 0 > GO:0000012 0 0 0 > GO:0000014 0 0 0 > GO:0000015 0 0 0 > GO:0000018 0 0 0 > GO:0000019 0 0 0 > > Thanks for helping, and > > Emiel > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
peverlorenvanthemaat at amc.uva.nl writes:> Full_Name: emiel ver loren > Version: 2.2.0 > OS: Windows XP > Submission from: (NULL) (145.117.31.248) > > > Dear R-community and developers, > > I have been trying to read in a tab delimeted file where the column names and > the row names are of the form "GO:0000051" (gene ontology IDs). When using: > > > gomat<-read.table("test.txt") > > colnames(gomat)[1] > [1] "GO.0000051" > > rownames(gomat)[1] > [1] "GO:0000002" > > Which means that ":" is transformed into a "." !! This seems like Excel when it > is trying to guess what I am really ment (and turning 1/1/1 into 1-1-2001).This is what check.names=FALSE is for... (and NOT a bug, please don't abuse the bug repository, use the mailing lists)> Furthermore, I found the following quite strange as well: > > > gomat2<-read.delim2("test.txt",header=FALSE) > > gomat2[1,1:2] > V1 V2 > 1 GO:0000051 GO:0000280 > > as.character(gomat2[1,1:2]) > [1] "8" "2" > > as.character(gomat2[1,1]) > [1] "GO:0000051" > > I have found a way to work around it, but I am wandering what's happening....Yes, this is a bit nasty, but... What is happening is similar to this:> d <- data.frame(a=factor(LETTERS), b=factor(letters)) > d[1,]a b 1 A a> as.character(d[1,])[1] "1" "1"> as.character(d[1,1])[1] "A"> as.character(d[1,1,drop=F])[1] "1" or this:> l <- list(a=factor("x"),b=factor("y")) > l$a [1] x Levels: x $b [1] y Levels: y> as.character(l)[1] "1" "1" The thing is that as.character on a list will first coerce factors to numeric, then numeric to character. I'm not sure whether there could be a rationale for it, but it isn't S-PLUS compatible (not 6.2.1 anyway, which is the most recent one that I have access to). -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
On Fri, 20 Jan 2006 peverlorenvanthemaat at amc.uva.nl wrote:> Full_Name: emiel ver loren > Version: 2.2.0 > OS: Windows XP > Submission from: (NULL) (145.117.31.248) > > > Dear R-community and developers, > > I have been trying to read in a tab delimeted file where the column names and > the row names are of the form "GO:0000051" (gene ontology IDs). When using: > > > gomat<-read.table("test.txt") > > colnames(gomat)[1] > [1] "GO.0000051" > > rownames(gomat)[1] > [1] "GO:0000002" > > Which means that ":" is transformed into a "." !! This seems like Excel when it > is trying to guess what I am really ment (and turning 1/1/1 into 1-1-2001).Wrong. ?read.table says with reference to the check.names = TRUE argument that: "check.names: logical. If 'TRUE' then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names. If necessary they are adjusted (by 'make.names') so that they are, and also to ensure that there are no duplicates."> make.names("GO:0000051")[1] "GO.0000051" You can use "GO:0000051" as a column name if quoted, otherwise ":" is an operator, so the default value of the check.names argument is sound. If you "ment" to do what you say, you should have set check.names=FALSE.> > Furthermore, I found the following quite strange as well: > > > gomat2<-read.delim2("test.txt",header=FALSE) > > gomat2[1,1:2] > V1 V2 > 1 GO:0000051 GO:0000280 > > as.character(gomat2[1,1:2]) > [1] "8" "2" > > as.character(gomat2[1,1]) > [1] "GO:0000051" > > I have found a way to work around it, but I am wandering what's happening.... > > The tab-delimited file look like: > > GO:0000051 GO:0000280 GO:0000740 > GO:0000002 0 0 0 > GO:0000004 0 0 0 > GO:0000012 0 0 0 > GO:0000014 0 0 0 > GO:0000015 0 0 0 > GO:0000018 0 0 0 > GO:0000019 0 0 0 > > Thanks for helping, and > > Emiel > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
Joerg van den Hoff
2006-Jan-20 14:11 UTC
[Rd] read.table with ":" in column names (PR#8511)
in my view it's not always good to get this answer, but your "problem" is not too deeply hidden in the manpages, so simply read the documentation of read.table: ?read.table (and look out for the "check.names" flag) regards, joerg