Dear all, I am having troubles importing values written as scientific notation using read.table(). I'm sure this is a frequent problem, as many people in my lab have this problem as well, so I'm sure that I just have troubles googling for the right solution. The problem is, that, given a file like that: a 1 2e-4 b 2 3e-8 ... the third column gets imported as a factor, or a string if I set the as.is parameter of read.table to TRUE for this column. However, I just want a simple numeric vector :-) I'm sure there is a simple trick for this. If you can point me to the right function, or manual, I think I should be able to find out the details myself. Thanks in advance, January -- ------------ January Weiner 3 ---------------------+--------------- Division of Bioinformatics, University of Muenster | Schlo?platz 4 (+49)(251)8321634 | D48149 M?nster http://www.uni-muenster.de/Biologie.Botanik/ebb/ | Germany
Your example does not exhibit that behavior when I try it (below). Can you provide a reproducible example following the style shown here:> Lines <- "a 1 2e-4+ b 2 3e-8"> > DF <- read.table(textConnection(Lines)) > str(DF)'data.frame': 2 obs. of 3 variables: $ V1: Factor w/ 2 levels "a","b": 1 2 $ V2: int 1 2 $ V3: num 2e-04 3e-08> R.version.string # Windows XP[1] "R version 2.4.0 (2006-10-03)" On 10/10/06, January Weiner <january at uni-muenster.de> wrote:> Dear all, > > I am having troubles importing values written as scientific notation > using read.table(). I'm sure this is a frequent problem, as many > people in my lab have this problem as well, so I'm sure that I just > have troubles googling for the right solution. > > The problem is, that, given a file like that: > > a 1 2e-4 > b 2 3e-8 > ... > > the third column gets imported as a factor, or a string if I set the > as.is parameter of read.table to TRUE for this column. However, I just > want a simple numeric vector :-) I'm sure there is a simple trick for > this. If you can point me to the right function, or manual, I think I > should be able to find out the details myself. > > Thanks in advance, > January > > -- > ------------ January Weiner 3 ---------------------+--------------- > Division of Bioinformatics, University of Muenster | Schlo?platz 4 > (+49)(251)8321634 | D48149 M?nster > http://www.uni-muenster.de/Biologie.Botanik/ebb/ | Germany > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On FC5 Linux: gannet% cat > foo.dat a 1 2e-4 b 2 3e-8 gannet% R ...> read.table("foo.dat")V1 V2 V3 1 a 1 2e-04 2 b 2 3e-08> sapply(read.table("foo.dat"), class)V1 V2 V3 "factor" "integer" "numeric" so please tell us your environment and give a reproducible example. (This is using the OS function strtod, so it might be a deficiency in your OS's implementation of ISO C.) On Tue, 10 Oct 2006, January Weiner wrote:> Dear all, > > I am having troubles importing values written as scientific notation > using read.table(). I'm sure this is a frequent problem, as many > people in my lab have this problem as well, so I'm sure that I just > have troubles googling for the right solution. > > The problem is, that, given a file like that: > > a 1 2e-4 > b 2 3e-8 > ... > > the third column gets imported as a factor, or a string if I set the > as.is parameter of read.table to TRUE for this column. However, I just > want a simple numeric vector :-) I'm sure there is a simple trick for > this. If you can point me to the right function, or manual, I think I > should be able to find out the details myself. > > Thanks in advance, > January > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Oh, thanks, that was hint enough :-) I see it now. I turns that R does not understand e-10 ...which stands for 1e-10 and is produced by some of the bioinformatic applications that I use (notably BLAST). However, R instead of being verbose on that just assumes that the whole column is a string. Is there a way to enforce a specific conversion in R (for example, to be able to see where the errors are?). January -- ------------ January Weiner 3 ---------------------+--------------- Division of Bioinformatics, University of Muenster | Schlo?platz 4 (+49)(251)8321634 | D48149 M?nster http://www.uni-muenster.de/Biologie.Botanik/ebb/ | Germany
A cheeky solution by subverting the coerce mechanism and read.table: # install a coerce function which can fix the "e+10" syntax for an imaginary class myDouble: > setAs("character", "myDouble", function(from)as.double(sub('^(-?) e','\\11e',from))) Warning message: in the method signature for function 'coerce' no definition for class: ?myDouble? in: matchSignature(signature, fdef, where) # load some data: > Lines <- scan(sep="\n", what="") a 1 3e-8 b 2 1e+10 c 3 e-10 d 4 e+3 e 5 e+1 # process it without using the imaginary class - use a real double instead to see what happens: # Note I've used textConnection(Lines) here, where your filename would go > T <- read.table(textConnection(Lines), colClasses=list ("character", "integer", "double")) Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'a real', got 'e-10' # process it, specifying the imaginary class myDouble. > T <- read.table(textConnection(Lines), colClasses=list ("character", "integer", "myDouble")) > T V1 V2 V3 1 a 1 3e-08 2 b 2 1e+10 3 c 3 1e-10 4 d 4 1e+03 5 e 5 1e+01 > lapply(T, class) $V1 [1] "character" $V2 [1] "integer" $V3 [1] "numeric" Someone's bound to shoot me down for hackery here :-) -Alex On 10 Oct 2006, at 11:43, January Weiner wrote:> Dear all, > > I am having troubles importing values written as scientific notation > using read.table(). I'm sure this is a frequent problem, as many > people in my lab have this problem as well, so I'm sure that I just > have troubles googling for the right solution. > > The problem is, that, given a file like that: > > a 1 2e-4 > b 2 3e-8 > ... > > the third column gets imported as a factor, or a string if I set the > as.is parameter of read.table to TRUE for this column. However, I just > want a simple numeric vector :-) I'm sure there is a simple trick for > this. If you can point me to the right function, or manual, I think I > should be able to find out the details myself. > > Thanks in advance, > January > > -- > ------------ January Weiner 3 ---------------------+--------------- > Division of Bioinformatics, University of Muenster | Schlo?platz 4 > (+49)(251)8321634 | D48149 M?nster > http://www.uni-muenster.de/Biologie.Botanik/ebb/ | Germany > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
>>>>> "January" == January Weiner <january at uni-muenster.de> writes:> Dear all, I am having troubles importing values written as > scientific notation using read.table(). I'm sure this is a > frequent problem, as many people in my lab have this > problem as well, so I'm sure that I just have troubles > googling for the right solution. > The problem is, that, given a file like that: > a 1 2e-4 > b 2 3e-8 > ... Note: this is advocacy for education in clear quantitative language and is a border-line off topic rant... The other day I read a paper from a student who used notation like 2e-4 in the text - blech! I sent it back for revisions. Lately I have noticed here and in other places this tendency to use floating point notation (also referred to as exponential notation) where scientific notation is appropriate, and vice versa. The notation 2e-4 is a convenient way to express floating point numbers with a simple text string, but it is certainly not scientific notation. No wonder you had trouble googling it! Mike