note: this e-mail is supposed to precede my coerce hack one.
As an example of the other posters mentioning colClasses, with some
debugging notes:
# create a pretend file for this example
> Lines <- scan(sep="\n", what="")
a 1 3e-8
b 2 1e+10
c 3 e-10
d 4 e+3
> file <- textConnection(Lines)
# import as you would a file, and specify the column types.
> T <- read.table(file, colClasses=list("character",
"integer",
"double"))
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
na.strings, :
scan() expected 'a real', got 'e-10'
# decide that's not very helpful. let's just import everything as
character:
# restarting the file.
>file <- textConnection(Lines)
>T <- read.table(file, colClasses="character")
>lapply(T, mode)
$V1
[1] "character"
$V2
[1] "character"
$V3
[1] "character"
> T
V1 V2 V3
1 a 1 3e-8
2 b 2 1e+10
3 c 3 e-10
4 d 4 e+3
# try the conversion to double:
> (D<-as.double(T$V3))
[1] 3e-08 1e+10 NA NA
Warning message:
NAs introduced by coercion
# let's see which are bad:
> T[is.na(D),]
V1 V2 V3
3 c 3 e-10
4 d 4 e+3
-Alex
On 10 Oct 2006, at 12:17, January Weiner wrote:
> Oh, thanks, that was hint enough :-) I see it now. I turns that R does
> not understand
>
> e-10
>
> ...which stands for 1e-10 and is produced by some of the bioinformatic
> applications that I use (notably BLAST). However, R instead of being
> verbose on that just assumes that the whole column is a string.
>
> Is there a way to enforce a specific conversion in R (for example, to
> be able to see where the errors are?).
>
> January
>
> --
> ------------ January Weiner 3 ---------------------+---------------
> Division of Bioinformatics, University of Muenster | Schlo?platz 4
> (+49)(251)8321634 | D48149 M?nster
> http://www.uni-muenster.de/Biologie.Botanik/ebb/ | Germany
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.