Hi,
I have a bit of a problem with as.numeric or as.double.
I read in an excel-file (either xlsx::read.xlsx2 or gdata::read.xls).
Select a subset and then try to make it numeric:
# read in the excel-file
alldata<-read.xlsx2("input.xls",1)
# select the subset
s<-subset(alldata,
select=c("cI","cII","cIII","cIV","cV"))
# unluckily we have "n/a" for missing values in the file - so we turn
it
into "proper" missing values
s[s == "n/a"]<-NA
n<-data.matrix(s);
The problem I have is that it does not convert the date the way I would
expect.
just as an example:
> s[1,2]
[1] 30.94346629
3136 Levels: 0.026307482 0.028239812 0.02849896 0.029054564 0.029540352
0.030248034 0.030841352 0.032966308 ... n/a
turned into:
> n[1,2]
[1] 3020
And I would like to have there 30.94346629 as well. I assume that has to
do with the "Levels" attribute - but not sure what to make of these in
the first place.
I also tried to convert each value on its own:
#make some space that holds the actual numeric data
n <- array(dim=c(length(s[,1]),length(s)))
# now turn everything into doubles
for (c in 1:length(s)) {
for (r in 1:length(s[,1])) {
n[r,c]<-as.double(s[r,c])
}
}
but that gave the same result - just a lot slower.
Thanks
Lutz
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
On 25/05/2011 9:20 a.m., Lutz Fischer wrote:> Hi, > > I have a bit of a problem with as.numeric or as.double. > > I read in an excel-file (either xlsx::read.xlsx2 or gdata::read.xls). > Select a subset and then try to make it numeric: > > > # read in the excel-file > alldata<-read.xlsx2("input.xls",1) > # select the subset > s<-subset(alldata, select=c("cI","cII","cIII","cIV","cV")) > # unluckily we have "n/a" for missing values in the file - so we turn it > into "proper" missing values > s[s == "n/a"]<-NA > > n<-data.matrix(s); > > > > > The problem I have is that it does not convert the date the way I would > expect. > > just as an example: > > s[1,2] > [1] 30.94346629 > 3136 Levels: 0.026307482 0.028239812 0.02849896 0.029054564 0.029540352 > 0.030248034 0.030841352 0.032966308 ... n/a > > turned into: > > n[1,2] > [1] 3020 > > And I would like to have there 30.94346629 as well. I assume that has to > do with the "Levels" attribute - but not sure what to make of these in > the first place. > > I also tried to convert each value on its own: > > #make some space that holds the actual numeric data > n <- array(dim=c(length(s[,1]),length(s))) > # now turn everything into doubles > for (c in 1:length(s)) { > for (r in 1:length(s[,1])) { > n[r,c]<-as.double(s[r,c]) > } > } > > but that gave the same result - just a lot slower. > > > > Thanks > Lutz > >Your problem is the conversion to factors when the data is read. Use options(stringsAsFactors = FALSE) before you read the data, then the mixed columns of numeric and missing will be read as character data and the conversion to numeric will go as you expect. (But I haven't tested this.) David Scott -- _________________________________________________________________ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142, NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email: d.scott at auckland.ac.nz, Fax: +64 9 373 7018
On 24 May 2011, at 10:20 PM, Lutz Fischer wrote:> n<-data.matrix(s); > > s[1,2] > [1] 30.94346629 # 3136 Levels: 0.026307482 > turned into: > > n[1,2] > [1] 3020Dear Lutz, ?3020? is the factor level associated with 30.94346629, in turn generated by importing with strings in the column Try passing "na.strings = ?n/a? ? into the read.xls function to avoid the problem If not, then as.numeric(as.character(datafulloffactors) will work as.numeric sees the factor levels: as.character forces as.numeric to see the ?labels? t
Reasonably Related Threads
- Error when writing to Excel files using the packages xlsx and xlsx2
- Read shortcuts of MS Excel files through R
- Read in alphanumeric column without decimals
- importar simultaneamente varios archivos excel
- 'force create mode' not working when 'unix extensions = yes'