Hi, I have a bit of a problem with as.numeric or as.double. I read in an excel-file (either xlsx::read.xlsx2 or gdata::read.xls). Select a subset and then try to make it numeric: # read in the excel-file alldata<-read.xlsx2("input.xls",1) # select the subset s<-subset(alldata, select=c("cI","cII","cIII","cIV","cV")) # unluckily we have "n/a" for missing values in the file - so we turn it into "proper" missing values s[s == "n/a"]<-NA n<-data.matrix(s); The problem I have is that it does not convert the date the way I would expect. just as an example: > s[1,2] [1] 30.94346629 3136 Levels: 0.026307482 0.028239812 0.02849896 0.029054564 0.029540352 0.030248034 0.030841352 0.032966308 ... n/a turned into: > n[1,2] [1] 3020 And I would like to have there 30.94346629 as well. I assume that has to do with the "Levels" attribute - but not sure what to make of these in the first place. I also tried to convert each value on its own: #make some space that holds the actual numeric data n <- array(dim=c(length(s[,1]),length(s))) # now turn everything into doubles for (c in 1:length(s)) { for (r in 1:length(s[,1])) { n[r,c]<-as.double(s[r,c]) } } but that gave the same result - just a lot slower. Thanks Lutz -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
On 25/05/2011 9:20 a.m., Lutz Fischer wrote:> Hi, > > I have a bit of a problem with as.numeric or as.double. > > I read in an excel-file (either xlsx::read.xlsx2 or gdata::read.xls). > Select a subset and then try to make it numeric: > > > # read in the excel-file > alldata<-read.xlsx2("input.xls",1) > # select the subset > s<-subset(alldata, select=c("cI","cII","cIII","cIV","cV")) > # unluckily we have "n/a" for missing values in the file - so we turn it > into "proper" missing values > s[s == "n/a"]<-NA > > n<-data.matrix(s); > > > > > The problem I have is that it does not convert the date the way I would > expect. > > just as an example: > > s[1,2] > [1] 30.94346629 > 3136 Levels: 0.026307482 0.028239812 0.02849896 0.029054564 0.029540352 > 0.030248034 0.030841352 0.032966308 ... n/a > > turned into: > > n[1,2] > [1] 3020 > > And I would like to have there 30.94346629 as well. I assume that has to > do with the "Levels" attribute - but not sure what to make of these in > the first place. > > I also tried to convert each value on its own: > > #make some space that holds the actual numeric data > n <- array(dim=c(length(s[,1]),length(s))) > # now turn everything into doubles > for (c in 1:length(s)) { > for (r in 1:length(s[,1])) { > n[r,c]<-as.double(s[r,c]) > } > } > > but that gave the same result - just a lot slower. > > > > Thanks > Lutz > >Your problem is the conversion to factors when the data is read. Use options(stringsAsFactors = FALSE) before you read the data, then the mixed columns of numeric and missing will be read as character data and the conversion to numeric will go as you expect. (But I haven't tested this.) David Scott -- _________________________________________________________________ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142, NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email: d.scott at auckland.ac.nz, Fax: +64 9 373 7018
On 24 May 2011, at 10:20 PM, Lutz Fischer wrote:> n<-data.matrix(s); > > s[1,2] > [1] 30.94346629 # 3136 Levels: 0.026307482 > turned into: > > n[1,2] > [1] 3020Dear Lutz, ?3020? is the factor level associated with 30.94346629, in turn generated by importing with strings in the column Try passing "na.strings = ?n/a? ? into the read.xls function to avoid the problem If not, then as.numeric(as.character(datafulloffactors) will work as.numeric sees the factor levels: as.character forces as.numeric to see the ?labels? t
Reasonably Related Threads
- Error when writing to Excel files using the packages xlsx and xlsx2
- Read shortcuts of MS Excel files through R
- Read in alphanumeric column without decimals
- importar simultaneamente varios archivos excel
- 'force create mode' not working when 'unix extensions = yes'