Sebastian Lerch
2011-Jun-03 14:24 UTC
[R] Problem using read.xls - Everything converted to factors
Hallo, I would like to use to read.xls function from the gdata package to read data from Microsoft Excel files but I experienced a problem: For example I used the following code: testfile<-read.xls("/home/.../wsjecon0603.xls", #file path header=F, dec=",", na.strings="n.a.", skip=5, sheet=2, col.names=c("Name", "Firm","GDP1","GDP2","GDP3","GDP4","CPI5", "CPI11","UNEMP5","UNEMP11","PROF03","PROF04","STARTS03","STARTS04"), nrows=54, #colClasses=c(character,character,numeric,numeric,numeric,numeric,numeric,numeric,numeric,numeric,numeric,numeric,numeric,numeric) ) print(testfile) Although the xls file contains numeric values in all the columns except the ones which I named "Name" and "Firm", everything in the data frame has "factor" as class. I tried to use the colClasses option as above and as well with " "'s around each word, but this does not work and I will always receive the following error: Fehler in is(object, Class) : versuche einen Slot "className" von einem Objekt der einfachen Klasse ("list") ohne Slots anzufordern Calls: read.xls -> read.csv -> read.table -> <Anonymous> -> is After some hours of reasearch I figured out how I can manually change the classes of the columns: testfile$GDP2<-as.numeric(levels(testfile$GDP2))[testfile$GDP2] testfile$Name<-as.character(levels(testfile$Name))[testfile$Name] #and so on This works, but is a lot of work since I have to import many different data sets. So I was wondering if there is another way to let the classes be recognized correctly. Additionally I would like to know if there is any way to import data from different sheets with the same layout at once into one data frame. I use Ubuntu 11.04 with Rkward if this is of any importance. Thanks in advance for your answers, Sebastian
Gabor Grothendieck
2011-Jun-03 15:13 UTC
[R] Problem using read.xls - Everything converted to factors
On Fri, Jun 3, 2011 at 10:24 AM, Sebastian Lerch <lerch at lavabit.com> wrote:> Hallo, > > I would like to use to read.xls function from the gdata package to read data > from Microsoft Excel files but I experienced a problem: For example I used > the following code: > > testfile<-read.xls("/home/.../wsjecon0603.xls", #file path > ? ? ? ? ? header=F, > ? ? ? ? ? dec=",", > ? ? ? ? ? na.strings="n.a.", > ? ? ? ? ? skip=5, > ? ? ? ? ? sheet=2, > ? ? ? ? ? col.names=c("Name", "Firm","GDP1","GDP2","GDP3","GDP4","CPI5", > > ?"CPI11","UNEMP5","UNEMP11","PROF03","PROF04","STARTS03","STARTS04"), > ? ? ? ? ? nrows=54, > > #colClasses=c(character,character,numeric,numeric,numeric,numeric,numeric,numeric,numeric,numeric,numeric,numeric,numeric,numeric) > > ) > print(testfile) > > Although the xls file contains numeric values in all the columns except the > ones which I named "Name" and "Firm", everything in the data frame has > "factor" as class. I tried to use the colClasses option as above and as well > with " "'s around each word, but this does not work and I will always > receive the following error: > > Fehler in is(object, Class) : > ?versuche einen Slot "className" von einem Objekt der einfachen Klasse > ("list") ohne Slots anzufordern > Calls: read.xls -> read.csv -> read.table -> <Anonymous> -> is > > After some hours of reasearch I figured out how I can manually change the > classes of the columns: > > testfile$GDP2<-as.numeric(levels(testfile$GDP2))[testfile$GDP2] > testfile$Name<-as.character(levels(testfile$Name))[testfile$Name] #and so on > > This works, but is a lot of work since I have to import many different data > sets. So I was wondering if there is another way to let the classes be > recognized correctly. > > Additionally I would like to know if there is any way to import data from > different sheets with the same layout at once into one data frame. > > I use Ubuntu 11.04 with Rkward if this is of any importance. >Assuming you are the gdata package then read.xls has a ... argument which it passes to read.table so see ?read.table . In particular, as.is = TRUE prevents conversion to factors and any column which has even one non-numeric will not be regarded as numeric. You can rbind the results from different sheets if they have same layout. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Petr PIKAL
2011-Jun-03 15:34 UTC
[R] Odp: Problem using read.xls - Everything converted to factors
Hi> > [R] Problem using read.xls - Everything converted to factors > > Hallo, > > I would like to use to read.xls function from the gdata package to read > data from Microsoft Excel files but I experienced a problem: For example> I used the following code: > > testfile<-read.xls("/home/.../wsjecon0603.xls", #file path > header=F, > dec=",", > na.strings="n.a.", > skip=5, > sheet=2, > col.names=c("Name","Firm","GDP1","GDP2","GDP3","GDP4","CPI5",> > "CPI11","UNEMP5","UNEMP11","PROF03","PROF04","STARTS03","STARTS04"), > nrows=54, > > #colClasses=c >(character,character,numeric,numeric,numeric,numeric,numeric,numeric,numeric,numeric,numeric,numeric,numeric,numeric)> > ) > print(testfile) > > Although the xls file contains numeric values in all the columns except > the ones which I named "Name" and "Firm", everything in the data frame > has "factor" as class. I tried to use the colClasses option as above and> as well with " "'s around each word, but this does not work and I willHm. That shall work. You have got some advice from Gabor but in case numeric columns come as non numeric I often find a problem with some kind of formating the original values. Numbers like 10 253,52 are treated as nonnumeric as there is extra space character between thousands and hundereds. Maybe also na.strings are not always marked as n.a. but sometimes the value is missing and I suppose this can lead to conversion of all column to character vector.> always receive the following error: > > Fehler in is(object, Class) : > versuche einen Slot "className" von einem Objekt der einfachen Klasse> ("list") ohne Slots anzufordern > Calls: read.xls -> read.csv -> read.table -> <Anonymous> -> is > > After some hours of reasearch I figured out how I can manually change > the classes of the columns: > > testfile$GDP2<-as.numeric(levels(testfile$GDP2))[testfile$GDP2] > testfile$Name<-as.character(levels(testfile$Name))[testfile$Name] #andso on you can spare some time to use sapply testfile[,character columns] <- sapply(testfile[,character columns], as.numeric) shall convert all character columns to numeric at once but you will get NAs to all values which could not be converted for any reason. Regards Petr> > This works, but is a lot of work since I have to import many different > data sets. So I was wondering if there is another way to let the classes> be recognized correctly. > > Additionally I would like to know if there is any way to import data > from different sheets with the same layout at once into one data frame. > > I use Ubuntu 11.04 with Rkward if this is of any importance. > > Thanks in advance for your answers, > Sebastian > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Hi All, Before writing a simple formula intereter in R myself, i wanted to find out if any one has done it before. I could not locate it myself. I am looking for something like this - take a dataframe and a simple formula string as input and gives an output with formula applied on different columns of dataframe ( even better if it is time series compatibe) A crude and bad example is as below http://www.codeproject.com/KB/recipes/formulainterpreter.aspx I will appreciate if anyone can provide any pointers/direction Thanks
On Fri, Jun 3, 2011 at 11:48 AM, amit jain <buddyhi at indiatimes.com> wrote:> > Hi All, > Before writing a simple formula intereter in R myself, i wanted to find out if any one has done it before. I could not locate it myself. I am looking for something like this - take a dataframe and a simple formula string as input and gives an output with formula applied on different columns of dataframe ( even better if it is time series compatibe) > > A crude and bad example is as below > http://www.codeproject.com/KB/recipes/formulainterpreter.aspx > > I will appreciate if anyone can provide any pointers/direction >Try this: fo <- "x+y/2" eval(parse(text = fo), list(x = 1, y = 2)) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com