Unfortunately, some data I need to work with was delivered in a rather messy Excel file. I want to import into R and clean up some things so that I can do my analysis. Pulling in a CSV from Excel is the easy part. My current challenge is dealing with some text mixed in the values. i.e. 118 5.7 <2.0 3.7 Since this column in Excel has a "<2.0" value, then R reads the column as a factor with levels. Ideally, I want to convert it a normal vector of scalars and code code the "<2.0" as 0. Can anyone suggest an easy way to do this? Thanks! -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building Los Angeles, CA 90095 [[alternative HTML version deleted]]
-----Original Message----- From: Noah Silverman Sent: Tuesday, February 28, 2012 3:27 PM To: r-help Subject: [R] Cleaning up messy Excel data Unfortunately, some data I need to work with was delivered in a rather messy Excel file. I want to import into R and clean up some things so that I can do my analysis. Pulling in a CSV from Excel is the easy part. My current challenge is dealing with some text mixed in the values. i.e. 118 5.7 <2.0 3.7 Since this column in Excel has a "<2.0" value, then R reads the column as a factor with levels. Ideally, I want to convert it a normal vector of scalars and code code the "<2.0" as 0. Can anyone suggest an easy way to do this? -------------------------------------- ?as.character will show you how to change the "factor" column into a character column. Then, you can replace text using any of a number of procedures. see for example ?gsub finally, you can use as.numeric if you want numbers. "Coding" is best done in the context of factors, so you might want to consider where replacing <2 with NA is more appropriate than replacing with 0. In this end, the choice might be context sensitive. Rob -------------------------------- ------------------------------------------ Robert W. Baer, Ph.D. Professor of Physiology Kirksville College of Osteopathic Medicine A. T. Still University of Health Sciences 800 W. Jefferson St. Kirksville, MO 63501 660-626-2322 FAX 660-626-2965
First of all when reading in the CSV file, use 'as.is = TRUE' to prevent the changing to factors. Now that things are character in that column, you can use some pattern expressions (gsub, regex, ...) to search for and change your data. E.g., sub("<.*", "0", yourCol) should do it for you. On Tue, Feb 28, 2012 at 4:27 PM, Noah Silverman <noahsilverman at ucla.edu> wrote:> Unfortunately, some data I need to work with was delivered in a rather messy Excel file. ?I want to import into R and clean up some things so that I can do my analysis. ?Pulling in a CSV from Excel is the easy part. > > My current challenge is dealing with some text mixed in the values. > i.e. ? 118 ? 5.7 ? <2.0 ?3.7 > > Since this column in Excel has a "<2.0" value, then R reads the column as a factor with levels. ?Ideally, I want to convert it a normal vector of scalars and code code the "<2.0" as 0. > > Can anyone suggest an easy way to do this? > > Thanks! > > > -- > Noah Silverman > UCLA Department of Statistics > 8117 Math Sciences Building > Los Angeles, CA 90095 > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.
Just replace that value with zero. If you provide some reproducible code I could probably give you a solution. ?dput good luck, Stephen On 02/28/2012 03:27 PM, Noah Silverman wrote:> Unfortunately, some data I need to work with was delivered in a rather messy Excel file. I want to import into R and clean up some things so that I can do my analysis. Pulling in a CSV from Excel is the easy part. > > My current challenge is dealing with some text mixed in the values. > i.e. 118 5.7<2.0 3.7 > > Since this column in Excel has a "<2.0" value, then R reads the column as a factor with levels. Ideally, I want to convert it a normal vector of scalars and code code the "<2.0" as 0. > > Can anyone suggest an easy way to do this? > > Thanks! > > > -- > Noah Silverman > UCLA Department of Statistics > 8117 Math Sciences Building > Los Angeles, CA 90095 > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Stephen Sefick ************************************************** Auburn University Biological Sciences 331 Funchess Hall Auburn, Alabama 36849 ************************************************** sas0025 at auburn.edu http://www.auburn.edu/~sas0025 ************************************************** Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis "A big computer, a complex algorithm and a long time does not equal science." -Robert Gentleman
(mydata <- as.factor(c("1","2","3", ">2", "5", ">2"))) str(mydata) newdata <- as.character(mydata) newdata[newdata==">2"] <- 0 newdata <- as.numeric(newdata) str(newdata) We really need to keep Excel (and other spreadsheets) out of peoples hands. John Kane Kingston ON Canada> -----Original Message----- > From: noahsilverman at ucla.edu > Sent: Tue, 28 Feb 2012 13:27:13 -0800 > To: r-help at r-project.org > Subject: [R] Cleaning up messy Excel data > > Unfortunately, some data I need to work with was delivered in a rather > messy Excel file. I want to import into R and clean up some things so > that I can do my analysis. Pulling in a CSV from Excel is the easy part. > > My current challenge is dealing with some text mixed in the values. > i.e. 118 5.7 <2.0 3.7 > > Since this column in Excel has a "<2.0" value, then R reads the column as > a factor with levels. Ideally, I want to convert it a normal vector of > scalars and code code the "<2.0" as 0. > > Can anyone suggest an easy way to do this? > > Thanks! > > > -- > Noah Silverman > UCLA Department of Statistics > 8117 Math Sciences Building > Los Angeles, CA 90095 > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.____________________________________________________________ FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family! Visit http://www.inbox.com/photosharing to find out more!
When I was still teaching undergraduate intro biz-stat (among that community it is always abbreviated), we needed to control the spreadsheet behaviour of TAs who entered marks into a spreadsheet. We came up with TellTable (the Sourceforge site is still around with refs at http://telltable-s.sourceforge.net/), which put openoffice calc on a server and made sure change recording was on and the menu to switch off change recording was removed. It is used over a web browser with a VNC client. Neil Smith wrote a Java application to view all the changes by who, what, when etc., and we discovered the infrastructure was quite nice for running any single user app in a shared mode with version control. However, with Google Docs, we realized we could try to make money or enjoy life, and so the project is now moribund. However, the ideas are there, and if anyone gets interested, I'll be happy to try to dig up materials, though I suspect that it would be easier to work with the ideas and more modern tools. The key idea is that there is just ONE master file, and that there is some discipline over keeping that file OK. My opinion is that this concept could be exploited much more for lots of different situations, but it seems that cloud technology is being used to create lots of versions of files rather than consolidate and control such files. JN On 03/03/2012 06:00 AM, r-help-request at r-project.org wrote:> Message: 76 > Date: Fri, 2 Mar 2012 20:04:05 -0500 > From: jim holtman <jholtman at gmail.com> > To: Greg Snow <538280 at gmail.com> > Cc: r-help <r-help at r-project.org> > Subject: Re: [R] Cleaning up messy Excel data > Message-ID: > <CAAxdm-6VzxcLi4Mr0GUkwGe5EVA0-Gx03FRUey9Ej3CaJY4hag at mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Unfortunately they only know how to use Excel and Word. They are not > folks who use a computer every day. Many of them run factories or > warehouses and asking them to use something like Access would not > happen in my lifetime (I have retired twice already). > > I don't have any problems with them "messing" up the data that I send > them; they are pretty good about making changes within the context of > the spreadsheet. The other issue is that I working with people in > twenty different locations spread across the US, so I might be able to > one of them to use Access (there is one I know that uses it), but that > leaves 19 other people I would not be able to communicate with.
Seconded John Kane Kingston ON Canada> -----Original Message----- > From: rolf.turner at xtra.co.nz > Sent: Sat, 03 Mar 2012 13:46:42 +1300 > To: 538280 at gmail.com > Subject: Re: [R] Cleaning up messy Excel data > > On 03/03/12 12:41, Greg Snow wrote: > > <SNIP> >> It is possible to do the right thing in >> Excel, but Excel does not encourage (let alone force) you to do the >> right thing, but makes it easy to do the wrong thing. > <SNIP> > > Fortune! > > cheers, > > Rolf Turner > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.____________________________________________________________ FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family! Visit http://www.inbox.com/photosharing to find out more!