Markus Weisner
2012-Apr-22 18:34 UTC
[R] need advice on using excel to check data for import into R
I have created an S4 object type for conducting fire department data analysis. The object includes validity check that ensures certain fields are present and that duplicate records don't exist for certain combinations of columns (e.g. no duplicate incident number / incident data / unit ID ensures that the data does not show the same fire engine responding twice on the same call). I am finding that I spend a lot of time taking client data, converting it to my S4 object, and then sending it back to the client to correct data validity issues. I am trying to figure out a clever way to have excel (typically the program used by my clients) check client data prior to them submitting it to me. I have been working with somebody on trying to develop an excel toolbar add-in with limited success. My question is whether anybody can think of clever alternatives for clients to validate their data … for example, is their a R excel plugin (that would be easily installed by a client) where I might be able write some lines of R to check the data and output messages … or maybe some sort of server where they could upload their data and I could have some lines of R code that would check the code and send back potential error messages? I realize this is a fairly open ended question … just looking for some general ideas and directions to go. Getting a little frustrated with spending most of my work time dealing with data cleaning issues … guessing this is a problem shared by many of us that use R! Thanks, Markus [[alternative HTML version deleted]]
Richard M. Heiberger
2012-Apr-22 19:43 UTC
[R] need advice on using excel to check data for import into R
This looks like a perfect case for an RExcel solution. RExcel is an addin that allows you, among other things, to place an arbitrary R function inside the Excel automatic recalculation mode. For details see rcom.univie.ac.at There are many references item listed on the wiki page in the left panel. For further followup, please sign up for the rcom mailing list, again with the details on the web site. Rich On Sun, Apr 22, 2012 at 2:34 PM, Markus Weisner <r@themarkus.com> wrote:> I have created an S4 object type for conducting fire department data > analysis. The object includes validity check that ensures certain fields > are present and that duplicate records don't exist for certain combinations > of columns (e.g. no duplicate incident number / incident data / unit ID > ensures that the data does not show the same fire engine responding twice > on the same call). > > I am finding that I spend a lot of time taking client data, converting it > to my S4 object, and then sending it back to the client to correct data > validity issues. > > I am trying to figure out a clever way to have excel (typically the program > used by my clients) check client data prior to them submitting it to me. I > have been working with somebody on trying to develop an excel toolbar > add-in with limited success. > > My question is whether anybody can think of clever alternatives for clients > to validate their data … for example, is their a R excel plugin (that would > be easily installed by a client) where I might be able write some lines of > R to check the data and output messages … or maybe some sort of server > where they could upload their data and I could have some lines of R code > that would check the code and send back potential error messages? > > I realize this is a fairly open ended question … just looking for some > general ideas and directions to go. Getting a little frustrated with > spending most of my work time dealing with data cleaning issues … guessing > this is a problem shared by many of us that use R! > > Thanks, > Markus > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]
Greg Snow
2012-Apr-23 20:10 UTC
[R] need advice on using excel to check data for import into R
This is really a job for a database, and Excel is not a database (even though many think it is). I have some clients that I have convinced to create an Access database rather than use Excel (still MS product so it can't be that scary, right?). They were often a little reluctant at first because they would be using a new tool, and they actually had to think about the design of the database up front, but once they got to serious data entry they were very grateful for me directing them to Access over Excel. Databases have tools to validate data on entry so there will be fewer cases where you need to ask them for corrections (and it will be easier for them to fix any problems that do sneak through). On Sun, Apr 22, 2012 at 12:34 PM, Markus Weisner <r at themarkus.com> wrote:> I have created an S4 object type for conducting fire department data > analysis. ?The object includes validity check that ensures certain fields > are present and that duplicate records don't exist for certain combinations > of columns (e.g. no duplicate incident number / incident data / unit ID > ensures that the data does not show the same fire engine responding twice > on the same call). > > I am finding that I spend a lot of time taking client data, converting it > to my S4 object, and then sending it back to the client to correct data > validity issues. > > I am trying to figure out a clever way to have excel (typically the program > used by my clients) check client data prior to them submitting it to me. ?I > have been working with somebody on trying to develop an excel toolbar > add-in with limited success. > > My question is whether anybody can think of clever alternatives for clients > to validate their data ? for example, is their a R excel plugin (that would > be easily installed by a client) where I might be able write some lines of > R to check the data and output messages ? or maybe some sort of server > where they could upload their data and I could have some lines of R code > that would check the code and send back potential error messages? > > I realize this is a fairly open ended question ? just looking for some > general ideas and directions to go. Getting a little frustrated with > spending most of my work time dealing with data cleaning issues ? guessing > this is a problem shared by many of us that use R! > > Thanks, > Markus > > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Gregory (Greg) L. Snow Ph.D. 538280 at gmail.com
Michael Bibo
2012-Apr-24 07:07 UTC
[R] need advice on using excel to check data for import into R
Markus Weisner <r <at> themarkus.com> writes:> My question is whether anybody can think of clever alternatives for clients > to validate their datafor example, is their a R excel plugin (that would> be easily installed by a client) where I might be able write some lines of > R to check the data and output messagesor maybe some sort of server> where they could upload their data and I could have some lines of R code > that would check the code and send back potential error messages? > > I realize this is a fairly open ended questionjust looking for some> general ideas and directions to go. Getting a little frustrated with > spending most of my work time dealing with data cleaning issuesguessing> this is a problem shared by many of us that use R! >For what it's worth, Excel does have some data validation tools built in. I refer people to "Disciplined Use of Spreadsheet Packages for Data Entry" available as a pdf from from the Statistical Services Centre at the University of Reading (http://www.reading.ac.uk/ssc/n/resources/DataEntryAndOrganisation.htm). Note that there is also a follow-up document "Role of a Database Package for Research Projects" for when "you are reaching the limits of Excel". Hope this helps, Michael Bibo Queensland Health
Seemingly Similar Threads
- dataframe selection using a multi-value key
- projecting GIS coordinates for analysis with spatstat package
- how to match exact phrase using gsub (or similar function)
- using match-type function to return correctly ordered data from a dataframe
- stumped on how to reorder factors