Mike Marchywka
2011-Feb-01 15:10 UTC
[R] general question on approaches to getting data from data providers
My question, buried in this rant, is " is there a mail list or other means for identifying sites with information likely to be important to many R users but the data is difficult to obtain due to the site's choice of technology?" Quite often, people here ask questions about scraping html to get various types of "public" information ( public being a bit debatable when information is buried in formatting junk). At least in one case,? I think it was something financial, I noted that R has packages with large components dedicated to scraping data from both gov and com sources but there is no indication that they are working with cooperative groups on the other side of the information fence. This morning, I tried to contact the census.gov webmaster after noting that all their data is in xls when in fact csv would probably be more appropriate for the data they have- I can open csv easily in notepad LOL. Then of course they point you to a certain company that makes a product to read this stuff. Is there a different list or general community that has a charter for discussing ways to get computer readable data from "data" providers? There are many websites that create other things, like fancy PDF graphics, that obliterate data or try to lock you into one commercial or proprietary or limited tool chain for data analysis. Thanks.