(Sorry for the slightly off topic post) I'm giving a talk (on data mining) to some non-statisticians (who're all postgrad students, but a mixture of Science and Commerce majors). My intention is to show them the importance of statistics when doing data mining. What I'm thinking of doing is using, hopefully, two datasets. One from scientific area and another that is commercially-related. However, it would be nice if the datasets (or at least one of them) will violate some kind of basic statistical assumptions (in its raw form anyway) -- hence showing having a basic statistical knowledge is important. Also hopefully, I can introduce R to them (since many of them haven't heard of it yet). Does anyone have (or know where I can get) such data? It doesn't have to be huge,..... Thanks! Kevin -- Ko-Kang Kevin Wang PhD Student Centre for Mathematics and its Applications Mathematical Sciences Institute (MSI) Australian National University Canberra, ACT 0200 Australia Homepage: http://wwwmaths.anu.edu.au/~wangk/ Ph (W): +61-2-6125-2431 Ph (H): +61-2-6125-7407 Ph (M): +61-40-451-8301
Kevin Wang wrote:> (Sorry for the slightly off topic post) > > I'm giving a talk (on data mining) to some non-statisticians (who're > all postgrad students, but a mixture of Science and Commerce majors). > > My intention is to show them the importance of statistics when doing > data mining. What I'm thinking of doing is using, hopefully, two > datasets. One from scientific area and another that is > commercially-related. However, it would be nice if the datasets (or > at least one of them) will violate some kind of basic statistical > assumptions (in its raw form anyway) -- hence showing having a basic > statistical knowledge is important. Also hopefully, I can introduce R > to them (since many of them haven't heard of it yet). > > Does anyone have (or know where I can get) such data? It doesn't have > to be huge,..... > > Thanks! > > Kevin >The titanic3 dataset on our web site - issue loadUrl('http://biostat.mc.vanderbilt.edu/twiki/pub/Main/DataSets/titanic3.sav') to load( ) it - may fit the bill although the response variable is binary. Assumptions that would be violated in a trivial analysis would be additivity of age and passenger class, and perhaps linearity of age. At least it is a dataset that everyone understands already. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University