Regarding unusual combinations of factors in categorical data. Are there any R packages that can be used to identify the outliers i.e. unusual combinations in categorical datasets ? Thanks. =============================================================================== Notice of Confidentiality This transmission contains information that may be confidential and that may also be privileged. Unless you are the intended recipient of the message (or authorised to receive it for the intended recipient) you may not copy, forward, or otherwise use it, or disclose it or its contents to anyone else. If you have received this transmission in error please notify us immediately and delete it from your system. RSA Insurance Group plc. Registered in England No. 2339826. The Registered Office is 9th Floor, One Plantation Place, 30 Fenchurch Street, London EC3M 3BD =============================================================================== [[alternative HTML version deleted]]
Hi, On Mon, Nov 8, 2010 at 2:25 PM, Alan Chalk <Alan.Chalk at gcc.rsagroup.com> wrote:> Regarding unusual combinations of factors in categorical data.where all variables are categorical?> Are there any R packages that can be used to identify the outliers i.e. > unusual combinations in categorical datasets ?"outlier" or "unusual" tends to be rather variable, that is something unusual in one data set may not be in another. If you are dealing with strictly categorical variables, I am not certain how you would define an outlier. The categories only have the meaning attached to them, so it seems like they would only indicate outliers if you decided that an entire category was an outlier (e.g., males, females, half-man-half-ox). If you have one continuous variable in mind by different levels of a factor, then you could just use some simple plots (e.g., ggplot() + geom_point() + facet_grid(factor ~ .) or something similar). You could also z-score the values by each factor level and then extract zscores more extreme than +/- 3 or whatever value you like. It might be easier to give you feedback if you have a more specific example. Cheers, Josh> > Thanks. > > > ===============================================================================> > Notice of Confidentiality > > This transmission contains information that may be confidential and that may also be privileged. Unless you are the intended recipient of the message (or authorised to receive it for the intended recipient) you may not copy, forward, or otherwise use it, or disclose it or its contents to anyone else. If you have received this transmission in error please notify us immediately and delete it from your system. > > RSA Insurance Group plc. Registered in England No. 2339826. The Registered Office is 9th Floor, One Plantation Place, 30 Fenchurch Street, London EC3M 3BD > > ===============================================================================> ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
Perhaps just use the ftable function to generate a flat contingency table and look for counts below some threshold. Michael On 9 November 2010 09:25, Alan Chalk <Alan.Chalk at gcc.rsagroup.com> wrote:> Regarding unusual combinations of factors in categorical data. > Are there any R packages that can be used to identify the outliers i.e. > unusual combinations in categorical datasets ? > > Thanks. > > > ===============================================================================> > Notice of Confidentiality > > This transmission contains information that may be confidential and that may also be privileged. Unless you are the intended recipient of the message (or authorised to receive it for the intended recipient) you may not copy, forward, or otherwise use it, or disclose it or its contents to anyone else. If you have received this transmission in error please notify us immediately and delete it from your system. > > RSA Insurance Group plc. Registered in England No. 2339826. The Registered Office is 9th Floor, One Plantation Place, 30 Fenchurch Street, London EC3M 3BD > > ===============================================================================> ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On 11/09/2010 09:25 AM, Alan Chalk wrote:> Regarding unusual combinations of factors in categorical data. > Are there any R packages that can be used to identify the outliers i.e. > unusual combinations in categorical datasets ? >Hi Alan, If your factors are dichotomous and you are looking for common patterns of intersection, try intersectDiagram. Jim
On 11/8/2010 5:25 PM, Alan Chalk wrote:> Regarding unusual combinations of factors in categorical data. > Are there any R packages that can be used to identify the outliers i.e. > unusual combinations in categorical datasets ?"Unusual combinations" of factors are those that have large residuals in some loglinear model (or glm with poisson link)-- positive if the observed frequencies are > expected, negative otherwise. The most basic 'null' loglinear model is that of mutual independence, however, if some of the factors are predictors, it makes sense to include their highest interaction in the null model. Fit the model with loglm() or glm(), and use vcd::mosaic() to visualize the outliers. HTH -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA