Tania Oh
2008-Apr-08 15:24 UTC
[R] how to check if a variable is preferentially present in a sample
Dear All, I do apologise if this question is out of place for this list but I've tried searching mailing lists and read "Introductory Statistics with R" by Peter Dalgaard, but couldn't find any hints on solving my question below: I have a data frame (d) of values which I will rank in decreasing order of "val". Each value belongs to a group, either 'A', 'B', 'C', 'D', or 'E'. I then take the first 10 entries in data frame 'd' and count the number of occurrences for each of the groups. I want to test if certain groups occur more frequently than by chance in my first 10 entries. Would a chi-square test or a hypergeometric test be more suitable? If neither, what would be an alternative solution in R? Below is my data: ## data L5 <- LETTERS[1:5] d <- data.frame(cbind(val= rnorm(1:10)^2, group=sample(L5,100, repl=TRUE))) str(d) ##'data.frame': 100 obs. of 2 variables: ##$ val : Factor w/ 10 levels "0.000169268449333046",..: 10 3 5 6 1 2 7 8 4 9 ... ##$ group: Factor w/ 5 levels "A","B","C","D",..: 4 4 4 5 3 1 5 2 1 2 ... Many thanks in advance and apologies again, tania D. phil student Department of Physiology, Anatomy and Genetics University of Oxford
Jorge Velez
2008-Apr-08 20:56 UTC
[R] how to check if a variable is preferentially present in a sample
Hi Tania, I think it could be. I tried a solution based on your data set using a chi-squared approach. Here is what I got: # ---------------- # Data set set.seed(123) d <- data.frame(cbind(val=rnorm(1:10)^2, group=sample(LETTERS[1:5],100,repl=TRUE))) d[,"val"]<-as.numeric(as.character(d$val)) # Ranking "d" in decreasing order based on "val" and counting the number of observation in each group TABLE=table(d[order(val,decreasing=TRUE),][1:10,"group"]) TABLE A B C D E 3 2 3 1 1 # Chi-squared cht=chisq.test(TABLE) cht Chi-squared test for given probabilities data: TABLE X-squared = 2, df = 4, p-value = 0.7358 cht$p.value [1] 0.7357589 Hope this helps, Jorge On Tue, Apr 8, 2008 at 11:24 AM, Tania Oh <tania.oh@bnc.ox.ac.uk> wrote:> Dear All, > > I do apologise if this question is out of place for this list but I've > tried searching mailing lists and read "Introductory Statistics with > R" by Peter Dalgaard, but couldn't find any hints on solving my > question below: > > I have a data frame (d) of values which I will rank in decreasing > order of "val". Each value belongs to a group, either 'A', 'B', 'C', > 'D', or 'E'. I then take the first 10 entries in data frame 'd' and > count the number of occurrences for each of the groups. I want to > test if certain groups occur more frequently than by chance in my > first 10 entries. Would a chi-square test or a hypergeometric test be > more suitable? If neither, what would be an alternative solution in > R? Below is my data: > > > ## data > L5 <- LETTERS[1:5] > d <- data.frame(cbind(val= rnorm(1:10)^2, group=sample(L5,100, > repl=TRUE))) > > str(d) > ##'data.frame': 100 obs. of 2 variables: > ##$ val : Factor w/ 10 levels "0.000169268449333046",..: 10 3 5 6 1 2 > 7 8 4 9 ... > ##$ group: Factor w/ 5 levels "A","B","C","D",..: 4 4 4 5 3 1 5 2 1 > 2 ... > > > Many thanks in advance and apologies again, > tania > > D. phil student > Department of Physiology, Anatomy and Genetics > University of Oxford > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]