thr3ads.net - R help - [R] how to check if a variable is preferentially present in a sample [Apr 2008]

If this information is useful, please help other people find it:
Share via:

Tania Oh

2008-Apr-08 15:24 UTC

[R] how to check if a variable is preferentially present in a sample

Dear All,

I do apologise if this question is out of place for this list but I've  
tried searching mailing lists and read "Introductory Statistics with  
R" by Peter Dalgaard, but couldn't find any hints on solving my  
question below:

I have a data frame (d) of values which I will rank in decreasing  
order of "val". Each value belongs to a group, either 'A',
'B', 'C',
'D', or 'E'.  I then take the first 10 entries in data frame
'd'  and
count the number of occurrences for each of the groups.  I want to  
test if certain groups occur more frequently than by chance in my  
first 10 entries. Would a chi-square test or a hypergeometric test be  
more suitable? If neither, what would be an alternative solution in  
R?  Below is my data:


## data
L5 <- LETTERS[1:5]
d <- data.frame(cbind(val= rnorm(1:10)^2, group=sample(L5,100,  
repl=TRUE)))

str(d)
##'data.frame':	100 obs. of  2 variables:
##$ val  : Factor w/ 10 levels "0.000169268449333046",..: 10 3 5 6 1 2
7 8 4 9 ...
##$ group: Factor w/ 5 levels
"A","B","C","D",..: 4 4 4 5 3 1 5 2 1
2 ...


Many thanks in advance and apologies again,
tania

D. phil student
Department of Physiology, Anatomy and Genetics
University of Oxford

Jorge Velez

2008-Apr-08 20:56 UTC

head link

[R] how to check if a variable is preferentially present in a sample

Hi Tania,

I think it could be. I tried a solution based on your data set using a
chi-squared approach. Here is what I got:

# ----------------
# Data set
set.seed(123)
d <- data.frame(cbind(val=rnorm(1:10)^2,
group=sample(LETTERS[1:5],100,repl=TRUE)))
d[,"val"]<-as.numeric(as.character(d$val))

# Ranking "d" in decreasing order based on "val" and
counting the number of
observation in each group
TABLE=table(d[order(val,decreasing=TRUE),][1:10,"group"])
TABLE

A B C D E
3 2 3 1 1

# Chi-squared
cht=chisq.test(TABLE)
cht

Chi-squared test for given probabilities

data:  TABLE
X-squared = 2, df = 4, p-value = 0.7358

cht$p.value
[1] 0.7357589


Hope this helps,


Jorge


On Tue, Apr 8, 2008 at 11:24 AM, Tania Oh <tania.oh@bnc.ox.ac.uk> wrote:
> Dear All,
>
> I do apologise if this question is out of place for this list but I've
> tried searching mailing lists and read "Introductory Statistics with
> R" by Peter Dalgaard, but couldn't find any hints on solving my
> question below:
>
> I have a data frame (d) of values which I will rank in decreasing
> order of "val". Each value belongs to a group, either
'A', 'B', 'C',
> 'D', or 'E'.  I then take the first 10 entries in data
frame 'd'  and
> count the number of occurrences for each of the groups.  I want to
> test if certain groups occur more frequently than by chance in my
> first 10 entries. Would a chi-square test or a hypergeometric test be
> more suitable? If neither, what would be an alternative solution in
> R?  Below is my data:
>
>
> ## data
> L5 <- LETTERS[1:5]
> d <- data.frame(cbind(val= rnorm(1:10)^2, group=sample(L5,100,
> repl=TRUE)))
>
> str(d)
> ##'data.frame': 100 obs. of  2 variables:
> ##$ val  : Factor w/ 10 levels "0.000169268449333046",..: 10 3 5
6 1 2
> 7 8 4 9 ...
> ##$ group: Factor w/ 5 levels
"A","B","C","D",..: 4 4 4 5 3 1 5 2 1
> 2 ...
>
>
> Many thanks in advance and apologies again,
> tania
>
> D. phil student
> Department of Physiology, Anatomy and Genetics
> University of Oxford
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more maybe matching threads

R help - Apr 2008 - how to check if a variable is preferentially present in a sample

[R] how to check if a variable is preferentially present in a sample

[R] how to check if a variable is preferentially present in a sample

Seemingly Similar Threads