thr3ads.net - R help - [R] Ambiguities in vector [Sep 2007]

If this information is useful, please help other people find it:
Share via:

Birgit Lemcke

2007-Sep-20 07:49 UTC

[R] Ambiguities in vector

Hello all you helpful people out there!

I am stil R Beginner using R 2.5.1 on a Apple Power Book G4 with Mac
OS X 10.4.10 .

Perhaps you haven?t understood my question in the mail yesterday. So  
I will try to describe my problem in a different way

You see the tables. I would like to test the variables between the  
tables. But for some variables in some species , I have more than 1  
possibility.
Example: if the variable is leaf form coded by numbers. The species  
Anth-crin could have the form round or elliptic coded with 1 and 2.  
So there is written 12. But the two numbers should be treated  
separately. Is there a possibility in R to have a classe with  
ambiguities? Or do I have to recode my variables in a different way  
to handle this?

I really hope you understand now what I mean. If not please ask me.

I would be very pleased, if somebody could help me.

Here are the two tables:

MalTabChi
                X1 X4 X6 X8  X10  X14  X21  X24   X29  X38  X43 X50
X67  X76  X78 X80 X82 X84
Anth_cap       1  1  1  1    6    5    1   45    12    4   12   6
56    5    2   4   1   1
Anth_crin     12  1  1  2   76    5    1   45   256    2   25  56
56  345    2  23   1   2
Anth_eck      12  1 12  1    7    5   12    5    14   45    2  56
4    5    2  34   1  12
Anth_gram      2  1  1  1    6    5    1   25    25   23   25  45
5   45    2  23   1  12
Anth_insi      2  1  1  2   63    5    1    4     2    2    2  45
45   45   12   3   1  23
Anth_laxi     12  1  1 12    7   45    1    5   245   23    5  46
56  345    2  23   1  12
Anth_sing      1  1  2  1    7 2345    1    4   129 2345   12  46
4    5   23   2   1   1
Aski_albo_ari  3  1  1  2    6    5    2   46     2   34   15  34
5    5    3   4   1   1
Aski_alt      13  1  1  2    6    5    2    4     2    3   15  46
5    5    2  34   3   1
Aski_and       1  1  2  2    6    5    2    5  <NA> <NA>    1  36
4    5    3   3   3   1
Aski_capi      3  1  1  2   63    5    2    5     2    3    5  45
4    5    3   3   1   1.............

FemTabChi
                  X1 X4   X6 X8  X10  X14 X21 X24   X29  X38  X43 X50
X67 X76  X78 X80 X82 X84
Anth_cap         1  1    1  2    4    4   1   5     6   14    2   6
56   5    2  23   1   1
Anth_crin        1  1    1  2   47    5   1  45     6    1    2
45    5  34    2   3   1  23
Anth_eck         1  1    1  2    4    5   1  45     6    1    2  56
46 345    2   3   1  12
Anth_gram        1  1    1  2 <NA>    5   2   5  <NA>    4
<NA>  56
56   5    2   3   1   1
Anth_insi        1  1    1  2    3    5   1   4    26   25    2
4    5   5    2   3   1 234
Anth_laxi        1  1    1  2   47    5   1  56     6   24    2
4    5 345   21  23   1   1
Anth_sing        1  1    1  2   47   24   1   2    24    2    2
4    4   5    2  12   1   1
Aski_albo_ari    2  1    1  2    4    5   2   4     2   34   15
4    5   5    3   4   1   1
Aski_alt        12  1    1  2    4    5   2   4    89    5   15
46    5   5    2   3   3   1
Aski_and         1  1    1  2    4    5   2   5   259    5    1  46
25   5    2  23   1   1
Aski_capi       23  1    1  2  234    5   2   5  2459  235    5
46    5  34    3   3   1   1
Aski_chart      12  1    1  2    4    5   2   4    29    2   15   4
25   5    2  34   3   1
Aski_deli        2  1    1  2 <NA>   45  12   4     6    2    5
46    5   3    3   4   1   1..........

Greetings

Birgit




Birgit Lemcke
Institut f?r Systematische Botanik
Zollikerstrasse 107
CH-8008 Z?rich
Switzerland
Ph: +41 (0)44 634 8351
birgit.lemcke at systbot.uzh.ch






	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting- 
guide.html
and provide commented, minimal, self-contained, reproducible code.

bbolker

2007-Sep-20 18:24 UTC

head link

[R] Ambiguities in vector

Birgit Lemcke wrote:> 
> 
> Perhaps you haven?t understood my question in the mail yesterday. So  
> I will try to describe my problem in a different way
> 
> You see the tables. I would like to test the variables between the  
> tables. 
> 
I'm afraid that even before we start to deal with the ambiguities
your question is not clear.   What do you want to know, and before
you sat down at the computer what statistical test did you intend
to use?  (For better or worse, most of the documentation of R
_assumes_ you know what you want to test and how you want to
do it.)   I'm supposing you want to do some kind of comparison
across communities (tables 1 and 2), but I don't know what kind.
Comparing a single cell of the table to another just asks if
the leaf form is the same in the two communities.  Do you just
want to ask if leaf forms of a given species are significantly different
in different communities?  I'm not sure what the null hypothesis
would be here.  What are the rows and columns?  Can we use
them to develop a hypothesis?

If you can say precisely what your question is and how you would
test it in the _absence_ of ambiguity (i.e., specify a statistical test --
you don't need to know how to run it in R, that's what the list is
actually for), then we can help you decide how to handle the
multiple coding problem.

  good luck
    Ben Bolker
-- 
View this message in context:
http://www.nabble.com/Ambiguities-in-vector-tf4485921.html#a12802997
Sent from the R help mailing list archive at Nabble.com.

Birgit Lemcke

2007-Oct-01 14:44 UTC

head link

[R] Ambiguities in vector

Hello James,

first I have to thank you for your help but there are some things I  
don´t understand now.

I am not sur if I understand what this example gives me back:

ratings <- data.frame(id = c(1,2,3,4), att1 = c(1,1,0,1), att2 = c 
(1,0,0,1), att3 = c(0,1,1,1))
ratings

     id att1 att2 att3
1  1    1    1    0
2  2    1    0    1
3  3    0    0    1
4  4    1    1    1

tab <- crossprod(as.matrix(ratings[,-1]))
tab <- tab - diag(diag(tab))
tab

        att1 att2 att3
att1    0    2    2
att2    2    0    1
att3    2    1    0

As I understood it gives me the number how often we find the same  
value for example comparing att1 and att2 for all id´s?. Is that right?

What is this line doing: tab <- tab - diag(diag(tab))

And what does the original output of crosspod mean:

      	att1 att2 att3
att1    3    2    2
att2    2    2    1
att3    2    1    3

I tried to do this with a part of my dataset

I used a table with 3 variables (only binary)
In the first part of the table I have the females (348 rows) and in  
the second part the males (also 348 rows).

Then I tried this:

CrossFemMal1_3<-crossprod(as.matrix(CrossFemMalVar1_3))

The output:
CrossFemMal1_3


       V1 V2 V3
V1 NA NA NA
V2 NA NA NA
V3 NA NA NA

There was one row of NAs in my dataset. I presume this is responsible  
for the NA results? So how can I deal here with NAs?

If I use two matrices (male and female) I get back amongst others the  
comparison of att1male to att1 female. In the case that I use the  
possibility of a percentage table output I get for example 40%. Can I  
say then that if the percentage is lower than 50% the attributes are  
significantly different?

Corresponding to your other suggestion:

sapply(c("1","2","3"), function(x)
ifelse(regexpr(x, FemV1) > 0, 1, 0))

It gives me this output

          1  2  3
   [1,]  1  0  0
   [2,]  1  0  0
   [3,]  1  0  0
   [4,]  1  0  0
   [5,]  1  0  0
   [6,]  1  0  0
   [7,]  1  0  0
   [8,]  1  0  0
   [9,]  0  1  0
      .     .   .   .
      .     .   .   .

I think now I should count the ones for 1, 2 and 3?

I tried to use table but it gives me only the counts for 1 and zero:

table(FemV1Test)
FemV1Test
   0   1
657 387

How can I specify that it gives me the counts for every column?

And then do the same for MalV1 and compare both somehow?

Another time thanks in advance for your help.

Greetings

Birgit


Am 29.09.2007 um 14:45 schrieb James Reilly:
>
> Hi Birgit,
>
> The first argument to regexpr should be just one character value,  
> not a vector. Your call:
> regexpr(c("1","2","3"),FemV1)
> seems to have been interpreted as:
> regexpr("1",FemV1)
>
> I think you probably need something more like:
> sapply(c("1","2","3"), function(x)
ifelse(regexpr(x, FemV1) > 0, 1,
> 0))
> This will also work on multiple response data such as
> FemV1 <- c("13", "2", "13",
"123", "1", "23")
> Then colSums will give you frequency counts for each attribute.
>
> I think you would need greatly simplify the multiple response data  
> to apply anything like a paired t-test. Have you considered just  
> crosstabulating the attributes of male plants versus the females?  
> For some R code, see
> https://stat.ethz.ch/pipermail/r-help/2007-February/126125.html
>
> Regards,
> James
>
>
> On 29/9/07 3:37 AM, Birgit Lemcke wrote:
>> Hello James,
>> sorry that I have to ask you a second time but I don´t understand  
>> what regexpr () is doing and how the syntax works.
>> I have a vectors that I converted to character string
>> as.character(FalV1)
>>  [1] "1"  "1"  "1"  "1" 
"1"  "1"  "1"  "1"  "2"
>> after that I did this, but without knowing what I am really doing
>> regexpr(c("1","2","3"),FemV1)
>> The output looked like that
>>  [1]  1  1  1  1  1  1  1  1 -1 As i undertsood the function looks  
>> for in this case 1, 2 or 3. If there is a match it gives me back 1  
>> if not it gives me back -1
>> But I don´t know how this helps me now si I hope you will explain me.
>> And there is another problem I have. cor the continous variables I  
>> used a paired T-Test can I perform this approach also paired?
>> Thanks a lot in advance.
>> Greetings
>> Birgit
>> Am 21.09.2007 um 11:38 schrieb James Reilly:
>>>
>>> If I understand you right, you have several multiple response  
>>> variables (with the responses encoded in numeric strings) and you  
>>> want to see whether these are associated with sex. To tabulate  
>>> the data, I would convert your variables into collections of  
>>> dummy variables using regexpr(), then use table(). You can use a  
>>> modified chi-squared test with a Rao-Scott correction on the  
>>> resulting tables; see Thomas and Decady (2004). Bootstrapping is  
>>> another possible approach.
>>>
>>> @article{,
>>> Author = {Thomas, D. Roland and Decady, Yves J.},
>>> Journal = {International Journal of Testing},
>>> Number = {1},
>>> Pages = {43 - 59},
>>> Title = {Testing for Association Using Multiple Response Survey  
>>> Data: Approximate Procedures Based on the Rao-Scott Approach.},
>>> Volume = {4},
>>> Year = {2004},
>>> Url=http://search.ebscohost.com/login.aspx? 
>>> direct=true&db=pbh&AN=13663214&site=ehost-live
<http://
>>> search.ebscohost.com/login.aspx? 
>>> direct=true&db=pbh&AN=13663214&site=ehost-live>
>>> }
>>>
>>> Hope this helps,
>>> James
>>> -- 
>>> James Reilly
>>> Department of Statistics, University of Auckland
>>> Private Bag 92019, Auckland, New Zealand
>>>
>>> On 21/9/07 7:14 AM, Birgit Lemcke wrote:
>>>> First thanks for your answer.
>>>> Now I try to explain better:
>>>> I have species in the rows and morphological attributes in the
>>>> columns coded by numbers (qualitative variables; nominal and  
>>>> ordinal).
>>>> In one table for the male plants of every species and in the  
>>>> other  table for the female plants of every species. The  
>>>> variables contain  every possible occurrence in this species
and
>>>> this gender.
>>>> I would like to compare every variable between male and female
>>>> plants  for example using a ChiSquare Test.
>>>> The Null-hypothesis could be: Variable male is equal to
variable
>>>> Female.
>>>> The question behind all is, if male and female plants in this  
>>>> species  are significantly different and which attributes are  
>>>> responsible for  this difference.
>>>> I really hope that this is better understandable. If not please
>>>> ask.
>>>> Thanks a million in advance.
>>>> Greetings
>>>> Birgit
>>>
>> Birgit Lemcke
>> Institut für Systematische Botanik
>> Zollikerstrasse 107
>> CH-8008 Zürich
>> Switzerland
>> Ph: +41 (0)44 634 8351
>> birgit.lemcke@systbot.uzh.ch
<mailto:birgit.lemcke@systbot.uzh.ch>
Birgit Lemcke
Institut für Systematische Botanik
Zollikerstrasse 107
CH-8008 Zürich
Switzerland
Ph: +41 (0)44 634 8351
birgit.lemcke@systbot.uzh.ch






	[[alternative HTML version deleted]]

Birgit Lemcke

2007-Oct-08 13:35 UTC

head link

[R] Ambiguities in vector

Hello James,

all of your suggestions work very well except of this:

FemMal <- cbind(FemV1gez?hlt[2,], MalV1gez?hlt[2,])

colnames(FemMal) <- ("Females", "Males")
Fehler: syntax error

FeMMal

   [,1]     [ ,2]
1  133   79
2  203  237
3   51   76

But it works if I do that:

Namen<-c("Female","Male")
colnames(FemMal) <- (Namen)

FemMal

   Female Male
1    133   79
2    203  237
3     51   76

Greetings

Birgit



Am 04.10.2007 um 17:19 schrieb James Reilly:
>
> Hi Birgit,
>
> First, can I suggest that you don't copy off-list conversations to  
> the mailing list partway through? Not that I minded in this case,  
> but it probably confuses people and the posting guide warns against  
> it.
>
> I'll address your questions in reverse order.
>
> To get tables for each column, try:
> apply(FemV1Test, 2, table)
>
> Likewise for males:
> apply(MalV1, 2, table)
>
> To compare them, perhaps put them side by side:
> FemMal <- cbind(apply(FemV1Test, 2, table)[2,], apply(MalV1, 2,  
> table)[2,])
> colnames(FemMal) <- ("Females", "Males")
> FemMal
>
> You can then do arithmetic, plot them, sort by the difference, etc.
> plot(FemMal)
> FemMal[order(FemMal[,1]-FemMal[,2]),]
>
> About crossprod, cell (i,j) in the resulting matrix shows the  
> number of cases with a 1 for attribute i  and attribute j. This  
> shows which attributes overlap most and least.
>
> The command "tab <- tab - diag(diag(tab))" puts zeroes down
the
> diagonal, as was requested. One cosmetic reason for doing this is  
> that the diagonal elements are often much larger than the off- 
> diagonal ones, and zeroing them makes the table easier to read or  
> display graphically. E.g.
> http://pbil.univ-lyon1.fr/ADE-4/ade4-html/table.dist.html	
>
> Yes, any row with all NAs will make the crossprod all NAs too. You  
> can ignore any rows with NAs as follows:
> CrossFemMal1_3<-crossprod(as.matrix(CrossFemMalVar1_3[apply 
> (CrossFemMalVar1_3, 1, function (x) !any(is.na(x))),]))
>
> I'm not sure if I follow why you want to know about statistical  
> significance here. Do you really think of the species in your study  
> as a sample from a larger population of plant species, which you  
> are trying to generalise about?
>
> If so, is the population much larger than your sample? And was your  
> sample of species selected randomly, i.e. with equal selection  
> probabilities? If not, standard tests probably won't apply.
>
> Regards,
> James
>
>
> On 2/10/07 2:44 AM, Birgit Lemcke wrote:
>> Hello James,
>> first I have to thank you for your help but there are some things  
>> I don?t understand now.
>> I am not sur if I understand what this example gives me back:
>> ratings <- data.frame(id = c(1,2,3,4), att1 = c(1,1,0,1), att2 = c 
>> (1,0,0,1), att3 = c(0,1,1,1))
>> ratings
>>     id att1 att2 att3
>> 1  1    1    1    0
>> 2  2    1    0    1
>> 3  3    0    0    1
>> 4  4    1    1    1
>> tab <- crossprod(as.matrix(ratings[,-1]))
>> tab <- tab - diag(diag(tab))
>> tab
>>        att1 att2 att3
>> att1    0    2    2
>> att2    2    0    1
>> att3    2    1    0
>> As I understood it gives me the number how often we find the same  
>> value for example comparing att1 and att2 for all id?s?. Is that  
>> right?
>> What is this line doing: tab <- tab - diag(diag(tab))
>> And what does the original output of crosspod mean:
>>       att1 att2 att3
>> att1    3    2    2
>> att2    2    2    1
>> att3    2    1    3
>> I tried to do this with a part of my dataset
>> I used a table with 3 variables (only binary)
>> In the first part of the table I have the females (348 rows) and  
>> in the second part the males (also 348 rows).
>> Then I tried this:
>> CrossFemMal1_3<-crossprod(as.matrix(CrossFemMalVar1_3))
>> The output:
>> CrossFemMal1_3
>>       V1 V2 V3
>> V1 NA NA NA
>> V2 NA NA NA
>> V3 NA NA NA
>> There was one row of NAs in my dataset. I presume this is  
>> responsible for the NA results? So how can I deal here with NAs?
>> If I use two matrices (male and female) I get back amongst others  
>> the comparison of att1male to att1 female. In the case that I use  
>> the possibility of a percentage table output I get for example  
>> 40%. Can I say then that if the percentage is lower than 50% the  
>> attributes are significantly different?
>> Corresponding to your other suggestion:
>> sapply(c("1","2","3"), function(x)
ifelse(regexpr(x, FemV1) > 0,
>> 1, 0))
>> It gives me this output
>>          1  2  3
>>   [1,]  1  0  0
>>   [2,]  1  0  0
>>   [3,]  1  0  0
>>   [4,]  1  0  0
>>   [5,]  1  0  0
>>   [6,]  1  0  0
>>   [7,]  1  0  0
>>   [8,]  1  0  0
>>   [9,]  0  1  0
>>      .     .   .   .
>>      .     .   .   .
>> I think now I should count the ones for 1, 2 and 3?
>> I tried to use table but it gives me only the counts for 1 and zero:
>> table(FemV1Test)
>> FemV1Test
>>   0   1
>> 657 387
>> How can I specify that it gives me the counts for every column?
>> And then do the same for MalV1 and compare both somehow?
>> Another time thanks in advance for your help.
>> Greetings Birgit
>> Am 29.09.2007 um 14:45 schrieb James Reilly:
>>>
>>> Hi Birgit,
>>>
>>> The first argument to regexpr should be just one character value,  
>>> not a vector. Your call:
>>> regexpr(c("1","2","3"),FemV1)
>>> seems to have been interpreted as:
>>> regexpr("1",FemV1)
>>>
>>> I think you probably need something more like:
>>> sapply(c("1","2","3"), function(x)
ifelse(regexpr(x, FemV1) > 0,
>>> 1, 0))
>>> This will also work on multiple response data such as
>>> FemV1 <- c("13", "2", "13",
"123", "1", "23")
>>> Then colSums will give you frequency counts for each attribute.
>>>
>>> I think you would need greatly simplify the multiple response  
>>> data to apply anything like a paired t-test. Have you considered  
>>> just crosstabulating the attributes of male plants versus the  
>>> females? For some R code, see
>>> https://stat.ethz.ch/pipermail/r-help/2007-February/126125.html
>>>
>>> Regards,
>>> James
>>>
>>>
>>> On 29/9/07 3:37 AM, Birgit Lemcke wrote:
>>>> Hello James,
>>>> sorry that I have to ask you a second time but I don?t  
>>>> understand what regexpr () is doing and how the syntax works.
>>>> I have a vectors that I converted to character string
>>>> as.character(FalV1)
>>>>  [1] "1"  "1"  "1"  "1"
"1"  "1"  "1"  "1"  "2"
>>>> after that I did this, but without knowing what I am really
doing
>>>> regexpr(c("1","2","3"),FemV1)
>>>> The output looked like that
>>>>  [1]  1  1  1  1  1  1  1  1 -1 As i undertsood the function  
>>>> looks for in this case 1, 2 or 3. If there is a match it gives
>>>> me back 1 if not it gives me back -1
>>>> But I don?t know how this helps me now si I hope you will  
>>>> explain me.
>>>> And there is another problem I have. cor the continous
variables
>>>> I used a paired T-Test can I perform this approach also paired?
>>>> Thanks a lot in advance.
>>>> Greetings
>>>> Birgit
>>>> Am 21.09.2007 um 11:38 schrieb James Reilly:
>>>>>
>>>>> If I understand you right, you have several multiple
response
>>>>> variables (with the responses encoded in numeric strings)
and
>>>>> you want to see whether these are associated with sex. To  
>>>>> tabulate the data, I would convert your variables into  
>>>>> collections of dummy variables using regexpr(), then use
table
>>>>> (). You can use a modified chi-squared test with a
Rao-Scott
>>>>> correction on the resulting tables; see Thomas and Decady  
>>>>> (2004). Bootstrapping is another possible approach.
>>>>>
>>>>> @article{,
>>>>> Author = {Thomas, D. Roland and Decady, Yves J.},
>>>>> Journal = {International Journal of Testing},
>>>>> Number = {1},
>>>>> Pages = {43 - 59},
>>>>> Title = {Testing for Association Using Multiple Response
Survey
>>>>> Data: Approximate Procedures Based on the Rao-Scott
Approach.},
>>>>> Volume = {4},
>>>>> Year = {2004},
>>>>> Url=http://search.ebscohost.com/login.aspx? 
>>>>> direct=true&db=pbh&AN=13663214&site=ehost-live
<http://
>>>>> search.ebscohost.com/login.aspx? 
>>>>> direct=true&db=pbh&AN=13663214&site=ehost-live
<http://
>>>>> search.ebscohost.com/login.aspx? 
>>>>>
direct=true&db=pbh&AN=13663214&site=ehost-live>>
>>>>> }
>>>>>
>>>>> Hope this helps,
>>>>> James
>>>>> -- 
>>>>> James Reilly
>>>>> Department of Statistics, University of Auckland
>>>>> Private Bag 92019, Auckland, New Zealand
>>>>>
>>>>> On 21/9/07 7:14 AM, Birgit Lemcke wrote:
>>>>>> First thanks for your answer.
>>>>>> Now I try to explain better:
>>>>>> I have species in the rows and morphological attributes
in
>>>>>> the  columns coded by numbers (qualitative variables;
nominal
>>>>>> and ordinal).
>>>>>> In one table for the male plants of every species and
in the
>>>>>> other  table for the female plants of every species.
The
>>>>>> variables contain  every possible occurrence in this
species
>>>>>> and this gender.
>>>>>> I would like to compare every variable between male and
female
>>>>>> plants  for example using a ChiSquare Test.
>>>>>> The Null-hypothesis could be: Variable male is equal to
>>>>>> variable Female.
>>>>>> The question behind all is, if male and female plants
in this
>>>>>> species  are significantly different and which
attributes are
>>>>>> responsible for  this difference.
>>>>>> I really hope that this is better understandable. If
not
>>>>>> please ask.
>>>>>> Thanks a million in advance.
>>>>>> Greetings
>>>>>> Birgit
>>>>>
>>>> Birgit Lemcke
>>>> Institut f?r Systematische Botanik
>>>> Zollikerstrasse 107
>>>> CH-8008 Z?rich
>>>> Switzerland
>>>> Ph: +41 (0)44 634 8351
>>>> birgit.lemcke at systbot.uzh.ch <mailto:birgit.lemcke at
systbot.uzh.ch>
>> Birgit Lemcke
>> Institut f?r Systematische Botanik
>> Zollikerstrasse 107
>> CH-8008 Z?rich
>> Switzerland
>> Ph: +41 (0)44 634 8351
>> birgit.lemcke at systbot.uzh.ch <mailto:birgit.lemcke at
systbot.uzh.ch>
>>
>
> -- 
> James Reilly
> Department of Statistics, University of Auckland
> Private Bag 92019, Auckland, New Zealand
Birgit Lemcke
Institut f?r Systematische Botanik
Zollikerstrasse 107
CH-8008 Z?rich
Switzerland
Ph: +41 (0)44 634 8351
birgit.lemcke at systbot.uzh.ch

Reasonably Related Threads

Search for more reasonably related threads

R help - Sep 2007 - Ambiguities in vector

[R] Ambiguities in vector

[R] Ambiguities in vector

[R] Ambiguities in vector

[R] Ambiguities in vector

Reasonably Related Threads