Hello all you helpful people out there! I am stil R Beginner using R 2.5.1 on a Apple Power Book G4 with Mac OS X 10.4.10 . Perhaps you haven?t understood my question in the mail yesterday. So I will try to describe my problem in a different way You see the tables. I would like to test the variables between the tables. But for some variables in some species , I have more than 1 possibility. Example: if the variable is leaf form coded by numbers. The species Anth-crin could have the form round or elliptic coded with 1 and 2. So there is written 12. But the two numbers should be treated separately. Is there a possibility in R to have a classe with ambiguities? Or do I have to recode my variables in a different way to handle this? I really hope you understand now what I mean. If not please ask me. I would be very pleased, if somebody could help me. Here are the two tables: MalTabChi X1 X4 X6 X8 X10 X14 X21 X24 X29 X38 X43 X50 X67 X76 X78 X80 X82 X84 Anth_cap 1 1 1 1 6 5 1 45 12 4 12 6 56 5 2 4 1 1 Anth_crin 12 1 1 2 76 5 1 45 256 2 25 56 56 345 2 23 1 2 Anth_eck 12 1 12 1 7 5 12 5 14 45 2 56 4 5 2 34 1 12 Anth_gram 2 1 1 1 6 5 1 25 25 23 25 45 5 45 2 23 1 12 Anth_insi 2 1 1 2 63 5 1 4 2 2 2 45 45 45 12 3 1 23 Anth_laxi 12 1 1 12 7 45 1 5 245 23 5 46 56 345 2 23 1 12 Anth_sing 1 1 2 1 7 2345 1 4 129 2345 12 46 4 5 23 2 1 1 Aski_albo_ari 3 1 1 2 6 5 2 46 2 34 15 34 5 5 3 4 1 1 Aski_alt 13 1 1 2 6 5 2 4 2 3 15 46 5 5 2 34 3 1 Aski_and 1 1 2 2 6 5 2 5 <NA> <NA> 1 36 4 5 3 3 3 1 Aski_capi 3 1 1 2 63 5 2 5 2 3 5 45 4 5 3 3 1 1............. FemTabChi X1 X4 X6 X8 X10 X14 X21 X24 X29 X38 X43 X50 X67 X76 X78 X80 X82 X84 Anth_cap 1 1 1 2 4 4 1 5 6 14 2 6 56 5 2 23 1 1 Anth_crin 1 1 1 2 47 5 1 45 6 1 2 45 5 34 2 3 1 23 Anth_eck 1 1 1 2 4 5 1 45 6 1 2 56 46 345 2 3 1 12 Anth_gram 1 1 1 2 <NA> 5 2 5 <NA> 4 <NA> 56 56 5 2 3 1 1 Anth_insi 1 1 1 2 3 5 1 4 26 25 2 4 5 5 2 3 1 234 Anth_laxi 1 1 1 2 47 5 1 56 6 24 2 4 5 345 21 23 1 1 Anth_sing 1 1 1 2 47 24 1 2 24 2 2 4 4 5 2 12 1 1 Aski_albo_ari 2 1 1 2 4 5 2 4 2 34 15 4 5 5 3 4 1 1 Aski_alt 12 1 1 2 4 5 2 4 89 5 15 46 5 5 2 3 3 1 Aski_and 1 1 1 2 4 5 2 5 259 5 1 46 25 5 2 23 1 1 Aski_capi 23 1 1 2 234 5 2 5 2459 235 5 46 5 34 3 3 1 1 Aski_chart 12 1 1 2 4 5 2 4 29 2 15 4 25 5 2 34 3 1 Aski_deli 2 1 1 2 <NA> 45 12 4 6 2 5 46 5 3 3 4 1 1.......... Greetings Birgit Birgit Lemcke Institut f?r Systematische Botanik Zollikerstrasse 107 CH-8008 Z?rich Switzerland Ph: +41 (0)44 634 8351 birgit.lemcke at systbot.uzh.ch [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.
Birgit Lemcke wrote:> > > Perhaps you haven?t understood my question in the mail yesterday. So > I will try to describe my problem in a different way > > You see the tables. I would like to test the variables between the > tables. >I'm afraid that even before we start to deal with the ambiguities your question is not clear. What do you want to know, and before you sat down at the computer what statistical test did you intend to use? (For better or worse, most of the documentation of R _assumes_ you know what you want to test and how you want to do it.) I'm supposing you want to do some kind of comparison across communities (tables 1 and 2), but I don't know what kind. Comparing a single cell of the table to another just asks if the leaf form is the same in the two communities. Do you just want to ask if leaf forms of a given species are significantly different in different communities? I'm not sure what the null hypothesis would be here. What are the rows and columns? Can we use them to develop a hypothesis? If you can say precisely what your question is and how you would test it in the _absence_ of ambiguity (i.e., specify a statistical test -- you don't need to know how to run it in R, that's what the list is actually for), then we can help you decide how to handle the multiple coding problem. good luck Ben Bolker -- View this message in context: http://www.nabble.com/Ambiguities-in-vector-tf4485921.html#a12802997 Sent from the R help mailing list archive at Nabble.com.
Hello James, first I have to thank you for your help but there are some things I don´t understand now. I am not sur if I understand what this example gives me back: ratings <- data.frame(id = c(1,2,3,4), att1 = c(1,1,0,1), att2 = c (1,0,0,1), att3 = c(0,1,1,1)) ratings id att1 att2 att3 1 1 1 1 0 2 2 1 0 1 3 3 0 0 1 4 4 1 1 1 tab <- crossprod(as.matrix(ratings[,-1])) tab <- tab - diag(diag(tab)) tab att1 att2 att3 att1 0 2 2 att2 2 0 1 att3 2 1 0 As I understood it gives me the number how often we find the same value for example comparing att1 and att2 for all id´s?. Is that right? What is this line doing: tab <- tab - diag(diag(tab)) And what does the original output of crosspod mean: att1 att2 att3 att1 3 2 2 att2 2 2 1 att3 2 1 3 I tried to do this with a part of my dataset I used a table with 3 variables (only binary) In the first part of the table I have the females (348 rows) and in the second part the males (also 348 rows). Then I tried this: CrossFemMal1_3<-crossprod(as.matrix(CrossFemMalVar1_3)) The output: CrossFemMal1_3 V1 V2 V3 V1 NA NA NA V2 NA NA NA V3 NA NA NA There was one row of NAs in my dataset. I presume this is responsible for the NA results? So how can I deal here with NAs? If I use two matrices (male and female) I get back amongst others the comparison of att1male to att1 female. In the case that I use the possibility of a percentage table output I get for example 40%. Can I say then that if the percentage is lower than 50% the attributes are significantly different? Corresponding to your other suggestion: sapply(c("1","2","3"), function(x) ifelse(regexpr(x, FemV1) > 0, 1, 0)) It gives me this output 1 2 3 [1,] 1 0 0 [2,] 1 0 0 [3,] 1 0 0 [4,] 1 0 0 [5,] 1 0 0 [6,] 1 0 0 [7,] 1 0 0 [8,] 1 0 0 [9,] 0 1 0 . . . . . . . . I think now I should count the ones for 1, 2 and 3? I tried to use table but it gives me only the counts for 1 and zero: table(FemV1Test) FemV1Test 0 1 657 387 How can I specify that it gives me the counts for every column? And then do the same for MalV1 and compare both somehow? Another time thanks in advance for your help. Greetings Birgit Am 29.09.2007 um 14:45 schrieb James Reilly:> > Hi Birgit, > > The first argument to regexpr should be just one character value, > not a vector. Your call: > regexpr(c("1","2","3"),FemV1) > seems to have been interpreted as: > regexpr("1",FemV1) > > I think you probably need something more like: > sapply(c("1","2","3"), function(x) ifelse(regexpr(x, FemV1) > 0, 1, > 0)) > This will also work on multiple response data such as > FemV1 <- c("13", "2", "13", "123", "1", "23") > Then colSums will give you frequency counts for each attribute. > > I think you would need greatly simplify the multiple response data > to apply anything like a paired t-test. Have you considered just > crosstabulating the attributes of male plants versus the females? > For some R code, see > https://stat.ethz.ch/pipermail/r-help/2007-February/126125.html > > Regards, > James > > > On 29/9/07 3:37 AM, Birgit Lemcke wrote: >> Hello James, >> sorry that I have to ask you a second time but I don´t understand >> what regexpr () is doing and how the syntax works. >> I have a vectors that I converted to character string >> as.character(FalV1) >> [1] "1" "1" "1" "1" "1" "1" "1" "1" "2" >> after that I did this, but without knowing what I am really doing >> regexpr(c("1","2","3"),FemV1) >> The output looked like that >> [1] 1 1 1 1 1 1 1 1 -1 As i undertsood the function looks >> for in this case 1, 2 or 3. If there is a match it gives me back 1 >> if not it gives me back -1 >> But I don´t know how this helps me now si I hope you will explain me. >> And there is another problem I have. cor the continous variables I >> used a paired T-Test can I perform this approach also paired? >> Thanks a lot in advance. >> Greetings >> Birgit >> Am 21.09.2007 um 11:38 schrieb James Reilly: >>> >>> If I understand you right, you have several multiple response >>> variables (with the responses encoded in numeric strings) and you >>> want to see whether these are associated with sex. To tabulate >>> the data, I would convert your variables into collections of >>> dummy variables using regexpr(), then use table(). You can use a >>> modified chi-squared test with a Rao-Scott correction on the >>> resulting tables; see Thomas and Decady (2004). Bootstrapping is >>> another possible approach. >>> >>> @article{, >>> Author = {Thomas, D. Roland and Decady, Yves J.}, >>> Journal = {International Journal of Testing}, >>> Number = {1}, >>> Pages = {43 - 59}, >>> Title = {Testing for Association Using Multiple Response Survey >>> Data: Approximate Procedures Based on the Rao-Scott Approach.}, >>> Volume = {4}, >>> Year = {2004}, >>> Url=http://search.ebscohost.com/login.aspx? >>> direct=true&db=pbh&AN=13663214&site=ehost-live <http:// >>> search.ebscohost.com/login.aspx? >>> direct=true&db=pbh&AN=13663214&site=ehost-live> >>> } >>> >>> Hope this helps, >>> James >>> -- >>> James Reilly >>> Department of Statistics, University of Auckland >>> Private Bag 92019, Auckland, New Zealand >>> >>> On 21/9/07 7:14 AM, Birgit Lemcke wrote: >>>> First thanks for your answer. >>>> Now I try to explain better: >>>> I have species in the rows and morphological attributes in the >>>> columns coded by numbers (qualitative variables; nominal and >>>> ordinal). >>>> In one table for the male plants of every species and in the >>>> other table for the female plants of every species. The >>>> variables contain every possible occurrence in this species and >>>> this gender. >>>> I would like to compare every variable between male and female >>>> plants for example using a ChiSquare Test. >>>> The Null-hypothesis could be: Variable male is equal to variable >>>> Female. >>>> The question behind all is, if male and female plants in this >>>> species are significantly different and which attributes are >>>> responsible for this difference. >>>> I really hope that this is better understandable. If not please >>>> ask. >>>> Thanks a million in advance. >>>> Greetings >>>> Birgit >>> >> Birgit Lemcke >> Institut für Systematische Botanik >> Zollikerstrasse 107 >> CH-8008 Zürich >> Switzerland >> Ph: +41 (0)44 634 8351 >> birgit.lemcke@systbot.uzh.ch <mailto:birgit.lemcke@systbot.uzh.ch>Birgit Lemcke Institut für Systematische Botanik Zollikerstrasse 107 CH-8008 Zürich Switzerland Ph: +41 (0)44 634 8351 birgit.lemcke@systbot.uzh.ch [[alternative HTML version deleted]]
Hello James, all of your suggestions work very well except of this: FemMal <- cbind(FemV1gez?hlt[2,], MalV1gez?hlt[2,]) colnames(FemMal) <- ("Females", "Males") Fehler: syntax error FeMMal [,1] [ ,2] 1 133 79 2 203 237 3 51 76 But it works if I do that: Namen<-c("Female","Male") colnames(FemMal) <- (Namen) FemMal Female Male 1 133 79 2 203 237 3 51 76 Greetings Birgit Am 04.10.2007 um 17:19 schrieb James Reilly:> > Hi Birgit, > > First, can I suggest that you don't copy off-list conversations to > the mailing list partway through? Not that I minded in this case, > but it probably confuses people and the posting guide warns against > it. > > I'll address your questions in reverse order. > > To get tables for each column, try: > apply(FemV1Test, 2, table) > > Likewise for males: > apply(MalV1, 2, table) > > To compare them, perhaps put them side by side: > FemMal <- cbind(apply(FemV1Test, 2, table)[2,], apply(MalV1, 2, > table)[2,]) > colnames(FemMal) <- ("Females", "Males") > FemMal > > You can then do arithmetic, plot them, sort by the difference, etc. > plot(FemMal) > FemMal[order(FemMal[,1]-FemMal[,2]),] > > About crossprod, cell (i,j) in the resulting matrix shows the > number of cases with a 1 for attribute i and attribute j. This > shows which attributes overlap most and least. > > The command "tab <- tab - diag(diag(tab))" puts zeroes down the > diagonal, as was requested. One cosmetic reason for doing this is > that the diagonal elements are often much larger than the off- > diagonal ones, and zeroing them makes the table easier to read or > display graphically. E.g. > http://pbil.univ-lyon1.fr/ADE-4/ade4-html/table.dist.html > > Yes, any row with all NAs will make the crossprod all NAs too. You > can ignore any rows with NAs as follows: > CrossFemMal1_3<-crossprod(as.matrix(CrossFemMalVar1_3[apply > (CrossFemMalVar1_3, 1, function (x) !any(is.na(x))),])) > > I'm not sure if I follow why you want to know about statistical > significance here. Do you really think of the species in your study > as a sample from a larger population of plant species, which you > are trying to generalise about? > > If so, is the population much larger than your sample? And was your > sample of species selected randomly, i.e. with equal selection > probabilities? If not, standard tests probably won't apply. > > Regards, > James > > > On 2/10/07 2:44 AM, Birgit Lemcke wrote: >> Hello James, >> first I have to thank you for your help but there are some things >> I don?t understand now. >> I am not sur if I understand what this example gives me back: >> ratings <- data.frame(id = c(1,2,3,4), att1 = c(1,1,0,1), att2 = c >> (1,0,0,1), att3 = c(0,1,1,1)) >> ratings >> id att1 att2 att3 >> 1 1 1 1 0 >> 2 2 1 0 1 >> 3 3 0 0 1 >> 4 4 1 1 1 >> tab <- crossprod(as.matrix(ratings[,-1])) >> tab <- tab - diag(diag(tab)) >> tab >> att1 att2 att3 >> att1 0 2 2 >> att2 2 0 1 >> att3 2 1 0 >> As I understood it gives me the number how often we find the same >> value for example comparing att1 and att2 for all id?s?. Is that >> right? >> What is this line doing: tab <- tab - diag(diag(tab)) >> And what does the original output of crosspod mean: >> att1 att2 att3 >> att1 3 2 2 >> att2 2 2 1 >> att3 2 1 3 >> I tried to do this with a part of my dataset >> I used a table with 3 variables (only binary) >> In the first part of the table I have the females (348 rows) and >> in the second part the males (also 348 rows). >> Then I tried this: >> CrossFemMal1_3<-crossprod(as.matrix(CrossFemMalVar1_3)) >> The output: >> CrossFemMal1_3 >> V1 V2 V3 >> V1 NA NA NA >> V2 NA NA NA >> V3 NA NA NA >> There was one row of NAs in my dataset. I presume this is >> responsible for the NA results? So how can I deal here with NAs? >> If I use two matrices (male and female) I get back amongst others >> the comparison of att1male to att1 female. In the case that I use >> the possibility of a percentage table output I get for example >> 40%. Can I say then that if the percentage is lower than 50% the >> attributes are significantly different? >> Corresponding to your other suggestion: >> sapply(c("1","2","3"), function(x) ifelse(regexpr(x, FemV1) > 0, >> 1, 0)) >> It gives me this output >> 1 2 3 >> [1,] 1 0 0 >> [2,] 1 0 0 >> [3,] 1 0 0 >> [4,] 1 0 0 >> [5,] 1 0 0 >> [6,] 1 0 0 >> [7,] 1 0 0 >> [8,] 1 0 0 >> [9,] 0 1 0 >> . . . . >> . . . . >> I think now I should count the ones for 1, 2 and 3? >> I tried to use table but it gives me only the counts for 1 and zero: >> table(FemV1Test) >> FemV1Test >> 0 1 >> 657 387 >> How can I specify that it gives me the counts for every column? >> And then do the same for MalV1 and compare both somehow? >> Another time thanks in advance for your help. >> Greetings Birgit >> Am 29.09.2007 um 14:45 schrieb James Reilly: >>> >>> Hi Birgit, >>> >>> The first argument to regexpr should be just one character value, >>> not a vector. Your call: >>> regexpr(c("1","2","3"),FemV1) >>> seems to have been interpreted as: >>> regexpr("1",FemV1) >>> >>> I think you probably need something more like: >>> sapply(c("1","2","3"), function(x) ifelse(regexpr(x, FemV1) > 0, >>> 1, 0)) >>> This will also work on multiple response data such as >>> FemV1 <- c("13", "2", "13", "123", "1", "23") >>> Then colSums will give you frequency counts for each attribute. >>> >>> I think you would need greatly simplify the multiple response >>> data to apply anything like a paired t-test. Have you considered >>> just crosstabulating the attributes of male plants versus the >>> females? For some R code, see >>> https://stat.ethz.ch/pipermail/r-help/2007-February/126125.html >>> >>> Regards, >>> James >>> >>> >>> On 29/9/07 3:37 AM, Birgit Lemcke wrote: >>>> Hello James, >>>> sorry that I have to ask you a second time but I don?t >>>> understand what regexpr () is doing and how the syntax works. >>>> I have a vectors that I converted to character string >>>> as.character(FalV1) >>>> [1] "1" "1" "1" "1" "1" "1" "1" "1" "2" >>>> after that I did this, but without knowing what I am really doing >>>> regexpr(c("1","2","3"),FemV1) >>>> The output looked like that >>>> [1] 1 1 1 1 1 1 1 1 -1 As i undertsood the function >>>> looks for in this case 1, 2 or 3. If there is a match it gives >>>> me back 1 if not it gives me back -1 >>>> But I don?t know how this helps me now si I hope you will >>>> explain me. >>>> And there is another problem I have. cor the continous variables >>>> I used a paired T-Test can I perform this approach also paired? >>>> Thanks a lot in advance. >>>> Greetings >>>> Birgit >>>> Am 21.09.2007 um 11:38 schrieb James Reilly: >>>>> >>>>> If I understand you right, you have several multiple response >>>>> variables (with the responses encoded in numeric strings) and >>>>> you want to see whether these are associated with sex. To >>>>> tabulate the data, I would convert your variables into >>>>> collections of dummy variables using regexpr(), then use table >>>>> (). You can use a modified chi-squared test with a Rao-Scott >>>>> correction on the resulting tables; see Thomas and Decady >>>>> (2004). Bootstrapping is another possible approach. >>>>> >>>>> @article{, >>>>> Author = {Thomas, D. Roland and Decady, Yves J.}, >>>>> Journal = {International Journal of Testing}, >>>>> Number = {1}, >>>>> Pages = {43 - 59}, >>>>> Title = {Testing for Association Using Multiple Response Survey >>>>> Data: Approximate Procedures Based on the Rao-Scott Approach.}, >>>>> Volume = {4}, >>>>> Year = {2004}, >>>>> Url=http://search.ebscohost.com/login.aspx? >>>>> direct=true&db=pbh&AN=13663214&site=ehost-live <http:// >>>>> search.ebscohost.com/login.aspx? >>>>> direct=true&db=pbh&AN=13663214&site=ehost-live <http:// >>>>> search.ebscohost.com/login.aspx? >>>>> direct=true&db=pbh&AN=13663214&site=ehost-live>> >>>>> } >>>>> >>>>> Hope this helps, >>>>> James >>>>> -- >>>>> James Reilly >>>>> Department of Statistics, University of Auckland >>>>> Private Bag 92019, Auckland, New Zealand >>>>> >>>>> On 21/9/07 7:14 AM, Birgit Lemcke wrote: >>>>>> First thanks for your answer. >>>>>> Now I try to explain better: >>>>>> I have species in the rows and morphological attributes in >>>>>> the columns coded by numbers (qualitative variables; nominal >>>>>> and ordinal). >>>>>> In one table for the male plants of every species and in the >>>>>> other table for the female plants of every species. The >>>>>> variables contain every possible occurrence in this species >>>>>> and this gender. >>>>>> I would like to compare every variable between male and female >>>>>> plants for example using a ChiSquare Test. >>>>>> The Null-hypothesis could be: Variable male is equal to >>>>>> variable Female. >>>>>> The question behind all is, if male and female plants in this >>>>>> species are significantly different and which attributes are >>>>>> responsible for this difference. >>>>>> I really hope that this is better understandable. If not >>>>>> please ask. >>>>>> Thanks a million in advance. >>>>>> Greetings >>>>>> Birgit >>>>> >>>> Birgit Lemcke >>>> Institut f?r Systematische Botanik >>>> Zollikerstrasse 107 >>>> CH-8008 Z?rich >>>> Switzerland >>>> Ph: +41 (0)44 634 8351 >>>> birgit.lemcke at systbot.uzh.ch <mailto:birgit.lemcke at systbot.uzh.ch> >> Birgit Lemcke >> Institut f?r Systematische Botanik >> Zollikerstrasse 107 >> CH-8008 Z?rich >> Switzerland >> Ph: +41 (0)44 634 8351 >> birgit.lemcke at systbot.uzh.ch <mailto:birgit.lemcke at systbot.uzh.ch> >> > > -- > James Reilly > Department of Statistics, University of Auckland > Private Bag 92019, Auckland, New ZealandBirgit Lemcke Institut f?r Systematische Botanik Zollikerstrasse 107 CH-8008 Z?rich Switzerland Ph: +41 (0)44 634 8351 birgit.lemcke at systbot.uzh.ch