Rachel Pearce
2004-Dec-13 09:37 UTC
[R] Percentages in contingency tables *warning trivial question*
I hesitate to post this question in the light of recent threads, indeed I have hesitated for several weeks, however I have come to a full stop and really need some help if I am going to progress. I am a new user of R for medical statistics. I have attempted to read all the relevant documents, but would welcome any suggestions as to what I have missed. I am trying to contruct "table 1" type contingency (mostly) tables. I would like to include percentages, thus: Cases Controls Total N % N % N % Total 50 100 50 100 100 100 Sex: M 23 46 27 54 50 50 etc... I hesitate even more to mention it here, but I am thinking of something along the lines of PROC TABULATE in SAS. The closest I have found in the documentation I have read so far is an example given in the help for "addmargins": Bee <- sample( c("Hum","Buzz"), 177, replace=TRUE ) Sea <- sample( c("White","Black","Red","Dead"), 177, replace=TRUE ) ... # Weird function needed to return the N when computing percentages sqsm <- function( x ) sum( x )^2/100 B <- table(Sea, Bee) round(sweep(addmargins(B, 1, list(list(All=sum, N=sqsm))), 2, apply( B, 2, sum )/100, "/" ), 1) round(sweep(addmargins(B, 2, list(list(All=sum, N=sqsm))), 1, apply(B, 1, sum )/100, "/"), 1) .. Which introduced me to "sweep" and maybe could be extended to do what I want. But I don't like using mysterious "weird" functions. I recently found Paul Johnson's Rtips where: http://www.ku.edu/~pauljohn/R/Rtips.html#6.1 mentioned the function prop.table, which is also close to what I want. But how to show Ns and percentages im the same table? I wondered if there were a function which does this already. Or perhaps I should just write one for myself? Or should I not be trying to do this in R in the first place and go back to Excel (I no longer have access to SAS)? Please, NO! Or perhaps I am looking for the wrong thing in the manuals? I have followed recent advice to look at Frank E Harrell's detailed tabulation code, but this seems to produce many errors on my system and with my version of R (see below). I do not have access to LaTeX (apologies for incorrect typography). I can provide details of the errors if it turns out that the answer to my question is RTFM by Prof Harrell. I would like to add my two pennorth to the debate about "trivial" questions, of which I assume this is one. I believe that a very large amount of what is hard about learning R on one's own with documentation but without a real person, is a matter of vocabulary. I only found sweep and prop.table by chance since neither of them are indexed by words like "proportion" or "percentage" which is what I had been looking for. Similarly I still do not know exactly what "sweep" does, since I have never heard this verb used in a mathematical / statistical context, and the help on sweep states that what it does is sweep. I have experienced many similar examples in the last few weeks. This is not to say that there is anything wrong with the help on these functions nor with the help in general, but what R does not have is an extensive indexing system by synonyms and uses. It is largely for reasons like this, I believe, that trivial questions continue to be asked. If one does not know the name of the function to do "verb" and one has tried "verb" and the synonyms which spring to mind and drawn a blank, where to next? Another reason for difficulty is that while a function may exist to do something, it is sometimes hard to find the package where it is contained, e.g. Frank Harrell's functions seem to be in a package called Hmisc which is not listed in the drop-down box for "load package". System and version information: platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 0.1 year 2004 month 11 day 15 language R Rachel Pearce British Society of Blood and Marrow Tranplantation
Chuck Cleland
2004-Dec-13 10:47 UTC
[R] Percentages in contingency tables *warning trivial question*
You might want to look at CrossTable() in the gmodels package of the gregmisc bundle. For example: > library(gmodels) > sex <- as.factor(sample(c("Male", "Female"), 100, replace=TRUE)) > case <- as.factor(sample(c("Case", "Control"), 100, replace=TRUE)) > CrossTable(sex, case) Cell Contents |-----------------| | N | | N / Row Total | | N / Col Total | | N / Table Total | |-----------------| Total Observations in Table: 100 | case sex | Case | Control | Row Total | -------------|-----------|-----------|-----------| Female | 21 | 29 | 50 | | 0.420 | 0.580 | 0.500 | | 0.420 | 0.580 | | | 0.210 | 0.290 | | -------------|-----------|-----------|-----------| Male | 29 | 21 | 50 | | 0.580 | 0.420 | 0.500 | | 0.580 | 0.420 | | | 0.290 | 0.210 | | -------------|-----------|-----------|-----------| Column Total | 50 | 50 | 100 | | 0.500 | 0.500 | | -------------|-----------|-----------|-----------| Rachel Pearce wrote:> I hesitate to post this question in the light of recent threads, indeed > I have hesitated for several weeks, however I have come to a full stop > and really need some help if I am going to progress. I am a new user of > R for medical statistics. I have attempted to read all the relevant > documents, but would welcome any suggestions as to what I have missed. > > I am trying to contruct "table 1" type contingency (mostly) tables. I > would like to include percentages, thus: > > Cases Controls Total > N % N % N % > Total 50 100 50 100 100 100 > > > Sex: M 23 46 27 54 50 50 > > etc... > > I hesitate even more to mention it here, but I am thinking of something > along the lines of PROC TABULATE in SAS. > > The closest I have found in the documentation I have read so far is an > example given in the help for "addmargins": > > Bee <- sample( c("Hum","Buzz"), 177, replace=TRUE ) > Sea <- sample( c("White","Black","Red","Dead"), 177, > replace=TRUE ) > ... > # Weird function needed to return the N when computing > percentages > sqsm <- function( x ) sum( x )^2/100 > B <- table(Sea, Bee) > round(sweep(addmargins(B, 1, list(list(All=sum, N=sqsm))), 2, > apply( B, 2, sum )/100, "/" ), 1) > round(sweep(addmargins(B, 2, list(list(All=sum, N=sqsm))), 1, > apply(B, 1, sum )/100, "/"), 1) > > .. Which introduced me to "sweep" and maybe could be extended to do > what I want. But I don't like using mysterious "weird" functions. > > I recently found Paul Johnson's Rtips where: > http://www.ku.edu/~pauljohn/R/Rtips.html#6.1 mentioned the function > prop.table, which is also close to what I want. But how to show Ns and > percentages im the same table? > > I wondered if there were a function which does this already. Or perhaps > I should just write one for myself? Or should I not be trying to do this > in R in the first place and go back to Excel (I no longer have access to > SAS)? Please, NO! Or perhaps I am looking for the wrong thing in the > manuals? > > I have followed recent advice to look at Frank E Harrell's detailed > tabulation code, but this seems to produce many errors on my system and > with my version of R (see below). I do not have access to LaTeX > (apologies for incorrect typography). I can provide details of the > errors if it turns out that the answer to my question is RTFM by Prof > Harrell. > > I would like to add my two pennorth to the debate about "trivial" > questions, of which I assume this is one. I believe that a very large > amount of what is hard about learning R on one's own with documentation > but without a real person, is a matter of vocabulary. I only found sweep > and prop.table by chance since neither of them are indexed by words like > "proportion" or "percentage" which is what I had been looking for. > Similarly I still do not know exactly what "sweep" does, since I have > never heard this verb used in a mathematical / statistical context, and > the help on sweep states that what it does is sweep. I have experienced > many similar examples in the last few weeks. This is not to say that > there is anything wrong with the help on these functions nor with the > help in general, but what R does not have is an extensive indexing > system by synonyms and uses. It is largely for reasons like this, I > believe, that trivial questions continue to be asked. If one does not > know the name of the function to do "verb" and one has tried "verb" and > the synonyms which spring to mind and drawn a blank, where to next? > > Another reason for difficulty is that while a function may exist to do > something, it is sometimes hard to find the package where it is > contained, e.g. Frank Harrell's functions seem to be in a package called > Hmisc which is not listed in the drop-down box for "load package". > > System and version information: > > platform i386-pc-mingw32 > arch i386 > os mingw32 > system i386, mingw32 > status > major 2 > minor 0.1 > year 2004 > month 11 > day 15 > language R > > Rachel Pearce > > British Society of Blood and Marrow Tranplantation > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 452-1424 (M, W, F) fax: (917) 438-0894
BXC (Bendix Carstensen)
2004-Dec-13 11:36 UTC
[R] Percentages in contingency tables *warning trivial question*
> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Rachel Pearce > Sent: Monday, December 13, 2004 10:37 AM > To: r-help at stat.math.ethz.ch > Subject: [R] Percentages in contingency tables *warning > trivial question* > > > I hesitate to post this question in the light of recent > threads, indeed I have hesitated for several weeks, however I > have come to a full stop and really need some help if I am > going to progress. I am a new user of R for medical > statistics. I have attempted to read all the relevant > documents, but would welcome any suggestions as to what I have missed. > > I am trying to contruct "table 1" type contingency (mostly) > tables. I would like to include percentages, thus: > > Cases Controls Total > N % N % N % > Total 50 100 50 100 100 100 > > > Sex: M 23 46 27 54 50 50 > > etc... > > I hesitate even more to mention it here, but I am thinking of > something along the lines of PROC TABULATE in SAS.This is one of the holes in the tabulation features in R. The simplest feature needed in the one in addmargins, but tabulation is still rudimentary in R. I'm afraid that what you want would reqire: 1. Make the table of counts 2. Make the table of percentages by sweeping out a margin ( i.e. take the margin and divide the entire table by that, - sweeping is just the generalization of this; use any desired function instesd of "/" ) 3. Define a new table with an extra dimension (c("N","pct")) and fill in the two original tables there. The last step is necessary in the absence of a generalized cbind/rbind for tables/arrays. Please correct me if such a thing exists. If it does, it should be referenced under "see also" in the help page for cbind. The weird example in addmargins only covers the case where a table of percentages is wanted with a margin of total counts, not the general problem. Somebody should sit down a write a reasonable tabulation feature for R, but the problem in itself is complcated, so the syntax is likely to be arcane. For example, take a look at the syntax for proc tabulate in SAS, which is very strange, but given the features it covers (which are all desirable) it is difficult to come up with something simpler. Bendix Carstensen ---------------------- Bendix Carstensen Senior Statistician Steno Diabetes Center Niels Steensens Vej 2 DK-2820 Gentofte Denmark tel: +45 44 43 87 38 mob: +45 30 75 87 38 fax: +45 44 43 07 06 bxc at steno.dk www.biostat.ku.dk/~bxc ----------------------> The closest I have found in the documentation I have read so > far is an example given in the help for "addmargins": > > Bee <- sample( c("Hum","Buzz"), 177, replace=TRUE ) > Sea <- sample( c("White","Black","Red","Dead"), 177, > replace=TRUE ) > ... > # Weird function needed to return the N when computing > percentages > sqsm <- function( x ) sum( x )^2/100 > B <- table(Sea, Bee) > round(sweep(addmargins(B, 1, list(list(All=sum, N=sqsm))), 2, > apply( B, 2, sum )/100, "/" ), 1) > round(sweep(addmargins(B, 2, list(list(All=sum, N=sqsm))), 1, > apply(B, 1, sum )/100, "/"), 1) > > .. Which introduced me to "sweep" and maybe could be extended > to do what I want. But I don't like using mysterious "weird" > functions. > > I recently found Paul Johnson's Rtips where: > http://www.ku.edu/~pauljohn/R/Rtips.html#6.1 mentioned the > function prop.table, which is also close to what I want. But > how to show Ns and percentages im the same table? > > I wondered if there were a function which does this already. > Or perhaps I should just write one for myself? Or should I > not be trying to do this in R in the first place and go back > to Excel (I no longer have access to SAS)? Please, NO! Or > perhaps I am looking for the wrong thing in the manuals? > > I have followed recent advice to look at Frank E Harrell's > detailed tabulation code, but this seems to produce many > errors on my system and with my version of R (see below). I > do not have access to LaTeX (apologies for incorrect > typography). I can provide details of the errors if it turns > out that the answer to my question is RTFM by Prof Harrell. > > I would like to add my two pennorth to the debate about > "trivial" questions, of which I assume this is one. I believe > that a very large amount of what is hard about learning R on > one's own with documentation but without a real person, is a > matter of vocabulary. I only found sweep and prop.table by > chance since neither of them are indexed by words like > "proportion" or "percentage" which is what I had been looking > for. Similarly I still do not know exactly what "sweep" does, > since I have never heard this verb used in a mathematical / > statistical context, and the help on sweep states that what > it does is sweep. I have experienced many similar examples in > the last few weeks. This is not to say that there is anything > wrong with the help on these functions nor with the help in > general, but what R does not have is an extensive indexing > system by synonyms and uses. It is largely for reasons like > this, I believe, that trivial questions continue to be asked. > If one does not know the name of the function to do "verb" > and one has tried "verb" and the synonyms which spring to > mind and drawn a blank, where to next? > > Another reason for difficulty is that while a function may > exist to do something, it is sometimes hard to find the > package where it is contained, e.g. Frank Harrell's functions > seem to be in a package called Hmisc which is not listed in > the drop-down box for "load package". > > System and version information: > > platform i386-pc-mingw32 > arch i386 > os mingw32 > system i386, mingw32 > status > major 2 > minor 0.1 > year 2004 > month 11 > day 15 > language R > > Rachel Pearce > > British Society of Blood and Marrow Tranplantation > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read > the posting guide! http://www.R-project.org/posting-guide.html >
Dirk Enzmann
2004-Dec-16 16:45 UTC
[R] Percentages in contingency tables *warning trivial question*
Being still unsatisfied with the CrossTable() function I modified the code so that the function will create an output similar to the SPSS procedure CROSSTABS. Most probably the code will not meet most R programmers' standards, perhaps someone else is willing to optimize it. Unfortunately, as an R beginner I am not able to write a documentation file (perhaps someone is willing to put some effort in it, too)- the parameters that can be used can be found next to "function". Including the function code here would cause nasty line breaks, you can find it at http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Software/crosstabs.r Dirk At Mon, 13 Dec 2004 05:47:17 -0500 Chuck Cleland <ccleland at optonline.net> wrote: (snip) > You might want to look at CrossTable() in the gmodels package > of the gregmisc bundle. (snip) -- ************************************************* Dr. Dirk Enzmann Institute of Criminal Sciences Dept. of Criminology Schlueterstr. 28 D-20146 Hamburg Germany phone: +49-040-42838.7498 (office) +49-040-42838.4591 (Billon) fax: +49-040-42838.2344 email: dirk.enzmann at jura.uni-hamburg.de www: http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html