thr3ads.net - R help - [R] Percentages in contingency tables *warning trivial question* [Dec 2004]

If this information is useful, please help other people find it:
Share via:

Rachel Pearce

2004-Dec-13 09:37 UTC

[R] Percentages in contingency tables warning trivial question

I hesitate to post this question in the light of recent threads, indeed
I have hesitated for several weeks, however I have come to a full stop
and really need some help if I am going to progress. I am a new user of
R for medical statistics. I have attempted to read all the relevant
documents, but would welcome any suggestions as to what I have missed.

I am trying to contruct "table 1" type contingency (mostly) tables. I
would like to include percentages, thus:

		Cases		Controls	Total
		N	%	N	%	N	%
Total		50	100	50	100	100	100


Sex: M	23 	46	27	54	50	50

etc...

I hesitate even more to mention it here, but I am thinking of something
along the lines of PROC TABULATE in SAS.

The closest I have found in the documentation I have read so far is an
example given in the help for "addmargins":

	Bee <- sample( c("Hum","Buzz"), 177, replace=TRUE )
	Sea <- sample(
c("White","Black","Red","Dead"), 177,
replace=TRUE )
	...
	# Weird function needed to return the N when computing
percentages
	sqsm <- function( x ) sum( x )^2/100
	B <- table(Sea, Bee)
	round(sweep(addmargins(B, 1, list(list(All=sum, N=sqsm))), 2,
	apply( B, 2, sum )/100, "/" ), 1)
	round(sweep(addmargins(B, 2, list(list(All=sum, N=sqsm))), 1,
	apply(B, 1, sum )/100, "/"), 1)

.. Which introduced me to "sweep" and maybe could be extended to do
what I want. But I don't like using mysterious "weird" functions.

I recently found Paul Johnson's Rtips where:
http://www.ku.edu/~pauljohn/R/Rtips.html#6.1 mentioned the function
prop.table, which is also close to what I want. But how to show Ns and
percentages im the same table? 

I wondered if there were a function which does this already. Or perhaps
I should just write one for myself? Or should I not be trying to do this
in R in the first place and go back to Excel (I no longer have access to
SAS)? Please, NO! Or perhaps I am looking for the wrong thing in the
manuals? 

I have followed recent advice to look at Frank E Harrell's detailed
tabulation code, but this seems to produce many errors on my system and
with my version of R (see below). I do not have access to LaTeX
(apologies for incorrect typography). I can provide details of the
errors if it turns out that the answer to my question is RTFM by Prof
Harrell.

I would like to add my two pennorth to the debate about "trivial"
questions, of which I assume this is one. I believe that a very large
amount of what is hard about learning R on one's own with documentation
but without a real person, is a matter of vocabulary. I only found sweep
and prop.table by chance since neither of them are indexed by words like
"proportion" or "percentage" which is what I had been
looking for.
Similarly I still do not know exactly what "sweep" does, since I have
never heard this verb used in a mathematical / statistical context, and
the help on sweep states that what it does is sweep. I have experienced
many similar examples in the last few weeks. This is not to say that
there is anything wrong with the help on these functions nor with the
help in general, but what R does not have is an extensive indexing
system by synonyms and uses. It is largely for reasons like this, I
believe, that trivial questions continue to be asked. If one does not
know the name of the function to do "verb" and one has tried
"verb" and
the synonyms which spring to mind and drawn a blank, where to next? 

Another reason for difficulty is that while a function may exist to do
something, it is sometimes hard to find the package where it is
contained, e.g. Frank Harrell's functions seem to be in a package called
Hmisc which is not listed in the drop-down box for "load package".

System and version information:

platform i386-pc-mingw32
arch     i386           
os       mingw32        
system   i386, mingw32  
status                  
major    2              
minor    0.1            
year     2004           
month    11             
day      15             
language R     

Rachel Pearce

British Society of Blood and Marrow Tranplantation

Chuck Cleland

2004-Dec-13 10:47 UTC

head link

[R] Percentages in contingency tables warning trivial question

You might want to look at CrossTable() in the gmodels package of the 
gregmisc bundle.  For example:

 > library(gmodels)
 > sex <- as.factor(sample(c("Male", "Female"), 100,
replace=TRUE))
 > case <- as.factor(sample(c("Case", "Control"), 100,
replace=TRUE))
 > CrossTable(sex, case)

    Cell Contents
|-----------------|
|               N |
|   N / Row Total |
|   N / Col Total |
| N / Table Total |
|-----------------|

Total Observations in Table:  100

              | case
          sex |      Case |   Control | Row Total |
-------------|-----------|-----------|-----------|
       Female |        21 |        29 |        50 |
              |     0.420 |     0.580 |     0.500 |
              |     0.420 |     0.580 |           |
              |     0.210 |     0.290 |           |
-------------|-----------|-----------|-----------|
         Male |        29 |        21 |        50 |
              |     0.580 |     0.420 |     0.500 |
              |     0.580 |     0.420 |           |
              |     0.290 |     0.210 |           |
-------------|-----------|-----------|-----------|
Column Total |        50 |        50 |       100 |
              |     0.500 |     0.500 |           |
-------------|-----------|-----------|-----------|

Rachel Pearce wrote:> I hesitate to post this question in the light of recent threads, indeed
> I have hesitated for several weeks, however I have come to a full stop
> and really need some help if I am going to progress. I am a new user of
> R for medical statistics. I have attempted to read all the relevant
> documents, but would welcome any suggestions as to what I have missed.
> 
> I am trying to contruct "table 1" type contingency (mostly)
tables. I
> would like to include percentages, thus:
> 
> 		Cases		Controls	Total
> 		N	%	N	%	N	%
> Total		50	100	50	100	100	100
> 
> 
> Sex: M	23 	46	27	54	50	50
> 
> etc...
> 
> I hesitate even more to mention it here, but I am thinking of something
> along the lines of PROC TABULATE in SAS.
> 
> The closest I have found in the documentation I have read so far is an
> example given in the help for "addmargins":
> 
> 	Bee <- sample( c("Hum","Buzz"), 177, replace=TRUE )
> 	Sea <- sample(
c("White","Black","Red","Dead"), 177,
> replace=TRUE )
> 	...
> 	# Weird function needed to return the N when computing
> percentages
> 	sqsm <- function( x ) sum( x )^2/100
> 	B <- table(Sea, Bee)
> 	round(sweep(addmargins(B, 1, list(list(All=sum, N=sqsm))), 2,
> 	apply( B, 2, sum )/100, "/" ), 1)
> 	round(sweep(addmargins(B, 2, list(list(All=sum, N=sqsm))), 1,
> 	apply(B, 1, sum )/100, "/"), 1)
> 
> .. Which introduced me to "sweep" and maybe could be extended to
do
> what I want. But I don't like using mysterious "weird"
functions.
> 
> I recently found Paul Johnson's Rtips where:
> http://www.ku.edu/~pauljohn/R/Rtips.html#6.1 mentioned the function
> prop.table, which is also close to what I want. But how to show Ns and
> percentages im the same table? 
> 
> I wondered if there were a function which does this already. Or perhaps
> I should just write one for myself? Or should I not be trying to do this
> in R in the first place and go back to Excel (I no longer have access to
> SAS)? Please, NO! Or perhaps I am looking for the wrong thing in the
> manuals? 
> 
> I have followed recent advice to look at Frank E Harrell's detailed
> tabulation code, but this seems to produce many errors on my system and
> with my version of R (see below). I do not have access to LaTeX
> (apologies for incorrect typography). I can provide details of the
> errors if it turns out that the answer to my question is RTFM by Prof
> Harrell.
> 
> I would like to add my two pennorth to the debate about "trivial"
> questions, of which I assume this is one. I believe that a very large
> amount of what is hard about learning R on one's own with documentation
> but without a real person, is a matter of vocabulary. I only found sweep
> and prop.table by chance since neither of them are indexed by words like
> "proportion" or "percentage" which is what I had been
looking for.
> Similarly I still do not know exactly what "sweep" does, since I
have
> never heard this verb used in a mathematical / statistical context, and
> the help on sweep states that what it does is sweep. I have experienced
> many similar examples in the last few weeks. This is not to say that
> there is anything wrong with the help on these functions nor with the
> help in general, but what R does not have is an extensive indexing
> system by synonyms and uses. It is largely for reasons like this, I
> believe, that trivial questions continue to be asked. If one does not
> know the name of the function to do "verb" and one has tried
"verb" and
> the synonyms which spring to mind and drawn a blank, where to next? 
> 
> Another reason for difficulty is that while a function may exist to do
> something, it is sometimes hard to find the package where it is
> contained, e.g. Frank Harrell's functions seem to be in a package
called
> Hmisc which is not listed in the drop-down box for "load
package".
> 
> System and version information:
> 
> platform i386-pc-mingw32
> arch     i386           
> os       mingw32        
> system   i386, mingw32  
> status                  
> major    2              
> minor    0.1            
> year     2004           
> month    11             
> day      15             
> language R     
> 
> Rachel Pearce
> 
> British Society of Blood and Marrow Tranplantation
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
> 
-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 452-1424 (M, W, F)
fax: (917) 438-0894

BXC (Bendix Carstensen)

2004-Dec-13 11:36 UTC

head link

[R] Percentages in contingency tables warning trivial question

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Rachel Pearce
> Sent: Monday, December 13, 2004 10:37 AM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Percentages in contingency tables *warning 
> trivial question*
> 
> 
> I hesitate to post this question in the light of recent 
> threads, indeed I have hesitated for several weeks, however I 
> have come to a full stop and really need some help if I am 
> going to progress. I am a new user of R for medical 
> statistics. I have attempted to read all the relevant 
> documents, but would welcome any suggestions as to what I have missed.
> 
> I am trying to contruct "table 1" type contingency (mostly) 
> tables. I would like to include percentages, thus:
> 
> 		Cases		Controls	Total
> 		N	%	N	%	N	%
> Total		50	100	50	100	100	100
> 
> 
> Sex: M	23 	46	27	54	50	50
> 
> etc...
> 
> I hesitate even more to mention it here, but I am thinking of 
> something along the lines of PROC TABULATE in SAS.
This is one of the holes in the tabulation features in R.
The simplest feature needed in the one in addmargins, but
tabulation is still rudimentary in R.

I'm afraid that what you want would reqire:

1. Make the table of counts
2. Make the table of percentages by sweeping out a margin
   ( i.e. take the margin and divide the entire table by that,
     - sweeping is just the generalization of this; use any
     desired function instesd of "/" )
3. Define a new table with an extra dimension (c("N","pct"))
and
   fill in the two original tables there.

The last step is necessary in the absence of a generalized cbind/rbind
for tables/arrays.

Please correct me if such a thing exists. If it does, it should be
referenced under "see also" in the help page for cbind.

The weird example in addmargins only covers the case where a table of
percentages is wanted with a margin of total counts, not the general
problem.

Somebody should sit down a write a reasonable tabulation feature for R,
but the problem in itself is complcated, so the syntax is likely to be
arcane. For example, take a look at the syntax for proc tabulate in SAS,
which is very strange, but given the features it covers (which are all 
desirable) it is difficult to come up with something simpler.

Bendix Carstensen
----------------------
Bendix Carstensen
Senior Statistician
Steno Diabetes Center
Niels Steensens Vej 2
DK-2820 Gentofte
Denmark
tel: +45 44 43 87 38
mob: +45 30 75 87 38
fax: +45 44 43 07 06
bxc at steno.dk
www.biostat.ku.dk/~bxc
----------------------
> The closest I have found in the documentation I have read so 
> far is an example given in the help for "addmargins":
> 
> 	Bee <- sample( c("Hum","Buzz"), 177, replace=TRUE )
> 	Sea <- sample(
c("White","Black","Red","Dead"), 177,
> replace=TRUE )
> 	...
> 	# Weird function needed to return the N when computing 
> percentages
> 	sqsm <- function( x ) sum( x )^2/100
> 	B <- table(Sea, Bee)
> 	round(sweep(addmargins(B, 1, list(list(All=sum, N=sqsm))), 2,
> 	apply( B, 2, sum )/100, "/" ), 1)
> 	round(sweep(addmargins(B, 2, list(list(All=sum, N=sqsm))), 1,
> 	apply(B, 1, sum )/100, "/"), 1)
> 
> .. Which introduced me to "sweep" and maybe could be extended 
> to do what I want. But I don't like using mysterious "weird" 
> functions.
> 
> I recently found Paul Johnson's Rtips where: 
> http://www.ku.edu/~pauljohn/R/Rtips.html#6.1 mentioned the 
> function prop.table, which is also close to what I want. But 
> how to show Ns and percentages im the same table? 
> 
> I wondered if there were a function which does this already. 
> Or perhaps I should just write one for myself? Or should I 
> not be trying to do this in R in the first place and go back 
> to Excel (I no longer have access to SAS)? Please, NO! Or 
> perhaps I am looking for the wrong thing in the manuals? 
> 
> I have followed recent advice to look at Frank E Harrell's 
> detailed tabulation code, but this seems to produce many 
> errors on my system and with my version of R (see below). I 
> do not have access to LaTeX (apologies for incorrect 
> typography). I can provide details of the errors if it turns 
> out that the answer to my question is RTFM by Prof Harrell.
> 
> I would like to add my two pennorth to the debate about 
> "trivial" questions, of which I assume this is one. I believe 
> that a very large amount of what is hard about learning R on 
> one's own with documentation but without a real person, is a 
> matter of vocabulary. I only found sweep and prop.table by 
> chance since neither of them are indexed by words like 
> "proportion" or "percentage" which is what I had been
looking
> for. Similarly I still do not know exactly what "sweep" does, 
> since I have never heard this verb used in a mathematical / 
> statistical context, and the help on sweep states that what 
> it does is sweep. I have experienced many similar examples in 
> the last few weeks. This is not to say that there is anything 
> wrong with the help on these functions nor with the help in 
> general, but what R does not have is an extensive indexing 
> system by synonyms and uses. It is largely for reasons like 
> this, I believe, that trivial questions continue to be asked. 
> If one does not know the name of the function to do "verb" 
> and one has tried "verb" and the synonyms which spring to 
> mind and drawn a blank, where to next? 
> 
> Another reason for difficulty is that while a function may 
> exist to do something, it is sometimes hard to find the 
> package where it is contained, e.g. Frank Harrell's functions 
> seem to be in a package called Hmisc which is not listed in 
> the drop-down box for "load package".
> 
> System and version information:
> 
> platform i386-pc-mingw32
> arch     i386           
> os       mingw32        
> system   i386, mingw32  
> status                  
> major    2              
> minor    0.1            
> year     2004           
> month    11             
> day      15             
> language R     
> 
> Rachel Pearce
> 
> British Society of Blood and Marrow Tranplantation
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read 
> the posting guide! http://www.R-project.org/posting-guide.html
>

Dirk Enzmann

2004-Dec-16 16:45 UTC

head link

[R] Percentages in contingency tables warning trivial question

Being still unsatisfied with the CrossTable() function I modified the 
code so that the function will create an output similar to the SPSS 
procedure CROSSTABS. Most probably the code will not meet most R 
programmers' standards, perhaps someone else is willing to optimize it. 
Unfortunately, as an R beginner I am not able to write a documentation 
file (perhaps someone is willing to put some effort in it, too)- the 
parameters that can be used can be found next to "function".

Including the function code here would cause nasty line breaks, you can 
find it at

http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Software/crosstabs.r

Dirk

At Mon, 13 Dec 2004 05:47:17 -0500 Chuck Cleland 
<ccleland at optonline.net> wrote:

(snip)
 > You might want to look at CrossTable() in the gmodels package
 > of the gregmisc bundle.
(snip)

-- 
*************************************************
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Schlueterstr. 28
D-20146 Hamburg
Germany

phone: +49-040-42838.7498 (office)
        +49-040-42838.4591 (Billon)
fax:   +49-040-42838.2344
email: dirk.enzmann at jura.uni-hamburg.de
www: 
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html

Possibly Parallel Threads

Search for more maybe matching threads

R help - Dec 2004 - Percentages in contingency tables *warning trivial question*

[R] Percentages in contingency tables *warning trivial question*

[R] Percentages in contingency tables *warning trivial question*

[R] Percentages in contingency tables *warning trivial question*

[R] Percentages in contingency tables *warning trivial question*

Possibly Parallel Threads

R help - Dec 2004 - Percentages in contingency tables warning trivial question

[R] Percentages in contingency tables warning trivial question

[R] Percentages in contingency tables warning trivial question

[R] Percentages in contingency tables warning trivial question

[R] Percentages in contingency tables warning trivial question