Perhaps this is a common question but I haven't been able to find the answer. I have data with many factors, each taking many values. However, only relatively few combinations appear in the data, ie have nonzero counts, in other words the resulting table is sparse. Say we have 10 factors each with 10 levels. The result of table() would exceed the memory space (on a 32bit machine). Is there any way to produce a table with empty cells omitted? (without first producing the whole table and then removing rows.) Thanks, Steve -- View this message in context: http://www.nabble.com/omit-empty-cells-in-crosstab--tp23222263p23222263.html Sent from the R help mailing list archive at Nabble.com.
Hi Steve, The general answer is yes, but the specific will depend on your problem. Could you provide a small reproducible example to illustrate your problem? Hadley On Fri, Apr 24, 2009 at 1:19 PM, sjaffe <sjaffe at riskspan.com> wrote:> > Perhaps this is a common question but I haven't been able to find the answer. > > I have data with many factors, each taking many values. However, only > relatively few combinations appear in the data, ie have nonzero counts, in > other words the resulting table is sparse. Say we have 10 factors each with > 10 levels. The result of table() would exceed the memory space (on a 32bit > machine). Is there any way to produce a table with empty cells omitted? > (without first producing the whole table and then removing rows.) > > Thanks, > Steve > > -- > View this message in context: http://www.nabble.com/omit-empty-cells-in-crosstab--tp23222263p23222263.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- http://had.co.nz/
sjaffe <sjaffe <at> riskspan.com> writes:> > I have data with many factors, each taking many values. However, only > relatively few combinations appear in the data, ie have nonzero counts, in > other words the resulting table is sparse. Say we have 10 factors each with > 10 levels. The result of table() would exceed the memory space (on a 32bit > machine). Is there any way to produce a table with empty cells omitted? > (without first producing the whole table and then removing rows.)It would be easier if you had a reproducible base example, but I suggest to create ONE new factor of the pasted levels using unique(), and creating a table of these. Dieter
small example: a<-c(1.1, 2.1, 9.1) b<-cut(a,0:10) c<-data.frame(b,b) d<-table(c) dim(d) ##result: c(10, 10) But only 9 of the 100 cells are non-zero. If there were 10 columns, the table have 10 dimensions each of length 10, so have 10^10 elements, too much even to fit in memory Dieter Menne wrote:> > sjaffe <sjaffe <at> riskspan.com> writes: > >> >> I have data with many factors, each taking many values. However, only >> relatively few combinations appear in the data, ie have nonzero counts, >> in >> other words the resulting table is sparse. Say we have 10 factors each >> with >> 10 levels. The result of table() would exceed the memory space (on a >> 32bit >> machine). Is there any way to produce a table with empty cells omitted? >> (without first producing the whole table and then removing rows.) > > It would be easier if you had a reproducible base example, but I > suggest to create ONE new factor of the pasted levels using unique(), > and creating a table of these. > > Dieter > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- View this message in context: http://www.nabble.com/omit-empty-cells-in-crosstab--tp23222263p23224071.html Sent from the R help mailing list archive at Nabble.com.
On Fri, 2009-04-24 at 13:12 -0700, sjaffe wrote:> small example: > > a<-c(1.1, 2.1, 9.1) > b<-cut(a,0:10) > c<-data.frame(b,b) > d<-table(c) > dim(d) > ##result: c(10, 10) > > But only 9 of the 100 cells are non-zero. > If there were 10 columns, the table have 10 dimensions each of length 10, so > have 10^10 elements, too much even to fit in memoryHi Steve In your only 3 cells > 0> db.1 b (0,1] (1,2] (2,3] (3,4] (4,5] (5,6] (6,7] (7,8] (8,9] (9,10] (0,1] 0 0 0 0 0 0 0 0 0 0 (1,2] 0 1 0 0 0 0 0 0 0 0 (2,3] 0 0 1 0 0 0 0 0 0 0 (3,4] 0 0 0 0 0 0 0 0 0 0 (4,5] 0 0 0 0 0 0 0 0 0 0 (5,6] 0 0 0 0 0 0 0 0 0 0 (6,7] 0 0 0 0 0 0 0 0 0 0 (7,8] 0 0 0 0 0 0 0 0 0 0 (8,9] 0 0 0 0 0 0 0 0 0 0 (9,10] 0 0 0 0 0 0 0 0 0 1 If you desire use simple code to find only cell>0 use this table(interaction(c,drop=T)) (1,2].(1,2] (2,3].(2,3] (9,10].(9,10] 1 1 1 -- Bernardo Rangel Tura, M.D,MPH,Ph.D National Institute of Cardiology Brazil