Michael Friendly
2013-Dec-13 19:41 UTC
[R] extracting non-NA entries from a two-way frequency table
I have data in the form of a two-way table recording the number of families with varying numbers of boys (rows) and girls (columns: g0 -- g12) below, also given in dput() format. I want to convert this to a data frame containing only the non-NA entries, with columns boys, girls, Freq, where Freq is the table entry. Can anyone help with this? I suppose that the steps are to transpose each row to a column identifying the number of girls, and then delete the NAs, but I can't quite see how to do this. > Geissler boys g0 g1 g2 g3 g4 g5 g6 g7 g8 g9 g10 g11 g12 1 12 7 NA NA NA NA NA NA NA NA NA NA NA NA 2 11 24 45 NA NA NA NA NA NA NA NA NA NA NA 3 10 30 93 181 NA NA NA NA NA NA NA NA NA NA 4 9 90 287 492 478 NA NA NA NA NA NA NA NA NA 5 8 264 713 1027 1077 829 NA NA NA NA NA NA NA NA 6 7 631 1655 2418 2309 1801 1112 NA NA NA NA NA NA NA 7 6 1579 3725 4948 4757 3470 2310 1343 NA NA NA NA NA NA 8 5 3666 7908 9547 8498 6436 3878 2161 1033 NA NA NA NA NA 9 4 8628 16340 17332 14479 10263 5917 3072 1540 670 NA NA NA NA 10 3 20540 31611 30175 22221 13972 7603 3895 1783 837 286 NA NA NA 11 2 47819 57179 44793 28630 15700 8171 3951 1776 722 275 104 NA NA 12 1 114609 89213 53789 28101 13740 6233 2719 1152 432 151 72 24 NA 13 0 NA 108719 42860 17395 7004 2839 1096 436 161 66 30 8 3 Geissler <- structure(list(boys = c(12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 0L), g0 = c(7L, 24L, 30L, 90L, 264L, 631L, 1579L, 3666L, 8628L, 20540L, 47819L, 114609L, NA), g1 = c(NA, 45L, 93L, 287L, 713L, 1655L, 3725L, 7908L, 16340L, 31611L, 57179L, 89213L, 108719L), g2 = c(NA, NA, 181L, 492L, 1027L, 2418L, 4948L, 9547L, 17332L, 30175L, 44793L, 53789L, 42860L), g3 = c(NA, NA, NA, 478L, 1077L, 2309L, 4757L, 8498L, 14479L, 22221L, 28630L, 28101L, 17395L ), g4 = c(NA, NA, NA, NA, 829L, 1801L, 3470L, 6436L, 10263L, 13972L, 15700L, 13740L, 7004L), g5 = c(NA, NA, NA, NA, NA, 1112L, 2310L, 3878L, 5917L, 7603L, 8171L, 6233L, 2839L), g6 = c(NA, NA, NA, NA, NA, NA, 1343L, 2161L, 3072L, 3895L, 3951L, 2719L, 1096L), g7 = c(NA, NA, NA, NA, NA, NA, NA, 1033L, 1540L, 1783L, 1776L, 1152L, 436L), g8 = c(NA, NA, NA, NA, NA, NA, NA, NA, 670L, 837L, 722L, 432L, 161L), g9 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 286L, 275L, 151L, 66L), g10 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 104L, 72L, 30L), g11 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 24L, 8L), g12 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 3L)), .Names = c("boys", "g0", "g1", "g2", "g3", "g4", "g5", "g6", "g7", "g8", "g9", "g10", "g11", "g12"), class = "data.frame", row.names = c(NA, -13L)) -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. & Chair, Quantitative Methods York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA
rmailbox at justemail.net
2013-Dec-13 20:01 UTC
[R] extracting non-NA entries from a two-way frequency table
Perhaps this? library(reshape2) library(stringr) GeisslerLong <- melt (Geissler, id.vars = c("boys")) GeisslerLong <- transform ( GeisslerLong, girls = as.numeric ( str_replace( variable, "g", '' )) ) GeisslerLong <- rename ( GeisslerLong, c( value = "Freq")) GeisslerLong <- arrange ( GeisslerLong, boys, girls) GeisslerLong <- subset ( GeisslerLong, !is.na ( Freq), select = c( boys, girls, Freq)) Eric ----- Original message ----- From: Michael Friendly <friendly at yorku.ca> To: "R-help" <r-help at r-project.org> Subject: [R] extracting non-NA entries from a two-way frequency table Date: Fri, 13 Dec 2013 14:41:58 -0500 I have data in the form of a two-way table recording the number of families with varying numbers of boys (rows) and girls (columns: g0 -- g12) below, also given in dput() format. I want to convert this to a data frame containing only the non-NA entries, with columns boys, girls, Freq, where Freq is the table entry. Can anyone help with this? I suppose that the steps are to transpose each row to a column identifying the number of girls, and then delete the NAs, but I can't quite see how to do this. > Geissler boys g0 g1 g2 g3 g4 g5 g6 g7 g8 g9 g10 g11 g12 1 12 7 NA NA NA NA NA NA NA NA NA NA NA NA 2 11 24 45 NA NA NA NA NA NA NA NA NA NA NA 3 10 30 93 181 NA NA NA NA NA NA NA NA NA NA 4 9 90 287 492 478 NA NA NA NA NA NA NA NA NA 5 8 264 713 1027 1077 829 NA NA NA NA NA NA NA NA 6 7 631 1655 2418 2309 1801 1112 NA NA NA NA NA NA NA 7 6 1579 3725 4948 4757 3470 2310 1343 NA NA NA NA NA NA 8 5 3666 7908 9547 8498 6436 3878 2161 1033 NA NA NA NA NA 9 4 8628 16340 17332 14479 10263 5917 3072 1540 670 NA NA NA NA 10 3 20540 31611 30175 22221 13972 7603 3895 1783 837 286 NA NA NA 11 2 47819 57179 44793 28630 15700 8171 3951 1776 722 275 104 NA NA 12 1 114609 89213 53789 28101 13740 6233 2719 1152 432 151 72 24 NA 13 0 NA 108719 42860 17395 7004 2839 1096 436 161 66 30 8 3 Geissler <- structure(list(boys = c(12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L, 0L), g0 = c(7L, 24L, 30L, 90L, 264L, 631L, 1579L, 3666L, 8628L, 20540L, 47819L, 114609L, NA), g1 = c(NA, 45L, 93L, 287L, 713L, 1655L, 3725L, 7908L, 16340L, 31611L, 57179L, 89213L, 108719L), g2 = c(NA, NA, 181L, 492L, 1027L, 2418L, 4948L, 9547L, 17332L, 30175L, 44793L, 53789L, 42860L), g3 = c(NA, NA, NA, 478L, 1077L, 2309L, 4757L, 8498L, 14479L, 22221L, 28630L, 28101L, 17395L ), g4 = c(NA, NA, NA, NA, 829L, 1801L, 3470L, 6436L, 10263L, 13972L, 15700L, 13740L, 7004L), g5 = c(NA, NA, NA, NA, NA, 1112L, 2310L, 3878L, 5917L, 7603L, 8171L, 6233L, 2839L), g6 = c(NA, NA, NA, NA, NA, NA, 1343L, 2161L, 3072L, 3895L, 3951L, 2719L, 1096L), g7 = c(NA, NA, NA, NA, NA, NA, NA, 1033L, 1540L, 1783L, 1776L, 1152L, 436L), g8 = c(NA, NA, NA, NA, NA, NA, NA, NA, 670L, 837L, 722L, 432L, 161L), g9 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 286L, 275L, 151L, 66L), g10 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 104L, 72L, 30L), g11 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 24L, 8L), g12 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 3L)), .Names = c("boys", "g0", "g1", "g2", "g3", "g4", "g5", "g6", "g7", "g8", "g9", "g10", "g11", "g12"), class = "data.frame", row.names = c(NA, -13L)) -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. & Chair, Quantitative Methods York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
William Dunlap
2013-Dec-13 20:06 UTC
[R] extracting non-NA entries from a two-way frequency table
The following puts the data.frame into 'long' format and then drops rows with NA's for 'n'. f <- function(data){ df <- data.frame( expand.grid( boys = data[["boys"]], girls = as.integer(sub("^g", "", colnames(data)[-1])) ), n = unlist(data[, -1])) # n is all but the 'boys', the first, column df[!is.na(df[["n"]]), ] } E.g.,> f(Geissler)[c(2,30,75),]boys girls n 2 11 0 24 34 5 2 9547 104 0 7 436> Geissler[Geissler["boys"]==5, "g2"][1] 9547 reshape() can probably do this but I always get lost in its argument list. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf > Of Michael Friendly > Sent: Friday, December 13, 2013 11:42 AM > To: R-help > Subject: [R] extracting non-NA entries from a two-way frequency table > > I have data in the form of a two-way table recording the number of > families with varying numbers > of boys (rows) and girls (columns: g0 -- g12) below, also given in > dput() format. > > I want to convert this to a data frame containing only the non-NA > entries, with columns > boys, girls, Freq, where Freq is the table entry. Can anyone help with > this? > I suppose that the steps are to transpose each row to a column > identifying the number of > girls, and then delete the NAs, but I can't quite see how to do this. > > > > Geissler > boys g0 g1 g2 g3 g4 g5 g6 g7 g8 g9 g10 g11 g12 > 1 12 7 NA NA NA NA NA NA NA NA NA NA NA NA > 2 11 24 45 NA NA NA NA NA NA NA NA NA NA NA > 3 10 30 93 181 NA NA NA NA NA NA NA NA NA NA > 4 9 90 287 492 478 NA NA NA NA NA NA NA NA NA > 5 8 264 713 1027 1077 829 NA NA NA NA NA NA NA NA > 6 7 631 1655 2418 2309 1801 1112 NA NA NA NA NA NA NA > 7 6 1579 3725 4948 4757 3470 2310 1343 NA NA NA NA NA NA > 8 5 3666 7908 9547 8498 6436 3878 2161 1033 NA NA NA NA NA > 9 4 8628 16340 17332 14479 10263 5917 3072 1540 670 NA NA NA NA > 10 3 20540 31611 30175 22221 13972 7603 3895 1783 837 286 NA NA NA > 11 2 47819 57179 44793 28630 15700 8171 3951 1776 722 275 104 NA NA > 12 1 114609 89213 53789 28101 13740 6233 2719 1152 432 151 72 24 NA > 13 0 NA 108719 42860 17395 7004 2839 1096 436 161 66 30 8 3 > > Geissler <- > structure(list(boys = c(12L, 11L, 10L, 9L, 8L, 7L, 6L, 5L, 4L, > 3L, 2L, 1L, 0L), g0 = c(7L, 24L, 30L, 90L, 264L, 631L, 1579L, > 3666L, 8628L, 20540L, 47819L, 114609L, NA), g1 = c(NA, 45L, 93L, > 287L, 713L, 1655L, 3725L, 7908L, 16340L, 31611L, 57179L, 89213L, > 108719L), g2 = c(NA, NA, 181L, 492L, 1027L, 2418L, 4948L, 9547L, > 17332L, 30175L, 44793L, 53789L, 42860L), g3 = c(NA, NA, NA, 478L, > 1077L, 2309L, 4757L, 8498L, 14479L, 22221L, 28630L, 28101L, 17395L > ), g4 = c(NA, NA, NA, NA, 829L, 1801L, 3470L, 6436L, 10263L, > 13972L, 15700L, 13740L, 7004L), g5 = c(NA, NA, NA, NA, NA, 1112L, > 2310L, 3878L, 5917L, 7603L, 8171L, 6233L, 2839L), g6 = c(NA, > NA, NA, NA, NA, NA, 1343L, 2161L, 3072L, 3895L, 3951L, 2719L, > 1096L), g7 = c(NA, NA, NA, NA, NA, NA, NA, 1033L, 1540L, 1783L, > 1776L, 1152L, 436L), g8 = c(NA, NA, NA, NA, NA, NA, NA, NA, 670L, > 837L, 722L, 432L, 161L), g9 = c(NA, NA, NA, NA, NA, NA, NA, NA, > NA, 286L, 275L, 151L, 66L), g10 = c(NA, NA, NA, NA, NA, NA, NA, > NA, NA, NA, 104L, 72L, 30L), g11 = c(NA, NA, NA, NA, NA, NA, > NA, NA, NA, NA, NA, 24L, 8L), g12 = c(NA, NA, NA, NA, NA, NA, > NA, NA, NA, NA, NA, NA, 3L)), .Names = c("boys", "g0", "g1", > "g2", "g3", "g4", "g5", "g6", "g7", "g8", "g9", "g10", "g11", > "g12"), class = "data.frame", row.names = c(NA, -13L)) > > > -- > Michael Friendly Email: friendly AT yorku DOT ca > Professor, Psychology Dept. & Chair, Quantitative Methods > York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 > 4700 Keele Street Web: http://www.datavis.ca > Toronto, ONT M3J 1P3 CANADA > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.