I have a data set with 22 fields and several thousand records in which one field (count) indicates the number of times that each specific combination of the other 21 fields occurred in a bigger and largely unavailable data set. So each record is unique in its combination of field values and has a field that identifies how many multiples of this record actually occurred. Without resorting to writing a program that re-expands the data set to several million rows by cloning each row by the number of times the "count" field indicated, is there a way in R to use that field to come up with summary stats and bargraphs of the distribution of any one of the other fields? best Matthew
Dennis Murphy
2011-Jul-16 01:24 UTC
[R] summarized data set - how to use an "occurs" field
Hi: Your count variable is a frequency associated with a given row of the data set. If you're more specific about what you want and can post a representative sample of (some facsimile of) your data using dput(), the list is likely to be more helpful. See the posting guide linked at the bottom of this message for guidelines. Dennis On Fri, Jul 15, 2011 at 3:10 PM, mloxton <mhloxton at gmail.com> wrote:> I have a data set with 22 fields and several thousand records in which > one field (count) indicates the number of times that each specific > combination of the other 21 fields occurred in a bigger and largely > unavailable data set. > So each record is unique in its combination of field values and has a > field that identifies how many multiples of this record actually > occurred. > > Without resorting to writing a program that re-expands the data set to > several million rows by cloning each row by the number of times the > "count" field indicated, is there a way in R to use that field to come > up with summary stats and bargraphs of the distribution of any one of > the other fields? > > best > Matthew > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
David Winsemius
2011-Jul-16 01:38 UTC
[R] summarized data set - how to use an "occurs" field
On Jul 15, 2011, at 6:10 PM, mloxton wrote:> I have a data set with 22 fields and several thousand records in which > one field (count) indicates the number of times that each specific > combination of the other 21 fields occurred in a bigger and largely > unavailable data set. > So each record is unique in its combination of field values and has a > field that identifies how many multiples of this record actually > occurred. > > Without resorting to writing a program that re-expands the data set to > several million rows by cloning each row by the number of times the > "count" field indicated, is there a way in R to use that field to come > up with summary stats and bargraphs of the distribution of any one of > the other fields? >> dfrm <- expand.grid(A=1:3, B=1:3) > dfrm$counts <- 1:9 > xtabs(counts~A, data=dfrm) A 1 2 3 12 15 18 >barplot(xtabs(counts~A, data=dfrm), xlab="Counts by A level") -- David Winsemius, MD West Hartford, CT
David, thanks, I think that should work perfectly Much obliged> > ?> dfrm <- expand.grid(A=1:3, B=1:3) > ?> dfrm$counts <- 1:9 > ?> xtabs(counts~A, data=dfrm) > A > ? 1 ?2 ?3 > 12 15 18 > > ?>barplot(xtabs(counts~A, data=dfrm), xlab="Counts by A level") > > -- > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.