Hi everyone, I'm struggling with a little problem for a while, and I'm wondering if anyone could help... I have a dataset (from retailing industry) that indicates which brands are present in a panel of 500 stores, store , brand 1 , B1 1 , B2 1 , B3 2 , B1 2 , B3 3 , B2 3 , B3 3 , B4 I would like to know how many brands are present in each store, I tried: result <- aggregate(MyData$brand , by=list(MyData$store) , nlevels) but I got: Group.1 x 1 , 4 2 , 4 3 , 4 which is not exactly the result I expected I would like to get sthg like: Group.1 x 1 , 3 2 , 2 3 , 3 Looking around, I found I can delete empty levels of factor using: problem.factor <- problem.factor[,drop=TRUE] But this solution isn't handy for me as I have many stores and should make a subset of my data for each store before dropping empty factor I can't either counting the line for each store (N), because the same brand can appear several times in each store (several products for the same brand, and/or several weeks of observation) I used to do this calculation using SAS with: proc freq data = MyData noprint ; by store ; tables brand / out = result ; run ; (the cool thing was I got a database I can merge with MyData) any idea for doing that in R ? Thanks in advance, King Regards, Sylvain Willart, PhD Marketing, IAE Lille, France
On Nov 8, 2009, at 8:38 AM, sylvain willart wrote:> Hi everyone, > > I'm struggling with a little problem for a while, and I'm wondering if > anyone could help... > > I have a dataset (from retailing industry) that indicates which brands > are present in a panel of 500 stores, > > store , brand > 1 , B1 > 1 , B2 > 1 , B3 > 2 , B1 > 2 , B3 > 3 , B2 > 3 , B3 > 3 , B4 > > I would like to know how many brands are present in each store, > > I tried: > result <- aggregate(MyData$brand , by=list(MyData$store) , nlevels) > > but I got: > Group.1 x > 1 , 4 > 2 , 4 > 3 , 4 > > which is not exactly the result I expected > I would like to get sthg like: > Group.1 x > 1 , 3 > 2 , 2 > 3 , 3Try: result <- aggregate(MyData$brand , by=list(MyData$store) , length) Quick, easy and generalizes to other situations. The factor levels got carried along identically, but length counts the number of elements in the list returned by tapply.> > Looking around, I found I can delete empty levels of factor using: > problem.factor <- problem.factor[,drop=TRUE]If you reapply the function, factor, you get the same result. So you could have done this: > result <- aggregate(MyData$brand , by=list(MyData$store) , function(x) nlevels(factor(x))) > result Group.1 x 1 1 3 2 2 2 3 3 3> But this solution isn't handy for me as I have many stores and should > make a subset of my data for each store before dropping empty factor > > I can't either counting the line for each store (N), because the same > brand can appear several times in each store (several products for the > same brand, and/or several weeks of observation) > > I used to do this calculation using SAS with: > proc freq data = MyData noprint ; by store ; > tables brand / out = result ; > run ; > (the cool thing was I got a database I can merge with MyData) > > any idea for doing that in R ? > > Thanks in advance, > > King Regards, > > Sylvain Willart, > PhD Marketing, > IAE Lille, France > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT
With xx as your sample data will this work? See ?addmargins jj <- table(xx) addmargins(jj, 2) # or for both margins addmargins(jj, c(1,2)) or apply(jj, 1, sum) --- On Sun, 11/8/09, sylvain willart <sylvain.willart at gmail.com> wrote:> From: sylvain willart <sylvain.willart at gmail.com> > Subject: [R] Counting non-empty levels of a factor > To: r-help at r-project.org > Received: Sunday, November 8, 2009, 8:38 AM > Hi everyone, > > I'm struggling with a little problem for a while, and I'm > wondering if > anyone could help... > > I have a dataset (from retailing industry) that indicates > which brands > are present in a panel of 500 stores, > > store , brand > 1 , B1 > 1 , B2 > 1 , B3 > 2 , B1 > 2 , B3 > 3 , B2 > 3 , B3 > 3 , B4 > > I would like to know how many brands are present in each > store, > > I tried: > result <- aggregate(MyData$brand , by=list(MyData$store) > , nlevels) > > but I got: > Group.1 x > 1 , 4 > 2 , 4 > 3 , 4 > > which is not exactly the result I expected > I would like to get sthg like: > Group.1 x > 1 , 3 > 2 , 2 > 3 , 3 > > Looking around, I found I can delete empty levels of factor > using: > problem.factor <- problem.factor[,drop=TRUE] > But this solution isn't handy for me as I have many stores > and should > make a subset of my data for each store before dropping > empty factor > > I can't either counting the line for each store (N), > because the same > brand can appear several times in each store (several > products for the > same brand, and/or several weeks of observation) > > I used to do this calculation using SAS with: > proc freq data = MyData noprint ; by store ; > tables? brand / out = result ; > run ; > (the cool thing was I got a database I can merge with > MyData) > > any idea for doing that in R ? > > Thanks in advance, > > King Regards, > > Sylvain Willart, > PhD Marketing, > IAE Lille, France > > ______________________________________________ > R-help at r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. >__________________________________________________________________ Make your browsing faster, safer, and easier with the new Internet Explorer? 8. Opt ernetexplorer/