Hi everyone, I'm new at R (although I'm a Stata user for some time and somehow proficient in it) and I'm trying to use the 'diverse' R package to compute a few diversity measures on a sample of firms for a period of about 10 years. I was wondering if you can give me some hints on how to best proceed on using the 'diverse' package. My sample has the following setup. It's comprised of a annual variable number of firms which are identified by the companyid variable and the year variable (unbalanced panel). In addition I also have a variable identifying the worker, workerid. I then have a set of variables which i want to use as the basis for calculating some of the measures in the 'diverse' package. An example of the sample is as follows, using the gender variable (0 for male and 1 for female) as the variable of interest: companyid year workerid gender 85390 1999 46446384 0 85390 1999 126800000 1 85390 1999 163300000 0 85390 1999 60225451 0 85390 1999 60195422 0 85390 2000 60225451 0 85390 2000 3571000000 1 85390 2000 163300000 0 85390 2000 163300000 0 85390 2000 126800000 0 85390 2001 60195422 0 85390 2001 60225451 1 85390 2001 46446384 0 85390 2001 60195422 0 85390 2001 60225451 0 4391076 2005 13753759 0 4391076 2005 49988911 0 4391076 2005 112400000 0 4391076 2005 185500000 0 4391076 2005 35649643 0 4391076 2005 65809705 0 4391076 2005 114200000 0 4391076 2005 192100000 0 4391076 2005 64258701 0 4391076 2005 1212000000 1 Based on the 'diverse' need to calculate for each firm, for each year, for instance the diversity(gender) measure. in Stata this would be obtained just a issuing a by firm year command, but have no idea how to tackle this is issue in R. Any ideas? Best wishes, Li [[alternative HTML version deleted]]

You really need to spend some time learning the basics of R. There are thousands of R packages, so you also need to spend time reading the documentation for the package so that you can show us what the data format should be like. Here are some simple ways to transform the data. You should also use dput() to include your data in your email, not just a listing which can remove important information about the structure of the original data:> Example <- structure(list(companyid = c(85390L, 85390L, 85390L, 85390L,85390L, 85390L, 85390L, 85390L, 85390L, 85390L, 85390L, 85390L, 85390L, 85390L, 85390L, 4391076L, 4391076L, 4391076L, 4391076L, 4391076L, 4391076L, 4391076L, 4391076L, 4391076L, 4391076L), year = c(1999L, 1999L, 1999L, 1999L, 1999L, 2000L, 2000L, 2000L, 2000L, 2000L, 2001L, 2001L, 2001L, 2001L, 2001L, 2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 2005L ), workerid = c(46446384, 126800000, 163300000, 60225451, 60195422, 60225451, 3.571e+09, 163300000, 163300000, 126800000, 60195422, 60225451, 46446384, 60195422, 60225451, 13753759, 49988911, 112400000, 185500000, 35649643, 65809705, 114200000, 192100000, 64258701, 1.212e+09), gender = c(0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L)), .Names = c("companyid", "year", "workerid", "gender"), class = "data.frame", row.names = c(NA, -25L))> aggregate(gender~companyid+year, Example, mean)companyid year gender 1 85390 1999 0.2 2 85390 2000 0.2 3 85390 2001 0.2 4 4391076 2005 0.1> aggregate(gender~companyid+year, Example, table)companyid year gender.0 gender.1 1 85390 1999 4 1 2 85390 2000 4 1 3 85390 2001 4 1 4 4391076 2005 9 1> x <- xtabs(~gender+companyid+year, Example) > ftable(x, row.vars=2:3, col.vars=1)gender 0 1 companyid year 85390 1999 4 1 2000 4 1 2001 4 1 2005 0 0 4391076 1999 0 0 2000 0 0 2001 0 0 2005 9 1 You should read these manual pages: ?dput ?aggregate ?xtabs ?ftable ---------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77843-4352 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Li Jiang Sent: Thursday, October 19, 2017 4:08 AM To: r-help at r-project.org Subject: [R] looping using 'diverse' package measures Hi everyone, I'm new at R (although I'm a Stata user for some time and somehow proficient in it) and I'm trying to use the 'diverse' R package to compute a few diversity measures on a sample of firms for a period of about 10 years. I was wondering if you can give me some hints on how to best proceed on using the 'diverse' package. My sample has the following setup. It's comprised of a annual variable number of firms which are identified by the companyid variable and the year variable (unbalanced panel). In addition I also have a variable identifying the worker, workerid. I then have a set of variables which i want to use as the basis for calculating some of the measures in the 'diverse' package. An example of the sample is as follows, using the gender variable (0 for male and 1 for female) as the variable of interest: companyid year workerid gender 85390 1999 46446384 0 85390 1999 126800000 1 85390 1999 163300000 0 85390 1999 60225451 0 85390 1999 60195422 0 85390 2000 60225451 0 85390 2000 3571000000 1 85390 2000 163300000 0 85390 2000 163300000 0 85390 2000 126800000 0 85390 2001 60195422 0 85390 2001 60225451 1 85390 2001 46446384 0 85390 2001 60195422 0 85390 2001 60225451 0 4391076 2005 13753759 0 4391076 2005 49988911 0 4391076 2005 112400000 0 4391076 2005 185500000 0 4391076 2005 35649643 0 4391076 2005 65809705 0 4391076 2005 114200000 0 4391076 2005 192100000 0 4391076 2005 64258701 0 4391076 2005 1212000000 1 Based on the 'diverse' need to calculate for each firm, for each year, for instance the diversity(gender) measure. in Stata this would be obtained just a issuing a by firm year command, but have no idea how to tackle this is issue in R. Any ideas? Best wishes, Li [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.