reichm@@j m@iii@g oii sbcgiob@i@@et
2019-Mar-21 22:31 UTC
[R] counting unique values (summary stats)
r-help I have the following little scrip to create a df of summary stats. I'm having problems obtaining the # of unique values unique=sapply(myData, function (x) length(unique(x), replace = TRUE)) Can I do that, or am I using the wrong R function? summary.stats <- data.frame(mean=sapply(myData, mean, na.rm=TRUE), sd=sapply(myData, sd, na.rm=TRUE), min=sapply(myData, min, na.rm=TRUE), max=sapply(myData, max, na.rm=TRUE), median=sapply(myData, median, na.rm=TRUE), length=sapply(myData, length), unique=sapply(myData, function (x) length(unique(x), replace = TRUE)) miss.val=sapply(myData, function(y) sum(length(which(is.na(y)))))) Jeff Reichman
On 3/21/19 3:31 PM, reichmanj at sbcglobal.net wrote:> r-help > > I have the following little scrip to create a df of summary stats. I'm > having problems obtaining the # of unique values > > unique=sapply(myData, function (x) > length(unique(x), replace = TRUE))I just looked up the usage on `length` and do not see any possibility of using a "replace" parameter. It's also unclear what sort of data object `myData` might be. (And you might consider using column names other than the names of R functions.) -- David.> > Can I do that, or am I using the wrong R function? > > summary.stats <- data.frame(mean=sapply(myData, mean, na.rm=TRUE), > sd=sapply(myData, sd, na.rm=TRUE), > min=sapply(myData, min, na.rm=TRUE), > max=sapply(myData, max, na.rm=TRUE), > median=sapply(myData, median, na.rm=TRUE), > length=sapply(myData, length), > unique=sapply(myData, function (x) > length(unique(x), replace = TRUE)) > miss.val=sapply(myData, function(y) > sum(length(which(is.na(y)))))) > > Jeff Reichman > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
You have several problems. As David W pointed out, there is no replace= argument in the unique() function. The first step in debugging your code should be to read the manual page for any function returning an error. Also you did not include a comma at the end of the line containing replace=TRUE. Finally the code for counting the missing values is more complicated than it needs to be. This code will only work if myData is a data frame that contains only columns with numeric data. options(digits=4) myData <- USArrests summary.stats <- data.frame(mean=sapply(myData, mean, na.rm=TRUE), sd=sapply(myData, sd, na.rm=TRUE), min=sapply(myData, min, na.rm=TRUE), max=sapply(myData, max, na.rm=TRUE), median=sapply(myData, median, na.rm=TRUE), length=sapply(myData, length), unique=sapply(myData, function (x) length(unique(x))), miss.val=sapply(myData, function(y) sum(is.na(y)))) summary.stats mean sd min max median length unique miss.val # Murder 7.788 4.356 0.8 17.4 7.25 50 43 0 # Assault 170.760 83.338 45.0 337.0 159.00 50 45 0 # UrbanPop 65.540 14.475 32.0 91.0 66.00 50 36 0 # Rape 21.232 9.366 7.3 46.0 20.10 50 48 0 ---------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77843-4352 -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of David Winsemius Sent: Thursday, March 21, 2019 5:55 PM To: reichmanj at sbcglobal.net; 'r-help mailing list' <r-help at r-project.org> Subject: Re: [R] counting unique values (summary stats) On 3/21/19 3:31 PM, reichmanj at sbcglobal.net wrote:> r-help > > I have the following little scrip to create a df of summary stats. I'm > having problems obtaining the # of unique values > > unique=sapply(myData, function (x) > length(unique(x), replace = TRUE))I just looked up the usage on `length` and do not see any possibility of using a "replace" parameter. It's also unclear what sort of data object `myData` might be. (And you might consider using column names other than the names of R functions.) -- David.> > Can I do that, or am I using the wrong R function? > > summary.stats <- data.frame(mean=sapply(myData, mean, na.rm=TRUE), > sd=sapply(myData, sd, na.rm=TRUE), > min=sapply(myData, min, na.rm=TRUE), > max=sapply(myData, max, na.rm=TRUE), > median=sapply(myData, median, na.rm=TRUE), > length=sapply(myData, length), > unique=sapply(myData, function (x) > length(unique(x), replace = TRUE)) > miss.val=sapply(myData, function(y) > sum(length(which(is.na(y)))))) > > Jeff Reichman > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.