This is my first attempt at this, so hopefully a few kind pointers can get me going in the right direction... I have a large data frame of 20+ columns and 20,000 rows. I'd like to evaluate the distribution of values in each row, to determine whether they meet the criteria of a normal distribution. I'd loop this over all the rows in the data frame, and output the summary results to a new data frame. I have a loop that should run a Shapiro-Wilk test over each row, y= data frame for (j in 1:nr) { y.temp<-list(y[j,]) testsw <- lapply(y.temp, shapiro.test) testtable <- t(sapply(testsw, function(x) c(x$statistic, x$p.value))) colnames(testtable) <- c("W", "p.value") } but it is currently throwing out an error: "Error in `rownames<-`(`*tmp*`, value = "1") : attempt to set rownames on object with no dimensions" ...which I guess is unrelated to the evaluation of normality, and more likely a faulty loop? Any suggestions either for this test, or a better way to evaluate the normal distribution (e.g. qq-plot residuals for each row) would be greatly received. Thanks. -- View this message in context: http://r.789695.n4.nabble.com/Finding-non-normal-distributions-per-row-of-data-frame-tp3259439p3259439.html Sent from the R help mailing list archive at Nabble.com.
Patrizio Frederic
2011-Feb-04 11:40 UTC
[R] Finding non-normal distributions per row of data frame?
Hi Danny, it sounds to much easer than that. Try y <- data.frame(matrix(rnorm(100),10)) nr <- ncol(y) test <- lapply(y, shapiro.test) sapply(test,function(x)c(x$statistic, x$p.value)) it should perform the required task. Cheers, P On Fri, Feb 4, 2011 at 4:52 AM, DB1984 <dannybolg at gmail.com> wrote:> > This is my first attempt at this, so hopefully a few kind pointers can get me > going in the right direction... > > I have a large data frame of 20+ columns and 20,000 rows. I'd like to > evaluate the distribution of values in each row, to determine whether they > meet the criteria of a normal distribution. I'd loop this over all the rows > in the data frame, and output the summary results to a new data frame. > > I have a loop that should run a Shapiro-Wilk test over each row, > > y= data frame > > for (j in 1:nr) { > y.temp<-list(y[j,]) > testsw <- lapply(y.temp, shapiro.test) > testtable <- t(sapply(testsw, function(x) c(x$statistic, x$p.value))) > ?colnames(testtable) <- c("W", "p.value") > } > > > but it is currently throwing out an error: > ?"Error in `rownames<-`(`*tmp*`, value = "1") : > ?attempt to set rownames on object with no dimensions" > > ...which I guess is unrelated to the evaluation of normality, and more > likely a faulty loop? > > Any suggestions either for this test, or a better way to evaluate the normal > distribution (e.g. qq-plot residuals for each row) would be greatly > received. Thanks. > -- > View this message in context: http://r.789695.n4.nabble.com/Finding-non-normal-distributions-per-row-of-data-frame-tp3259439p3259439.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Greg Snow
2011-Feb-04 16:54 UTC
[R] Finding non-normal distributions per row of data frame?
I get a different set of errors than you do (what version of R are you using?). Patrizio showed one way to do what you want. But, what is it that you are really trying to accomplish? What do you think the result of 20,000 normality tests (each of which may not be answering the real question on its own) will tell you? Your code below seems to be mixing concepts that it would benefit you to learn more about and when to use each one. If y is fully numeric, then it is more efficient to use a matrix than a data frame. In your loop you assign y.temp to be a list containing 1 row from y, this results in a list with 1 element which is a 1 row data frame. Why make it a list? Do you really want it to stay a data frame or be a vector? You then run lapply on a list with only one element, that works, but is a bit wasteful and does not accomplish anything more than running the function on the single element. Then the shapiro.test function is passed a data frame when it is expecting a vector (this gives an error on my install, you may have something different going on). Then testtable is being overwritten each time through the loop, so you are throwing away most of your work without ever doing anything with it. Why the loop and the apply's? Why not just try something like apply(y, 1, shapiro.test) ? And overall what are you trying to accomplish? Because what this is likely to accomplish is probably less useful than just generating random numbers. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of DB1984 > Sent: Thursday, February 03, 2011 8:52 PM > To: r-help at r-project.org > Subject: [R] Finding non-normal distributions per row of data frame? > > > This is my first attempt at this, so hopefully a few kind pointers can > get me > going in the right direction... > > I have a large data frame of 20+ columns and 20,000 rows. I'd like to > evaluate the distribution of values in each row, to determine whether > they > meet the criteria of a normal distribution. I'd loop this over all the > rows > in the data frame, and output the summary results to a new data frame. > > I have a loop that should run a Shapiro-Wilk test over each row, > > y= data frame > > for (j in 1:nr) { > y.temp<-list(y[j,]) > testsw <- lapply(y.temp, shapiro.test) > testtable <- t(sapply(testsw, function(x) c(x$statistic, x$p.value))) > colnames(testtable) <- c("W", "p.value") > } > > > but it is currently throwing out an error: > "Error in `rownames<-`(`*tmp*`, value = "1") : > attempt to set rownames on object with no dimensions" > > ...which I guess is unrelated to the evaluation of normality, and more > likely a faulty loop? > > Any suggestions either for this test, or a better way to evaluate the > normal > distribution (e.g. qq-plot residuals for each row) would be greatly > received. Thanks. > -- > View this message in context: http://r.789695.n4.nabble.com/Finding- > non-normal-distributions-per-row-of-data-frame-tp3259439p3259439.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.