thr3ads.net - R help - [R] Finding non-normal distributions per row of data frame? [Feb 2011]

If this information is useful, please help other people find it:
Share via:

DB1984

2011-Feb-04 03:52 UTC

[R] Finding non-normal distributions per row of data frame?

This is my first attempt at this, so hopefully a few kind pointers can get me
going in the right direction...

I have a large data frame of 20+ columns and 20,000 rows. I'd like to
evaluate the distribution of values in each row, to determine whether they
meet the criteria of a normal distribution. I'd loop this over all the rows
in the data frame, and output the summary results to a new data frame.

I have a loop that should run a Shapiro-Wilk test over each row, 

y= data frame 

for (j in 1:nr) {
y.temp<-list(y[j,]) 
testsw <- lapply(y.temp, shapiro.test)
testtable <- t(sapply(testsw, function(x) c(x$statistic, x$p.value)))
 colnames(testtable) <- c("W", "p.value") 
}


but it is currently throwing out an error:
 "Error in `rownames<-`(`*tmp*`, value = "1") : 
  attempt to set rownames on object with no dimensions" 

...which I guess is unrelated to the evaluation of normality, and more
likely a faulty loop?

Any suggestions either for this test, or a better way to evaluate the normal
distribution (e.g. qq-plot residuals for each row) would be greatly
received. Thanks.
-- 
View this message in context:
http://r.789695.n4.nabble.com/Finding-non-normal-distributions-per-row-of-data-frame-tp3259439p3259439.html
Sent from the R help mailing list archive at Nabble.com.

Patrizio Frederic

2011-Feb-04 11:40 UTC

head link

[R] Finding non-normal distributions per row of data frame?

Hi Danny,
it sounds to much easer than that.

Try

y <- data.frame(matrix(rnorm(100),10))
nr <- ncol(y)

test <- lapply(y, shapiro.test)
sapply(test,function(x)c(x$statistic, x$p.value))

it should perform the required task.
Cheers,

P

On Fri, Feb 4, 2011 at 4:52 AM, DB1984 <dannybolg at gmail.com>
wrote:>
> This is my first attempt at this, so hopefully a few kind pointers can get
me
> going in the right direction...
>
> I have a large data frame of 20+ columns and 20,000 rows. I'd like to
> evaluate the distribution of values in each row, to determine whether they
> meet the criteria of a normal distribution. I'd loop this over all the
rows
> in the data frame, and output the summary results to a new data frame.
>
> I have a loop that should run a Shapiro-Wilk test over each row,
>
> y= data frame
>
> for (j in 1:nr) {
> y.temp<-list(y[j,])
> testsw <- lapply(y.temp, shapiro.test)
> testtable <- t(sapply(testsw, function(x) c(x$statistic, x$p.value)))
> ?colnames(testtable) <- c("W", "p.value")
> }
>
>
> but it is currently throwing out an error:
> ?"Error in `rownames<-`(`*tmp*`, value = "1") :
> ?attempt to set rownames on object with no dimensions"
>
> ...which I guess is unrelated to the evaluation of normality, and more
> likely a faulty loop?
>
> Any suggestions either for this test, or a better way to evaluate the
normal
> distribution (e.g. qq-plot residuals for each row) would be greatly
> received. Thanks.
> --
> View this message in context:
http://r.789695.n4.nabble.com/Finding-non-normal-distributions-per-row-of-data-frame-tp3259439p3259439.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Greg Snow

2011-Feb-04 16:54 UTC

head link

[R] Finding non-normal distributions per row of data frame?

I get a different set of errors than you do (what version of R are you using?).

Patrizio showed one way to do what you want.  But, what is it that you are
really trying to accomplish?  What do you think the result of 20,000 normality
tests (each of which may not be answering the real question on its own) will
tell you?

Your code below seems to be mixing concepts that it would benefit you to learn
more about and when to use each one.  If y is fully numeric, then it is more
efficient to use a matrix than a data frame.

In your loop you assign y.temp to be a list containing 1 row from y, this
results in a list with 1 element which is a 1 row data frame.  Why make it a
list?  Do you really want it to stay a data frame or be a vector?

You then run lapply on a list with only one element, that works, but is a bit
wasteful and does not accomplish anything more than running the function on the
single element.

Then the shapiro.test function is passed a data frame when it is expecting a
vector (this gives an error on my install, you may have something different
going on).

Then testtable is being overwritten each time through the loop, so you are
throwing away most of your work without ever doing anything with it.

Why the loop and the apply's?

Why not just try something like apply(y, 1, shapiro.test) ?

And overall what are you trying to accomplish? Because what this is likely to
accomplish is probably less useful than just generating random numbers.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of DB1984
> Sent: Thursday, February 03, 2011 8:52 PM
> To: r-help at r-project.org
> Subject: [R] Finding non-normal distributions per row of data frame?
> 
> 
> This is my first attempt at this, so hopefully a few kind pointers can
> get me
> going in the right direction...
> 
> I have a large data frame of 20+ columns and 20,000 rows. I'd like to
> evaluate the distribution of values in each row, to determine whether
> they
> meet the criteria of a normal distribution. I'd loop this over all the
> rows
> in the data frame, and output the summary results to a new data frame.
> 
> I have a loop that should run a Shapiro-Wilk test over each row,
> 
> y= data frame
> 
> for (j in 1:nr) {
> y.temp<-list(y[j,])
> testsw <- lapply(y.temp, shapiro.test)
> testtable <- t(sapply(testsw, function(x) c(x$statistic, x$p.value)))
>  colnames(testtable) <- c("W", "p.value")
> }
> 
> 
> but it is currently throwing out an error:
>  "Error in `rownames<-`(`*tmp*`, value = "1") :
>   attempt to set rownames on object with no dimensions"
> 
> ...which I guess is unrelated to the evaluation of normality, and more
> likely a faulty loop?
> 
> Any suggestions either for this test, or a better way to evaluate the
> normal
> distribution (e.g. qq-plot residuals for each row) would be greatly
> received. Thanks.
> --
> View this message in context: http://r.789695.n4.nabble.com/Finding-
> non-normal-distributions-per-row-of-data-frame-tp3259439p3259439.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Feb 2011 - Finding non-normal distributions per row of data frame?

[R] Finding non-normal distributions per row of data frame?

[R] Finding non-normal distributions per row of data frame?

[R] Finding non-normal distributions per row of data frame?

Possibly Parallel Threads