R help - Sep 2004 - t test problem?

If this information is useful, please help other people find it:
Share via:

kan Liu

2004-Sep-22 07:52 UTC

[R] t test problem?

Hello,
 
I got two sets of data
x=(124738, 128233, 85901, 33806, ...)
y=(25292, 21877, 45498, 63973, ....)
When I did a t test, I got two tail p-value = 0.117, which is not significantly
different.
 
If I changed x, y to log scale, and re-do the t test, I got two tail p-value =
0.042, which is significantly different.
 
Now I got confused which one is correct. Any help would be very appreciated.
 
Thanks,
Liu

__________________________________________________



	[[alternative HTML version deleted]]

Dimitris Rizopoulos

2004-Sep-22 08:00 UTC

[R] t test problem?

Hi Liu,

before applying a t-test (or any test) you should first check if the 
assumptions of the test are supported by your data, i.e., in a t-test 
x and y must be normally distributed.

I hope it helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/396887
Fax: +32/16/337015
Web: med.kuleuven.ac.be/biostat
     student.kuleuven.ac.be/~m0390867/dimitris.htm


----- Original Message ----- 
From: "kan Liu" <kan_liu1 at yahoo.com>
To: <r-help at stat.math.ethz.ch>
Sent: Wednesday, September 22, 2004 9:52 AM
Subject: [R] t test problem?

> Hello,
>
> I got two sets of data
> x=(124738, 128233, 85901, 33806, ...)
> y=(25292, 21877, 45498, 63973, ....)
> When I did a t test, I got two tail p-value = 0.117, which is not 
> significantly different.
>
> If I changed x, y to log scale, and re-do the t test, I got two tail 
> p-value = 0.042, which is significantly different.
>
> Now I got confused which one is correct. Any help would be very 
> appreciated.
>
> Thanks,
> Liu
>
> __________________________________________________
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> R-project.org/posting-guide.html
>

Vito Ricci

2004-Sep-22 08:03 UTC

[R] t test problem?

Hi,

maybe your data are distributed according a log-normal
distribution, so logs are normally distributed.
But remerber the significancy of t test can applied
only on log transformated data and not on original
data. See basic hypothesis for t testing, in
alternative use non-parametric methods to compare
results.
Best
Vito

You wrote:

Hello,
 
I got two sets of data
x=(124738, 128233, 85901, 33806, ...)
y=(25292, 21877, 45498, 63973, ....)
When I did a t test, I got two tail p-value = 0.117,
which is not significantly different.
 
If I changed x, y to log scale, and re-do the t test,
I got two tail p-value = 0.042, which is significantly
different.
 
Now I got confused which one is correct. Any help
would be very appreciated.
 
Thanks,
Liu

====Diventare costruttori di soluzioni

Visitate il portale modugno.it
e in particolare la sezione su Palese
modugno.it/archivio/cat_palese.shtml


		
___________________________________

it.seriea.fantasysports.yahoo.com

Vito Ricci

2004-Sep-22 09:44 UTC

[R] t test problem?

Hi Liu,

I'd suggest you to use non-parametric tests (see 
cas.lancs.ac.uk/glossary_v1.1/nonparam.html)
such as:
wilcox.test() in stats package
pairwise.wilcox.test() in stats package

and see the result tou got (significancy/non
significancy) and compare it with t test result;

Parametric Test   Analogous Non-Parametric test 

Student T-test   Wilcoxon Rank Sum Test 
Paired t-test    Wilcoxon Signed Rank Test or the Sign
Test 

to test normality you can use:

shapiro.test() in stats package 

if you decide to use log scale, you must use this for
both samples.



bye
Vito


You wrote:

Hi, Many thanks for your helpful comments and
suggestions. The attached are the data in both log10
scale and original scale. It would be very grateful if
you could suggest which version of test should be
used. 
 
By the way, how to check whether the variation is
additive (natural scale) or multiplicative (log scale)
in R? How to check whether the distribution of the
data is normal? 
 
PS, Can I confirm that do your suggestions mean that
in order to check whether there is a difference
between x and y in terms of mean I need check the
distribution of x and that of y in both natual and log
scales and to see which present normal distribution?
and then perform a t test using the data scale which
presents normal distribution? If both scales present
normal distribution, then the t tests with both scales
should give the similar results?
 
 
 
Thanks again.
 
Liu

====Diventare costruttori di soluzioni

Visitate il portale modugno.it
e in particolare la sezione su Palese
modugno.it/archivio/cat_palese.shtml


		
___________________________________

it.seriea.fantasysports.yahoo.com

Wayne Jones

2004-Sep-22 10:27 UTC

[R] t test problem?

Hi Kan Lui, 

I've had a quick look at the data. The logged data seems reasonably nicely
distributed (roughly symmetrical + equal variance). Indeed the y variable
passed the (very strict) shapiro.test for normality. 

However the main problem is that I do not get the same results as you for
the significance of the t.test. 

The only significant test I see is the paired t.test on the logged data. Is
your data paired data? To see what I mean check out:
texasoft.com/winkpair.html

The non-parametric tests show no significance (paired data or not) (logged
and natural data). Although in general they do tend to be less strict than
parametric tests. 

Unless the data is paired then the means of these samples most certainly do
not significantly differ from one another.

Here are my workings: 

Temp.Dat<-read.table("data_natural.txt",header=T)

hist(log(Temp.Dat$x,10))
hist(log(Temp.Dat$y,10))

shapiro.test(log(Temp.Dat$x,10))
shapiro.test(log(Temp.Dat$y,10))

t.test(log(Temp.Dat$x,10), log(Temp.Dat$y,10))

       Welch Two Sample t-test

data:  log(Temp.Dat$x, 10) and log(Temp.Dat$y, 10) 
t = 0.9126, df = 195.806, p-value = 0.3626
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -0.0599837  0.1633168 
sample estimates:
mean of x mean of y 
 4.891313  4.839647

> t.test(log(Temp.Dat$x,10), log(Temp.Dat$y,10),paired=T)
        Paired t-test

data:  log(Temp.Dat$x, 10) and log(Temp.Dat$y, 10) 
t = 2.3535, df = 98, p-value = 0.02060
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 0.008101002 0.095232132 
sample estimates:
mean of the differences 
             0.05166657 
> wilcox.test(log(Temp.Dat$x,10), log(Temp.Dat$y,10),paired=T)
        Wilcoxon signed rank test with continuity correction

data:  log(Temp.Dat$x, 10) and log(Temp.Dat$y, 10) 
V = 2972.5, p-value = 0.0828
alternative hypothesis: true mu is not equal to 0 
> wilcox.test(log(Temp.Dat$x,10), log(Temp.Dat$y,10),paired=F)
        Wilcoxon rank sum test with continuity correction

data:  log(Temp.Dat$x, 10) and log(Temp.Dat$y, 10) 
W = 5206, p-value = 0.4491
alternative hypothesis: true mu is not equal to 0

> wilcox.test(Temp.Dat$x, Temp.Dat$y,paired=F)
        Wilcoxon rank sum test with continuity correction

data:  Temp.Dat$x and Temp.Dat$y 
W = 5206, p-value = 0.4491
alternative hypothesis: true mu is not equal to 0 
> wilcox.test(Temp.Dat$x, Temp.Dat$y,paired=T)
        Wilcoxon signed rank test with continuity correction

data:  Temp.Dat$x and Temp.Dat$y 
V = 2896.5, p-value = 0.1417
alternative hypothesis: true mu is not equal to 0 
> t.test(Temp.Dat$x, Temp.Dat$y,paired=T)
        Paired t-test

data:  Temp.Dat$x and Temp.Dat$y 
t = 1.6731, df = 98, p-value = 0.0975
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -2351.81 27623.53 
sample estimates:
mean of the differences 
               12635.86 
> t.test(Temp.Dat$x, Temp.Dat$y,paired=F)
        Welch Two Sample t-test

data:  Temp.Dat$x and Temp.Dat$y 
t = 0.6432, df = 191.177, p-value = 0.5209
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -26116.18  51387.89 
sample estimates:
mean of x mean of y 
 120544.9  107909.0 
>
-----Original Message-----
From: kan Liu [mailto:kan_liu1@yahoo.com] 
Sent: 22 September 2004 10:22
To: Andrew Robinson; Dimitris Rizopoulos
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] t test problem?

Hi, Many thanks for your helpful comments and suggestions. The attached are
the data in both log10 scale and original scale. It would be very grateful
if you could suggest which version of test should be used. 

By the way, how to check whether the variation is additive (natural scale)
or multiplicative (log scale) in R? How to check whether the distribution of
the data is normal? 

PS, Can I confirm that do your suggestions mean that in order to check
whether there is a difference between x and y in terms of mean I need check
the distribution of x and that of y in both natual and log scales and to see
which present normal distribution? and then perform a t test using the data
scale which presents normal distribution? If both scales present normal
distribution, then the t tests with both scales should give the similar
results?

Thanks again.

Liu

Andrew Robinson <andrewr@uidaho.edu> wrote:
Hi Dimitris,

you are describing a more stringent requirement than the t-test
actually requires. It's the sampling distribution of the mean that
should be normal, and this condition is addressed by the Central
Limit Theorem.

Whether or not the CLT can be invoked depends on numerous factors,
including the distribution of the sample, and the size of the sample,
neither of which we have any information about. 

Liu, the problem you describe is associated with the application of
the test rather than the test itself. The difference between log- and
natural- scaled data can often profitably be thought about by asking
whether you would naturally assume that the variation is additive
(natural scale) or multiplicative (log scale). Given the information
that you've presented there's no way we can tell which version of the
test is more reliable. 

I hope that this helps.

Andrew

On Wed, Sep 22, 2004 at 10:00:16AM +0200, Dimitris Rizopoulos
wrote:> Hi Liu,
> 
> before applying a t-test (or any test) you should first check if the 
> assumptions of the test are supported by your data, i.e., in a t-test 
> x and y must be normally distributed.
> 
> I hope it helps.
> 
> Best,
> Dimitris
> 
> ----
> Dimitris Rizopoulos
> Ph.D. Student
> Biostatistical Centre
> School of Public Health
> Catholic University of Leuven
> 
> Address: Kapucijnenvoer 35, Leuven, Belgium
> Tel: +32/16/396887
> Fax: +32/16/337015
> Web: med.kuleuven.ac.be/biostat
> student.kuleuven.ac.be/~m0390867/dimitris.htm
> 
> 
> ----- Original Message ----- 
> From: "kan Liu" 
> To: 
> Sent: Wednesday, September 22, 2004 9:52 AM
> Subject: [R] t test problem?
> 
> 
> >Hello,
> >
> >I got two sets of data
> >x=(124738, 128233, 85901, 33806, ...)
> >y=(25292, 21877, 45498, 63973, ....)
> >When I did a t test, I got two tail p-value = 0.117, which is not 
> >significantly different.
> >
> >If I changed x, y to log scale, and re-do the t test, I got two tail 
> >p-value = 0.042, which is significantly different.
> >
> >Now I got confused which one is correct. Any help would be very 
> >appreciated.
> >
> >Thanks,
> >Liu
> >
> >__________________________________________________
> >
> >
> >
> >[[alternative HTML version deleted]]
> >
> >______________________________________________
> >R-help@stat.math.ethz.ch mailing list
> >stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide! 
> >R-project.org/posting-guide.html
> >
> 
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> R-project.org/posting-guide.html
-- 
Andrew Robinson Ph: 208 885 7115
Department of Forest Resources Fa: 208 885 6226
University of Idaho E : andrewr@uidaho.edu
PO Box 441133 W : uidaho.edu/~andrewr
Moscow ID 83843 Or: biometrics.uidaho.edu
No statement above necessarily represents my employer's opinion.

---------------------------------

KSS Ltd
Seventh Floor  St James's Buildings  79 Oxford Street  Manchester  M1 6SS 
England
Company Registration Number 2800886
Tel: +44 (0) 161 228 0040	Fax: +44 (0) 161 236 6305
mailto:kssg@kssg.com		kssg.com

The information in this Internet email is confidential and m...{{dropped}}

Ramon Diaz-Uriarte

2004-Sep-23 08:27 UTC

[R] t test problem?

On Wednesday 22 September 2004 13:07, Ted Harding wrote:> On 22-Sep-04 kan Liu wrote:
> > Hi, Many thanks for your helpful comments and suggestions. The
attached
> > are the data in both log10 scale and original scale. It would be very
> > grateful if you could suggest which version of test should be used.
> >
> > By the way, how to check whether the variation is additive (natural
> > scale) or multiplicative (log scale) in R? How to check whether the
> > distribution of the data is normal?
>
> As for additive vs multiplicative, this can only be judged in terms
> of the process by which the values are created in the real world.

Just my 2 cents: I often find it helpful to ask myself (or the
"client")
whether, if there was a difference ("something") between the two
samples,
I/she/he thinks the appropriate model is (please, read the "=" as
"approx.
equal")

sample.1 = sample.2 + something [1]

OR

sample.1 = sample.2 * something [2]

(i.e., the ratio of means is a constant: sample.1/sample.2 = something)

which, by log transforming becomes

log(sample.1) = log(sample.2) + log(something)

I am not including here the issue of error distribution, but often times when 
the model for the means is like [2] the error terms are multiplicative (i.e., 
additive in the log scale). At least in many biological and engineering 
problems it is often evident whether [1] or [2] should be appropriate for the 
data, given what we know about the subject.

Best,

R.
> As for normality vs non-normality, an appraisal can often be made
> simply by looking at a histogram of the data.
>
> In your case, the commands
>   hist(x,breaks=10000*(0:100))
>   hist(y,breaks=10000*(0:100))
> indicate that the distributions of x and y do not look at all
> "normal", since they both have considerable positive skewness
> (i.e. long upper tails relative to the main mass of the distribution).
>
> This does strongly suggest that a logarithmic transformation would
> give data which are more nearly normally distributed, as indeed
> is confirmed by the commands
>   hist(log(x))
>   hist(log(y))
> though in both cases the histograms show some irregularity compared
> with what you would expect from a sample from a normal distribution:
> the commands
>   hist(log(x),breaks=0.2*(40:80))
>   hist(log(y),breaks=0.2*(40:80))
> show that log(x) has an excessive peak at around 11.7,
> while log(y) has holes at around 11.1 and 12.1.
>
> Nevertheless, this inspection of the data shows that the use of
> log(x) and log(y) will come much closer to fulfilling the conditions
> of validity of the t test than using the raw data x and y.
>
> However, it is not merely the *normality* of each which is needed:
> the conditions for the usual t test also require that the two
> populations sampled for log(x) and log(y) should have the same
> standard deviations. In your case, this also turns out to be
>
> nearly enough true:
>   > sd(log(x))
>
>   [1] 0.902579
>
>   > sd(log(y))
>
>   [1] 0.9314807
>
> > PS, Can I confirm that do your suggestions mean that in order to check
> > whether there is a difference between x and y in terms of mean I need
> > check the distribution of x and that of y in both natual and log
scales
> > and to see which present normal distribution?
>
> See above for an approach to this: the answer to your question is,
> in effect, "yes". It could of course have happened that neither
the
> raw nor the log scale would be satisfactory, in which case you would
> need to consider other possibilities. And, if the SDs had turned out
> to be very different, you should not use the standard t test but
> a variant which is adpated to the situation (e.g. the Welch test).
>
> You can, of course, also perform formal tests for skewness, for
> normality, and for equality of variances.
>
> Best wishes,
> Ted.
>
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
> Fax-to-email: +44 (0)870 094 0861   [NB: New number!]
> Date: 22-Sep-04                                       Time: 12:07:07
> ------------------------------ XFMail ------------------------------
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> R-project.org/posting-guide.html
-- 
Ram??n D??az-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncol??gicas (CNIO)
(Spanish National Cancer Center)
Melchor Fern??ndez Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(ligarto.org/rdiaz/0xE89B3462.asc)

Maybe Matching Threads

Search for more seemingly similar threads