thr3ads.net - R help - [R] doing 1000 permutations and doing test statistics distribution [Feb 2020]

If this information is useful, please help other people find it:
Share via:

Bert Gunter

2020-Feb-04 22:45 UTC

[R] doing 1000 permutations and doing test statistics distribution

Yes, a clear thinko... Thanks for the correction.

-- Bert

On Tue, Feb 4, 2020 at 1:41 PM Duncan Murdoch <murdoch.duncan at
gmail.com>
wrote:
> On 04/02/2020 4:28 p.m., Ana Marija wrote:
> > I tired your code on this simplified data just for say 10
permutations:
> >
> > dat <- read.table(text = "   code.1 code.2 code.3 code.4
> > 1     82     93     NA     NA
> > 2     15     85     93     NA
> > 3     93     89     NA     NA
> > 4     81     NA     NA     NA",
> >                    header = TRUE, stringsAsFactors = FALSE)
> >
> > dat2=data.matrix(dat)
> >
> >> result<- lapply(seq_len(10), FUN = function(dat2){
> > +     dat2 <- dat2[, sample.int(4)]
> > +     print(colnames(dat2))
> > + } )
> > Error in dat2[, sample.int(4)] : incorrect number of dimensions
>
> Yes, Bert did suggest that, but it's obviously wrong.  The argument to
> FUN is an element of seq_len(10), it's not the full dataset.  Try
>
> result<- lapply(seq_len(10), FUN = function(i){
>       dat <- dat2[, sample.int(4)]
>       print(colnames(dat))
>   } )
>
> Duncan Murdoch
>
> >
> >
> > On Tue, Feb 4, 2020 at 3:10 PM Bert Gunter <bgunter.4567 at
gmail.com>
> wrote:
> >>
> >> I am not going to do your programming for you. If the following
doesn't
> suffice, maybe someone else will provide you something that will.
> >>
> >> m = your matrix
> >>
> >> code = your code that uses m
> >>
> >> your list of results <- lapply(seq_len(1000), FUN =
function(m){
> >>    m <- m[, sample.int(132)]
> >>   code
> >> } )
> >>
> >> or use an explicit for() loop to populate a list or vector with
your
> results.
> >>
> >> Again, if I have misunderstood what you want to do, then clarify,
and
> perhaps someone else will help.
> >>
> >> -- Bert
> >>
> >>
> >>
> >> Bert Gunter
> >>
> >> "The trouble with having an open mind is that people keep
coming along
> and sticking things into it."
> >> -- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
> >>
> >>
> >> On Tue, Feb 4, 2020 at 12:41 PM Ana Marija <sokovic.anamarija
at gmail.com>
> wrote:
> >>>
> >>> Hi Bert,
> >>>
> >>> thanks for getting back to me. I have to permute those 132
columns
> >>> 1000 times and perform the code given in the previous email.
> >>>
> >>> Can you please show me how you would do that in the loop? This
is also
> >>> a huge data set ...
> >>>
> >>> Thanks
> >>> Ana
> >>>
> >>> On Tue, Feb 4, 2020 at 2:34 PM Bert Gunter <bgunter.4567 at
gmail.com>
> wrote:
> >>>>
> >>>> If you just want to permute columns of a matrix,
> >>>>
> >>>> ?sample
> >>>>> sample.int(10)
> >>>>   [1]  9  2 10  8  4  6  3  1  5  7
> >>>>
> >>>> and you can just use this as an index into the columns of
your
> matrix, presumably within a loop of some sort.
> >>>>
> >>>> If I have misunderstood, just ignore.
> >>>>
> >>>> Cheers,
> >>>> Bert
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Feb 4, 2020 at 12:23 PM Ana Marija <
> sokovic.anamarija at gmail.com> wrote:
> >>>>>
> >>>>> Hello,
> >>>>>
> >>>>> I have a matrix
> >>>>>> dim(dat)
> >>>>> [1] 15568   132
> >>>>>
> >>>>> It looks like this:
> >>>>>
> >>>>>                     NoD_14381_norm.1 NoD_14381_norm.2
> NoD_14381_norm.3
> >>>>> NoD_14520_30mM.1 NoD_14520_30mM.2 NoD_14520_30mM.3
> >>>>> Ku8QhfS0n_hIOABXuE             4.75             4.25  
4.79
> >>>>>             4.33             4.63             3.85
> >>>>> Bx496XsFXiAlj.Eaeo             6.15             6.23  
6.55
> >>>>>             6.26             6.24             5.99
> >>>>> W38p0ogk.wIBVRXllY             7.13             7.35  
7.55
> >>>>>             7.37             7.36             7.55
> >>>>> QIBkqIS9LR5DfTlTS8             6.27             6.73  
6.45
> >>>>>             5.39             4.75             4.96
> >>>>> BZKiEvS0eQ305U0v34             6.35             7.02  
6.76
> >>>>>             5.45             5.25             5.02
> >>>>> 6TheVd.HiE1UF3lX6g             5.53             5.02  
5.36
> >>>>>             5.61             5.66             5.37
> >>>>>
> >>>>> So it is a matrix with gene names ex.
Ku8QhfS0n_hIOABXuE, and
> subjects
> >>>>> named ex. NoD_14381_norm.1
> >>>>>
> >>>>>
> >>>>> How to do 1000 permutations of these 132 columns and
on each created
> >>>>> new permuted matrix perform this code:
> >>>>>
> >>>>> subject="all_replicate"
> >>>>>
>
targets<-readTargets(paste(PhenotypeDir,"hg_sg_",subject,"_target.txt",
> sep=''))
> >>>>> Treat <-
factor(targets$Treatment,levels=c("C","T"))
> >>>>> Replicates <- factor(targets$rep)
> >>>>> design <- model.matrix(~Replicates+Treat)
> >>>>> corfit <- duplicateCorrelation(dat, block =
targets$Subject)
> >>>>> corfit$consensus.correlation
> >>>>> fit
>
<-lmFit(dat,design,block=targets$Subject,correlation=corfit$consensus.correlation)
> >>>>> fit<-eBayes(fit)
> >>>>> qval.cutoff=0.1; FC.cutoff=0.17
> >>>>> y1=topTable(fit, coef="TreatT",
> n=nrow(genes),adjust.method="BH",genelist=genes)
> >>>>>
> >>>>> y1 for each iteration of permutation would  have
P.Value column and
> >>>>> these I would have plotted on the end to find the
distribution of all
> >>>>> p values generated in those 1000 permutations.
> >>>>>
> >>>>> Please advise,
> >>>>> Ana
> >>>>>
> >>>>> ______________________________________________
> >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE
and more, see
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >>>>> and provide commented, minimal, self-contained,
reproducible code.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
	[[alternative HTML version deleted]]

Ana Marija

2020-Feb-05 15:15 UTC

head link

[R] doing 1000 permutations and doing test statistics distribution

I tried to solve the task via following code:

manyorders <- replicate(100, sample(colnames(dat)), simplify=FALSE)

all_results <- lapply(manyorders, function(ord) {
  tmpdat <- `colnames<-`(dat, ord) # copies and renames in one line
  corfit <- duplicateCorrelation(dat, block = targets$Subject)
  corfit$consensus.correlation
  fit
<-lmFit(dat,design,block=targets$Subject,correlation=corfit$consensus.correlation)
  fit <- eBayes(fit)
  y1 <- topTable(fit, coef="TreatT", n=nrow(genes),
adjust.method="BH", genelist=genes)
  list(fit = fit, y1 = y1)
})

and I wrote all_results in a file
write.table(all_results, file="all_res", sep = " ",
row.names = FALSE,
col.names = TRUE,quote=FALSE)

when I tried to open the file:> a=read.table("all_res", header=T)Error in read.table("all_res", header = T) :
  more columns than column names

also with fread:> a=fread("all_res")Warning messages:
1: In fread("all_res") :
  Detected 35300 column names but the data has 35700 columns (i.e.
invalid file). Added 400 extra default column names at the end.
2: In fread("all_res") :
  Stopped early on line 5. Expected 35700 fields but found 35400.
Consider fill=TRUE and comment.char=. First discarded non-empty line:
<<5.73583235032546 0.182418204566858 0.178323702841331
-1.69234503648485 -1.83571739423166 -1.72136694103431 1.35210840970636
1.37698365727967 1.65643614366521 1.33750809366081 1.22116614774455
1.07000318265432 -1.13789084968265 -1.0757049956716 -1.24824627469937
-0.208441151406013 -0.246971068064524 -0.272463550264579
-0.561691522816148 -0.407713100376217 -0.538071283637418
-0.979274581542649 -0.909855772342568 -0.974827213844384
0.21700425934175 0.134586251989901 0.0818947419096577 -0.6788584605>>

my original dat matrix has 132 columns and around 15000 rows

Can you please advise on this?

Thanks
Ana

On Tue, Feb 4, 2020 at 4:45 PM Bert Gunter <bgunter.4567 at gmail.com>
wrote:>
> Yes, a clear thinko... Thanks for the correction.
>
> -- Bert
>
> On Tue, Feb 4, 2020 at 1:41 PM Duncan Murdoch <murdoch.duncan at
gmail.com> wrote:
>>
>> On 04/02/2020 4:28 p.m., Ana Marija wrote:
>> > I tired your code on this simplified data just for say 10
permutations:
>> >
>> > dat <- read.table(text = "   code.1 code.2 code.3 code.4
>> > 1     82     93     NA     NA
>> > 2     15     85     93     NA
>> > 3     93     89     NA     NA
>> > 4     81     NA     NA     NA",
>> >                    header = TRUE, stringsAsFactors = FALSE)
>> >
>> > dat2=data.matrix(dat)
>> >
>> >> result<- lapply(seq_len(10), FUN = function(dat2){
>> > +     dat2 <- dat2[, sample.int(4)]
>> > +     print(colnames(dat2))
>> > + } )
>> > Error in dat2[, sample.int(4)] : incorrect number of dimensions
>>
>> Yes, Bert did suggest that, but it's obviously wrong.  The argument
to
>> FUN is an element of seq_len(10), it's not the full dataset.  Try
>>
>> result<- lapply(seq_len(10), FUN = function(i){
>>       dat <- dat2[, sample.int(4)]
>>       print(colnames(dat))
>>   } )
>>
>> Duncan Murdoch
>>
>> >
>> >
>> > On Tue, Feb 4, 2020 at 3:10 PM Bert Gunter <bgunter.4567 at
gmail.com> wrote:
>> >>
>> >> I am not going to do your programming for you. If the
following doesn't suffice, maybe someone else will provide you something
that will.
>> >>
>> >> m = your matrix
>> >>
>> >> code = your code that uses m
>> >>
>> >> your list of results <- lapply(seq_len(1000), FUN =
function(m){
>> >>    m <- m[, sample.int(132)]
>> >>   code
>> >> } )
>> >>
>> >> or use an explicit for() loop to populate a list or vector
with your results.
>> >>
>> >> Again, if I have misunderstood what you want to do, then
clarify, and perhaps someone else will help.
>> >>
>> >> -- Bert
>> >>
>> >>
>> >>
>> >> Bert Gunter
>> >>
>> >> "The trouble with having an open mind is that people keep
coming along and sticking things into it."
>> >> -- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
>> >>
>> >>
>> >> On Tue, Feb 4, 2020 at 12:41 PM Ana Marija
<sokovic.anamarija at gmail.com> wrote:
>> >>>
>> >>> Hi Bert,
>> >>>
>> >>> thanks for getting back to me. I have to permute those 132
columns
>> >>> 1000 times and perform the code given in the previous
email.
>> >>>
>> >>> Can you please show me how you would do that in the loop?
This is also
>> >>> a huge data set ...
>> >>>
>> >>> Thanks
>> >>> Ana
>> >>>
>> >>> On Tue, Feb 4, 2020 at 2:34 PM Bert Gunter
<bgunter.4567 at gmail.com> wrote:
>> >>>>
>> >>>> If you just want to permute columns of a matrix,
>> >>>>
>> >>>> ?sample
>> >>>>> sample.int(10)
>> >>>>   [1]  9  2 10  8  4  6  3  1  5  7
>> >>>>
>> >>>> and you can just use this as an index into the columns
of your matrix, presumably within a loop of some sort.
>> >>>>
>> >>>> If I have misunderstood, just ignore.
>> >>>>
>> >>>> Cheers,
>> >>>> Bert
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Tue, Feb 4, 2020 at 12:23 PM Ana Marija
<sokovic.anamarija at gmail.com> wrote:
>> >>>>>
>> >>>>> Hello,
>> >>>>>
>> >>>>> I have a matrix
>> >>>>>> dim(dat)
>> >>>>> [1] 15568   132
>> >>>>>
>> >>>>> It looks like this:
>> >>>>>
>> >>>>>                     NoD_14381_norm.1
NoD_14381_norm.2 NoD_14381_norm.3
>> >>>>> NoD_14520_30mM.1 NoD_14520_30mM.2 NoD_14520_30mM.3
>> >>>>> Ku8QhfS0n_hIOABXuE             4.75            
4.25             4.79
>> >>>>>             4.33             4.63             3.85
>> >>>>> Bx496XsFXiAlj.Eaeo             6.15            
6.23             6.55
>> >>>>>             6.26             6.24             5.99
>> >>>>> W38p0ogk.wIBVRXllY             7.13            
7.35             7.55
>> >>>>>             7.37             7.36             7.55
>> >>>>> QIBkqIS9LR5DfTlTS8             6.27            
6.73             6.45
>> >>>>>             5.39             4.75             4.96
>> >>>>> BZKiEvS0eQ305U0v34             6.35            
7.02             6.76
>> >>>>>             5.45             5.25             5.02
>> >>>>> 6TheVd.HiE1UF3lX6g             5.53            
5.02             5.36
>> >>>>>             5.61             5.66             5.37
>> >>>>>
>> >>>>> So it is a matrix with gene names ex.
Ku8QhfS0n_hIOABXuE, and subjects
>> >>>>> named ex. NoD_14381_norm.1
>> >>>>>
>> >>>>>
>> >>>>> How to do 1000 permutations of these 132 columns
and on each created
>> >>>>> new permuted matrix perform this code:
>> >>>>>
>> >>>>> subject="all_replicate"
>> >>>>>
targets<-readTargets(paste(PhenotypeDir,"hg_sg_",subject,"_target.txt",
sep=''))
>> >>>>> Treat <-
factor(targets$Treatment,levels=c("C","T"))
>> >>>>> Replicates <- factor(targets$rep)
>> >>>>> design <- model.matrix(~Replicates+Treat)
>> >>>>> corfit <- duplicateCorrelation(dat, block =
targets$Subject)
>> >>>>> corfit$consensus.correlation
>> >>>>> fit
<-lmFit(dat,design,block=targets$Subject,correlation=corfit$consensus.correlation)
>> >>>>> fit<-eBayes(fit)
>> >>>>> qval.cutoff=0.1; FC.cutoff=0.17
>> >>>>> y1=topTable(fit, coef="TreatT",
n=nrow(genes),adjust.method="BH",genelist=genes)
>> >>>>>
>> >>>>> y1 for each iteration of permutation would  have
P.Value column and
>> >>>>> these I would have plotted on the end to find the
distribution of all
>> >>>>> p values generated in those 1000 permutations.
>> >>>>>
>> >>>>> Please advise,
>> >>>>> Ana
>> >>>>>
>> >>>>> ______________________________________________
>> >>>>> R-help at r-project.org mailing list -- To
UNSUBSCRIBE and more, see
>> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> >>>>> and provide commented, minimal, self-contained,
reproducible code.
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>

Ivan Krylov

2020-Feb-05 16:11 UTC

head link

[R] doing 1000 permutations and doing test statistics distribution

On Wed, 5 Feb 2020 09:15:16 -0600
Ana Marija <sokovic.anamarija at gmail.com> wrote:
> I tried to solve the task via following code:
> all_results <- lapply(manyorders, function(ord) {
# ...
>  list(fit = fit, y1 = y1)
> })
> and I wrote all_results in a file
> write.table(all_results, file="all_res", sep = " ",
row.names = FALSE,
> col.names = TRUE,quote=FALSE)
all_results is a list of lists of pairs of data.frames and MArrayLM
S3 objects, while write.table works best with data.frames (or things
that make sense when coerced to a data.frame). I am afraid that the
result of coercing a list of lists of lists into a data.frame may not
make sense.

What do you need the "all_res" file for? If you just want persistence
(save the data between runs of R programs, but not examine it
manually), consider using saveRDS/readRDS instead. If you need to
interoperate with non-R programs or want a text file for transparency
reasons, read on.
> when I tried to open the file:
> > a=read.table("all_res", header=T)  
> Error in read.table("all_res", header = T) :
>   more columns than column names
Do any of the strings stored in all_results contain spaces?
(y1[,'genelist'] might.) A combination of sep=" " and
quote=FALSE could
make such data impossible to read back unambiguously. Does it help to
re-enable quote=TRUE or switch to a different sep (like "\t")
that's
guaranteed to be absent from strings you are trying to save? Either
way, read.table() will not reconstruct the same all_results that was
fed to write.table() previously unless all_results is already a
data.frame (which it isn't).

-- 
Best regards,
Ivan

R help - Feb 2020 - doing 1000 permutations and doing test statistics distribution

[R] doing 1000 permutations and doing test statistics distribution

[R] doing 1000 permutations and doing test statistics distribution

[R] doing 1000 permutations and doing test statistics distribution