Ana Marija
2020-Feb-04 21:28 UTC
[R] doing 1000 permutations and doing test statistics distribution
I tired your code on this simplified data just for say 10 permutations: dat <- read.table(text = " code.1 code.2 code.3 code.4 1 82 93 NA NA 2 15 85 93 NA 3 93 89 NA NA 4 81 NA NA NA", header = TRUE, stringsAsFactors = FALSE) dat2=data.matrix(dat)> result<- lapply(seq_len(10), FUN = function(dat2){+ dat2 <- dat2[, sample.int(4)] + print(colnames(dat2)) + } ) Error in dat2[, sample.int(4)] : incorrect number of dimensions On Tue, Feb 4, 2020 at 3:10 PM Bert Gunter <bgunter.4567 at gmail.com> wrote:> > I am not going to do your programming for you. If the following doesn't suffice, maybe someone else will provide you something that will. > > m = your matrix > > code = your code that uses m > > your list of results <- lapply(seq_len(1000), FUN = function(m){ > m <- m[, sample.int(132)] > code > } ) > > or use an explicit for() loop to populate a list or vector with your results. > > Again, if I have misunderstood what you want to do, then clarify, and perhaps someone else will help. > > -- Bert > > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Tue, Feb 4, 2020 at 12:41 PM Ana Marija <sokovic.anamarija at gmail.com> wrote: >> >> Hi Bert, >> >> thanks for getting back to me. I have to permute those 132 columns >> 1000 times and perform the code given in the previous email. >> >> Can you please show me how you would do that in the loop? This is also >> a huge data set ... >> >> Thanks >> Ana >> >> On Tue, Feb 4, 2020 at 2:34 PM Bert Gunter <bgunter.4567 at gmail.com> wrote: >> > >> > If you just want to permute columns of a matrix, >> > >> > ?sample >> > > sample.int(10) >> > [1] 9 2 10 8 4 6 3 1 5 7 >> > >> > and you can just use this as an index into the columns of your matrix, presumably within a loop of some sort. >> > >> > If I have misunderstood, just ignore. >> > >> > Cheers, >> > Bert >> > >> > >> > >> > >> > On Tue, Feb 4, 2020 at 12:23 PM Ana Marija <sokovic.anamarija at gmail.com> wrote: >> >> >> >> Hello, >> >> >> >> I have a matrix >> >> > dim(dat) >> >> [1] 15568 132 >> >> >> >> It looks like this: >> >> >> >> NoD_14381_norm.1 NoD_14381_norm.2 NoD_14381_norm.3 >> >> NoD_14520_30mM.1 NoD_14520_30mM.2 NoD_14520_30mM.3 >> >> Ku8QhfS0n_hIOABXuE 4.75 4.25 4.79 >> >> 4.33 4.63 3.85 >> >> Bx496XsFXiAlj.Eaeo 6.15 6.23 6.55 >> >> 6.26 6.24 5.99 >> >> W38p0ogk.wIBVRXllY 7.13 7.35 7.55 >> >> 7.37 7.36 7.55 >> >> QIBkqIS9LR5DfTlTS8 6.27 6.73 6.45 >> >> 5.39 4.75 4.96 >> >> BZKiEvS0eQ305U0v34 6.35 7.02 6.76 >> >> 5.45 5.25 5.02 >> >> 6TheVd.HiE1UF3lX6g 5.53 5.02 5.36 >> >> 5.61 5.66 5.37 >> >> >> >> So it is a matrix with gene names ex. Ku8QhfS0n_hIOABXuE, and subjects >> >> named ex. NoD_14381_norm.1 >> >> >> >> >> >> How to do 1000 permutations of these 132 columns and on each created >> >> new permuted matrix perform this code: >> >> >> >> subject="all_replicate" >> >> targets<-readTargets(paste(PhenotypeDir,"hg_sg_",subject,"_target.txt", sep='')) >> >> Treat <- factor(targets$Treatment,levels=c("C","T")) >> >> Replicates <- factor(targets$rep) >> >> design <- model.matrix(~Replicates+Treat) >> >> corfit <- duplicateCorrelation(dat, block = targets$Subject) >> >> corfit$consensus.correlation >> >> fit <-lmFit(dat,design,block=targets$Subject,correlation=corfit$consensus.correlation) >> >> fit<-eBayes(fit) >> >> qval.cutoff=0.1; FC.cutoff=0.17 >> >> y1=topTable(fit, coef="TreatT", n=nrow(genes),adjust.method="BH",genelist=genes) >> >> >> >> y1 for each iteration of permutation would have P.Value column and >> >> these I would have plotted on the end to find the distribution of all >> >> p values generated in those 1000 permutations. >> >> >> >> Please advise, >> >> Ana >> >> >> >> ______________________________________________ >> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code.
Duncan Murdoch
2020-Feb-04 21:41 UTC
[R] doing 1000 permutations and doing test statistics distribution
On 04/02/2020 4:28 p.m., Ana Marija wrote:> I tired your code on this simplified data just for say 10 permutations: > > dat <- read.table(text = " code.1 code.2 code.3 code.4 > 1 82 93 NA NA > 2 15 85 93 NA > 3 93 89 NA NA > 4 81 NA NA NA", > header = TRUE, stringsAsFactors = FALSE) > > dat2=data.matrix(dat) > >> result<- lapply(seq_len(10), FUN = function(dat2){ > + dat2 <- dat2[, sample.int(4)] > + print(colnames(dat2)) > + } ) > Error in dat2[, sample.int(4)] : incorrect number of dimensionsYes, Bert did suggest that, but it's obviously wrong. The argument to FUN is an element of seq_len(10), it's not the full dataset. Try result<- lapply(seq_len(10), FUN = function(i){ dat <- dat2[, sample.int(4)] print(colnames(dat)) } ) Duncan Murdoch> > > On Tue, Feb 4, 2020 at 3:10 PM Bert Gunter <bgunter.4567 at gmail.com> wrote: >> >> I am not going to do your programming for you. If the following doesn't suffice, maybe someone else will provide you something that will. >> >> m = your matrix >> >> code = your code that uses m >> >> your list of results <- lapply(seq_len(1000), FUN = function(m){ >> m <- m[, sample.int(132)] >> code >> } ) >> >> or use an explicit for() loop to populate a list or vector with your results. >> >> Again, if I have misunderstood what you want to do, then clarify, and perhaps someone else will help. >> >> -- Bert >> >> >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Tue, Feb 4, 2020 at 12:41 PM Ana Marija <sokovic.anamarija at gmail.com> wrote: >>> >>> Hi Bert, >>> >>> thanks for getting back to me. I have to permute those 132 columns >>> 1000 times and perform the code given in the previous email. >>> >>> Can you please show me how you would do that in the loop? This is also >>> a huge data set ... >>> >>> Thanks >>> Ana >>> >>> On Tue, Feb 4, 2020 at 2:34 PM Bert Gunter <bgunter.4567 at gmail.com> wrote: >>>> >>>> If you just want to permute columns of a matrix, >>>> >>>> ?sample >>>>> sample.int(10) >>>> [1] 9 2 10 8 4 6 3 1 5 7 >>>> >>>> and you can just use this as an index into the columns of your matrix, presumably within a loop of some sort. >>>> >>>> If I have misunderstood, just ignore. >>>> >>>> Cheers, >>>> Bert >>>> >>>> >>>> >>>> >>>> On Tue, Feb 4, 2020 at 12:23 PM Ana Marija <sokovic.anamarija at gmail.com> wrote: >>>>> >>>>> Hello, >>>>> >>>>> I have a matrix >>>>>> dim(dat) >>>>> [1] 15568 132 >>>>> >>>>> It looks like this: >>>>> >>>>> NoD_14381_norm.1 NoD_14381_norm.2 NoD_14381_norm.3 >>>>> NoD_14520_30mM.1 NoD_14520_30mM.2 NoD_14520_30mM.3 >>>>> Ku8QhfS0n_hIOABXuE 4.75 4.25 4.79 >>>>> 4.33 4.63 3.85 >>>>> Bx496XsFXiAlj.Eaeo 6.15 6.23 6.55 >>>>> 6.26 6.24 5.99 >>>>> W38p0ogk.wIBVRXllY 7.13 7.35 7.55 >>>>> 7.37 7.36 7.55 >>>>> QIBkqIS9LR5DfTlTS8 6.27 6.73 6.45 >>>>> 5.39 4.75 4.96 >>>>> BZKiEvS0eQ305U0v34 6.35 7.02 6.76 >>>>> 5.45 5.25 5.02 >>>>> 6TheVd.HiE1UF3lX6g 5.53 5.02 5.36 >>>>> 5.61 5.66 5.37 >>>>> >>>>> So it is a matrix with gene names ex. Ku8QhfS0n_hIOABXuE, and subjects >>>>> named ex. NoD_14381_norm.1 >>>>> >>>>> >>>>> How to do 1000 permutations of these 132 columns and on each created >>>>> new permuted matrix perform this code: >>>>> >>>>> subject="all_replicate" >>>>> targets<-readTargets(paste(PhenotypeDir,"hg_sg_",subject,"_target.txt", sep='')) >>>>> Treat <- factor(targets$Treatment,levels=c("C","T")) >>>>> Replicates <- factor(targets$rep) >>>>> design <- model.matrix(~Replicates+Treat) >>>>> corfit <- duplicateCorrelation(dat, block = targets$Subject) >>>>> corfit$consensus.correlation >>>>> fit <-lmFit(dat,design,block=targets$Subject,correlation=corfit$consensus.correlation) >>>>> fit<-eBayes(fit) >>>>> qval.cutoff=0.1; FC.cutoff=0.17 >>>>> y1=topTable(fit, coef="TreatT", n=nrow(genes),adjust.method="BH",genelist=genes) >>>>> >>>>> y1 for each iteration of permutation would have P.Value column and >>>>> these I would have plotted on the end to find the distribution of all >>>>> p values generated in those 1000 permutations. >>>>> >>>>> Please advise, >>>>> Ana >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Bert Gunter
2020-Feb-04 22:45 UTC
[R] doing 1000 permutations and doing test statistics distribution
Yes, a clear thinko... Thanks for the correction. -- Bert On Tue, Feb 4, 2020 at 1:41 PM Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> On 04/02/2020 4:28 p.m., Ana Marija wrote: > > I tired your code on this simplified data just for say 10 permutations: > > > > dat <- read.table(text = " code.1 code.2 code.3 code.4 > > 1 82 93 NA NA > > 2 15 85 93 NA > > 3 93 89 NA NA > > 4 81 NA NA NA", > > header = TRUE, stringsAsFactors = FALSE) > > > > dat2=data.matrix(dat) > > > >> result<- lapply(seq_len(10), FUN = function(dat2){ > > + dat2 <- dat2[, sample.int(4)] > > + print(colnames(dat2)) > > + } ) > > Error in dat2[, sample.int(4)] : incorrect number of dimensions > > Yes, Bert did suggest that, but it's obviously wrong. The argument to > FUN is an element of seq_len(10), it's not the full dataset. Try > > result<- lapply(seq_len(10), FUN = function(i){ > dat <- dat2[, sample.int(4)] > print(colnames(dat)) > } ) > > Duncan Murdoch > > > > > > > On Tue, Feb 4, 2020 at 3:10 PM Bert Gunter <bgunter.4567 at gmail.com> > wrote: > >> > >> I am not going to do your programming for you. If the following doesn't > suffice, maybe someone else will provide you something that will. > >> > >> m = your matrix > >> > >> code = your code that uses m > >> > >> your list of results <- lapply(seq_len(1000), FUN = function(m){ > >> m <- m[, sample.int(132)] > >> code > >> } ) > >> > >> or use an explicit for() loop to populate a list or vector with your > results. > >> > >> Again, if I have misunderstood what you want to do, then clarify, and > perhaps someone else will help. > >> > >> -- Bert > >> > >> > >> > >> Bert Gunter > >> > >> "The trouble with having an open mind is that people keep coming along > and sticking things into it." > >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >> > >> > >> On Tue, Feb 4, 2020 at 12:41 PM Ana Marija <sokovic.anamarija at gmail.com> > wrote: > >>> > >>> Hi Bert, > >>> > >>> thanks for getting back to me. I have to permute those 132 columns > >>> 1000 times and perform the code given in the previous email. > >>> > >>> Can you please show me how you would do that in the loop? This is also > >>> a huge data set ... > >>> > >>> Thanks > >>> Ana > >>> > >>> On Tue, Feb 4, 2020 at 2:34 PM Bert Gunter <bgunter.4567 at gmail.com> > wrote: > >>>> > >>>> If you just want to permute columns of a matrix, > >>>> > >>>> ?sample > >>>>> sample.int(10) > >>>> [1] 9 2 10 8 4 6 3 1 5 7 > >>>> > >>>> and you can just use this as an index into the columns of your > matrix, presumably within a loop of some sort. > >>>> > >>>> If I have misunderstood, just ignore. > >>>> > >>>> Cheers, > >>>> Bert > >>>> > >>>> > >>>> > >>>> > >>>> On Tue, Feb 4, 2020 at 12:23 PM Ana Marija < > sokovic.anamarija at gmail.com> wrote: > >>>>> > >>>>> Hello, > >>>>> > >>>>> I have a matrix > >>>>>> dim(dat) > >>>>> [1] 15568 132 > >>>>> > >>>>> It looks like this: > >>>>> > >>>>> NoD_14381_norm.1 NoD_14381_norm.2 > NoD_14381_norm.3 > >>>>> NoD_14520_30mM.1 NoD_14520_30mM.2 NoD_14520_30mM.3 > >>>>> Ku8QhfS0n_hIOABXuE 4.75 4.25 4.79 > >>>>> 4.33 4.63 3.85 > >>>>> Bx496XsFXiAlj.Eaeo 6.15 6.23 6.55 > >>>>> 6.26 6.24 5.99 > >>>>> W38p0ogk.wIBVRXllY 7.13 7.35 7.55 > >>>>> 7.37 7.36 7.55 > >>>>> QIBkqIS9LR5DfTlTS8 6.27 6.73 6.45 > >>>>> 5.39 4.75 4.96 > >>>>> BZKiEvS0eQ305U0v34 6.35 7.02 6.76 > >>>>> 5.45 5.25 5.02 > >>>>> 6TheVd.HiE1UF3lX6g 5.53 5.02 5.36 > >>>>> 5.61 5.66 5.37 > >>>>> > >>>>> So it is a matrix with gene names ex. Ku8QhfS0n_hIOABXuE, and > subjects > >>>>> named ex. NoD_14381_norm.1 > >>>>> > >>>>> > >>>>> How to do 1000 permutations of these 132 columns and on each created > >>>>> new permuted matrix perform this code: > >>>>> > >>>>> subject="all_replicate" > >>>>> > targets<-readTargets(paste(PhenotypeDir,"hg_sg_",subject,"_target.txt", > sep='')) > >>>>> Treat <- factor(targets$Treatment,levels=c("C","T")) > >>>>> Replicates <- factor(targets$rep) > >>>>> design <- model.matrix(~Replicates+Treat) > >>>>> corfit <- duplicateCorrelation(dat, block = targets$Subject) > >>>>> corfit$consensus.correlation > >>>>> fit > <-lmFit(dat,design,block=targets$Subject,correlation=corfit$consensus.correlation) > >>>>> fit<-eBayes(fit) > >>>>> qval.cutoff=0.1; FC.cutoff=0.17 > >>>>> y1=topTable(fit, coef="TreatT", > n=nrow(genes),adjust.method="BH",genelist=genes) > >>>>> > >>>>> y1 for each iteration of permutation would have P.Value column and > >>>>> these I would have plotted on the end to find the distribution of all > >>>>> p values generated in those 1000 permutations. > >>>>> > >>>>> Please advise, > >>>>> Ana > >>>>> > >>>>> ______________________________________________ > >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>>> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >>>>> and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > >[[alternative HTML version deleted]]