Ana Marija
2020-Feb-04 20:46 UTC
[R] doing 1000 permutations and doing test statistics distribution
Hi Bert, thanks for getting back to me. I have to permute those 132 columns 1000 times and perform the code given in the previous email. Can you please show me how you would do that in the loop? This is also a huge data set ... Thanks Ana On Tue, Feb 4, 2020 at 2:34 PM Bert Gunter <bgunter.4567 at gmail.com> wrote:> > If you just want to permute columns of a matrix, > > ?sample > > sample.int(10) > [1] 9 2 10 8 4 6 3 1 5 7 > > and you can just use this as an index into the columns of your matrix, presumably within a loop of some sort. > > If I have misunderstood, just ignore. > > Cheers, > Bert > > > > > On Tue, Feb 4, 2020 at 12:23 PM Ana Marija <sokovic.anamarija at gmail.com> wrote: >> >> Hello, >> >> I have a matrix >> > dim(dat) >> [1] 15568 132 >> >> It looks like this: >> >> NoD_14381_norm.1 NoD_14381_norm.2 NoD_14381_norm.3 >> NoD_14520_30mM.1 NoD_14520_30mM.2 NoD_14520_30mM.3 >> Ku8QhfS0n_hIOABXuE 4.75 4.25 4.79 >> 4.33 4.63 3.85 >> Bx496XsFXiAlj.Eaeo 6.15 6.23 6.55 >> 6.26 6.24 5.99 >> W38p0ogk.wIBVRXllY 7.13 7.35 7.55 >> 7.37 7.36 7.55 >> QIBkqIS9LR5DfTlTS8 6.27 6.73 6.45 >> 5.39 4.75 4.96 >> BZKiEvS0eQ305U0v34 6.35 7.02 6.76 >> 5.45 5.25 5.02 >> 6TheVd.HiE1UF3lX6g 5.53 5.02 5.36 >> 5.61 5.66 5.37 >> >> So it is a matrix with gene names ex. Ku8QhfS0n_hIOABXuE, and subjects >> named ex. NoD_14381_norm.1 >> >> >> How to do 1000 permutations of these 132 columns and on each created >> new permuted matrix perform this code: >> >> subject="all_replicate" >> targets<-readTargets(paste(PhenotypeDir,"hg_sg_",subject,"_target.txt", sep='')) >> Treat <- factor(targets$Treatment,levels=c("C","T")) >> Replicates <- factor(targets$rep) >> design <- model.matrix(~Replicates+Treat) >> corfit <- duplicateCorrelation(dat, block = targets$Subject) >> corfit$consensus.correlation >> fit <-lmFit(dat,design,block=targets$Subject,correlation=corfit$consensus.correlation) >> fit<-eBayes(fit) >> qval.cutoff=0.1; FC.cutoff=0.17 >> y1=topTable(fit, coef="TreatT", n=nrow(genes),adjust.method="BH",genelist=genes) >> >> y1 for each iteration of permutation would have P.Value column and >> these I would have plotted on the end to find the distribution of all >> p values generated in those 1000 permutations. >> >> Please advise, >> Ana >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.
Ana Marija
2020-Feb-04 20:46 UTC
[R] doing 1000 permutations and doing test statistics distribution
Basically I would just reshuffle column names in each of 1000 permutations how to do that and perform everything I described in my initial email On Tue, 4 Feb 2020 at 14:46, Ana Marija <sokovic.anamarija at gmail.com> wrote:> Hi Bert, > > thanks for getting back to me. I have to permute those 132 columns > 1000 times and perform the code given in the previous email. > > Can you please show me how you would do that in the loop? This is also > a huge data set ... > > Thanks > Ana > > On Tue, Feb 4, 2020 at 2:34 PM Bert Gunter <bgunter.4567 at gmail.com> wrote: > > > > If you just want to permute columns of a matrix, > > > > ?sample > > > sample.int(10) > > [1] 9 2 10 8 4 6 3 1 5 7 > > > > and you can just use this as an index into the columns of your matrix, > presumably within a loop of some sort. > > > > If I have misunderstood, just ignore. > > > > Cheers, > > Bert > > > > > > > > > > On Tue, Feb 4, 2020 at 12:23 PM Ana Marija <sokovic.anamarija at gmail.com> > wrote: > >> > >> Hello, > >> > >> I have a matrix > >> > dim(dat) > >> [1] 15568 132 > >> > >> It looks like this: > >> > >> NoD_14381_norm.1 NoD_14381_norm.2 NoD_14381_norm.3 > >> NoD_14520_30mM.1 NoD_14520_30mM.2 NoD_14520_30mM.3 > >> Ku8QhfS0n_hIOABXuE 4.75 4.25 4.79 > >> 4.33 4.63 3.85 > >> Bx496XsFXiAlj.Eaeo 6.15 6.23 6.55 > >> 6.26 6.24 5.99 > >> W38p0ogk.wIBVRXllY 7.13 7.35 7.55 > >> 7.37 7.36 7.55 > >> QIBkqIS9LR5DfTlTS8 6.27 6.73 6.45 > >> 5.39 4.75 4.96 > >> BZKiEvS0eQ305U0v34 6.35 7.02 6.76 > >> 5.45 5.25 5.02 > >> 6TheVd.HiE1UF3lX6g 5.53 5.02 5.36 > >> 5.61 5.66 5.37 > >> > >> So it is a matrix with gene names ex. Ku8QhfS0n_hIOABXuE, and subjects > >> named ex. NoD_14381_norm.1 > >> > >> > >> How to do 1000 permutations of these 132 columns and on each created > >> new permuted matrix perform this code: > >> > >> subject="all_replicate" > >> targets<-readTargets(paste(PhenotypeDir,"hg_sg_",subject,"_target.txt", > sep='')) > >> Treat <- factor(targets$Treatment,levels=c("C","T")) > >> Replicates <- factor(targets$rep) > >> design <- model.matrix(~Replicates+Treat) > >> corfit <- duplicateCorrelation(dat, block = targets$Subject) > >> corfit$consensus.correlation > >> fit > <-lmFit(dat,design,block=targets$Subject,correlation=corfit$consensus.correlation) > >> fit<-eBayes(fit) > >> qval.cutoff=0.1; FC.cutoff=0.17 > >> y1=topTable(fit, coef="TreatT", > n=nrow(genes),adjust.method="BH",genelist=genes) > >> > >> y1 for each iteration of permutation would have P.Value column and > >> these I would have plotted on the end to find the distribution of all > >> p values generated in those 1000 permutations. > >> > >> Please advise, > >> Ana > >> > >> ______________________________________________ > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Bert Gunter
2020-Feb-04 21:10 UTC
[R] doing 1000 permutations and doing test statistics distribution
I am not going to do your programming for you. If the following doesn't suffice, maybe someone else will provide you something that will. m = your matrix code = your code that uses m your list of results <- lapply(seq_len(1000), FUN = function(m){ m <- m[, sample.int(132)] code } ) or use an explicit for() loop to populate a list or vector with your results. Again, if I have misunderstood what you want to do, then clarify, and perhaps someone else will help. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Feb 4, 2020 at 12:41 PM Ana Marija <sokovic.anamarija at gmail.com> wrote:> Hi Bert, > > thanks for getting back to me. I have to permute those 132 columns > 1000 times and perform the code given in the previous email. > > Can you please show me how you would do that in the loop? This is also > a huge data set ... > > Thanks > Ana > > On Tue, Feb 4, 2020 at 2:34 PM Bert Gunter <bgunter.4567 at gmail.com> wrote: > > > > If you just want to permute columns of a matrix, > > > > ?sample > > > sample.int(10) > > [1] 9 2 10 8 4 6 3 1 5 7 > > > > and you can just use this as an index into the columns of your matrix, > presumably within a loop of some sort. > > > > If I have misunderstood, just ignore. > > > > Cheers, > > Bert > > > > > > > > > > On Tue, Feb 4, 2020 at 12:23 PM Ana Marija <sokovic.anamarija at gmail.com> > wrote: > >> > >> Hello, > >> > >> I have a matrix > >> > dim(dat) > >> [1] 15568 132 > >> > >> It looks like this: > >> > >> NoD_14381_norm.1 NoD_14381_norm.2 NoD_14381_norm.3 > >> NoD_14520_30mM.1 NoD_14520_30mM.2 NoD_14520_30mM.3 > >> Ku8QhfS0n_hIOABXuE 4.75 4.25 4.79 > >> 4.33 4.63 3.85 > >> Bx496XsFXiAlj.Eaeo 6.15 6.23 6.55 > >> 6.26 6.24 5.99 > >> W38p0ogk.wIBVRXllY 7.13 7.35 7.55 > >> 7.37 7.36 7.55 > >> QIBkqIS9LR5DfTlTS8 6.27 6.73 6.45 > >> 5.39 4.75 4.96 > >> BZKiEvS0eQ305U0v34 6.35 7.02 6.76 > >> 5.45 5.25 5.02 > >> 6TheVd.HiE1UF3lX6g 5.53 5.02 5.36 > >> 5.61 5.66 5.37 > >> > >> So it is a matrix with gene names ex. Ku8QhfS0n_hIOABXuE, and subjects > >> named ex. NoD_14381_norm.1 > >> > >> > >> How to do 1000 permutations of these 132 columns and on each created > >> new permuted matrix perform this code: > >> > >> subject="all_replicate" > >> targets<-readTargets(paste(PhenotypeDir,"hg_sg_",subject,"_target.txt", > sep='')) > >> Treat <- factor(targets$Treatment,levels=c("C","T")) > >> Replicates <- factor(targets$rep) > >> design <- model.matrix(~Replicates+Treat) > >> corfit <- duplicateCorrelation(dat, block = targets$Subject) > >> corfit$consensus.correlation > >> fit > <-lmFit(dat,design,block=targets$Subject,correlation=corfit$consensus.correlation) > >> fit<-eBayes(fit) > >> qval.cutoff=0.1; FC.cutoff=0.17 > >> y1=topTable(fit, coef="TreatT", > n=nrow(genes),adjust.method="BH",genelist=genes) > >> > >> y1 for each iteration of permutation would have P.Value column and > >> these I would have plotted on the end to find the distribution of all > >> p values generated in those 1000 permutations. > >> > >> Please advise, > >> Ana > >> > >> ______________________________________________ > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Ana Marija
2020-Feb-04 21:28 UTC
[R] doing 1000 permutations and doing test statistics distribution
I tired your code on this simplified data just for say 10 permutations: dat <- read.table(text = " code.1 code.2 code.3 code.4 1 82 93 NA NA 2 15 85 93 NA 3 93 89 NA NA 4 81 NA NA NA", header = TRUE, stringsAsFactors = FALSE) dat2=data.matrix(dat)> result<- lapply(seq_len(10), FUN = function(dat2){+ dat2 <- dat2[, sample.int(4)] + print(colnames(dat2)) + } ) Error in dat2[, sample.int(4)] : incorrect number of dimensions On Tue, Feb 4, 2020 at 3:10 PM Bert Gunter <bgunter.4567 at gmail.com> wrote:> > I am not going to do your programming for you. If the following doesn't suffice, maybe someone else will provide you something that will. > > m = your matrix > > code = your code that uses m > > your list of results <- lapply(seq_len(1000), FUN = function(m){ > m <- m[, sample.int(132)] > code > } ) > > or use an explicit for() loop to populate a list or vector with your results. > > Again, if I have misunderstood what you want to do, then clarify, and perhaps someone else will help. > > -- Bert > > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Tue, Feb 4, 2020 at 12:41 PM Ana Marija <sokovic.anamarija at gmail.com> wrote: >> >> Hi Bert, >> >> thanks for getting back to me. I have to permute those 132 columns >> 1000 times and perform the code given in the previous email. >> >> Can you please show me how you would do that in the loop? This is also >> a huge data set ... >> >> Thanks >> Ana >> >> On Tue, Feb 4, 2020 at 2:34 PM Bert Gunter <bgunter.4567 at gmail.com> wrote: >> > >> > If you just want to permute columns of a matrix, >> > >> > ?sample >> > > sample.int(10) >> > [1] 9 2 10 8 4 6 3 1 5 7 >> > >> > and you can just use this as an index into the columns of your matrix, presumably within a loop of some sort. >> > >> > If I have misunderstood, just ignore. >> > >> > Cheers, >> > Bert >> > >> > >> > >> > >> > On Tue, Feb 4, 2020 at 12:23 PM Ana Marija <sokovic.anamarija at gmail.com> wrote: >> >> >> >> Hello, >> >> >> >> I have a matrix >> >> > dim(dat) >> >> [1] 15568 132 >> >> >> >> It looks like this: >> >> >> >> NoD_14381_norm.1 NoD_14381_norm.2 NoD_14381_norm.3 >> >> NoD_14520_30mM.1 NoD_14520_30mM.2 NoD_14520_30mM.3 >> >> Ku8QhfS0n_hIOABXuE 4.75 4.25 4.79 >> >> 4.33 4.63 3.85 >> >> Bx496XsFXiAlj.Eaeo 6.15 6.23 6.55 >> >> 6.26 6.24 5.99 >> >> W38p0ogk.wIBVRXllY 7.13 7.35 7.55 >> >> 7.37 7.36 7.55 >> >> QIBkqIS9LR5DfTlTS8 6.27 6.73 6.45 >> >> 5.39 4.75 4.96 >> >> BZKiEvS0eQ305U0v34 6.35 7.02 6.76 >> >> 5.45 5.25 5.02 >> >> 6TheVd.HiE1UF3lX6g 5.53 5.02 5.36 >> >> 5.61 5.66 5.37 >> >> >> >> So it is a matrix with gene names ex. Ku8QhfS0n_hIOABXuE, and subjects >> >> named ex. NoD_14381_norm.1 >> >> >> >> >> >> How to do 1000 permutations of these 132 columns and on each created >> >> new permuted matrix perform this code: >> >> >> >> subject="all_replicate" >> >> targets<-readTargets(paste(PhenotypeDir,"hg_sg_",subject,"_target.txt", sep='')) >> >> Treat <- factor(targets$Treatment,levels=c("C","T")) >> >> Replicates <- factor(targets$rep) >> >> design <- model.matrix(~Replicates+Treat) >> >> corfit <- duplicateCorrelation(dat, block = targets$Subject) >> >> corfit$consensus.correlation >> >> fit <-lmFit(dat,design,block=targets$Subject,correlation=corfit$consensus.correlation) >> >> fit<-eBayes(fit) >> >> qval.cutoff=0.1; FC.cutoff=0.17 >> >> y1=topTable(fit, coef="TreatT", n=nrow(genes),adjust.method="BH",genelist=genes) >> >> >> >> y1 for each iteration of permutation would have P.Value column and >> >> these I would have plotted on the end to find the distribution of all >> >> p values generated in those 1000 permutations. >> >> >> >> Please advise, >> >> Ana >> >> >> >> ______________________________________________ >> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code.