similar to: Selecting a subsample so that it follows a distribution.

Displaying 20 results from an estimated 200 matches similar to: "Selecting a subsample so that it follows a distribution."

2011 Oct 30
1
Normality tests on groups of rows in a data frame, grouped based on content in other columns
Dear R users, I have a data frame in the form below, on which I would like to make normality tests on the values in the ExpressionLevel column. > head(df) ID Plant Tissue Gene ExpressionLevel 1 1 p1 t1 g1 366.53 2 2 p1 t1 g2 0.57 3 3 p1 t1 g3 11.81 4 4 p1 t2 g1 498.43 5 5 p1 t2 g2 2.14 6 6 p1 t2 g3 7.85 I
2008 Jul 02
1
help on list comparison
hi I want to compare two list by its names and get the values of that list. can anybody let me know the syntax of comparing the list by their names using a for loop c.genes<- list() for(i in 1:100) c.genes[[1]]<- geneset(which(geneset == tobecampared[i])) } here geneset is a list and also tobecampared is a list Thank you Ramya -- View this message in context:
2008 Jun 27
3
For loop
Hi, Could you please let me know to use a list in a for loop here geneset is a loop.I am trying to match the names of the list with 1st row of the output. result<- list() for(i in 1:length(output) { result[[i]] <- geneset(which(geneset %n% output[,1])) } Kindly help me out -- View this message in context: http://www.nabble.com/For-loop-tp18163665p18163665.html Sent from the R
2008 Jul 28
2
writing the plots
hi there, I want to write the plots in the pdfs and the details about the graph in a seperate notepad. plot(as.numeric(lapply(resultgenes,length)), main= "Geneset.gene#.bias.test",xlab="Top.Ranked.Genesets", ylab="gene.number.per.geneset") lines(loess.smooth(c(1:1000),as.numeric(lapply(resultgenes,length)), span = 2/3, degree = 1, family =
2008 Jul 21
3
vector help
hi I have vector test. It has 3 elements. I want to join the three into one vector. "Geneset=HSA04910_INSULIN_SIGNALING_PATHWAY-157- 20". how can i do it. > class(test) [1] "character" > test [1] "Geneset=HSA04910_INSULIN_SIGNALING_PATHWAY" "157" [3] "20" Ramya --
2007 Sep 25
1
'load' does not properly add 'show' methods for classes extending 'list'
The GeneSetCollection class in the Bioconductor package GSEABase extends 'list' > library(GSEABase) > showClass("GeneSetCollection") Slots: Name: .Data Class: list Extends: Class "list", from data part Class "vector", by class "list", distance 2 Class "AssayData", by class "list", distance 2 If I create
2010 Apr 19
2
Error message GSA package
Dear list, I have gene expression measurements obtained by PCR on 11 genes, tabulated as a data matrix. I'm attempting to use GSA package to distinguish any significant changes in these genes as a pathway. My response variable is binary, 0=no disease, 1=disease. I have read the PCR data into R as follows: data <-
2009 Jun 24
1
Rscript segfaults with lazy loading
Hi, I have an RData file containing a GeneSetCollection object (Bioconductor), http://www.cs.mu.oz.au/~gabraham/c2.RData. I think it uses lazy loading because packages are only loaded when I access the object (see below) in the R console. When I try the same with Rscript, it segfaults. This happens on 2.9.0 both on Linux and Mac: Rscript -e 'load("c2.RData"); c2[1]' ***
2009 Apr 09
1
.Call()
Hi guys, I want to transfer the following code from R into .Call compatible form. How can i do that? Thanks!!! INT sim; for(i in 1:sim){ if(i>2) genemat <- genemat[,sample(1:ncol(genemat))] ranklist[,1] <- apply(genemat, 1, function(x){ (mean(x[cols]) - mean(x[-cols]))/sd(x)}) ranklist <- ranklist[order(ranklist[,1]),]
2013 Jan 18
0
repeat resampling with different subsample sizes
Hi, I'm trying to write a code (see below) to randomly resample measurements of one variable (say here the variable "counts" in the data frame "dat") with different resampled subsample sizes. The code works fine for a single resampled subsample size (in the code below = 10). I then tried to generalize this by writing a function with a loop, where in each loop the function
2008 Sep 16
1
analyze subsample of dataframe
Hi there, I'm dealing with a pretty big dataset (~22,000 entries) with numerous entries for every day over a period of several years. I have a column "judy" (for Julian Day) with 0 beginning on Jan. 1st of every new year (I want to compare tendencies between years). However, in order to control for a leap year (2004), I simply need to subtract 1 from every judy value for the year
2009 Jul 21
1
Subsample points for mclust
Hi all! I have an ordered vector of values. The distribution of these values can be modeled by a sum of Gaussians. So I'm using the package 'mclust' to get the Gaussians's parameters for this 1D distribution. It works very well, but, for input sizes above 100.000 values it starts taking really forever. Unfortunately my dataset has around 4.6M values... My question: is it
2012 Jun 28
2
Size of subsample in ecodist mantel()
What is the size of the boostrapped subsample in ecodist mantel() thanks [[alternative HTML version deleted]]
2012 Aug 16
1
Big Data reading subsample csv
Hello, I'm most grateful for your time to read this. I have a uber size 30GB file of 6 million records and 3000 (mostly categorical data) columns in csv format. I want to bootstrap subsamples for multinomial regression, but it's proving difficult even with my 64GB RAM in my machine and twice that swap file , the process becomes super slow and halts. I'm thinking about generating
2010 Dec 19
1
Random selection from a subsample
Dear Mailing List I have a data set (data4) consisting of a number of factors and a response variable. I wish to randomly sample from a combination of two of those factors (GIS_station and Distance_code2) and return a new dataframe containing the original data structure (i.e. all the columns) but only containing the randomly selected rows. The number of rows in each combination of GIS_station
2009 Jun 26
1
Where can I find information on how to subsample a time series?
I suspect I'm looking in the wrong places, so guidance to the relevant documentation would be as welcome as a little code snippet. I have time series data stored in a MySQL database. There is the usual DATE field, along with a double precision number: there are daily values (including only normal working days: Monday through Friday). I actually have to do a couple things here. Because of
2017 Sep 25
0
Sample of a subsample
For personal aesthetic reasons, I changed the name "data" to "dat". Your code, with a slight modification: set.seed (1357) ## for reproducibility dat <- data.frame(var1=seq(1:40), var2=seq(40,1)) dat$sampleNo <- 0 idx <- sample(seq(1,nrow(dat)), size=10, replace=F) dat[idx,"sampleNo"] <-1 ## yielding > dat var1 var2 sampleNo 1 1 40
2017 Sep 25
2
Sample of a subsample
Hello everybody! I have the following problem: I'd like to select a sample from a subsample in a dataset. Actually, I don't want to select it, but to create a new variable sampleNo that indicates to which sample (one or two) a case belongs to. Lets suppose I have a dataset containing 40 cases: data <- data.frame(var1=seq(1:40), var2=seq(40,1)) The first sample (n=10) I drew like
2017 Sep 25
1
Sample of a subsample
Hi David, I was about to post a reply when Bert responded. His answer is good and his comment to use the name 'dat' rather than 'data' is instructive. I am providing my suggestion as well because I think it may address what was causing you some confusion (mainly to use "which", but also the missing !) idx2 <- sample( which( (!data$var1%%2) & data$sampleNo==0 ),
2009 Apr 06
3
how to subsample all possible combinations of n species taken 1:n at a time?
Hello I apologise for the length of this entry but please bear with me. In short: I need a way of subsampling communities from all possible communities of n taxa taken 1:n at a time without having to calculate all possible combinations (because this gives me a memory error - using combn() or expand.grid() at least). Does anyone know of a function? Or can you help me edit the combn or