thr3ads.net - similar to: "Selecting a subsample so that it follows a distribution."

Displaying 20 results from an estimated 200 matches similar to: "Selecting a subsample so that it follows a distribution."

Normality tests on groups of rows in a data frame, grouped based on content in other columns

2011 Oct 30

Normality tests on groups of rows in a data frame, grouped based on content in other columns

Dear R users, I have a data frame in the form below, on which I would like to make normality tests on the values in the ExpressionLevel column. > head(df) ID Plant Tissue Gene ExpressionLevel 1 1 p1 t1 g1 366.53 2 2 p1 t1 g2 0.57 3 3 p1 t1 g3 11.81 4 4 p1 t2 g1 498.43 5 5 p1 t2 g2 2.14 6 6 p1 t2 g3 7.85 I

help on list comparison

2008 Jul 02

help on list comparison

hi I want to compare two list by its names and get the values of that list. can anybody let me know the syntax of comparing the list by their names using a for loop c.genes<- list() for(i in 1:100) c.genes[[1]]<- geneset(which(geneset == tobecampared[i])) } here geneset is a list and also tobecampared is a list Thank you Ramya -- View this message in context:

For loop

2008 Jun 27

For loop

Hi, Could you please let me know to use a list in a for loop here geneset is a loop.I am trying to match the names of the list with 1st row of the output. result<- list() for(i in 1:length(output) { result[[i]] <- geneset(which(geneset %n% output[,1])) } Kindly help me out -- View this message in context: http://www.nabble.com/For-loop-tp18163665p18163665.html Sent from the R

writing the plots

2008 Jul 28

writing the plots

hi there, I want to write the plots in the pdfs and the details about the graph in a seperate notepad. plot(as.numeric(lapply(resultgenes,length)), main= "Geneset.gene#.bias.test",xlab="Top.Ranked.Genesets", ylab="gene.number.per.geneset") lines(loess.smooth(c(1:1000),as.numeric(lapply(resultgenes,length)), span = 2/3, degree = 1, family =

vector help

2008 Jul 21

vector help

hi I have vector test. It has 3 elements. I want to join the three into one vector. "Geneset=HSA04910_INSULIN_SIGNALING_PATHWAY-157- 20". how can i do it. > class(test) [1] "character" > test [1] "Geneset=HSA04910_INSULIN_SIGNALING_PATHWAY" "157" [3] "20" Ramya --

'load' does not properly add 'show' methods for classes extending 'list'

2007 Sep 25

'load' does not properly add 'show' methods for classes extending 'list'

The GeneSetCollection class in the Bioconductor package GSEABase extends 'list' > library(GSEABase) > showClass("GeneSetCollection") Slots: Name: .Data Class: list Extends: Class "list", from data part Class "vector", by class "list", distance 2 Class "AssayData", by class "list", distance 2 If I create

Error message GSA package

2010 Apr 19

Error message GSA package

Dear list, I have gene expression measurements obtained by PCR on 11 genes, tabulated as a data matrix. I'm attempting to use GSA package to distinguish any significant changes in these genes as a pathway. My response variable is binary, 0=no disease, 1=disease. I have read the PCR data into R as follows: data <-

Rscript segfaults with lazy loading

2009 Jun 24

Rscript segfaults with lazy loading

Hi, I have an RData file containing a GeneSetCollection object (Bioconductor), http://www.cs.mu.oz.au/~gabraham/c2.RData. I think it uses lazy loading because packages are only loaded when I access the object (see below) in the R console. When I try the same with Rscript, it segfaults. This happens on 2.9.0 both on Linux and Mac: Rscript -e 'load("c2.RData"); c2[1]' ***

.Call()

2009 Apr 09

.Call()

Hi guys, I want to transfer the following code from R into .Call compatible form. How can i do that? Thanks!!! INT sim; for(i in 1:sim){ if(i>2) genemat <- genemat[,sample(1:ncol(genemat))] ranklist[,1] <- apply(genemat, 1, function(x){ (mean(x[cols]) - mean(x[-cols]))/sd(x)}) ranklist <- ranklist[order(ranklist[,1]),]

repeat resampling with different subsample sizes

2013 Jan 18

repeat resampling with different subsample sizes

Hi, I'm trying to write a code (see below) to randomly resample measurements of one variable (say here the variable "counts" in the data frame "dat") with different resampled subsample sizes. The code works fine for a single resampled subsample size (in the code below = 10). I then tried to generalize this by writing a function with a loop, where in each loop the function

analyze subsample of dataframe

2008 Sep 16

analyze subsample of dataframe

Hi there, I'm dealing with a pretty big dataset (~22,000 entries) with numerous entries for every day over a period of several years. I have a column "judy" (for Julian Day) with 0 beginning on Jan. 1st of every new year (I want to compare tendencies between years). However, in order to control for a leap year (2004), I simply need to subtract 1 from every judy value for the year

Subsample points for mclust

2009 Jul 21

Subsample points for mclust

Hi all! I have an ordered vector of values. The distribution of these values can be modeled by a sum of Gaussians. So I'm using the package 'mclust' to get the Gaussians's parameters for this 1D distribution. It works very well, but, for input sizes above 100.000 values it starts taking really forever. Unfortunately my dataset has around 4.6M values... My question: is it

Size of subsample in ecodist mantel()

2012 Jun 28

Size of subsample in ecodist mantel()

What is the size of the boostrapped subsample in ecodist mantel() thanks [[alternative HTML version deleted]]

Big Data reading subsample csv

2012 Aug 16

Big Data reading subsample csv

Hello, I'm most grateful for your time to read this. I have a uber size 30GB file of 6 million records and 3000 (mostly categorical data) columns in csv format. I want to bootstrap subsamples for multinomial regression, but it's proving difficult even with my 64GB RAM in my machine and twice that swap file , the process becomes super slow and halts. I'm thinking about generating

Random selection from a subsample

2010 Dec 19

Random selection from a subsample

Dear Mailing List I have a data set (data4) consisting of a number of factors and a response variable. I wish to randomly sample from a combination of two of those factors (GIS_station and Distance_code2) and return a new dataframe containing the original data structure (i.e. all the columns) but only containing the randomly selected rows. The number of rows in each combination of GIS_station

Where can I find information on how to subsample a time series?

2009 Jun 26

Where can I find information on how to subsample a time series?

I suspect I'm looking in the wrong places, so guidance to the relevant documentation would be as welcome as a little code snippet. I have time series data stored in a MySQL database. There is the usual DATE field, along with a double precision number: there are daily values (including only normal working days: Monday through Friday). I actually have to do a couple things here. Because of

Sample of a subsample

2017 Sep 25

Sample of a subsample

For personal aesthetic reasons, I changed the name "data" to "dat". Your code, with a slight modification: set.seed (1357) ## for reproducibility dat <- data.frame(var1=seq(1:40), var2=seq(40,1)) dat$sampleNo <- 0 idx <- sample(seq(1,nrow(dat)), size=10, replace=F) dat[idx,"sampleNo"] <-1 ## yielding > dat var1 var2 sampleNo 1 1 40

Sample of a subsample

2017 Sep 25

Sample of a subsample

Hello everybody! I have the following problem: I'd like to select a sample from a subsample in a dataset. Actually, I don't want to select it, but to create a new variable sampleNo that indicates to which sample (one or two) a case belongs to. Lets suppose I have a dataset containing 40 cases: data <- data.frame(var1=seq(1:40), var2=seq(40,1)) The first sample (n=10) I drew like

Sample of a subsample

2017 Sep 25

Sample of a subsample

Hi David, I was about to post a reply when Bert responded. His answer is good and his comment to use the name 'dat' rather than 'data' is instructive. I am providing my suggestion as well because I think it may address what was causing you some confusion (mainly to use "which", but also the missing !) idx2 <- sample( which( (!data$var1%%2) & data$sampleNo==0 ),

how to subsample all possible combinations of n species taken 1:n at a time?

2009 Apr 06

how to subsample all possible combinations of n species taken 1:n at a time?

Hello I apologise for the length of this entry but please bear with me. In short: I need a way of subsampling communities from all possible communities of n taxa taken 1:n at a time without having to calculate all possible combinations (because this gives me a memory error - using combn() or expand.grid() at least). Does anyone know of a function? Or can you help me edit the combn or

similar to: Selecting a subsample so that it follows a distribution.