thr3ads.net - similar to: "Fw: Memory problem on a linux cluster using a large data set [Broadcast]"

Displaying 20 results from an estimated 1000 matches similar to: "Fw: Memory problem on a linux cluster using a large data set [Broadcast]"

Memory problem on a linux cluster using a large data set

2006 Dec 18

Memory problem on a linux cluster using a large data set

Hello, I have a large data set 320.000 rows and 1000 columns. All the data has the values 0,1,2. I wrote a script to remove all the rows with more than 46 missing values. This works perfect on a smaller dataset. But the problem arises when I try to run it on the larger data set I get an error “cannot allocate vector size 1240 kb”. I’ve searched through previous posts and found out that it might

Memory problem on a linux cluster using a large data set [Broadcast]

2006 Dec 21

Memory problem on a linux cluster using a large data set [Broadcast]

Thank you all for your help! So with all your suggestions we will try to run it on a computer with a 64 bits proccesor. But i've been told that the new R versions all work on a 32bits processor. I read in other posts that only the old R versions were capable of larger data sets and were running under 64 bit proccesors. I also read that they are adapting the new R version for 64 bits

Imputing data

2011 Dec 02

Imputing data

So I have a very big matrix of about 900 by 400 and there are a couple of NA in the list. I have used the following functions to impute the missing data data(pc) pc.na<-pc pc.roughfix <- na.roughfix(pc.na) pc.narf <- randomForest(pc.na, na.action=na.roughfix) yet it does not replace the NA in the list. Presently I want to replace the NA with maybe the mean of the rows or columns or

anyone know why package "RandomForest" na.roughfix is so slow??

2010 Jun 30

anyone know why package "RandomForest" na.roughfix is so slow??

Hi all, I am using the package "random forest" for random forest predictions. I like the package. However, I have fairly large data sets, and it can often take *hours* just to go through the "na.roughfix" call, which simply goes through and cleans up any NA values to either the median (numerical data) or the most frequent occurrence (factors). I am going to start

rfImpute

2007 Aug 10

rfImpute

I am having trouble with the rfImpute function in the randomForest package. Here is a sample... clunk.roughfix<-na.roughfix(clunk) > > clunk.impute<-rfImpute(CONVERT~.,data=clunk) ntree OOB 1 2 300: 26.80% 3.83% 85.37% ntree OOB 1 2 300: 18.56% 5.74% 51.22% Error in randomForest.default(xf, y, ntree = ntree, ..., do.trace = ntree, : NA not

Matching and merging two rows with missing values

2007 Oct 11

Matching and merging two rows with missing values

Hello, I have two rows which are almost identical but miss different values at different locations. I would like to merge these two rows so that the missing values are replaced by the element in the same column on the other row making one row. If both rows contain a NA, NA remains in the output. 1 2 3 4 5 Row1 AA AG GG NA NA Row2 NA AG GG AA NA The

NA in R package randomForest

2012 Mar 26

NA in R package randomForest

I have a question regarding NA in randomForest (in R). I have a dataset which include both numerical and non-numerical variables, and the data includes some NA. I tried to use na.roughfix but then i get an error message "na.roughfix only works for numeric or factor". I also tried rfImpute but this does not work either because I have some NA in my response variable. Does anyone have som

help in R

2006 Apr 26

help in R

Hi, I cant understand where I am going wrong.Below is my code.I would really appreciate your help. Thanks. > genfile<-read.table("c:/tina/phd/bs871/hw/genfile.txt",skip=1) > > #read in SNP data > snp.dat <- as.matrix(genfile) > snp.name <- scan("c:/tina/phd/bs871/hw/genfile.txt",nline=1,what="character") Read 100 items

2 x 3 Probability under the null

2011 Oct 27

2 x 3 Probability under the null

I have a 2 x 3 matrix called snp and I want to compute the following probability: choose(sum(snp[,1]), snp[1,1]) * choose(sum(snp[,2]), snp[1,2]) * choose(sum(snp[,3]), snp[1,3])/choose(sum(snp), sum(snp[1,])) but I keep getting Infs and NaNs. Is there a function that can do this in R? -- Thanks, Jim. [[alternative HTML version deleted]]

Errors melt()ing data...

2008 Feb 28

Errors melt()ing data...

Hi, I'm trying to melt() some data for subsequent cast()ing and am encoutering errors. The overall process requires a couple of casts()s and melt()s. ########Start Session 1########## ## I have the data in a (fully) melted format and can cast it fine... > norm1[1:10,] Pool SNP Sample.Name variable value 1 1 rs1045485 CA0092 Height.1 0.003488853 2 1 rs1045485

efficient code. how to reduce running time?

2007 Jan 21

efficient code. how to reduce running time?

Hi, I am new to R. and even though I've made my code to run and do what it needs to . It is taking forever and I can't use it like this. I was wondering if you could help me find ways to fix the code to run faster. Here are my codes.. the data set is a bunch of 0s and 1s in a data.frame. What I am doing is this. I pick a column and make up a new column Y with values associated with that

automating regression or correlations for many variables

2011 Apr 04

automating regression or correlations for many variables

Dear All, I have a large data frame with 10 rows and 82 columns. I want to apply the same function to all of the columns with a single command. e.g. zl <- lm (snp$a_109909 ~ snp$lat) will fit a linear model to the values in lat and a_109909. What I want to do is fit linear models for the values in each column against lat. I tried doing zl <- (snp[,2:82] ~ snp$lat[,1]) but got the following

splitting multiple data in one column into multiple rows with one entry per column

2009 Jul 26

splitting multiple data in one column into multiple rows with one entry per column

Dear R colleagues, I annotated a list of single nuclotide polymorphiosms (SNP) with the corresponding genes using biomaRt. The result is the following data.frame (pasted from R): snp ensembl_gene_id 1 rs8032583 2 rs1071600 ENSG00000101605 3 rs13406898 ENSG00000167165 4 rs7030479

Order a data frame based on the order of another data frame

2012 Mar 05

Order a data frame based on the order of another data frame

Hi, I am trying to match the order of the rownames of a dataframe with the rownames of another dataframe (I can't simply sort both sets because I would have to change the order of many other connected datasets if I did that): Also, the second dataset (snp.matrix$fam) is a snp matrix slot: so for example: data_one: x y

na.action in randomForest --- Summary

2003 Aug 05

na.action in randomForest --- Summary

A few days ago I asked whether there were options other than na.action=na.fail for the R port of Breiman?s randomForest; the function?s help page did not say anything about other options. I have since discovered that a pdf document called ?The randomForest Package? and made available by Andy Liaw (who made the tool available in R---thank you) does discuss an option. It is an implementation of

SNPRelate: Plink conversion

2013 Nov 08

SNPRelate: Plink conversion

Hi, Following my earlier posts about having problems performing a PCA, I have worked out what the problem is. The problem lies within the PLINK to gds conversion. It seems as though the SNPs are imported as "samples" and in turn, the samples are recognised as SNPs: >snpsgdsSummary("chr2L") Some values of snp.position are invalid (should be > 0)! Some values of

2 D density plot interpretation and manipulating the data

2020 Oct 09

2 D density plot interpretation and manipulating the data

Hi Abby, Thanks for getting back to me, yes I believe I did that by doing this: SNP$density <- get_density(SNP$mean, SNP$var) > summary(SNP$density) Min. 1st Qu. Median Mean 3rd Qu. Max. 0 383 696 738 1170 1789 where get_density() is function from here: https://slowkow.com/notes/ggplot2-color-by-density/ and keep only entries with density > 400

Problems reshaping data with cast()

2008 Feb 07

Problems reshaping data with cast()

Hi, I'm trying to cast() some data, but keep on getting the following error... > norm.all.melted.height <- transform(all.melted.height, + norm.height = value / ave(value, SNP, Pool, FUN = max) + ) Warning messages: 1: In FUN(X[[147L]], ...) : no non-missing arguments to max; returning -Inf 2: In FUN(X[[147L]],

permutation test - query

2009 Aug 31

permutation test - query

Hi, My query is regarding permutation test and reshuffling of genotype/phenotype data I have been using the haplo.stats package of R. for haplotype analysis and I would like to perform an analysis which I'm requesting your advice. I have a data set of individuals genotyped for 12 SNP and a dichotomous phenotype. At first, I have tested each of those SNP independently in order to bypass

Error in inherits(x, "data.frame") : subscript out of bounds

2010 Mar 05

Error in inherits(x, "data.frame") : subscript out of bounds

Hi, I have a list p with different size dataframes and length of over 8000. I'm trying to calculate correlations between the rows of dataframes of this list and columns of another dataset (type data.frame also) so that first column is correlated with all the rows in the list dataframe. Some information from another dataset is also included to the final output (all.corrs). This worked a

similar to: Fw: Memory problem on a linux cluster using a large data set [Broadcast]