similar to: Memory problem on a linux cluster using a large data set [Broadcast]

Displaying 20 results from an estimated 1000 matches similar to: "Memory problem on a linux cluster using a large data set [Broadcast]"

2007 Jan 10
1
Fw: Memory problem on a linux cluster using a large data set [Broadcast]
Hi I listened to all your advise and ran my data on a computer with a 64 bits procesor but i still get the same error saying "it cannot allocate a vector of that size 1240 kb" . I don't want to cut my data in smaller pieces because we are looking at interaction. So are there any other options for me to try out or should i wait for the development of more advanced computers!
2006 Dec 18
1
Memory problem on a linux cluster using a large data set
Hello, I have a large data set 320.000 rows and 1000 columns. All the data has the values 0,1,2. I wrote a script to remove all the rows with more than 46 missing values. This works perfect on a smaller dataset. But the problem arises when I try to run it on the larger data set I get an error “cannot allocate vector size 1240 kb”. I’ve searched through previous posts and found out that it might
2011 Dec 02
2
Imputing data
So I have a very big matrix of about 900 by 400 and there are a couple of NA in the list. I have used the following functions to impute the missing data data(pc) pc.na<-pc pc.roughfix <- na.roughfix(pc.na) pc.narf <- randomForest(pc.na, na.action=na.roughfix) yet it does not replace the NA in the list. Presently I want to replace the NA with maybe the mean of the rows or columns or
2009 Jul 11
1
help with winbind and groups
Hello, I have winbind working well out of the box. However, I am having problems with using groups to restrict ssh access to the box. I have a feeling there are some tricks that I haven't thought of yet. Here is the relevant parts of smb.conf: workgroup = FOO password server = server.foo.local realm = FOO.LOCAL security = ads idmap uid = 10000-20000 idmap gid =
2007 Aug 10
1
rfImpute
I am having trouble with the rfImpute function in the randomForest package. Here is a sample... clunk.roughfix<-na.roughfix(clunk) > > clunk.impute<-rfImpute(CONVERT~.,data=clunk) ntree OOB 1 2 300: 26.80% 3.83% 85.37% ntree OOB 1 2 300: 18.56% 5.74% 51.22% Error in randomForest.default(xf, y, ntree = ntree, ..., do.trace = ntree, : NA not
2010 Jun 30
2
anyone know why package "RandomForest" na.roughfix is so slow??
Hi all, I am using the package "random forest" for random forest predictions. I like the package. However, I have fairly large data sets, and it can often take *hours* just to go through the "na.roughfix" call, which simply goes through and cleans up any NA values to either the median (numerical data) or the most frequent occurrence (factors). I am going to start
2012 Mar 26
1
NA in R package randomForest
I have a question regarding NA in randomForest (in R). I have a dataset which include both numerical and non-numerical variables, and the data includes some NA. I tried to use na.roughfix but then i get an error message "na.roughfix only works for numeric or factor". I also tried rfImpute but this does not work either because I have some NA in my response variable. Does anyone have som
2007 Oct 11
2
Matching and merging two rows with missing values
Hello, I have two rows which are almost identical but miss different values at different locations. I would like to merge these two rows so that the missing values are replaced by the element in the same column on the other row making one row. If both rows contain a NA, NA remains in the output. 1 2 3 4 5 Row1 AA AG GG NA NA Row2 NA AG GG AA NA The
2007 Oct 30
0
Plotting question: how to plot SNP location data?
Hello, I would like to plot specific SNPs with their exact locations on a chromosome. Based on my genotyping results I would like to separate these SNPs in three different categories: 1, 2 and 3 and use different colours to represent these categories. The script below generates the sample data. I can plot these with the image function using the following: val <- 1:3 samp <- sample(val,
2006 Apr 26
2
help in R
Hi, I cant understand where I am going wrong.Below is my code.I would really appreciate your help. Thanks. > genfile<-read.table("c:/tina/phd/bs871/hw/genfile.txt",skip=1) > > #read in SNP data > snp.dat <- as.matrix(genfile) > snp.name <- scan("c:/tina/phd/bs871/hw/genfile.txt",nline=1,what="character") Read 100 items
2011 Jan 03
0
Using PCA to correct p-values from snpMatrix
Hi R-help folks, I have been doing some single SNP association work using snpMatrix. This works well, but produces a lot of false positives, because of population structure in my data. I would like to correct the p-values (which snpMatrix gives me) for population structure, possibly using principle component analysis (PCA). My data is complicated, so here's a simple example of what
2008 Feb 28
1
Errors melt()ing data...
Hi, I'm trying to melt() some data for subsequent cast()ing and am encoutering errors. The overall process requires a couple of casts()s and melt()s. ########Start Session 1########## ## I have the data in a (fully) melted format and can cast it fine... > norm1[1:10,] Pool SNP Sample.Name variable value 1 1 rs1045485 CA0092 Height.1 0.003488853 2 1 rs1045485
2013 Jan 04
0
FW: Index out SNP position
I think you mean between column 1 and 2 of A? Why is 36003918 not included? It is clearly between 35838396 and 36151202 in the first row of A. My earlier solution should work fine. Just create a new matrix AX that has the columns switched so that the start is always column 1 and use that to identify the ones you want to select. That way you are not modifying B. This will be faster than checking
2011 Oct 27
3
2 x 3 Probability under the null
I have a 2 x 3 matrix called snp and I want to compute the following probability: choose(sum(snp[,1]), snp[1,1]) * choose(sum(snp[,2]), snp[1,2]) * choose(sum(snp[,3]), snp[1,3])/choose(sum(snp), sum(snp[1,])) but I keep getting Infs and NaNs. Is there a function that can do this in R? -- Thanks, Jim. [[alternative HTML version deleted]]
2007 Jan 21
2
efficient code. how to reduce running time?
Hi, I am new to R. and even though I've made my code to run and do what it needs to . It is taking forever and I can't use it like this. I was wondering if you could help me find ways to fix the code to run faster. Here are my codes.. the data set is a bunch of 0s and 1s in a data.frame. What I am doing is this. I pick a column and make up a new column Y with values associated with that
2009 Jan 10
0
Rserve/RandomForest does not work with a CSV?
Hi all, We're using Rserve and RandomForest to do classification from within a Java program. The total is about 4 lines of R code: library('randomForest') x y future fit<-randomForest(x,y,no.action=na.roughfix,importance=T,proximity=T) p<-predict(fit, future) What is very frustrating is that we have tried this two different ways (both work in R): 1. Load x, y, and future
2011 Apr 04
1
automating regression or correlations for many variables
Dear All, I have a large data frame with 10 rows and 82 columns. I want to apply the same function to all of the columns with a single command. e.g. zl <- lm (snp$a_109909 ~ snp$lat) will fit a linear model to the values in lat and a_109909. What I want to do is fit linear models for the values in each column against lat. I tried doing zl <- (snp[,2:82] ~ snp$lat[,1]) but got the following
2012 Jan 26
0
Request for help on manipulation large data sets
Dear All, I would like to ask for help on how to read different files automatically and do analysis using scripts. 1. Description of the data 1.1. there are 5 text files, each of which contains cleaned data for the same 100 SNPs. Observations (e.g., position on gnome, alelle type, ...) for SNPs are rows ordered by the SNP numbers, 1.2. there are 1 text file, containing the expression level of
2009 May 28
1
can you help me please :)
hi there :) i want to use barplot with if else but i dont know how to do it ? i tried this but it is not working with me SNP <- read.table("my.txt") >SNP[,2] [1] 1175 483 240 170 99 79 76 45 38 35 21 16 14 19 16 [16] 3 3 3 10 2 1 6 8 6 8 2 0 5 1 1 [31] 1 0 6 2 0 13 0 5 0 5 0
2020 Oct 09
0
2 D density plot interpretation and manipulating the data
My understanding is that this represents bivariate normal approximation of the data which uses the kernel density function to test for inclusion within a level set. (please correct me) In order to exclude the outlier to these ellipses/contours is it advisable to do something like this: SNP$density <- get_density(SNP$mean, SNP$var) > summary(SNP$density) Min. 1st Qu. Median Mean 3rd