Displaying 20 results from an estimated 1000 matches similar to: "Memory problem on a linux cluster using a large data set"
2007 Jan 10
1
Fw: Memory problem on a linux cluster using a large data set [Broadcast]
Hi
I listened to all your advise and ran my data on a computer with a 64 bits procesor but i still get the same error saying "it cannot allocate a vector of that size 1240 kb" . I don't want to cut my data in smaller pieces because we are looking at interaction. So are there any other options for me to try out or should i wait for the development of more advanced computers!
2006 Dec 21
1
Memory problem on a linux cluster using a large data set [Broadcast]
Thank you all for your help!
So with all your suggestions we will try to run it on a computer with a 64 bits proccesor. But i've been told that the new R versions all work on a 32bits processor. I read in other posts that only the old R versions were capable of larger data sets and were running under 64 bit proccesors. I also read that they are adapting the new R version for 64 bits
2011 Dec 02
2
Imputing data
So I have a very big matrix of about 900 by 400 and there are a couple of NA
in the list. I have used the following functions to impute the missing data
data(pc)
pc.na<-pc
pc.roughfix <- na.roughfix(pc.na)
pc.narf <- randomForest(pc.na, na.action=na.roughfix)
yet it does not replace the NA in the list. Presently I want to replace the
NA with maybe the mean of the rows or columns or
2009 Jul 11
1
help with winbind and groups
Hello,
I have winbind working well out of the box. However, I am having
problems with using groups to restrict ssh access to the box. I have
a feeling there are some tricks that I haven't thought of yet.
Here is the relevant parts of smb.conf:
workgroup = FOO
password server = server.foo.local
realm = FOO.LOCAL
security = ads
idmap uid = 10000-20000
idmap gid =
2010 Jun 30
2
anyone know why package "RandomForest" na.roughfix is so slow??
Hi all,
I am using the package "random forest" for random forest predictions. I
like the package. However, I have fairly large data sets, and it can often
take *hours* just to go through the "na.roughfix" call, which simply goes
through and cleans up any NA values to either the median (numerical data) or
the most frequent occurrence (factors).
I am going to start
2007 Aug 10
1
rfImpute
I am having trouble with the rfImpute function in the randomForest package.
Here is a sample...
clunk.roughfix<-na.roughfix(clunk)
>
> clunk.impute<-rfImpute(CONVERT~.,data=clunk)
ntree OOB 1 2
300: 26.80% 3.83% 85.37%
ntree OOB 1 2
300: 18.56% 5.74% 51.22%
Error in randomForest.default(xf, y, ntree = ntree, ..., do.trace = ntree,
:
NA not
2012 Mar 26
1
NA in R package randomForest
I have a question regarding NA in randomForest (in R). I have a dataset
which include both numerical and non-numerical variables, and the data
includes some NA. I tried to use na.roughfix but then i get an error
message "na.roughfix only works for numeric or factor". I also tried
rfImpute but this does not work either because I have some NA in my
response variable. Does anyone have som
2006 Apr 26
2
help in R
Hi,
I cant understand where I am going wrong.Below is my code.I would really appreciate your help.
Thanks.
> genfile<-read.table("c:/tina/phd/bs871/hw/genfile.txt",skip=1)
>
> #read in SNP data
> snp.dat <- as.matrix(genfile)
> snp.name <- scan("c:/tina/phd/bs871/hw/genfile.txt",nline=1,what="character")
Read 100 items
2011 Jan 03
1
randomForest speed improvements
Hi there,
We're trying to use randomForest to do some predictions. The test-harness
for our code is pretty straightforward:
library ('randomForest');
data202 <- read.csv ("random.csv", header=TRUE);
x<- data202[1:50000,1:6];
y<- data202[1:50000,8];
y<- y[,drop=TRUE];
x2 <- data202[50001:60000,1:6];
y2 <- data202[50001:60000,8];
y2 <-
2011 Oct 27
3
2 x 3 Probability under the null
I have a 2 x 3 matrix called snp and I want to compute the following
probability:
choose(sum(snp[,1]), snp[1,1]) * choose(sum(snp[,2]), snp[1,2]) *
choose(sum(snp[,3]), snp[1,3])/choose(sum(snp), sum(snp[1,]))
but I keep getting Infs and NaNs. Is there a function that can do this in R?
--
Thanks,
Jim.
[[alternative HTML version deleted]]
2008 Feb 28
1
Errors melt()ing data...
Hi,
I'm trying to melt() some data for subsequent cast()ing and am
encoutering errors.
The overall process requires a couple of casts()s and melt()s.
########Start Session 1##########
## I have the data in a (fully) melted format and can cast it fine...
> norm1[1:10,]
Pool SNP Sample.Name variable value
1 1 rs1045485 CA0092 Height.1 0.003488853
2 1 rs1045485
2007 Jan 21
2
efficient code. how to reduce running time?
Hi,
I am new to R.
and even though I've made my code to run and do what it needs to .
It is taking forever and I can't use it like this.
I was wondering if you could help me find ways to fix the code to run
faster.
Here are my codes..
the data set is a bunch of 0s and 1s in a data.frame.
What I am doing is this.
I pick a column and make up a new column Y with values associated with that
2011 Apr 04
1
automating regression or correlations for many variables
Dear All,
I have a large data frame with 10 rows and 82 columns. I want to apply the
same function to all of the columns with a single command. e.g. zl <- lm
(snp$a_109909 ~ snp$lat) will fit a linear model to the values in lat and
a_109909. What I want to do is fit linear models for the values in each
column against lat. I tried doing zl <- (snp[,2:82] ~ snp$lat[,1]) but got
the following
2009 Jul 26
1
splitting multiple data in one column into multiple rows with one entry per column
Dear R colleagues,
I annotated a list of single nuclotide polymorphiosms (SNP) with the
corresponding genes using biomaRt. The result is the following
data.frame (pasted from R):
snp ensembl_gene_id
1 rs8032583
2 rs1071600 ENSG00000101605
3 rs13406898 ENSG00000167165
4 rs7030479
2012 Mar 05
1
Order a data frame based on the order of another data frame
Hi, I am trying to match the order of the rownames of a dataframe with
the rownames of another dataframe (I can't simply sort both sets
because I would have to change the order of many other connected
datasets if I did that): Also, the second dataset (snp.matrix$fam) is
a snp matrix slot:
so for example:
data_one:
x y
2003 Aug 05
1
na.action in randomForest --- Summary
A few days ago I asked whether there were options other than
na.action=na.fail for the R port of Breiman?s randomForest; the function?s
help page did not say anything about other options.
I have since discovered that a pdf document called ?The randomForest
Package? and made available by Andy Liaw (who made the tool available in
R---thank you) does discuss an option. It is an implementation of
2013 Nov 08
1
SNPRelate: Plink conversion
Hi,
Following my earlier posts about having problems performing a PCA, I have
worked out what the problem is. The problem lies within the PLINK to gds
conversion.
It seems as though the SNPs are imported as "samples" and in turn, the
samples are recognised as SNPs:
>snpsgdsSummary("chr2L")
Some values of snp.position are invalid (should be > 0)!
Some values of
2020 Oct 09
0
2 D density plot interpretation and manipulating the data
Hi Abby,
Thanks for getting back to me, yes I believe I did that by doing this:
SNP$density <- get_density(SNP$mean, SNP$var)
> summary(SNP$density)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 383 696 738 1170 1789
where get_density() is function from here:
https://slowkow.com/notes/ggplot2-color-by-density/
and keep only entries with density > 400
2024 Nov 15
1
R coding to extract allele frequencies from NCBI for ALL alleles of one SNP?
Dear All,
The following code extracts from NCBI very nice output for ONE allele of a SNP (often the allele with the second largest frequency - usually termed the minor allele). It gives an average minor allele frequency from all NCBI sources (which is what I want, except I'd like the addition of data for all the other alleles of one SNP) plus a table of minor allele frequencies from each
2008 Feb 07
1
Problems reshaping data with cast()
Hi,
I'm trying to cast() some data, but keep on getting the following error...
> norm.all.melted.height <- transform(all.melted.height,
+ norm.height = value / ave(value,
SNP, Pool, FUN = max)
+ )
Warning messages:
1: In FUN(X[[147L]], ...) :
no non-missing arguments to max; returning -Inf
2: In FUN(X[[147L]],