Displaying 20 results from an estimated 2000 matches similar to: "Imputing data"
2006 Dec 18
1
Memory problem on a linux cluster using a large data set
Hello,
I have a large data set 320.000 rows and 1000 columns. All the data has the values 0,1,2.
I wrote a script to remove all the rows with more than 46 missing values. This works perfect on a smaller dataset. But the problem arises when I try to run it on the larger data set I get an error “cannot allocate vector size 1240 kb”. I’ve searched through previous posts and found out that it might
2007 Jan 10
1
Fw: Memory problem on a linux cluster using a large data set [Broadcast]
Hi
I listened to all your advise and ran my data on a computer with a 64 bits procesor but i still get the same error saying "it cannot allocate a vector of that size 1240 kb" . I don't want to cut my data in smaller pieces because we are looking at interaction. So are there any other options for me to try out or should i wait for the development of more advanced computers!
2007 Aug 10
1
rfImpute
I am having trouble with the rfImpute function in the randomForest package.
Here is a sample...
clunk.roughfix<-na.roughfix(clunk)
>
> clunk.impute<-rfImpute(CONVERT~.,data=clunk)
ntree OOB 1 2
300: 26.80% 3.83% 85.37%
ntree OOB 1 2
300: 18.56% 5.74% 51.22%
Error in randomForest.default(xf, y, ntree = ntree, ..., do.trace = ntree,
:
NA not
2010 Jun 30
2
anyone know why package "RandomForest" na.roughfix is so slow??
Hi all,
I am using the package "random forest" for random forest predictions. I
like the package. However, I have fairly large data sets, and it can often
take *hours* just to go through the "na.roughfix" call, which simply goes
through and cleans up any NA values to either the median (numerical data) or
the most frequent occurrence (factors).
I am going to start
2006 Dec 21
1
Memory problem on a linux cluster using a large data set [Broadcast]
Thank you all for your help!
So with all your suggestions we will try to run it on a computer with a 64 bits proccesor. But i've been told that the new R versions all work on a 32bits processor. I read in other posts that only the old R versions were capable of larger data sets and were running under 64 bit proccesors. I also read that they are adapting the new R version for 64 bits
2011 Oct 11
1
Mean or mode imputation fro missing values
Dear R experts,
I have a large database made up of mixed data types (numeric,
character, factor, ordinal factor) with missing values, and I am
looking for a package that would help me impute the missing values
using ?either the mean if numerical or the mode if character/factor.
I maybe could use replace like this:
df$var[is.na(df$var)] <- mean(df$var, na.rm = TRUE)
And go through all the many
2003 Aug 05
1
na.action in randomForest --- Summary
A few days ago I asked whether there were options other than
na.action=na.fail for the R port of Breiman?s randomForest; the function?s
help page did not say anything about other options.
I have since discovered that a pdf document called ?The randomForest
Package? and made available by Andy Liaw (who made the tool available in
R---thank you) does discuss an option. It is an implementation of
2012 Aug 11
1
Imputing data below detection limit
Hello,
I'm trying to impute data below detection limit (with multiple detection
limits)
so i need just a method or a code for imputation and then extract the
complete dataset to do the analyses.
Is there any package which could do that simply as i'm a beginner in R
Thank you
--
View this message in context:
2005 Jan 11
1
transcan() from Hmisc package for imputing data
Hello:
I have been trying to impute missing values of a data
frame which has both numerical and categorical values
using the function transcan() with little luck.
Would you be able to give me a simple example where a
data frame is fed to transcan and it spits out a new
data frame with the NA values filled up?
Or is there any other function that i could use?
Thank you
avneet
=====
I believe in
2008 Dec 22
1
imputing the numerical columns of a dataframe, returning the rest unchanged
Hi R-experts,
how can I apply a function to each numeric column of a data frame and return
the whole data frame with changes in numeric columns only?
In my case I want to do a median imputation of the numeric columns and
retain the other columns. My dataframe (DF) contains factors, characters and
numerics.
I tried the following but that does not work:
foo <- function(x){
2009 Jul 11
1
help with winbind and groups
Hello,
I have winbind working well out of the box. However, I am having
problems with using groups to restrict ssh access to the box. I have
a feeling there are some tricks that I haven't thought of yet.
Here is the relevant parts of smb.conf:
workgroup = FOO
password server = server.foo.local
realm = FOO.LOCAL
security = ads
idmap uid = 10000-20000
idmap gid =
2007 Jun 22
1
Imputing missing values in time series
Folks,
This must be a rather common problem with real life time series data
but I don't see anything in the archive about how to deal with it. I
have a time series of natural gas prices by flow date. Since gas is not
traded on weekends and holidays, I have a lot of missing values,
FDate Price
11/1/2006 6.28
11/2/2006 6.58
11/3/2006 6.586
11/4/2006 6.716
11/5/2006 NA
11/6/2006 NA
11/7/2006
2004 Sep 01
3
Imputing missing values
Dear all,
Apologies for this beginner's question. I have a
variable Price, which is associated with factors
Season and Crop, each of which have several levels.
The Price variable contains missing values (NA), which
I want to substitute by the mean of the remaining
(non-NA) Price values of the same Season-Crop
combination of levels.
Price Crop Season
10 Rice Summer
12
2009 Sep 21
9
Handling missing data
I have to remove missing data both in character and numeric datatype.I tried
using NA condition but it is not working ,please help me to solve this.
--
View this message in context: http://www.nabble.com/Handling-missing-data-tp25530192p25530192.html
Sent from the R help mailing list archive at Nabble.com.
2004 Mar 15
2
imputation of sub-threshold values
Is there a good way in R to impute values which exist,
but are less than the detection level for an assay?
Thanks,
Jonathan Williams
OPTIMA
Radcliffe Infirmary
Woodstock Road
OXFORD OX2 6HE
Tel +1865 (2)24356
2008 Oct 29
1
Help with impute.knn
ear all,
This is my first time using this listserv and I am seeking help from the
expert. OK, here is my question, I am trying to use impute.knn function
in impute library and when I tested the sample code, I got the error as
followingt:
Here is the sample code:
library(impute)
data(khanmiss)
khan.expr <- khanmiss[-1, -(1:2)]
## ## First example
## if(exists(".Random.seed"))
2011 Mar 02
2
*** caught segfault *** when using impute.knn (impute package)
hi,
i am getting an error when calling the impute.knn
function (see the screenshot below).
what is the problem here and how can it be solved?
screenshot:
##################
*** caught segfault ***
address 0x513c7b84, cause 'memory not mapped'
Traceback:
1: .Fortran("knnimp", x, ximp = x, p, n, imiss = imiss, irmiss,
as.integer(k), double(p), double(n), integer(p),
2008 Jun 30
3
Is there a good package for multiple imputation of missing values in R?
I'm looking for a package that has a start-of-the-art method of
imputation of missing values in a data frame with both continuous and
factor columns.
I've found transcan() in 'Hmisc', which appears to be possibly suited
to my needs, but I haven't been able to figure out how to get a new
data frame with the imputed values replaced (I don't have Herrell's book).
Any
2008 Mar 05
1
rrp.impute: for what sizes does it work?
Hi,
I have a survey dataset of about 20000 observations
where for 2 factor variables I have about 200 missing
values each. I want to impute these using 10 possibly
explanatory variables which are a mixture of integers
and factors.
Since I was quite intrigued by the concept of rrp I
wanted to use it but it takes ages and terminates with
an error. First time it stopped complaining about too
little
2012 Mar 26
1
NA in R package randomForest
I have a question regarding NA in randomForest (in R). I have a dataset
which include both numerical and non-numerical variables, and the data
includes some NA. I tried to use na.roughfix but then i get an error
message "na.roughfix only works for numeric or factor". I also tried
rfImpute but this does not work either because I have some NA in my
response variable. Does anyone have som