similar to: Imputing data

Displaying 20 results from an estimated 2000 matches similar to: "Imputing data"

2006 Dec 18
1
Memory problem on a linux cluster using a large data set
Hello, I have a large data set 320.000 rows and 1000 columns. All the data has the values 0,1,2. I wrote a script to remove all the rows with more than 46 missing values. This works perfect on a smaller dataset. But the problem arises when I try to run it on the larger data set I get an error “cannot allocate vector size 1240 kb”. I’ve searched through previous posts and found out that it might
2007 Jan 10
1
Fw: Memory problem on a linux cluster using a large data set [Broadcast]
Hi I listened to all your advise and ran my data on a computer with a 64 bits procesor but i still get the same error saying "it cannot allocate a vector of that size 1240 kb" . I don't want to cut my data in smaller pieces because we are looking at interaction. So are there any other options for me to try out or should i wait for the development of more advanced computers!
2007 Aug 10
1
rfImpute
I am having trouble with the rfImpute function in the randomForest package. Here is a sample... clunk.roughfix<-na.roughfix(clunk) > > clunk.impute<-rfImpute(CONVERT~.,data=clunk) ntree OOB 1 2 300: 26.80% 3.83% 85.37% ntree OOB 1 2 300: 18.56% 5.74% 51.22% Error in randomForest.default(xf, y, ntree = ntree, ..., do.trace = ntree, : NA not
2010 Jun 30
2
anyone know why package "RandomForest" na.roughfix is so slow??
Hi all, I am using the package "random forest" for random forest predictions. I like the package. However, I have fairly large data sets, and it can often take *hours* just to go through the "na.roughfix" call, which simply goes through and cleans up any NA values to either the median (numerical data) or the most frequent occurrence (factors). I am going to start
2006 Dec 21
1
Memory problem on a linux cluster using a large data set [Broadcast]
Thank you all for your help! So with all your suggestions we will try to run it on a computer with a 64 bits proccesor. But i've been told that the new R versions all work on a 32bits processor. I read in other posts that only the old R versions were capable of larger data sets and were running under 64 bit proccesors. I also read that they are adapting the new R version for 64 bits
2011 Oct 11
1
Mean or mode imputation fro missing values
Dear R experts, I have a large database made up of mixed data types (numeric, character, factor, ordinal factor) with missing values, and I am looking for a package that would help me impute the missing values using ?either the mean if numerical or the mode if character/factor. I maybe could use replace like this: df$var[is.na(df$var)] <- mean(df$var, na.rm = TRUE) And go through all the many
2003 Aug 05
1
na.action in randomForest --- Summary
A few days ago I asked whether there were options other than na.action=na.fail for the R port of Breiman?s randomForest; the function?s help page did not say anything about other options. I have since discovered that a pdf document called ?The randomForest Package? and made available by Andy Liaw (who made the tool available in R---thank you) does discuss an option. It is an implementation of
2012 Aug 11
1
Imputing data below detection limit
Hello, I'm trying to impute data below detection limit (with multiple detection limits) so i need just a method or a code for imputation and then extract the complete dataset to do the analyses. Is there any package which could do that simply as i'm a beginner in R Thank you -- View this message in context:
2005 Jan 11
1
transcan() from Hmisc package for imputing data
Hello: I have been trying to impute missing values of a data frame which has both numerical and categorical values using the function transcan() with little luck. Would you be able to give me a simple example where a data frame is fed to transcan and it spits out a new data frame with the NA values filled up? Or is there any other function that i could use? Thank you avneet ===== I believe in
2008 Dec 22
1
imputing the numerical columns of a dataframe, returning the rest unchanged
Hi R-experts, how can I apply a function to each numeric column of a data frame and return the whole data frame with changes in numeric columns only? In my case I want to do a median imputation of the numeric columns and retain the other columns. My dataframe (DF) contains factors, characters and numerics. I tried the following but that does not work: foo <- function(x){
2009 Jul 11
1
help with winbind and groups
Hello, I have winbind working well out of the box. However, I am having problems with using groups to restrict ssh access to the box. I have a feeling there are some tricks that I haven't thought of yet. Here is the relevant parts of smb.conf: workgroup = FOO password server = server.foo.local realm = FOO.LOCAL security = ads idmap uid = 10000-20000 idmap gid =
2007 Jun 22
1
Imputing missing values in time series
Folks, This must be a rather common problem with real life time series data but I don't see anything in the archive about how to deal with it. I have a time series of natural gas prices by flow date. Since gas is not traded on weekends and holidays, I have a lot of missing values, FDate Price 11/1/2006 6.28 11/2/2006 6.58 11/3/2006 6.586 11/4/2006 6.716 11/5/2006 NA 11/6/2006 NA 11/7/2006
2004 Sep 01
3
Imputing missing values
Dear all, Apologies for this beginner's question. I have a variable Price, which is associated with factors Season and Crop, each of which have several levels. The Price variable contains missing values (NA), which I want to substitute by the mean of the remaining (non-NA) Price values of the same Season-Crop combination of levels. Price Crop Season 10 Rice Summer 12
2009 Sep 21
9
Handling missing data
I have to remove missing data both in character and numeric datatype.I tried using NA condition but it is not working ,please help me to solve this. -- View this message in context: http://www.nabble.com/Handling-missing-data-tp25530192p25530192.html Sent from the R help mailing list archive at Nabble.com.
2004 Mar 15
2
imputation of sub-threshold values
Is there a good way in R to impute values which exist, but are less than the detection level for an assay? Thanks, Jonathan Williams OPTIMA Radcliffe Infirmary Woodstock Road OXFORD OX2 6HE Tel +1865 (2)24356
2008 Oct 29
1
Help with impute.knn
ear all, This is my first time using this listserv and I am seeking help from the expert. OK, here is my question, I am trying to use impute.knn function in impute library and when I tested the sample code, I got the error as followingt: Here is the sample code: library(impute) data(khanmiss) khan.expr <- khanmiss[-1, -(1:2)] ## ## First example ## if(exists(".Random.seed"))
2011 Mar 02
2
*** caught segfault *** when using impute.knn (impute package)
hi, i am getting an error when calling the impute.knn function (see the screenshot below). what is the problem here and how can it be solved? screenshot: ################## *** caught segfault *** address 0x513c7b84, cause 'memory not mapped' Traceback: 1: .Fortran("knnimp", x, ximp = x, p, n, imiss = imiss, irmiss, as.integer(k), double(p), double(n), integer(p),
2008 Jun 30
3
Is there a good package for multiple imputation of missing values in R?
I'm looking for a package that has a start-of-the-art method of imputation of missing values in a data frame with both continuous and factor columns. I've found transcan() in 'Hmisc', which appears to be possibly suited to my needs, but I haven't been able to figure out how to get a new data frame with the imputed values replaced (I don't have Herrell's book). Any
2008 Mar 05
1
rrp.impute: for what sizes does it work?
Hi, I have a survey dataset of about 20000 observations where for 2 factor variables I have about 200 missing values each. I want to impute these using 10 possibly explanatory variables which are a mixture of integers and factors. Since I was quite intrigued by the concept of rrp I wanted to use it but it takes ages and terminates with an error. First time it stopped complaining about too little
2012 Mar 26
1
NA in R package randomForest
I have a question regarding NA in randomForest (in R). I have a dataset which include both numerical and non-numerical variables, and the data includes some NA. I tried to use na.roughfix but then i get an error message "na.roughfix only works for numeric or factor". I also tried rfImpute but this does not work either because I have some NA in my response variable. Does anyone have som