similar to: Mean or mode imputation fro missing values

Displaying 20 results from an estimated 2000 matches similar to: "Mean or mode imputation fro missing values"

2011 Dec 02
2
Imputing data
So I have a very big matrix of about 900 by 400 and there are a couple of NA in the list. I have used the following functions to impute the missing data data(pc) pc.na<-pc pc.roughfix <- na.roughfix(pc.na) pc.narf <- randomForest(pc.na, na.action=na.roughfix) yet it does not replace the NA in the list. Presently I want to replace the NA with maybe the mean of the rows or columns or
2008 May 21
2
an unknown error message when using gamm function
Dear everyone, I'm encountering an unknown error message when using gamm function: > fitoutput <- gamm(cvd~as.factor(dow)+pm10+s(time,bs="cr",k=15,fx=TRUE)+s(tmean,bs="cr",k=7,fx=TRUE) + ,correlation=corAR1(form=~1|city),family=poisson,random=list(city=~pm10),data=mimp) Maximum number of PQL iterations: 20 iteration 1 iteration 2 iteration 3 iteration 4
2009 Apr 28
1
I can't install dprep
When I want to install "dprep" and I always get information: > install.packages("dprep") Warning in install.packages("dprep") : argument 'lib' is missing: using 'C:\Users\Documents/R/win-library/2.8' --- Please select a CRAN mirror for use in this session --- Warning message: package ?dprep? is not available I have tried a lot of mirror...
2009 Sep 06
1
Two packages and one method
Hi! I want to use one method "combinations" from "gtools" package but in my code I must use also "dprep" method where is method "combinations" too. Mayby I show you result my help function: Help on topic 'combinations' was found in the following packages: Package Library dprep /usr/lib64/R/library gtools
2007 Aug 10
1
rfImpute
I am having trouble with the rfImpute function in the randomForest package. Here is a sample... clunk.roughfix<-na.roughfix(clunk) > > clunk.impute<-rfImpute(CONVERT~.,data=clunk) ntree OOB 1 2 300: 26.80% 3.83% 85.37% ntree OOB 1 2 300: 18.56% 5.74% 51.22% Error in randomForest.default(xf, y, ntree = ntree, ..., do.trace = ntree, : NA not
2010 Jun 30
2
anyone know why package "RandomForest" na.roughfix is so slow??
Hi all, I am using the package "random forest" for random forest predictions. I like the package. However, I have fairly large data sets, and it can often take *hours* just to go through the "na.roughfix" call, which simply goes through and cleans up any NA values to either the median (numerical data) or the most frequent occurrence (factors). I am going to start
2011 Oct 03
2
Import in R with White Spaces
Hi, I have a simple question about importing data, I would be very grateful if you could help me out. I have used read.csv(file name, header=T, sep=",") to bring in a csv file I saved in MS Excel.The problem is I have white spaces in the middle of values (not in the column names), and this messes up the column entries. Since I have many many files that I am importing and I have spaces
2008 Dec 09
4
Pre-model Variable Reduction
Hello All, I am trying to carry out variable reduction. I do not have information about the dependent variable, and have only the X variables as it were. In selecting variables I wish to keep, I have considered the following criteria. 1) Percentage of missing value in each column/variable 2) Variance of each variable, with a cut-off value. I recently came across Weka and found that there is an
2005 Feb 04
2
genetic algorithm
Hi, I am doing some research on feature selection for classfication problem using genetic algorithm in a wrapper approach. I am wondering if there is some package which is already built for this purpose. I was advised before about dprep package but I don't think it used GA there (if I am wrong, please correct me!) Thanks, Ed
2007 Jun 20
1
Error in funcion distancia() in package dprep(v1.0) (PR#9745)
Full_Name: Kang Yousan Version: 2.5.0 OS: Windows XP Submission from: (NULL) (211.137.211.67) There is a bug in function distancia() in package dprep. See the description below. > distancia 1 function (x, y) 2 { 3 if (class(y) == "matrix") { 4 distancia = drop(sqrt(colSums((x - t(y))^2))) 5 distancia = t(distancia) 6 } 7 else distancia =
2003 Aug 05
1
na.action in randomForest --- Summary
A few days ago I asked whether there were options other than na.action=na.fail for the R port of Breiman?s randomForest; the function?s help page did not say anything about other options. I have since discovered that a pdf document called ?The randomForest Package? and made available by Andy Liaw (who made the tool available in R---thank you) does discuss an option. It is an implementation of
2012 Mar 05
1
Order a data frame based on the order of another data frame
Hi, I am trying to match the order of the rownames of a dataframe with the rownames of another dataframe (I can't simply sort both sets because I would have to change the order of many other connected datasets if I did that): Also, the second dataset (snp.matrix$fam) is a snp matrix slot: so for example: data_one: x y
2011 Oct 24
1
Creating data frame with residuals of a data frame
Dear experts, I am trying to create a data frame from the residuals I get after having applied a linear regression to each column of a data frame, but I don't know how to create this data frame from the resulting list since the list has differing numbers of rows. So for example: age<- c(5,6,10,14,16,NA,18) value1<- c(30,70,40,50,NA,NA,NA) value2<- c(2,4,1,4,4,4,4) df<-
2013 Feb 18
1
ggplot2 and facet_wrap help
Dear R experts, I am trying to arrange multiple plots, creating one graph for each size1 factor variable in my data frame, and each plot has the median price on the y-axis and the size2 on the x-axis grouped by clarity: library(ggplot2) df <- data.frame(price=matrix(sample(1:1000, 100, replace = TRUE), ncol = 1)) df$size1 = 1:nrow(df) df$size1 = cut(df$size1, breaks=11)
2012 Apr 14
3
Choose between duplicated rows
Dear r experts, Sorry for this basic question, but I can't seem to find a solution? I have this data frame: df <- data.frame(id = c("id1", "id1", "id1", "id2", "id2", "id2"), A = c(11905, 11907, 11907, 11829, 11829, 11829), v1 = c(NA, 3, NA,1,2,NA), v2 = c(NA,2,NA, 2, NA,NA), v3 = c(NA,1,NA,1,NA,NA), v4 = c("N",
2011 Oct 03
2
Merge two data frames and find common values and non-matching values
Hi, I am trying to find a function to match two data frames of different lengths for one field only. So, for example, df1 is: Name Position location francesca A 75 cristina B 36 And df2 is: location Country 75 UK 56 Austria And I would like to match on "Location" and the output to be something like: Name Position Location Match francesca A 75 1 cristina B 36 0 I have tried with
2012 Apr 17
1
Problem accessing .Rdata objects in a loop
Hi, I am trying to access many .Rdata objects and do some operations with them using a loop. I can load the files but can't access them. The files' names are stored in a character vector called "names". After loading the objects, I can view each one using ls() and see that two objects are present for each. I am trying to access the one with the name which is the same as the
2006 Dec 18
1
Memory problem on a linux cluster using a large data set
Hello, I have a large data set 320.000 rows and 1000 columns. All the data has the values 0,1,2. I wrote a script to remove all the rows with more than 46 missing values. This works perfect on a smaller dataset. But the problem arises when I try to run it on the larger data set I get an error “cannot allocate vector size 1240 kb”. I’ve searched through previous posts and found out that it might
2003 Jul 27
1
multiple imputation with fit.mult.impute in Hmisc
I have always avoided missing data by keeping my distance from the real world. But I have a student who is doing a study of real patients. We're trying to test regression models using multiple imputation. We did the following (roughly): f <- aregImpute(~ [list of 32 variables, separated by + signs], n.impute=20, defaultLinear=T, data=t1) # I read that 20 is better than the default of
2008 Nov 26
1
multiple imputation with fit.mult.impute in Hmisc - how to replace NA with imputed value?
I am doing multiple imputation with Hmisc, and can't figure out how to replace the NA values with the imputed values. Here's a general ourline of the process: > set.seed(23) > library("mice") > library("Hmisc") > library("Design") > d <- read.table("DailyDataRaw_01.txt",header=T) > length(d);length(d[,1]) [1] 43 [1] 2666