thr3ads.net - similar to: "Mean or mode imputation fro missing values"

Displaying 20 results from an estimated 2000 matches similar to: "Mean or mode imputation fro missing values"

2011 Dec 02

Imputing data

So I have a very big matrix of about 900 by 400 and there are a couple of NA in the list. I have used the following functions to impute the missing data data(pc) pc.na<-pc pc.roughfix <- na.roughfix(pc.na) pc.narf <- randomForest(pc.na, na.action=na.roughfix) yet it does not replace the NA in the list. Presently I want to replace the NA with maybe the mean of the rows or columns or

an unknown error message when using gamm function

2008 May 21

an unknown error message when using gamm function

Dear everyone, I'm encountering an unknown error message when using gamm function: > fitoutput <- gamm(cvd~as.factor(dow)+pm10+s(time,bs="cr",k=15,fx=TRUE)+s(tmean,bs="cr",k=7,fx=TRUE) + ,correlation=corAR1(form=~1|city),family=poisson,random=list(city=~pm10),data=mimp) Maximum number of PQL iterations: 20 iteration 1 iteration 2 iteration 3 iteration 4

I can't install dprep

2009 Apr 28

I can't install dprep

When I want to install "dprep" and I always get information: > install.packages("dprep") Warning in install.packages("dprep") : argument 'lib' is missing: using 'C:\Users\Documents/R/win-library/2.8' --- Please select a CRAN mirror for use in this session --- Warning message: package ?dprep? is not available I have tried a lot of mirror...

Two packages and one method

2009 Sep 06

Two packages and one method

Hi! I want to use one method "combinations" from "gtools" package but in my code I must use also "dprep" method where is method "combinations" too. Mayby I show you result my help function: Help on topic 'combinations' was found in the following packages: Package Library dprep /usr/lib64/R/library gtools

rfImpute

2007 Aug 10

rfImpute

I am having trouble with the rfImpute function in the randomForest package. Here is a sample... clunk.roughfix<-na.roughfix(clunk) > > clunk.impute<-rfImpute(CONVERT~.,data=clunk) ntree OOB 1 2 300: 26.80% 3.83% 85.37% ntree OOB 1 2 300: 18.56% 5.74% 51.22% Error in randomForest.default(xf, y, ntree = ntree, ..., do.trace = ntree, : NA not

anyone know why package "RandomForest" na.roughfix is so slow??

2010 Jun 30

anyone know why package "RandomForest" na.roughfix is so slow??

Hi all, I am using the package "random forest" for random forest predictions. I like the package. However, I have fairly large data sets, and it can often take *hours* just to go through the "na.roughfix" call, which simply goes through and cleans up any NA values to either the median (numerical data) or the most frequent occurrence (factors). I am going to start

Import in R with White Spaces

2011 Oct 03

Import in R with White Spaces

Hi, I have a simple question about importing data, I would be very grateful if you could help me out. I have used read.csv(file name, header=T, sep=",") to bring in a csv file I saved in MS Excel.The problem is I have white spaces in the middle of values (not in the column names), and this messes up the column entries. Since I have many many files that I am importing and I have spaces

Pre-model Variable Reduction

2008 Dec 09

Pre-model Variable Reduction

Hello All, I am trying to carry out variable reduction. I do not have information about the dependent variable, and have only the X variables as it were. In selecting variables I wish to keep, I have considered the following criteria. 1) Percentage of missing value in each column/variable 2) Variance of each variable, with a cut-off value. I recently came across Weka and found that there is an

genetic algorithm

2005 Feb 04

genetic algorithm

Hi, I am doing some research on feature selection for classfication problem using genetic algorithm in a wrapper approach. I am wondering if there is some package which is already built for this purpose. I was advised before about dprep package but I don't think it used GA there (if I am wrong, please correct me!) Thanks, Ed

Error in funcion distancia() in package dprep(v1.0) (PR#9745)

2007 Jun 20

Error in funcion distancia() in package dprep(v1.0) (PR#9745)

Full_Name: Kang Yousan Version: 2.5.0 OS: Windows XP Submission from: (NULL) (211.137.211.67) There is a bug in function distancia() in package dprep. See the description below. > distancia 1 function (x, y) 2 { 3 if (class(y) == "matrix") { 4 distancia = drop(sqrt(colSums((x - t(y))^2))) 5 distancia = t(distancia) 6 } 7 else distancia =

na.action in randomForest --- Summary

2003 Aug 05

na.action in randomForest --- Summary

A few days ago I asked whether there were options other than na.action=na.fail for the R port of Breiman?s randomForest; the function?s help page did not say anything about other options. I have since discovered that a pdf document called ?The randomForest Package? and made available by Andy Liaw (who made the tool available in R---thank you) does discuss an option. It is an implementation of

Order a data frame based on the order of another data frame

2012 Mar 05

Order a data frame based on the order of another data frame

Hi, I am trying to match the order of the rownames of a dataframe with the rownames of another dataframe (I can't simply sort both sets because I would have to change the order of many other connected datasets if I did that): Also, the second dataset (snp.matrix$fam) is a snp matrix slot: so for example: data_one: x y

Creating data frame with residuals of a data frame

2011 Oct 24

Creating data frame with residuals of a data frame

Dear experts, I am trying to create a data frame from the residuals I get after having applied a linear regression to each column of a data frame, but I don't know how to create this data frame from the resulting list since the list has differing numbers of rows. So for example: age<- c(5,6,10,14,16,NA,18) value1<- c(30,70,40,50,NA,NA,NA) value2<- c(2,4,1,4,4,4,4) df<-

ggplot2 and facet_wrap help

2013 Feb 18

ggplot2 and facet_wrap help

Dear R experts, I am trying to arrange multiple plots, creating one graph for each size1 factor variable in my data frame, and each plot has the median price on the y-axis and the size2 on the x-axis grouped by clarity: library(ggplot2) df <- data.frame(price=matrix(sample(1:1000, 100, replace = TRUE), ncol = 1)) df$size1 = 1:nrow(df) df$size1 = cut(df$size1, breaks=11)

Choose between duplicated rows

2012 Apr 14

Choose between duplicated rows

Dear r experts, Sorry for this basic question, but I can't seem to find a solution? I have this data frame: df <- data.frame(id = c("id1", "id1", "id1", "id2", "id2", "id2"), A = c(11905, 11907, 11907, 11829, 11829, 11829), v1 = c(NA, 3, NA,1,2,NA), v2 = c(NA,2,NA, 2, NA,NA), v3 = c(NA,1,NA,1,NA,NA), v4 = c("N",

Merge two data frames and find common values and non-matching values

2011 Oct 03

Merge two data frames and find common values and non-matching values

Hi, I am trying to find a function to match two data frames of different lengths for one field only. So, for example, df1 is: Name Position location francesca A 75 cristina B 36 And df2 is: location Country 75 UK 56 Austria And I would like to match on "Location" and the output to be something like: Name Position Location Match francesca A 75 1 cristina B 36 0 I have tried with

Problem accessing .Rdata objects in a loop

2012 Apr 17

Problem accessing .Rdata objects in a loop

Hi, I am trying to access many .Rdata objects and do some operations with them using a loop. I can load the files but can't access them. The files' names are stored in a character vector called "names". After loading the objects, I can view each one using ls() and see that two objects are present for each. I am trying to access the one with the name which is the same as the

Memory problem on a linux cluster using a large data set

2006 Dec 18

Memory problem on a linux cluster using a large data set

Hello, I have a large data set 320.000 rows and 1000 columns. All the data has the values 0,1,2. I wrote a script to remove all the rows with more than 46 missing values. This works perfect on a smaller dataset. But the problem arises when I try to run it on the larger data set I get an error “cannot allocate vector size 1240 kb”. I’ve searched through previous posts and found out that it might

multiple imputation with fit.mult.impute in Hmisc

2003 Jul 27

multiple imputation with fit.mult.impute in Hmisc

I have always avoided missing data by keeping my distance from the real world. But I have a student who is doing a study of real patients. We're trying to test regression models using multiple imputation. We did the following (roughly): f <- aregImpute(~ [list of 32 variables, separated by + signs], n.impute=20, defaultLinear=T, data=t1) # I read that 20 is better than the default of

multiple imputation with fit.mult.impute in Hmisc - how to replace NA with imputed value?

2008 Nov 26

multiple imputation with fit.mult.impute in Hmisc - how to replace NA with imputed value?

I am doing multiple imputation with Hmisc, and can't figure out how to replace the NA values with the imputed values. Here's a general ourline of the process: > set.seed(23) > library("mice") > library("Hmisc") > library("Design") > d <- read.table("DailyDataRaw_01.txt",header=T) > length(d);length(d[,1]) [1] 43 [1] 2666

similar to: Mean or mode imputation fro missing values